Pulse—keeping a check on our services

Developers take pride in their services’ uptime—they want to know when services go down or become lethargic in their response.

At Slice, we use Elastic Heartbeat to monitor the uptime of internal services and alert us when they go down. Heartbeat has a nifty dashboard(through Kibana) that displays the uptime of all the services it is monitoring.

Heartbeat in a nutshell

You install Heartbeat in a server and configure the services you want Heartbeat to monitor. We have a configuration file for each of our services

A sample Heartbeat configuration file in YAML:

-type: http
 id: foo
 name: bar
 enabled: true
 urls: [https://foo.sliceit.com/ping]
 schedule: '@every 30s'

With the above, Heartbeat will ping foo.sliceit.com/ping every 30 seconds.

Every service that we want Heartbeat to monitor maps to one configuration file of the above format. 

Whenever we design systems at Slice, one of our guiding principles is to make it easy for everyone to use. 

In line with this principle, we wanted to give our developers the least resistance way to monitor their services’ uptime. We asked ourselves, how can we make it easy for Slice developers to add their services to Heartbeat monitoring? 

Pulse is the framework that we came up with.

Pulse

Developers write Heartbeat configuration files for their services and commit them to a Github repo. The check-in triggers CI/CD workflow(through AWS CodeBuild) that syncs these files to an AWS S3 bucket. On the Heartbeat server, we have a cron that periodically syncs the configuration files from the S3 bucket to a local directory. We have configured Heartbeat to look for configuration files in this directory.

With Pulse in place; to monitor a new service, all the developer has to do is check-in the Heartbeat configuration file to the Pulse Github repo. After a minute or so, the service starts appearing in the heartbeat monitoring dashboard on Kibana. We have integrated alerting with Heartbeat using Elastic Watcher to notify us of service downtimes.

Pulse has brought in visibility into the uptime of our services, thus making Slice snappier and reliable.


Heartbeat dashboard image from Elastic Heartbeat website.

One thought on “Pulse—keeping a check on our services

Leave a Reply to Meera Krishna Cancel reply