In this post, we will walk you through how we replaced our cron jobs with Hercules—a job scheduling framework that we developed internally.
If working on such problems excite you, our infrastructure engineering team is hiring. If you care about developer productivity and tooling—check out the job description and apply here.
Hassles of cron jobs
Deploying a recurring job through cron is a pain. You ssh into a server, figure out the cron expression, and the permissions. You worry about how you will get to know if your job fails. You fret about your job’s log management and log shipping. You think about the cost—why am I paying for this server when it is in use only for a couple of hours a day, the rest of the time it is sitting idle doing nothing. You get anxious about some other cron job on the server consuming all the CPU and memory, thus starving your job of resources and killing it.

As a developer, you want to concentrate on writing the code and let someone else manage all the mundane housekeeping stuff for you. That someone for us is Hercules—our job scheduling framework.
We used to use the good old cron to schedule recurring workloads. As Slice grew, so did the number of job workloads. After a particular stage, scheduling jobs through cron becomes a hassle.
Capacity planning
When you have a team of developers scheduling jobs through cron, after a particular scale, the jobs start trampling on each other. A job might be running, consuming a significant portion of the CPU and memory. While this job is still executing, the cron scheduler might kick off another job. Due to this, you end up with erratic job failures.
Capacity planning and efficient utilization of hardware is a problematic area with cron jobs. When the jobs are not running, the servers are dormant, but you end up paying for them. At the same time, if you have multiple jobs running simultaneously, you need to provision the hardware for peak usage.
Observability
Observability is another challenge with cron jobs.
You do not have a single view of all the jobs(spanning machines) and their execution history. Cron does not give you the ability to monitor job execution and alert on job failures.
Creating and maintaining a consistent execution environment for cron jobs is problematic—if the jobs depend on a specific directory structure, files, or other dependencies.
Hercules
To sidestep all the above challenges, we developed Hercules—a container job scheduling framework. With Hercules, we wanted to keep operational overhead to a minimum. Hence, we developed Hercules over AWS Fargate—a serverless compute engine for containers.
Developers package their jobs as Docker container images and push these images to AWS ECR; we have automated this as part of our build pipeline.
They create a schedule.json file with:
- Docker image URL
- CPU and memory requirements
- Docker run directive
- Schedule(cron or rate expression).
An example schdule.json file below:{
"image": "56789076.dkr.ecr.ap-south-1.amazonaws.com/foo", //image
"name": "foo", //name
"cpu": 512, //CPU
"memory": 1024, //memory
"command": ["./foo"], //run directive
"scheduleExpression": "rate(5 minutes)" //schedule
}
We have integrated Hercules with our build pipeline. During a build, Hercules scans for schedule.json files in the build artifact. Hercules creates a graph of all the jobs specified in the build, queries Fargate for the existing jobs, creates a diff between the two, and schedules jobs accordingly—adding new jobs, modifying existing jobs, and removing jobs not needed anymore.

Notifications are first-class citizens in Hercules. If a job fails pre-maturely, Hercules triggers an alert to a Slack channel.
Hercules relays job execution logs to Cloudwatch and our internal Elasticsearch cluster for easy search and analysis.
Conclusion
With Hercules bearing the burden of scheduling jobs and taking care of all the mundane stuff, developers can concentrate on writing their code and let Hercules do the rest. Developers do not have to bother about sshing into servers, deploying their jobs, and getting anxious about CPU, memory requirements, and job failures.
Hercules has made scheduling jobs at Slice a declarative and productive process—enjoyable and hassle free.
If working on such problems excite you, our infrastructure engineering team is hiring. If you care about developer productivity and tooling—check out the job description and apply here.