Deployments using Immutable Infrastructure

Don’t you feel like pulling out your hair when your code works in testing but not in production? Don’t you just hate it when you face downtime due to faulty deployments that are out of your control? We do too. Read on to find out how we resolved this nerve-wracking problem.

What is a mutable and immutable infrastructure?

As the name suggests, mutable infrastructure is the infrastructure that will change (mutate) over time in an incremental manner. Suppose you have a server (a bare-metal machine or a virtual machine) with your web application deployed. As you add new exciting features to your app, you continually SSH into your machine, pull the latest code, install new dependencies, and restart your application. Now, you can do this either manually or automate it, but it is mutable because you are modifying the existing machine.

On the other hand, the immutable infrastructure is unchanging. If you want to deploy a new version of your app, you tear down the old infrastructure and create a new one. As simple as that.

Why bother with immutable infrastructure?

Consider all the steps involved in updating a server using the mutable approach like multiple network calls, to Github (that surely can’t fa– 😞 ), downloading dependencies, installing dependencies, etc. If you’ve ever tried doing this manually, then you need not be told that things can go wrong. And if you remember the good old Murphy’s Law, what can go wrong will go wrong.

You might be thinking, “Sure, it can go wrong but it rarely does”. It does work 99% of the time. You are right but that makes it all the more difficult to debug and fix things when they do go wrong. While you scale, you increase the number of machines you deploy to and thus, increase the magnitude of the problem. Things can go wrong on a higher number of machines in a variety of different ways as an intermediate step can fail and you might eventually end up with a half-cooked update. Instead of going from version 1.0 to version 2.0, you might end up at version 1.5 or 1.8.

With immutable infrastructure, you always deploy what you test. There is no intermediate state your server might end up in. It’s either the previous version or the tested new version. If you use a template to launch your machines, such as an AWS AMI, you can replicate a tried and tested version thousands of times with a guarantee. What’s more, it gives you peace of mind.

What tools did we use for our immutable infrastructure deployments?

Finally, a short section on how we did it. We use TypeScript with Node.js for our backend application. We generate an AMI with the new version of our software using AWS CodeBuild integrated with GitHub and HashiCorp Packer. After we test this AMI and verify it works, we update the launch template for our AWS AutoScalingGroup with this AMI. In the end, we trigger an Instance Refresh for our auto-scaling group. Easy and Safe!

  • AWS CodeBuild
  • HashiCorp Packer
  • AWS AMI
  • AWS Launch Template
  • AWS Auto Scaling Group

How to do it?

The following instructions are specific to AWS. However, the concepts can be translated to other cloud providers as well.

Step 1: Building an AMI using HashiCorp Packer

Packer is a tool provided by HashiCorp for free. You can use it to build an AMI by providing it a JSON template that contains instructions on how to build your AMI. Then you can execute it using the following commands

packer validate template.json # check if the template is well formed
packer build template.json

This will start the execution of Packer which performs the following steps:

  1. It will provision a new EC2 instance based on a source_ami provided by you and wait for the instance to become available.
    Packer gives you options to specify the AWS region, VPC, ssh key pair for this new instance. It will use sensible defaults if not provided.
  2. It will perform the steps as detailed by you to convert that new instance into your desired server configuration.
    You have many options here. You can choose to have packer ssh into that machine and execute certain shell commands or you could have packer SCP some files into that machine. For all options, click here.
  3. It will save the state of that instance as an AMI and wait for the AMI to become available and then exit.
    You have to provide the name for the AMI and you can choose to have Packer add certain tags to that AMI to easily identify the AMI later.

Here is a sample packer template file

{
	"variables": {
		"aws_region": "{{env `AWS_REGION`}}",
		"aws_ami_name": "<unique-ami-name>",
		"source_ami": "{{env `SOURCE_AMI`}}"
	},

	"builders": [
		{
			"type": "amazon-ebs",
			"region": "{{user `aws_region`}}",
			"instance_type": "t3.medium",
			"ssh_username": "ubuntu",
			"ami_name": "{{user `aws_ami_name`}}",
			"ami_description": "project build ami for production",
			"associate_public_ip_address": "true",
			"source_ami": "{{user `source_ami`}}",
			"tags": {
				"branch": "master",
				"timestamp": "{{timestamp}}"
			}
		}
	],

	"provisioners": [
		{
			"type": "shell",
			"inline": [
				"git clone <github-repo>",
				"cd <project-dir>",
				"git checkout master",
				"npm run build",
				"echo built project",
				"sudo systemctl enable service"
			]
		}
	]
}

You can use this template as a starting point. For more details, visit this page

Step 2: Create an AWS CodeBuild pipeline to run Packer

You can find great detailed instructions here on how to create an AWS CodeBuild project which will run the packer template you created in the previous step.

Note: When you build new AMI for a long-running branch, you will end up with an ever-growing store of AMI. To bound this, you can decide to store only the latest 5 or 10 images. To achieve this, you will need to delete an AMI when you create a new one. Below is a shell script to do that:

#!/bin/bash
set -e
ami-list=$( /usr/local/bin/aws ec2 describe-images --filters "Name=tag:branch,Values=$1" --query 'Images[*].[CreationDate,ImageId]' --output json | jq '. | sort_by(.[0])' )
numImages=$( echo "$ami-list" | jq '. | length' )
maxImages=$2
toDelete="$(($numImages - $maxImages))"
echo "Images to delete: $toDelete"
while [ $toDelete -gt 0 ]
do
	index="$(($toDelete - 1))"
	ami_id=$( echo $ami-list | jq -r ".[$index][1]" )
	echo "deleting $ami_id"
	bash -e ./delete_ami.sh "$ami_id"
	toDelete="$(($toDelete - 1))"
	echo "deleted $ami_id"
done

The delete_ami.sh script can be found below. Remember, if you are using an EBS backed AMI, only de-registering an AMI isn’t enough, you also need to delete the EBS snapshot.

#!/bin/bash
set -e
ami_id=$1
temp_snapshot_id=""
# shellcheck disable=SC2039
ebs_array=( $(/usr/local/bin/aws ec2 describe-images --image-ids $ami_id --output text --query "Images[*].BlockDeviceMappings[*].Ebs.SnapshotId") )
ebs_array_length=${#ebs_array[@]}
echo "Deregistering AMI: $ami_id"
/usr/local/bin/aws ec2 deregister-image --image-id "$ami_id" > /dev/null
echo "Removing Snapshot"
for (( i=0; i<$ebs_array_length; i++ ))
do
	temp_snapshot_id=${ebs_array[$i]}
	echo "Deleting Snapshot: $temp_snapshot_id"
	/usr/local/bin/aws ec2 delete-snapshot --snapshot-id "$temp_snapshot_id" > /dev/null
done

Final step: Use the newly created AMI to update your auto-scaling group

This part assumes that you already have an auto-scaling group running which uses a launch template. If not, there is a good resource to get started here.

  1. Get the latest AMI id for the branch you need to deploy.
#!/bin/bash
ami_list=$( /usr/local/bin/aws ec2 describe-images --filters "Name=tag:branch,Values=$1" --query 'Images[*].[CreationDate,ImageId]' --output json | jq '. | sort_by(.[0])' )
numImages=$( echo "$ami_list" | jq '. | length' )
latest_ami_id=$( echo $ami_list | jq -r ".[$(($numImages - 1))][1]" )
echo "$latest_ami_id" >> latest_ami_id

The above shell script fetches the AMI list for the branch you specified by filtering on the tags you specified in your packer template. Then it sorts the list by creation time and chooses the latest. Finally, it writes the AMI id to a file.

2. Create a new version for your launch template.

Note: there is a limit on how many launch template versions you can have. So, it is a good idea to bound this number too. A good limit for this would be the same limit you put on your AMI as launch template versions will have a one-to-one mapping with AMIs.

#!/bin/bash
set -e
launch_template_id=$1
ami_id=$2
versions_to_keep=$3
if [ $versions_to_keep -lt 2 ]
	then
		echo "Must keep at least 2 versions of launch template"
		exit 1
	else
		echo "keeping $versions_to_keep of launch template"
fi
new_version=$( /usr/local/bin/aws ec2 create-launch-template-version --launch-template-id "$launch_template_id" --source-version 1 --launch-template-data "{\"ImageId\":\"$ami_id\"}" )
version_number=$( echo "$new_version" | jq '.LaunchTemplateVersion.VersionNumber' )
if [ $version_number -gt $(($versions_to_keep + 1)) ]
	then
		version_to_delete=$(($version_number - $versions_to_keep))
		out=$( /usr/local/bin/aws ec2 delete-launch-template-versions --launch-template-id "$launch_template_id" --versions "$version_to_delete" )
	else
		echo "Not deleting any launch template version"
fi

3. Start ASG instance refresh.

#!/bin/bash
set -e
config_file=$1
asg_name=$2
out=$( /usr/local/bin/aws autoscaling start-instance-refresh --cli-input-json "file://$1" )
interval=15
((end_time=${SECONDS}+3600+${interval}))
while ((${SECONDS} < ${end_time}))
do
  status=$( /usr/local/bin/aws autoscaling describe-instance-refreshes --max-records 1 --auto-scaling-group-name "$2" | jq -r '.InstanceRefreshes[0].Status' )
  echo "Instance refresh $status"
  if [ "$status" = "Successful" ]
  	then
  		echo "ASG updated"
  		exit 0
  fi
  if [ "$status" = "Failed" ]
  	then
  		echo "ASG update failed"
  		exit 1
  fi
  if [ "$status" = "Cancelled" ]
  	then
  		echo "ASG update cancelled"
  		exit 1
  fi
  sleep ${interval}
done

echo "ASG update exceeded timeout"
exit 1

This is a sample config file for ASG instance refresh

{
	"AutoScalingGroupName": "my_asg",
	"Preferences": {
		"InstanceWarmup": 5,
		"MinHealthyPercentage": 50
	}
}

Voila! We now have an immutable deployment pipeline.