Blue/Green Deployments in AWS Fargate with Automated Testing and Rollbacks

Introduction

AWS CodeDeploy makes it easy to setup Blue/Green deployments for your containerised applications running in AWS Fargate. In this article, I will show how you can configure CodeDeploy and Fargate to allow automated testing of your deployments before they receive production traffic. Additionally, I will show how you can configure automatic rollbacks, if your application generates errors after receiving production traffic.

Design

For this demonstration, our container application will be a simple Apache web server. An application load balancer will route production traffic to the containers. Our Docker code will be stored in an AWS CodeCommit repository. AWS CodeBuild will be used to build the Docker image and AWS CodeDeploy will of course be used to perform the deployments. We will use AWS CodePipeline to wrap the build and deploy stages into a deployment pipeline. The below diagram represents our design.

Blue/Green Deployment Pipeline Design

During a deployment, the new v2 code is launched in a second set of one or more containers. These new containers are registered with the “green” target group. The green target group is registered to a test listener on the application load balancer (port 8080 in this demonstration). We will then perform our testing against the test listener. When testing is complete, we signal for the deployment to continue at which point the live listener (port 80) is registered to the green target group. The security group rules for our load balancer only allow ingress on port 8080 from within our VPC thus, preventing end-users from accessing the release prematurely.

As we will see later, CodeDeploy automatically handles the registration of containers to the blue/green target groups and also the registration of listeners to target groups.

Prerequisites

The resources deployed in this solution are described with Terraform — an infrastructure as code software tool. Install the latest version of the Terraform CLI.

Next, ensure you have the git-remote-codecommit utility installed. Most often this can be installed with:

sudo pip install git-remote-codecommit

Terraform

The Terraform code at aw5academy/terraform/ecs-blue-green-demo can be used to provision the resources we need for this demonstration. Deploy this code to your environment by running:

git clone https://gitlab.com/aw5academy/terraform/ecs-blue-green-demo.git
cd ecs-blue-green-demo/
terraform init
terraform apply
Output From Terraform Apply

Note the “alb_dns_name” output — we will need this value later.

Docker

We now need to push our Docker code to the CodeCommit repository created by Terraform. Run the following commands to set it up:

git clone codecommit::us-east-1://ecs-blue-green-demo codecommit
git clone https://gitlab.com/aw5academy/docker/ecs-blue-green-demo.git
cp -r ecs-blue-green-demo/* codecommit/
cd codecommit/
git add .
git commit -m "v1"
git push origin master

CodePipeline

If you open the AWS Console and navigate to the CodePipeline service you will see that the “ecs-blue-green-demo” pipeline has started due to our commit to the CodeCommit repository. Wait for the pipeline to complete our first deployment.

CodePipeline Successful Release

Now lets check that our application is working by opening the “alb_dns_name” Terraform output from earlier in our browser.

Application Response

Great! We have a working application.

CodeDeploy Hooks

Hooks are a feature of CodeDeploy which allow you to perform actions at certain points in a deployment before the deployment continues to the next step. The Hooks for ECS/Fargate are defined here. The hook we are most interested in is “AfterAllowTestTraffic”. We want to run tests during this phase of the deployment to validate our deployment before sending production traffic to our release. To do this we will add an AWS Lambda function reference to our appspec.yaml. This lambda (source code at aw5academy/terraform/ecs-blue-green-demo/lambda-src/deploy-hook/lambda_function.py) writes the hook details to an Amazon S3 bucket for a CodeBuild project to reference. This CodeBuild project (source code at aw5academy/docker/ecs-blue-green-demo/test.sh) runs in parallel to our CodeDeploy deployment in our pipeline and performs our tests during the “AfterAllowTestTraffic” stage.

Automated Testing

Let’s test our deployment process by deliberately introducing an error. If you examine our test script at aw5academy/docker/ecs-blue-green-demo/test.sh you can see that we expect our application to return “Hello from v1”. So let’s break this by changing it to return “Hello from v2” instead. Run the following commands from the CodeCommit checkout to do this:

sed -i "s,Hello from v1,Hello from v2,g" start.sh
git commit -a -m "v2"
git push origin master

This action will automatically trigger our pipeline and if you navigate to the CodeDeploy service in the AWS Console you can follow the deployment when it starts. After some time you should see a failure on the “AfterAllowTestTraffic” stage as we expected.

CodeDeploy Failure

When we check the CodeBuild logs for our test project we can see the problem. As we noted, our tests still expect the application to respond with “Hello from v1”.

CodeBuild Error Logs

CodeDeploy and CloudWatch Alarms

There is one more way we can validate our deployments. Suppose we would like to monitor our deployments for some time after we route production traffic to them. And if we notice any issues we would like to rollback. By combining CodeDeploy and CloudWatch Alarms we can do this in an automated way.

AWS CodeDeploy allows you to retain the existing containers for a period of time after a deployment. In our demonstration, for simplicity, we have configured it to 5 minutes but it can be many hours if you wish. With this setting, and properly configured CloudWatch alarms, you can monitor your application post-deployment and if any of your alarms move into the alarm state during the retention time, CodeDeploy will automatically rollback to the previous version.

In our demonstration, we have configured our Docker container to send the httpd access logs to a CloudWatch Logs group. A log metric filter will send a data point whenever our httpd access logs contain the string ” 404 ” — i.e. whenever a request is made to the server which can’t be served. Next, we have a CloudWatch alarm that will move into the alarm state when 1 or more data points are received from the log metric filter.

In the next section we will see how CodeDeploy works with this CloudWatch alarm to automatically rollback when needed.

Automated Rollbacks

Let’s go back and fix the error we introduced. In our CodeCommit checkout, run the following commands:

sed -i "s,Hello from v1,Hello from v2,g" test.sh
git commit -a -m "v2 -- fix test"
git push origin master

Our tests have been corrected to match the new response from our application. If you open the AWS CodeDeploy service you should see the deployment happening again. This time you will see that it proceeds past the “AfterAllowTestTraffic” stage and that production traffic has been routed to the new set of containers.

CodeDeploy Wait

We can verify by opening the URL from our Terraform “alb_dns_name” output.

Application Response

Our application has been fully released and is serving production traffic. Now let’s deliberately cause an error by generating a 404. You can do this by entering any random path to the end of our URL. As expected we get a 404.

Application 404 Response

When we inspect our CloudWatch logs we can see the request in the access logs.

CloudWatch Logs 404 Error

Next, if we go back to CodeDeploy we should see a reporting of the alarm and a rollback being initiated.

CodeDeploy Alarm Rollback

Looks good! Now to confirm, we open our URL from the Terraform “alb_dns_name” output again to verify that the application has been rolled back to v1.

Application Response

Success!

Wrap-Up

I hope this article has demonstrated how powerful AWS CodeDeploy can be when configured with supporting services and features.

Ensure you clean-up the resources created here by running the following from the root of your checkout of the Terraform code:

terraform init
terraform destroy

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s