Shared Chef Code For Amazon Machine Images and Docker Images

If you are building Docker images you might be using a Dockerfile to define the state of the image. For example:

FROM centos:7

RUN yum install -y httpd
RUN mkdir -p /var/www/html/images
COPY images/foo.png /var/www/html/images/foo.png
RUN chown apache: /var/www/html/images/foo.png

If you have been creating Amazon Machine Images (AMIs) you might be using Chef or some other configuration management tool.

A problem arises when you want to avoid duplicating code across the two. Let’s say you want to create a “baseline” image for your development teams to use. It will likely share the same configuration in terms of what packages you need to install and what security configurations are required. How do you avoid writing code in your Dockerfile and duplicating that for your AMIs?

One solution is to use Chef to define your configuration for both images. In this article I will show how you can develop AMIs and Docker images using Chef and Kitchen and build the images using Jenkins and Packer.

Workstation Setup

First, let’s setup our workstation. I highly recommend installing Windows Subsystem for Linux 2 (WSL 2). To do so, install Windows 10 build 18932 or later. Then from PowerShell run:

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Windows-Subsystem-Linux
Enable-WindowsOptionalFeature -Online -FeatureName VirtualMachinePlatform

Install the Ubuntu app from the Microsoft Store and from PowerShell run:

wsl --set-version Ubuntu 2

Install Docker Desktop, open the ‘WSL 2 Tech Preview’ section of Docker Desktop and click ‘Start’.

From your Ubuntu app install Chef Workstation which in addition to Chef provides the Kitchen tool and the ec2 and dokken drivers.

wget "" -O /tmp/chef.deb
sudo dpkg -i /tmp/chef.deb
rm -f /tmp/chef.deb


If you clone and inspect the kitchen.yml file you can see how Docker images and AMIs can be developed with the same Chef code.

  name: <%= ENV['KITCHEN_DRIVER'] || 'dokken' %>
  <% if ENV['KITCHEN_DRIVER'] == "dokken" %>
  chef_version: 15.3.14
  <% else %>
  associate_public_ip: false
  availability_zone: us-east-1a
  iam_profile_name: kitchen-instance
  instance_type: t3.micro
  interface: private
  region: us-east-1
    tag:   'Name'
    value: 'kitchen-sg'
    tag:   'Name'
    value: 'kitchen-subnet'
    Name: baseline-testkitchen
  <% end %>

  <% if ENV['KITCHEN_DRIVER'] == "dokken" %>
  name: dokken
  <% else %>
  connection_retries: 5
  connection_timeout: 10
  name: ssh
  username: ec2-user
  <% end %>

  <% if ENV['KITCHEN_DRIVER'] == "dokken" %>
  name: dokken
  <% else %>
    environment: build
  environments_path: environments
  name: chef_zero
  product_name: chef
  product_version: 15.3.14
  <% end %>

  - name: amazon
      <% if ENV['KITCHEN_DRIVER'] == "dokken" %>
      image: amazonlinux:2
      <% else %>
      image_id: ami-0ce71448843cb18a1
        - device_name: /dev/xvda
            volume_type: gp2
            volume_size: 8
            delete_on_termination: true
      <% end %>

  - name: default
      - recipe[baseline]
        docker: <% if ENV['KITCHEN_DRIVER'] == "dokken" %>true<% else %>false<% end %>

  name: inspec
    - path: test/default
    - path: test/<%= ENV['KITCHEN_DRIVER'] %>

We can use the KITCHEN_DRIVER environment variable to control whether we want to use the ec2 or dokken driver.

Rapid development of Docker images is possible here because Kitchen creates a running container and you may iteratively run ‘kitchen converge’ as you develop your cookbooks.


If you clone you will see a Jenkins pipeline file (Jenkinsfile). This pipeline has a ‘berks’ stage to package the cookbooks and then parallel stages for the Docker build and the AMI build. Both images are built with Packer…

./packer build docker.json
./packer build ec2.json

The configuration values for these ‘docker.json’ and ‘ec2.json’ Packer files are extracted from the kitchen.yml of the cloned cookbook. The benefit of this is that your kitchen.yml contains all of the information needed to develop Docker images and AMIs in Kitchen and to build the images using Packer.

Final Comments

I have been using this solution for a few weeks and it has been working very well. I hope others may find this useful.

Deploy a Serverless SFTP Server With AWS

Let’s imagine you want to migrate an existing SFTP server to Amazon Web Services (AWS). You might consider deploying an EC2 instance to facilitate this. With this approach you are responsible for maintaining and patching that instance. Also, if you want to make your service highly available you would have to deploy multiple instances across availability zones. This is all doable but where possible I always try to take advantage of any serverless offerings in AWS — let Amazon handle the server patching, the high availability and scaling. In this article I will show how you can deploy a serverless SFTP server with AWS.


First let’s define our requirements. The solution we build must meet the following demands:

  • High availability – minimum 2 availability zones;
  • Allow password and public key authentication;
  • Should be “chroot” enabled so users can only view their own files;
  • Allow for IP whitelisting to control access;
  • User’s data should be encrypted;
  • Use an existing DNS record to point to the server’s endpoint;
  • Keep an existing SSH host key;


The requirements for this SFTP server can be satisfied with the following design.

In this design we will use the AWS Transfer for SFTP service to provision an endpoint for our SFTP server. This service does not currently support password based authentication so we need to configure our own identity provider. For our IDP we will use API Gateway calling a Lambda function with the user credentials stored in AWS Secrets Manager. For the file storage we will use S3 with encryption enabled.

Our requirements demand that we implement IP whitelisting to control access to the SFTP server. The out of the box implementation of the AWS Transfer for SFTP service deploys a public endpoint. So we will instead deploy this using a VPC endpoint. A network load balancer will then be deployed across two availability zones with a target group configured to point to the IPs of our “transfer.server” VPC endpoint. We then create a Route 53 alias record pointing to the DNS record for the network load balancer.


Let’s now use the Terraform code I have published at to provision the resources needed.

After cloning this repository, we can modify the default values in the file to match our environment. We need to provide the id of the VPC where the SFTP endpoint and network load balancer will be created in as well as the ids of the subnets.

Now we can deploy the stack with Terraform:

$ terraform init
Terraform has been successfully initialized!
$ terraform apply
Plan: 36 to add, 0 to change, 0 to destroy.
Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_alb_target_group.sftp: Creating...
aws_eip.a: Creating...
aws_eip.b: Creating...
Creation complete after 1s

Apply complete! Resources: 36 added, 0 changed, 0 destroyed.

Adding Users

With the stack deployed we can now add our users. Let’s add two users. One using public key authentication and the other using password authentication.

Open the AWS Secrets Manager service and click on “Store a new secret”. For secret type, select “Other type of secrets”. Enter “Password” as the key and we can enter the password we would like to set for this user.

On the next page we need to give the secret a name. The format of the name is ${stackName}/${userName} where the stack name is the value of the “name” variable from the Terraform’s file. In this example I am creating a user called “foo” with a stack name of “serverlessftp” so my secret name is “serverlessftp/foo”.

To create a user that uses public key authentication the process is the same except the name of the secret key is “PublicKey” with the value of course being the public key of the SSH key pair.

Bucket Layout

The layout of the S3 bucket containing the files for our server is a list of all user folders. So when creating a new user you should also create a folder in the bucket.

IP Whitelisting

Should you wish to limit the ingress to your SFTP server you can do so by adjusting the network access control list NACL rules on your subnets.

SSH Host Key

To preserve the SSH host key we update the AWS Transfer for SFTP server post deployment and provide our own SSH host key. This can only be done from the CLI but is as simple as:

$ aws transfer update-server --server-id s-abcd1234abcd12345 --host-key file:///path/to/your/host-key

Route 53

To create a Route 53 record pointing to your SFTP endpoint, first obtain the DNS name of the network load balancer.

$ terraform state show aws_alb.sftp |grep "dns_name" |cut -d '=' -f2 |xargs

You can then create an alias record pointing to this.


Now we are ready to try this out. Let’s use WinSCP to configure access to our server.

Now let’s test it…

Success! Notice also how the user has no view beyond their folder in the S3 bucket so our “chroot” is working. So what about public key authentication…

$ sftp -i ~/bar
Connected to
sftp> ls

That’s working too!


So that’s it! By combining a number of different Amazon Web Services we can construct solutions that do not require traditional server based approaches.

Serverless Web Applications In AWS

In this article I will demonstrate how to start developing serverless web applications in Amazon Web Services (AWS). A serverless architecture allows developers to focus on their code — the complexity of building and maintaining the infrastructure necessary to run the code is removed from their view.


Starfleet has asked you to create a web application that will allow fleet Captains to view their log entries. There are no servers in the future so you will have to do it with serverless!

We can achieve this with the following design:

Our front-end code will be stored in S3, served via CloudFront. A Route 53 record will be created for our domain. The “back-end” code will be delivered via API Gateway with data storage in Secrets Manager and DynamoDB.


Before we can deploy our design we need to setup a Route 53 zone and a wildcard AWS Certificate Mananger (ACM) certificate. Alternatively you can reuse existing resources. Ensure your certificate is a wildcard as shown below.

Serverless Framework

We will use the serverless framework CLI to build and deploy our API code. We can install it via:

$ npm install -g serverless

Next, clone the code I have developed at and deploy it with serverless, specifying your AWS Account ID as an argument:

$ mkdir serverless
$ cd serverless
$ git clone .
$ sls deploy --accountid 123456789012


The remainder of our code will be in Terraform. You can download the binary for Terraform from and unpack it anywhere on your PATH.

Now you can deploy the Terraform code I have developed at with:

$ mkdir terraform
$ cd terraform
$ git clone .
$ terraform init
$ terraform apply
  The domain of your website.

  Enter a value:
$ Apply complete!


We are now ready to try our web application. Let’s see what happens when we request

Success! But… how did that work? Firstly, we requested “/” and yet we ended up at “/login/” and if we examine our S3 bucket which stores our front-end code we only see:

If our request was for “/login/” how did the “login.html” document get returned? This is where the magic of Lambda@Edge comes in. Have a look at the code at and specifically this block:

def handler(event, context):
  request = event['Records'][0]['cf']['request']
  requestedUri = request['uri']
  if requestedUri == "/":
    request['uri'] = "/home.html"
  elif requestedUri == "/login/":
    request['uri'] = "/login.html"

In the above code we update requests for “/login/” to “/login.html”. So CloudFront (which is where these Lamba@Edge functions run) requests the “login.html” from S3 and returns that to the client. If we examine our CloudWatch logs for this Lambda we can see this happening:

But… if you remember our original request was for “/” not “/login/” so how did our request become “/login/”? Again, its the Lambda@Edge function at work:

  if requestedUri == "/":
    parsedCookies = parseCookies(headers)
    if parsedCookies and 'session-id' in parsedCookies:
      sessionid = parsedCookies['session-id']
      if validSessionId(sessionid):
        return request
    redirectUrl = "https://" + headers['host'][0]['value'] + "/login/"
    response = {
      'status': '302',
      'statusDescription': 'Found',
      'headers': {
        'location': [{
          'key': 'Location',
          'value': redirectUrl
    return response

In this code block, the Lambda is checking the request for the presence of a session id and if none is found it responds with a redirect to the “/login/” page.

Session Management

Let’s login to the application. In we have deployed two passwords to Secrets Manager. Try logging in with either:

Username = picard
Password = makeitso

Username = kirk
Password = KHAAAN!!!

I chose Picard 🙂 and the web application has successfully returned my logs. Now let’s dig into what happened here. Firstly, if we look at the serverless framework code at we can see that after the password is checked against what is stored in Secrets Manager an item is put into the “my-serverless-website-sessions” DynamoDB table.

def setSessionId():
  global SESSION_ID
  SESSION_ID = secrets.token_urlsafe()
    dynamodb = boto3.client('dynamodb', AWS_REGION)
    epoch = epoch = datetime.datetime.utcfromtimestamp(0)
    now = datetime.datetime.utcnow()
    diff = (now - epoch).total_seconds()
    now_seconds = int(diff)
    ttl = now_seconds + 60
    dynamodb.put_item(TableName='my-serverless-website-sessions', Item={'userName':{'S':USERNAME},'session-id':{'S':SESSION_ID},'creationTime':{'N':str(now_seconds)},'ttl':{'N':str(ttl)}})
  except ClientError as err:
    return None

A neat feature of DynamoDB is Time to Live which allows you to set when an item in your table expires. DynamoDB will automatically remove the item after this time (though not immediately). You can view the table in the DynamoDB console to see existing sessions.

The login API returns the session id to the client which is then saved to a cookie and the home (“/”) page is requested. This is when the home API is invoked. In this API, we get the userName from the session id by looking up the “my-serverless-website-sessions” DynamoDB table. If an item exists we then query the “my-serverless-website-logs” DynamoDB table for this user and return the results to the client for display.

As mentioned, the TTL feature of DynamoDB does not instantly remove items after they expire so our code should really verify that a session id is still valid. This logic is not present in the sample code.

Handling Errors

Cloudfront will by default, lookup S3 for whatever the user requests. So if a user requested “/foo” then Cloudfront will try to find a key named “foo”. If not found, a rather ugly error will be displayed to the user. To avoid this, we can enable a feature of CloudFront that will respond with a custom path whenever certain error codes would be returned. This setting is enabled in the code:

resource "aws_cloudfront_distribution" "my-serverless-website" {
  custom_error_response {
    error_code         = "403"
    response_code      = "403"
    response_page_path = "/error.html"

So now when errors occur we get a much prettier output:


If you have deployed my Terraform and serverless code to your environment you can clean it up with the following steps:

$ cd terraform
$ terraform destroy
$ cd ../serverless
$ sls remove --accountid 123456789012

Note: you might get an error when destroying the terraform stack. Lambda@Edge resources can take some time (as much as 24 hours) before they are fully removed. So you should repeat the destroy of terraform a day or two later.


That’s it! I hope this very simple example has demonstrated the capabilities of a serverless architecture.

Continuous Deployment with AWS CodePipeline and Chef Zero

In this article I will show how you can use AWS CodePipeline and Chef Zero to implement a blue-green continuous deployment model to automatically release changes to your EC2 hosted web application.

AWS CodePipeline is a fully managed continuous delivery service that helps you automate your release pipelines for fast and reliable application and infrastructure updates. CodePipeline automates the build, test, and deploy phases of your release process every time there is a code change, based on the release model you define.

Chef Zero

We will use Chef to define the state of each EC2 instance that hosts our web application. At I have created a very simple cookbook that will install Apache Tomcat and deliver some custom files to be served by the web application.

AWS CodeBuild

If you are familiar with Chef but not with AWS CodePipeline you might be curious about the buildspec.yml file in the cookbook repository.

This file is used in the build phase of our pipeline to instruct AWS CodeBuild to install Chef Workstation and package our cookbooks using Berkshelf.

AWS CodeDeploy

Another file of note is the appspec.yml file. This is used in the deploy phase of the pipeline to instruct AWS CodeDeploy how to update / install our web application code.

In this example the appspec.yml file instructs CodeDeploy to execute scripts/ and it is in this file where we run Chef Zero.

Terraform Code

In order to create a pipeline in AWS CodePipeline we first need to create some prerequisite AWS resources such as an AWS CodeCommit repository to store our application code as well as the AWS CodeBuild and AWS CodeDeploy resources I mentioned earlier. Additionally, we need a running web application to actually deploy our code to. I have created some Terraform code at that will provision these resources for us.

$ mkdir ~/codepipeline-demo
$ cd ~/codepipeline-demo
$ git clone terraform
$ cd terraform
$ vi terraform.tfvars
$ terraform init
Terraform has been successfully initialized!

$ terraform apply
data.template_file.user-data: Refreshing state...

Plan: 28 to add, 0 to change, 0 to destroy.

Do you want to perform these actions?
  Terraform will perform the actions described above.
  Only 'yes' will be accepted to approve.

  Enter a value: yes

aws_security_group.asg: Creating...
aws_iam_role.codebuild: Creating...
Apply complete! Resources: 28 added, 0 changed, 0 destroyed.

AWS CodeCommit

An additional prerequisite step is to publish the cookbook code to the CodeCommit repository that was created by Terraform. First, ensure your working environment is configured to work with CodeCommit by refering to this document.

Now you can clone my cookbook code from GitLab.

$ cd ~/codepipeline-demo
$ git clone colmmg-chef

Next, get the HTTPS clone URL of your CodeCommit repository from the AWS Console.

You can now push my cookbook code to your repository by running the following.

$ cd ~/codepipeline-demo
$ git clone <<YOUR_HTTPS_CLONE_URL>> chef
$ cd chef
$ cp -r ../colmmg-chef/* .
$ git add --all
$ git commit -m "Initial commit"
$ git push origin master

AWS CodePipeline Setup

We are now ready to create our pipeline! Open the CodePipeline service in the AWS Console and click to create a new pipeline.

In the first step, select the “codepipeline-chefzero-webapp” IAM role and “codepipeline-artifacts” S3 bucket.

In the source step, select AWS CodeCommit.

Choose AWS CodeBuild for the build stage.

Lastly, select AWS CodeDeploy for the deploy stage.

Review your choices in the next page and then create the pipeline. Your pipeline will automatically start.

AWS CodePipeline

Each stage of the pipeline will be executed starting with the retrieval of the cookbook code from the CodeCommit repository. The next stage is the build stage where the cookbook code is packaged. From CodePipeline you can click into each stage to view more details and if we do this for the build stage we can see the build logs.

Perhaps the most interesting stage is the deploy stage. When it starts you can click into it and will see a view like this.

From this page we can see exactly what is happening with our deployment including which instances traffic is being directed to.

The page will update as the deployment progresses.

When the deployment is complete you can test the web application by retrieving the DNS record of the application load balancer that was created by Terraform.


Continuous Deployment

Now let’s give our pipeline a proper test! Suppose the product team have asked that the background image of the web application be updated and deployed. With one commit we can fulfill this request.

$ cd ~/codepipeline-demo/chef
$ sed -i "s,div id=\"blue\",div id=\"green\",g" files/default/index.jsp
$ git commit -a -m "Updating background image"
$ git push origin master

That’s it! AWS CodePipeline will now take over and automatically deploy this change. If you make requests to the web application during deployment you will see the change being rolled out, with both the old and new image being returned until finally only the new image being displayed.


To remove the resources that were created in this demo, first you should manually delete any autoscaling groups that CodePipeline provisioned. They will have names starting with “CodeDeploy_codepipeline-chefzero-webapp”.

Next you can destroy the Terraform infrastructure.

$ cd ~/codepipeline-demo/chef
$ terraform destroy

Now you can manually delete the pipeline from the CodePipeline service.

As part of the creation of the pipeline, a CloudWatch events rule was created. This would have been deleted when you deleted the pipeline but the associated IAM role and policy would not have been removed. The role will have a name like “cwe-role-eu-west-1-codepipeline-chefzero-webapp”. You should remove this role and the attached policy.


This is a very simple example showing some of the capabilities of CodePipeline. It, along with CodeBuild and CodeDeploy, have other features including:

  • use S3 as the source instead of CodeCommit;
  • deploy to a percentage of instances at a time instead of all at once;
  • deploy to Amazon Elastic Container Service (ECS) and Lambda;
  • add a manual approval stage to the pipeline so a human must interact with it before a deployment can occur;
  • integration with Jenkins;

I hope this article and the associated code helps you get started with these very powerful services!

EC2 Jump Host For ECS Fargate Docker Containers


The benefits of Docker containers are well understood however, the challenges in managing the host operating system remain. AWS Fargate solves this problem.

Fargate makes it easy for you to focus on building your applications. Fargate removes the need to provision and manage servers.

By outsourcing the management of the host OS to AWS you do lose some control. You can’t login to the EC2 instance and run `docker exec`! In well-tuned applications, this should not be an issue. Your logs and metrics will be pushed to Amazon CloudWatch or some other service and there should be no need to login to a container. Perhaps for that initial period of adjusting to this new model or for the times when you can debug faster with container access you may wish to have a mechanism to login to your Fargate containers. In this article I will show how you can setup such a means.


To follow this guide you will need Terraform v0.12.23. Later versions may work but the guide was tested with 0.12.23.

You will also need extensive permissions to your AWS account as we will be creating resources across many services including IAM.


To demonstrate access to Fargate containers we will run a Docker container in Amazon Elastic Container Service. We will build the image for this container from its source in AWS CodeCommit via AWS CodeBuild and store the image in Amazon Elastic Container Registry. The image will be Amazon Linux 2 based with an Apache server and SSH access.

SSH ingress to the Fargate containers will be via an EC2 instance designated as our ECS jump host.


Let’s first setup the infrastructure. We begin by setting up some environment variables:

$ export AWS_ACCOUNT_ID=`aws sts get-caller-identity |jq .Account |xargs`
$ export AWS_REGION=us-east-1

Next we create our Amazon S3 bucket that will store our Terraform remote state:

$ aws s3 mb s3://tf-state-$AWS_REGION-$AWS_ACCOUNT_ID --region $AWS_REGION


We now deploy some foundation resources that we will need to use AWS CodePipeline. CodePipeline will glue our CodeCommit repository and CodeBuild project together to form an end-to-end pipeline.

$ git clone
$ sed -i "s,123456789012,$AWS_ACCOUNT_ID,g"
$ terraform init
$ terraform apply


Now we deploy the baseline resources required for our use of ECS. This stack will also generate the EC2 jump host that we will use later. A security group rule will be created to allow SSH ingress to the jump host from your public IP.

$ git clone
$ sed -i "s,123456789012,$AWS_ACCOUNT_ID,g"
$ echo "my-ip = \"`curl`\"" >> terraform.tfvars
$ ssh-keygen -t rsa -N '' -C "ECS Key" -f ~/.ssh/ecs
$ echo "ecs-public-key = \"`cat ~/.ssh/`\"" >> terraform.tfvars

Before continuing you should copy the contents of the private key (~/.ssh/ecs) into the user_data.tpl file at the line “INSERT PRIVATE KEY HERE”.

Now, you can deploy this Terraform stack:

$ terraform init
$ terraform apply


We now need to deploy a CodePipeline project to build our Docker image.

$ git clone
$ sed -i "s,123456789012,$AWS_ACCOUNT_ID,g"
$ terraform init
$ terraform apply

When complete, a build will be attempted but it will fail because our CodeCommit repo is empty.

The code that you need to push is at colmmg/docker/fargate-ssh but before we push this code to CodeCommit let me explain the configuration. The Dockerfile is straightforward, we are installing httpd and openssh and exposing ports 80 and 22. Our ENTRYPOINT is modified to run the script. In this script we start SSH in the background, we write the value of environment variable `SSH_PUBLIC_KEY` to the `/root/.ssh/authorized_keys` file and lastly we start httpd in the foreground. The `SSH_PUBLIC_KEY` environment variable is set in our ECS task definition which we will setup later.

Lets clone our CodeCommit repo. You may obtain the clone details from the CodeCommit console.

$ git clone codecommit::us-east-1://apache

Grab the code from colmmg/docker/fargate-ssh and copy it into your CodeCommit repo:

$ git clone /tmp/docker-fargate-ssh
$ cp /tmp/docker-fargate-ssh/* .
$ git add .
$ git commit -m "Initial commit"
$ git push origin master

If you return to the CodePipeline console you will see that our push has triggered a build.

Click on the build details to follow along as your image is built by CodeBuild.


We can now setup our ECS cluster and service that will run our Docker container.

$ git clone
$ sed -i "s,123456789012,$AWS_ACCOUNT_ID,g"
$ terraform init
$ terraform apply

When complete you can navigate to the ECS service and after a short period of time a task will be launched which will be your running container:

EC2 Jump Host

With everything in place we can now test out our solution. From the EC2 service, find the EC2 instance that was launched earlier. It will be named `ecs-jumpbox`. SSH into this instance:

$ ssh -i ~/.ssh/ecs

Now try to ssh into your Docker container:

$ ssh root@apache.local
Last login: Sun Apr  5 16:16:38 2020 from ip-173-51-19-127.ec2.internal

Success! If you are wondering how `apache.local` resolves to the container’s IP it is because as part of our deployments we configured ECS Service Discovery.


That’s it! I hope you find this guide useful!

Automated UI Testing With Amazon CloudWatch Synthetics


In a previous article I investigated the use of Amazon Sagemaker to perform automated UI testing for a web application. The intent was to produce an automated test suite which could detect obvious visual errors. In this article I will demonstrate a more robust technique using Amazon CloudWatch Synthetics. More specifically, I will be using the “visual monitoring” feature of CloudWatch Synthetics.

CloudWatch Synthetics now supports visual monitoring, allowing you to catch visual defects on your web application’s end user experience. The new visual monitoring feature makes it possible to catch visual defects that cannot be scripted.


Let’s first deploy the required infrastructure which will be used for the demonstration. We can provision the required resources using Terraform. The terraform code I have created at can be used as follows:

git clone
cd cloudwatch-synthetics-demo/
terraform init
terraform apply

Be sure to make a note of the alb_dns_name output — we will need this later.


With our infrastructure provisioned we now need to deploy our sample web application. The code I have authored at represents a mock search engine application. We can deploy it with:

git clone codecommit::us-east-1://cloudwatch-synthetics-demo codecommit
git clone
cp -r cloudwatch-synthetics-demo/* codecommit/
cd codecommit/
git add .
git commit -m "v1"
git push origin master

Initial Deployment

From the AWS Management Console you can follow the deployment of our mock web application in the AWS CodePipeline and AWS CodeDeploy services.

CodeDeploy Start
CodeDeploy Complete

We can now view the web application by opening the alb_dns_name output from the terraform step.

Mock Search Engine Web Application


You can use Amazon CloudWatch Synthetics to create canaries, configurable scripts that run on a schedule, to monitor your endpoints and APIs. Canaries offer programmatic access to a headless Google Chrome Browser via Puppeteer or Selenium Webdriver.

From the CloudWatch Synthetics service in the AWS console, we will create a canary to monitor our web application.

Create the canary using the “Visual monitoring” blueprint.

Create Canary Wizard

Provide myapp as the name of the canary, enter the alb_dns_name output from the terraform step as the application endpoint with port 8080 and select “15%” for the visual variance threshold.

Create Canary Endpoint

Let the wizard create a new S3 bucket to store the canary results and select the cloudwatch-synthetics-demo-canary IAM role.

Canary Data Storage

Deploy the canary into the VPC created by terraform — within the two private subnets and select the cloudwatch-synthetics-demo-canary security group.

Canary VPC

After clicking “Create”, allow some time for the canary resources to be created. You should then see your canary starting and completing.

Canary First Run
Canary First Pass

Excluded Areas

In our mock web application, the first search result contains an image with some associated text. Suppose in a real world scenario, this image changes from search-to-search. Let’s demonstrate this by changing the image of our mock application. Using the ECS Exec feature described in we can remote into our Fargate task to manually edit the HTML file. Retrieve the task id from the ECS service in the AWS console to remote in:

aws ecs execute-command  \
    --region us-east-1 \
    --cluster cloudwatch-synthetics-demo \
    --task <task-id-retrieved-from-aws-console> \
    --container main \
    --command "/bin/bash" \

Once connected, switch the image with:

sed -i "s/canary.jpg/canary2.jpg/g" /var/www/html/index.html

If we now run our canary we see an error.

Failed Canary

We do not want our UI testing to alert us if this part of our webpage varies because we know it always will have variance. We can configure our canary to ignore changes to this zone. If you edit your canary from the AWS console and scroll to the “Visual Monitoring” section you will see an “Edit Baseline” button.

Canary Edit – Visual Monitoring

Clicking on this button, we can draw areas in our baseline image that should be excluded from our testing. Let’s do this for the image in our first search result.

Baseline Image Edit

Running the canary again we now get a pass.

Canary Pass With Excluded Areas

Deployment Pipeline

We are now ready to test this canary in our deployment pipeline. Our ideal deployment pipeline will behave as follows:

  1. Code changes to our web application are deployed to an out-of-service (staging) area.
  2. Canary tests are performed against this staging area.
  3. Any failures in the canary tests will result in the cancellation of the deployment.
  4. When no test failures occur, our staging area should be promoted to be in-service and serving our end users.

To meet these requirements we can combine the AfterAllowTestTraffic CodeDeploy hook with our visual monitoring canary. The lambda function deployed as part of our terraform code ( triggers the start of our canary during the AfterAllowTestTraffic phase of our CodeDeploy deployment. If you remember from earlier, we set our application endpoint for the canary to use port 8080 — this is the test listener port which can only be accessed internally and not by our end users. It is also the listener where CodeDeploy routes new versions of our application during a deployment.

Let’s test our solution by introducing an obvious error in our web application’s UI. From our CodeCommit checkout, run the following commands to change the font-size of our search result’s title text from 18px to 36px.

sed -i "s/18px/36px/g" index.html
git commit -a -m "v2"
git push origin master

Like our previous deployment, you can follow along in the AWS console in the CodePipeline and CodeDeploy services. The CodeDeploy service will eventually result in the following:

CodeDeploy Fail

As expected, the AfterAllowTestTraffic phase has thrown an error. We can get further details by checking our canary.

Canary Font Failure

The pipeline has done exactly what we expected. UI errors have been detected and the deployment has been cancelled.

Intentional UI Changes

Should you wish to release intentional UI changes which will create significant variance from your baseline you can temporarily disable the UI testing canary for that release. To do this, simply update the RUN_CANARY environment variable for the deploy hook lambda function before releasing.

Lambda Environment Variables

After the release you can then edit the canary and set the next run as the new baseline.

Set New Baseline


It is important to note that the synthetics api does not return execution ids when canary runs are started. So it is not possible to know with 100% certainty that the canary we trigger was successful or not. In my Python code, I added some sub-optimal sleeps to workaround this but you may or may not be able to rely on this in a production setting. Hopefully, Amazon can address this in time.

Canaries are not limited to visual monitoring. It is worth exploring some of the other features which you could incorporate into a deployment pipeline or even a health check for your production endpoints.

To clean-up the resources created in this article, manually delete lambda layers and functions created by the canary. Empty and delete the cw-syn-* S3 bucket then execute terraform destroy from your checkout of the terraform code. Be aware that it will take some time for this operation to complete so please be patient.

I hope you have found this article useful.

OpenVPN over AWS Systems Manager Session Manager


AWS Systems Manager Session Manager allows you to establish a shell session to your EC2 instances and Fargate containers even when these resources don’t have a public IP address. Also, with EC2 instance port forwarding, you can redirect any port inside your remote instance to a local port on your client to interact with your private EC2 instance based applications. A common use case for this might be to access a web application running on your instance from your browser.

However, Session Manager sessions are limited to a single resource — one EC2 instance or one Fargate container. So, it is not possible to use Session Manager alone to create an ingress point allowing access to all resources within your private VPC.

In this article, I will show how you can combine Session Manager with OpenVPN to allow a secure network path from your client to all resources within your private VPC.


The below diagram illustrates the design for our solution.

We will launch an EC2 instance in a private subnet which will act as our OpenVPN server. We will then establish a Session Manager port forwarding session between our client and this EC2 instance. Then, using an OpenVPN client, we will tunnel to the OpenVPN server over the Session Manager session. With our VPN connection in place, we can then access all private applications in our VPC.

The configuration of the OpenVPN server will be done with the script at Because Session Manager does not support UDP, our OpenVPN server will be configured in TCP mode.


To be able to use Session Manager from the AWS CLI you also need to install the Session Manager Plugin.

Install the OpenVPN client.

Install the Terraform CLI.


The solution can be deployed via Terraform:

git clone
cd openvpn-ssm
terraform init
terraform apply

This Terraform code will provision the required VPC and Session Manager resources. The OpenVPN server is not deployed here… that comes later.

Make a note of the output alb-dns. This is the DNS record for a sample application load balancer deployed to the private subnets. If you try to access this you will not be able to connect.

This is expected because as we can see from the load balancer settings, this is an internal load balancer, meaning it can only be accessed from resources within the VPC.

Session Manager Preferences

The Session Manager preferences can’t be configured via Terraform. So we must set these manually with the following steps:

  • Login to the AWS console;
  • Open the Systems Manager service;
  • Click on ‘Session Manager’ under ‘Node Management’;
  • Click on the ‘Preferences’ tab;
  • Click ‘Edit’;
  • Enable KMS Encryption and point to the alias/session-manager key;
  • Enable session logging to S3 bucket ssm-session-logs... with encryption enabled;
  • Enable session logging to CloudWatch log group /aws/ssm/session-logs with encryption enabled;
  • Save the changes;

Start VPN Script

With our infrastructure deployed via Terraform we can now try to launch our OpenVPN server. The script provided at can be used to do this. This script performs the following steps:

  • Obtains the launch template for the OpenVPN instance;
  • Starts the EC2 instance;
  • Waits for the instance to be ready for Session Manager sessions;
  • Waits for the instance to complete its user_data which, is where the OpenVPN server is installed and configured;
  • Downloads the OpenVPN client config file generated by the server;
  • Starts a port forwarding Session Manager session;

Let’s try this script now by running:


Our Session Manager session is up and awaiting connections.

OpenVPN Client

Now we need to configure our OpenVPN client. In the previous step, an ssm.ovpn file was downloaded from S3. Make a note of the location of this file. Next, launch the OpenVPN client and select to import a profile by file.

Navigate to the location of the ssm.ovpn file.

Now click on Connect.

Our tunnel is now in place.

If the VPN fails to connect on Windows Subsystem for Linux try restarting WSL by running the following from a command prompt:

wsl --shutdown

Then rerun bash and try to connect from your OpenVPN client again.


Now we can test if our solution works by trying to access the load balancer that failed to connect earlier. Try it again in your browser:


Important Points

This solution was created for fun more than as a realistic real-world solution. The performance of this feature has not been thoroughly tested. Indeed, we are using TCP instead of UDP because Session Manager only supports TCP. TCP is known to be sub-optimal for VPN traffic and can suffer from a phenomenon know as TCP meltdown.

The security of this configuration is quite strong however. The communication between client and AWS is both HTTPS and KMS encrypted. Also, no customer managed networking ingress is required — so your VPC can be entirely private.

You may find this useful in a small development team environment. But for corporate settings, consider AWS Client VPN instead.


To clean-up the resources created in this guide, first destroy any EC2 instances with:

aws ec2 terminate-instances --instance-ids $(aws ec2 describe-instances --filters "Name=tag:Name,Values=openvpn-server"  --query Reservations[*].Instances[*].[InstanceId] --region us-east-1 --output text |xargs) --region us-east-1

Then, from the root of your checkout of the Terraform code run:

terraform init
terraform destroy

Serverless Caching With AWS AppConfig and Lambda Extensions


In this article I will show how you can deploy a simple caching solution for AWS Lambda functions by combining the AWS AppConfig service with the Lambda Extensions feature.

To demonstrate this, lets create a problem that we must solve. Suppose you have been asked to implement a solution that will allow the engineers on your team to query any IPv4 address to check if it is in the AWS IP address ranges.


Our solution to this problem will be very straightforward. We will have an Amazon S3 bucket serving a single HTML page which, allows the user to input an IP to check. This rudimentary web application will make a request to an Amazon API Gateway REST API. The API will use Lambda to check the input IP against the list of Amazon IP ranges. The IP ranges will be stored in AWS AppConfig.

Next, we will take advantage of the AppConfig Lambda extension so that our function does not need to call AppConfig on every invocation.

Lambda Extension

To use the AppConfig Lambda extension, we first attach a Lambda layer to our function code. The documentation here provides all of the per-region ARNs for the AppConfig Lambda extension.

Once attached, we modify our function code to make a request to a localhost HTTP endpoint that is created by the layer. This endpoint will regularly poll AppConfig for your configuration data and maintain a local cache of it which is available to your function.

Your function code can then query the endpoint for the configuration data. Some sample code showing this in use would be:

import json
import urllib.request

def lambda_handler(event, context):

  app_config_app_name = "foo"
  app_config_env_name = "live"
  app_config_profile_name = "data"

  url = f'http://localhost:2772/applications/{app_config_app_name }/environments/{app_config_env_name}/configurations/{app_config_profile_name}'
    config = json.loads(urllib.request.urlopen(url).read())


The AWS Serverless Application Model (SAM) code at can be used to deploy this solution.

This code defines:

  • A lambda function;
  • The AppConfig Lambda extension layer;
  • An API Gateway API;

Deploy the code with:

git clone
cd awsip
sam deploy --guided

When deployed, the API endpoint will be output. We will need this value in the next section.

Web Application

Let’s now create an S3 bucket to host our web application. From the S3 service, create a bucket and untick the Block all public access checkbox and acknowledge the warning.

In your checkout of the sam/awsip repository you will see a HTML document at files/index.html. Edit this file and replace the variable value with the value output in the previous step.


Next, upload this file to your bucket. Expand the “Permissions” section and tick the “Grant public-read access” radio button and acknowledge the warning.


We now need to deploy the IP data to AppConfig. In your checkout of the sam/awsip repository there is a Python script at scripts/ Run this script to load the AWS IP ranges into AppConfig.

python3 scripts/

We can confirm the script has worked by seeing the “1” and “2” configuration profiles under the “awsip” application in the AppConfig console.


We can now test our solution. In the S3 console, open the index.html object we uploaded and open the link under “Object URL”. Lets try an IP that we know does not exist in the AWS ranges.

Great! That works! Now let’s also check for one that does exist in the AWS range. ( can be used).


Confirming Caching

Our solution works but can we verify that we are seeing a performance improvement by using the Lambda extension?

Earlier when you ran sam deploy, two outputs were the function ARNs. The first, “CheckIpFunctionArn” is the function attached to our API which contains the Lambda extension feature. The second, “AppConfigCheckIpFunctionArn” is a separate function that does not have the Lambda extension and instead makes a request to AppConfig directly for the configuration.

In your checkout of the sam/awsip repository you will see a Bash script at scripts/ Run this script and provide the name of the first function. E.g.

bash scripts/ awsip-CheckIpFunction-S7XVscISBk2q

This script will invoke our function and report how long the function ran for. It will invoke the function 20 times and report the non cold start average duration.

So we see an average duration of approx. 897ms. Lets now try with the other function.

bash scripts/ awsip-AppConfigCheckIpFunction-H2quZz4TWC28

We see now that the average duration is 959ms. So our caching saves us approx. 60ms.


This very simple solution implements serverless caching by using AWS AppConfig as a data source.

You can clean-up the resources by deleting the CloudFormation stacks created by SAM and you may also use the Python script at scripts/ to remove the AppConfig resources.

Serverless Jenkins and Amazon ECS Exec

In this very short article I will show how you can create a serverless Jenkins instance and start a shell session in an AWS Fargate task without opening SSH ports or managing SSH keys.

Why Serverless?

No server is easier to manage than no server.

Werner Vogels, CTO @ Amazon

Managing a fleet of EC2 instances for your Jenkins slaves is cumbersome and time consuming, even when baking the configuration into an Amazon Machine Image (AMI). By combining AWS serverless products we can run an instance of Jenkins with substantially less overhead.


We will run our Jenkins master node in an AWS Fargate cluster. The JENKINS_HOME will be stored on an Amazon Elastic File System. We won’t have Jenkins slaves but will instead run jobs on AWS CodeBuild using the Jenkins plugin.


The Terraform code at can be used to provision the components we need. We can utilise this code as follows:

git clone
cd sls-jenkins
terraform init
terraform apply

Once applied, we get the following:

Wait a few moments for the ECS task to fully start then open the jenkins-url output in your browser. You should see the Unlock Jenkins page:

ECS Exec

We can obtain the password from the task logs.

However, let’s take advantage of a new feature of Fargate called ECS Exec. With this feature we can start a shell session in any container without requiring SSH ports to be opened or authenticating with SSH keys. To use this feature, ensure you have the latest version of the AWS CLI as well as the latest version of the session manager plugin.

Find the task id of the sls-jenkins task in the ECS console and use it with command:

aws ecs execute-command  \
    --region us-east-1 \
    --cluster sls-jenkins \
    --task <task-id> \
    --container sls-jenkins \
    --command "/bin/bash" \

You can then find the password in the /mnt/efs/secrets/initialAdminPassword file.

Use the value to login to Jenkins and complete the setup wizard.


We will run Jenkins jobs in AWS CodeBuild.

AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers. 

AWS CodeBuild – Fully Managed Build Service (

First, install the CodeBuild plugin.

Next, create a new pipeline job. You can use the sample project at as the source of the project.

The Jenkinsfile in this sample project starts a build of the sls-jenkins-small CodeBuild project. When we run the build we get the following output:

The logs from CodeBuild are pulled into Jenkins and displayed in the console output.

Persistent Storage

To verify our Jenkins configuration will persist, lets stop the ECS task.

And if we open Jenkins in our browser we see an outage as expected.

ECS will now launch a new task and will remount the EFS file system that stores our JENKINS_HOME. And if successful we will see the sample-project job that we created earlier.



This solution may be a good fit for very simple Jenkins implementations. You will find that the EFS performance is not as good as EBS or ephemeral storage. There is also a queueing and provisioning time for CodeBuild which you would not experience with your own fleet of EC2 instances. These factors should be considered but if you spend a lot of time maintaining your CI/CD infrastructure, this solution could be useful to you.

Blue/Green Deployments in AWS Fargate with Automated Testing and Rollbacks


AWS CodeDeploy makes it easy to setup Blue/Green deployments for your containerised applications running in AWS Fargate. In this article, I will show how you can configure CodeDeploy and Fargate to allow automated testing of your deployments before they receive production traffic. Additionally, I will show how you can configure automatic rollbacks, if your application generates errors after receiving production traffic.


For this demonstration, our container application will be a simple Apache web server. An application load balancer will route production traffic to the containers. Our Docker code will be stored in an AWS CodeCommit repository. AWS CodeBuild will be used to build the Docker image and AWS CodeDeploy will of course be used to perform the deployments. We will use AWS CodePipeline to wrap the build and deploy stages into a deployment pipeline. The below diagram represents our design.

Blue/Green Deployment Pipeline Design

During a deployment, the new v2 code is launched in a second set of one or more containers. These new containers are registered with the “green” target group. The green target group is registered to a test listener on the application load balancer (port 8080 in this demonstration). We will then perform our testing against the test listener. When testing is complete, we signal for the deployment to continue at which point the live listener (port 80) is registered to the green target group. The security group rules for our load balancer only allow ingress on port 8080 from within our VPC thus, preventing end-users from accessing the release prematurely.

As we will see later, CodeDeploy automatically handles the registration of containers to the blue/green target groups and also the registration of listeners to target groups.


The resources deployed in this solution are described with Terraform — an infrastructure as code software tool. Install the latest version of the Terraform CLI.

Next, ensure you have the git-remote-codecommit utility installed. Most often this can be installed with:

sudo pip install git-remote-codecommit


The Terraform code at aw5academy/terraform/ecs-blue-green-demo can be used to provision the resources we need for this demonstration. Deploy this code to your environment by running:

git clone
cd ecs-blue-green-demo/
terraform init
terraform apply
Output From Terraform Apply

Note the “alb_dns_name” output — we will need this value later.


We now need to push our Docker code to the CodeCommit repository created by Terraform. Run the following commands to set it up:

git clone codecommit::us-east-1://ecs-blue-green-demo codecommit
git clone
cp -r ecs-blue-green-demo/* codecommit/
cd codecommit/
git add .
git commit -m "v1"
git push origin master


If you open the AWS Console and navigate to the CodePipeline service you will see that the “ecs-blue-green-demo” pipeline has started due to our commit to the CodeCommit repository. Wait for the pipeline to complete our first deployment.

CodePipeline Successful Release

Now lets check that our application is working by opening the “alb_dns_name” Terraform output from earlier in our browser.

Application Response

Great! We have a working application.

CodeDeploy Hooks

Hooks are a feature of CodeDeploy which allow you to perform actions at certain points in a deployment before the deployment continues to the next step. The Hooks for ECS/Fargate are defined here. The hook we are most interested in is “AfterAllowTestTraffic”. We want to run tests during this phase of the deployment to validate our deployment before sending production traffic to our release. To do this we will add an AWS Lambda function reference to our appspec.yaml. This lambda (source code at aw5academy/terraform/ecs-blue-green-demo/lambda-src/deploy-hook/ writes the hook details to an Amazon S3 bucket for a CodeBuild project to reference. This CodeBuild project (source code at aw5academy/docker/ecs-blue-green-demo/ runs in parallel to our CodeDeploy deployment in our pipeline and performs our tests during the “AfterAllowTestTraffic” stage.

Automated Testing

Let’s test our deployment process by deliberately introducing an error. If you examine our test script at aw5academy/docker/ecs-blue-green-demo/ you can see that we expect our application to return “Hello from v1”. So let’s break this by changing it to return “Hello from v2” instead. Run the following commands from the CodeCommit checkout to do this:

sed -i "s,Hello from v1,Hello from v2,g"
git commit -a -m "v2"
git push origin master

This action will automatically trigger our pipeline and if you navigate to the CodeDeploy service in the AWS Console you can follow the deployment when it starts. After some time you should see a failure on the “AfterAllowTestTraffic” stage as we expected.

CodeDeploy Failure

When we check the CodeBuild logs for our test project we can see the problem. As we noted, our tests still expect the application to respond with “Hello from v1”.

CodeBuild Error Logs

CodeDeploy and CloudWatch Alarms

There is one more way we can validate our deployments. Suppose we would like to monitor our deployments for some time after we route production traffic to them. And if we notice any issues we would like to rollback. By combining CodeDeploy and CloudWatch Alarms we can do this in an automated way.

AWS CodeDeploy allows you to retain the existing containers for a period of time after a deployment. In our demonstration, for simplicity, we have configured it to 5 minutes but it can be many hours if you wish. With this setting, and properly configured CloudWatch alarms, you can monitor your application post-deployment and if any of your alarms move into the alarm state during the retention time, CodeDeploy will automatically rollback to the previous version.

In our demonstration, we have configured our Docker container to send the httpd access logs to a CloudWatch Logs group. A log metric filter will send a data point whenever our httpd access logs contain the string ” 404 ” — i.e. whenever a request is made to the server which can’t be served. Next, we have a CloudWatch alarm that will move into the alarm state when 1 or more data points are received from the log metric filter.

In the next section we will see how CodeDeploy works with this CloudWatch alarm to automatically rollback when needed.

Automated Rollbacks

Let’s go back and fix the error we introduced. In our CodeCommit checkout, run the following commands:

sed -i "s,Hello from v1,Hello from v2,g"
git commit -a -m "v2 -- fix test"
git push origin master

Our tests have been corrected to match the new response from our application. If you open the AWS CodeDeploy service you should see the deployment happening again. This time you will see that it proceeds past the “AfterAllowTestTraffic” stage and that production traffic has been routed to the new set of containers.

CodeDeploy Wait

We can verify by opening the URL from our Terraform “alb_dns_name” output.

Application Response

Our application has been fully released and is serving production traffic. Now let’s deliberately cause an error by generating a 404. You can do this by entering any random path to the end of our URL. As expected we get a 404.

Application 404 Response

When we inspect our CloudWatch logs we can see the request in the access logs.

CloudWatch Logs 404 Error

Next, if we go back to CodeDeploy we should see a reporting of the alarm and a rollback being initiated.

CodeDeploy Alarm Rollback

Looks good! Now to confirm, we open our URL from the Terraform “alb_dns_name” output again to verify that the application has been rolled back to v1.

Application Response



I hope this article has demonstrated how powerful AWS CodeDeploy can be when configured with supporting services and features.

Ensure you clean-up the resources created here by running the following from the root of your checkout of the Terraform code:

terraform init
terraform destroy