In this very short article I will show how you can create a serverless Jenkins instance and start a shell session in an AWS Fargate task without opening SSH ports or managing SSH keys.
No server is easier to manage than no server.
Werner Vogels, CTO @ Amazon
Managing a fleet of EC2 instances for your Jenkins slaves is cumbersome and time consuming, even when baking the configuration into an Amazon Machine Image (AMI). By combining AWS serverless products we can run an instance of Jenkins with substantially less overhead.
git clone https://gitlab.com/aw5academy/terraform/sls-jenkins.git
Once applied, we get the following:
Wait a few moments for the ECS task to fully start then open the jenkins-url output in your browser. You should see the Unlock Jenkins page:
We can obtain the password from the task logs.
However, let’s take advantage of a new feature of Fargate called ECS Exec. With this feature we can start a shell session in any container without requiring SSH ports to be opened or authenticating with SSH keys. To use this feature, ensure you have the latest version of the AWS CLI as well as the latest version of the session manager plugin.
Find the task id of the sls-jenkins task in the ECS console and use it with command:
You can then find the password in the /mnt/efs/secrets/initialAdminPassword file.
Use the value to login to Jenkins and complete the setup wizard.
We will run Jenkins jobs in AWS CodeBuild.
AWS CodeBuild is a fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy. With CodeBuild, you don’t need to provision, manage, and scale your own build servers.
The Jenkinsfile in this sample project starts a build of the sls-jenkins-small CodeBuild project. When we run the build we get the following output:
The logs from CodeBuild are pulled into Jenkins and displayed in the console output.
To verify our Jenkins configuration will persist, lets stop the ECS task.
And if we open Jenkins in our browser we see an outage as expected.
ECS will now launch a new task and will remount the EFS file system that stores our JENKINS_HOME. And if successful we will see the sample-project job that we created earlier.
This solution may be a good fit for very simple Jenkins implementations. You will find that the EFS performance is not as good as EBS or ephemeral storage. There is also a queueing and provisioning time for CodeBuild which you would not experience with your own fleet of EC2 instances. These factors should be considered but if you spend a lot of time maintaining your CI/CD infrastructure, this solution could be useful to you.
AWS CodeDeploy makes it easy to setup Blue/Green deployments for your containerised applications running in AWS Fargate. In this article, I will show how you can configure CodeDeploy and Fargate to allow automated testing of your deployments before they receive production traffic. Additionally, I will show how you can configure automatic rollbacks, if your application generates errors after receiving production traffic.
For this demonstration, our container application will be a simple Apache web server. An application load balancer will route production traffic to the containers. Our Docker code will be stored in an AWS CodeCommit repository. AWS CodeBuild will be used to build the Docker image and AWS CodeDeploy will of course be used to perform the deployments. We will use AWS CodePipeline to wrap the build and deploy stages into a deployment pipeline. The below diagram represents our design.
During a deployment, the new v2 code is launched in a second set of one or more containers. These new containers are registered with the “green” target group. The green target group is registered to a test listener on the application load balancer (port 8080 in this demonstration). We will then perform our testing against the test listener. When testing is complete, we signal for the deployment to continue at which point the live listener (port 80) is registered to the green target group. The security group rules for our load balancer only allow ingress on port 8080 from within our VPC thus, preventing end-users from accessing the release prematurely.
As we will see later, CodeDeploy automatically handles the registration of containers to the blue/green target groups and also the registration of listeners to target groups.
If you open the AWS Console and navigate to the CodePipeline service you will see that the “ecs-blue-green-demo” pipeline has started due to our commit to the CodeCommit repository. Wait for the pipeline to complete our first deployment.
Now lets check that our application is working by opening the “alb_dns_name” Terraform output from earlier.
Great! We have a working application.
Hooks are a feature of CodeDeploy which allow you to perform actions at certain points in a deployment before the deployment continues to the next step. The Hooks for ECS/Fargate are defined here. The hook we are most interested in is “AfterAllowTestTraffic”. We want to run tests during this phase of the deployment to validate our deployment before sending production traffic to our release. To do this we will add an AWS Lambda function reference to our appspec.yaml. This lambda (source code at aw5academy/terraform/ecs-blue-green-demo/lambda-src/deploy-hook/lambda_function.py) writes the hook details to an Amazon S3 bucket for a CodeBuild project to reference. This CodeBuild project (source code at aw5academy/docker/ecs-blue-green-demo/test.sh) runs in parallel to our CodeDeploy deployment in our pipeline and performs our tests during the “AfterAllowTestTraffic” stage.
Let’s test our deployment process by deliberately introducing an error. If you examine our test script at aw5academy/docker/ecs-blue-green-demo/test.sh you can see that we expect our application to return “Hello from v1”. So let’s break this by changing it to return “Hello from v2” instead. Run the following commands from the CodeCommit checkout to do this:
sed -i "s,Hello from v1,Hello from v2,g" start.sh
git commit -a -m "v2"
git push origin master
This action will automatically trigger our pipeline and if you navigate to the CodeDeploy service in the AWS Console you can follow the deployment when it starts. After some time you should see a failure on the “AfterAllowTestTraffic” stage as we expected.
When we check the CodeBuild logs for our test project we can see the problem. As we noted, our tests still expect the application to respond with “Hello from v1”.
CodeDeploy and CloudWatch Alarms
There is one more way we can validate our deployments. Suppose we would like to monitor our deployments for some time after we route production traffic to them. And if we notice any issues we would like to rollback. By combining CodeDeploy and CloudWatch Alarms we can do this in an automated way.
AWS CodeDeploy allows you to retain the existing containers for a period of time after a deployment. In our demonstration, for simplicity, we have configured it to 5 minutes but it can be many hours if you wish. With this setting, and properly configured CloudWatch alarms, you can monitor your application post-deployment and if any of your alarms move into the alarm state during the retention time, CodeDeploy will automatically rollback to the previous version.
In our demonstration, we have configured our Docker container to send the httpd access logs to a CloudWatch Logs group. A log metric filter will send a data point whenever our httpd access logs contain the string ” 404 ” — i.e. whenever a request is made to the server which can’t be served. Next, we have a CloudWatch alarm that will move into the alarm state when 1 or more data points are received from the log metric filter.
In the next section we will see how CodeDeploy works with this CloudWatch alarm to automatically rollback when needed.
Let’s go back and fix the error we introduced. In our CodeCommit checkout, run the following commands:
sed -i "s,Hello from v1,Hello from v2,g" test.sh
git commit -a -m "v2 -- fix test"
git push origin master
Our tests have been corrected to match the new response from our application. If you open the AWS CodeDeploy service you should see the deployment happening again. This time you will see that it proceeds past the “AfterAllowTestTraffic” stage and that production traffic has been routed to the new set of containers.
We can verify by opening the URL from our Terraform “alb_dns_name” output.
Our application has been fully released and is serving production traffic. Now let’s deliberately cause an error by generating a 404. You can do this by entering any random path to the end of our URL. As expected we get a 404.
When we inspect our CloudWatch logs we can see the request in the access logs.
Next, if we go back to CodeDeploy we should see a reporting of the alarm and a rollback being initiated.
Looks good! Now to confirm, we open our URL from the Terraform “alb_dns_name” output again to verify that the application has been rolled back to v1.
I hope this article has demonstrated how powerful AWS CodeDeploy can be when configured with supporting services and features.
Here, we have an EventBridge rule watching for tagging operations against S3 objects in our bucket. When detected, our Lambda is invoked which, loads each record of the CSV as an item in a DynamoDB table.
An Amazon CloudWatch alarm which, is monitoring the ScanBacklogPerTask metric, notifies the Application Auto Scaling service.
Application Auto Scaling updates the running task count of an ECS service.
The tasks in the ECS service mount the EFS file system so that the latest ClamAV virus definitions are available.
The tasks then receive messages from the SQS queue.
Each message contains details of the S3 object to be scanned. The task downloads the object and performs a clamdscan on it.
The result of the virus scan (either “CLEAN” or “INFECTED”) is set as the “av-status” tag on the S3 object.
Note also that the ECS scan service runs in a protected VPC subnet. That is, a subnet which has no internet access.
The Docker code for the ECS tasks can be found at aw5academy/docker/clamav. The Docker containers built from this code poll SQS for messages and perform the ClamAV virus scan. We will come back to this later.
One last step is we need to trigger a run of the freshclam task so that the ClamAV database files are present on our EFS file system. The easiest way to do this is to update the schedule for the task from the ECS console and set it to run every minute.
We can verify that the database is updated from the task logs.
Now let’s test our solution by uploading a file directly to the S3 bucket. When we do, we can check the metrics for our SQS queue for activity as well as the logs for the ECS scan tasks.
Success! We can see from the metrics that a message was sent to the queue and deleted shortly after. And the ECS logs show the file being scanned and the S3 object being tagged.
As one final test, let’s see if a virus will be detected and appropriate action taken. This solution has been designed to block access to all objects uploaded to S3 unless they have been tagged with “av-status-CLEAN”. So we expect to have no access to a virus infected file.
Rather than using a real virus we will use the EICAR test file. Let’s upload a file with this content to see what happens.
Great! The object has been properly tagged as infected. But are we blocked from accessing the file? Let’s try downloading it.
We are denied as expected.
Now let’s check out part 3 where we implement the loading of our CSV data.
Suppose a file transfer workload exists between a business and their customers. A comma-separated values (CSV) file is transferred to the business and the records are loaded into a database. The business has regulatory requirements mandating that all external assets are virus scanned before being processed. Additionally, an intrusion prevention system (IPS) must operate on all public endpoints.
In the following 3 articles I will demonstrate how we can build a serverless system that meets these requirements.
Make a note of both the bucket-name and sftp-endpoint outputs… we will use both of these values later.
With Terraform applied we can inspect the created components in the AWS console. Let’s first check our SFTP endpoint which can be found in the AWS Transfer Family service.
We can also see the AWS Network Firewall which is in the VPC service.
Let’s test out our solution. First, in the root of the Terraform directory, an example.pem file exists which is the private key we will use to authenticate with the SFTP endpoint. Copy this to your Windows host machine so we can use it with WinSCP.
In WinSCP, create a new site and provide the sftp endpoint. For username we will use “example”.
Select “Advanced” and provide the path to the example.pem you copied over. It will require you to convert it to a ppk file.
Now login and copy a file across.
Lastly, verify the file exists in S3 from the AWS console.
Now let’s continue with part 2 where we will implement the anti-virus scanning.
This article will be a little bit different to previous posts. Having only just recently started to check out AWS Machine Learning I am still in the early stages of my study of these services. So for this article, I wanted to post what I have learned so far in the form of a possible usage for machine learning — automated UI testing.
Let’s suppose we have a web application that provides a listing of search results — maybe a search engine or some kind of eCommerce website. We want to ensure the listings are displaying correctly so we have humans perform UI testing. Can we train machines to do this work for us?
The most difficult part of building a machine learning model appears to be collecting the right training data. Our training data will consist of screenshots of the web page where the “good” images will be when the application is working as expected and the “bad” images are when there is some error in the display of the application.
We can then explode our test data by performing random orientation changes, contrast changes etc. This increases the number of images in our training set.
Now we can open the Amazon SageMaker service and create our training job. We upload the training data to an Amazon S3 bucket so that SageMaker can download it.
Once created, the training job will start. We can view metrics from the job as it is working.
You can see the training accuracy improving over time.
Now that we have our model trained, we can test how good it is by deploying it to a SageMaker Model Endpoint. Once deployed, we can test it with invoke-endpoint. We provide a screenshot image to this API call and the result returned to us will be two values: the probability of the image being “good” and the probability of it being “bad”.
A partial success! The model did well for some tests and not so well for others.
Some thoughts and conclusions I have made after completing this experiment:
The algorithm used in this model was Image Classification. I am not sure this is the best choice. Most of the “good” images are very similar. Probably too similar. We might need another algorithm which, rather than classify the image, detects abnormalities.
As mentioned earlier, gathering the training data is the difficult part. It is possible that this mock application is not capable of producing enough variation. A real world application may produce better results. Additionally, actual errors observed in the past could be used to train the model.
Even with the less than great results from this experiment, this solution could be used in a CI/CD pipeline. The sample errors I generated were sometimes very subtle, such as text being off by a few pixels. The model could be retrained to detect only very obvious errors. Then, an application’s build pipeline could do very quick sanity tests to detect obvious UI errors.
A recent AWS Fargate feature update has added support for S3 hosted environment files. In this article I will show how you could use this to manage your application’s configuration. I will also demonstrate how changes to the configuration can be released in a blue-green deployment.
The solution we will build will follow the design shown in the below diagram.
If you have any issues with this step, navigate to the CodeCommit service and open the ecs-env-file-demo repository for clone instructions and prerequisites.
As soon as we push our code to CodeCommit, our release pipeline will trigger. Navigate to the CodePipeline service and open the ecs-env-file-demo pipeline.
Wait until this release completes.
Application Configuration Changes
We can now test our process for making configuration changes. Navigate to the CodeCommit service and open our ecs-env-file-demo repository. Then open the cfg.env file. You can see that our configuration file has a value of “blue” for our CSS_BACKGROUND variable. This is the variable that our Apache server uses for the webpage’s background colour.
Let’s change this value to “green”, enter the appropriate Author details and click “Commit changes”.
We can now use the CodeDeploy service to follow our deployment. If you first navigate to the CodePipeline service and open our ecs-env-file-demo pipeline, when the CodeDeploy stage begins, click on the Details link to bring us to the CodeDeploy service.
Our deployment has started. Note, our deployments will use a Canary release with 20% of the traffic receiving the new changes for 5 minutes. After that, 100% of the traffic will receive the new changes. In your checkout of the Terraform code, there is a deployment-tester.html file. This is a page of 9 HTML iframes with the source being the DNS name of the load balancer in our application stack. The page auto refreshes every 5 seconds.
If you open this deployment-tester.html file (you may need to open developer tools and disable cache for it to be effective) you will be able to verify our release is working as expected. It should initially show just the original blue.
Now you can wait for CodeDeploy to enter the next stage.
We now have 20% of our traffic routed to the new application configuration — the green. Let’s check this in our deployment-tester.html file:
And to complete the process, we can wait for CodeDeploy to finish and verify the application is fully green.
Cleanup the created resources with:
I hope this very simple example has effectively demonstrated the new capability in AWS Fargate.
In this article I will show how you can run your AWS CodeBuild projects locally. AWS CodeBuild is a “fully managed continuous integration service that compiles source code, runs tests, and produces software packages that are ready to deploy”. By running your CodeBuild projects locally you can test code changes before committing, allowing you to rapidly develop and debug your projects.
We will use the EC2 instance as a mock for an application that needs to communicate with our Aurora database.
Note: at the time of writing this article, Terraform does not support RDS Proxy resources. So we will need to manually create this component from the AWS console.
Let’s first deploy our Terraform code with:
git clone https://gitlab.com/aw5academy/terraform/rds-proxy.git
Once Terraform has been applied, it is worth examining the security groups that were created.
We can see that the Aurora database only allows connections from the Proxy and the Proxy only allows connections from the EC2 instance.
Additionally, a Secrets Manager secret was created. Our RDS Proxy will use the values from this secret to connect to our database. Note how it is the proxy alone that uses these credentials. We will see later that our application (the EC2 instance) will use IAM authentication to establish a connection with the RDS proxy and so the application never needs to know the database credentials.
Now we can create our RDS proxy from the AWS RDS console. During the creation of the proxy, provide the following settings
Select PostgreSQL for Engine compatibility;
Tick Require Transport Layer Security;
Select rds-proxy-test for Database;
Select the secret with prefix rds-proxy-test for Secrets Manager secret(s);
Select rds-proxy-test-proxy-role for IAM role;
Select Required for IAM authentication;
Select rds-proxy-test-proxy for Existing VPC security groups;
Now wait for the proxy to be created. This can take some time. Once complete, obtain the RDS Proxy endpoint from the console which, we will use to connect to from our EC2 instance.
Let’s test our setup. SSH into the EC2 instance with:
The RDS Proxy feature can improve application security as we have seen, with the proxy alone having access to the database credentials and the application using IAM authentication to connect to the proxy.
Application resilience is improved since RDS Proxy improves failover times by up to 66%.
Lastly, your applications will be able to scale more effectively since RDS Proxy will pool and share connections to the database.
To cleanup the resources we created, first delete the RDS Proxy from the console and then from your terminal, destroy the Terraform stack with:
In this article I will show how you can launch an Amazon Linux EC2 instance with a desktop environment that will serve as a jumpbox. Connections to this jumpbox will be made through RDP via a session manager port tunneling session. By using session manager, our EC2 instance’s security group does not require ingress rules allowing RDP or other ports to connect, thus improving the security of the jumpbox.
Before continuing with this article I would strongly recommend reading my earlier article Access Private EC2 Instances With AWS Systems Manager Session Manager. That article will explain the fundamental workings of session manager and shows how to deploy resources to your AWS account that will be required for setting up the jumpbox described in this article.
When the session-manager stack is deployed we need to read some of the Terraform outputs as we will need their values for the jumpbox stack’s input variables. We can retrieve the outputs and set them as environment variables with:
git clone https://gitlab.com/aw5academy/terraform/jumpbox.git
After the stack deploys, wait approximately 5 minutes. This is to allow time for the converge of the aw5academy/chef/jumpbox Chef cookbook which, is part of the EC2 instance’s user data. This cookbook installs the MATE desktop environment on the Amazon Linux instance. Also see here for more information on installing a GUI on Amazon Linux.
Let’s make sure we can connect to the jumpbox with a terminal session. The jump.sh script can be used:
You should see something like the following:
Now we can try a remote desktop session. Terminate the terminal session with exit and then run:
bash jump.sh -d
You should now see the port forwarding session being started:
Also printed are the connection details for RDP. Open your RDP client and enter localhost:55678 for the computer to connect to and provide the supplied user name. Check the Allow me to save credentials option and click Connect:
Provide the password at the prompt and click OK:
Behind The Scenes
An explanation of what is occurring when we use our jump.sh script…
In order to start an RDP session the client needs to know the username and password for an account on the jumpbox. Rather than creating a generic account to be shared among clients we dynamically create temporary (1 day lifetime) accounts. This is accomplished through the following actions:
The client creates a random username using urandom;
The client creates a random password using urandom;
The client creates a SHA-512 hash of the password using openssl;
The jumpbox retrieves the hashed password from parameter store;
The jumpbox deletes the hashed password from parameter store;
The jumpbox creates an account with the provided username and the retrieved hash of the password;
The jumpbox marks the account and password to expire after 1 day;
With these steps, the password never leaves the client and is always stored either encrypted and/or hashed and is only stored for as long as it is required.
That’s all there is to it. After your jumpbox is enabled you can configure your private applications to accept traffic from the jumpbox’s security group. The chromium browser can then be used to access these applications securely. I hope you find this article useful.