Serverless File Transfer Workload – Part 2 – AntiVirus

Introduction

We require uploaded files to be scanned for viruses before they can be processed further.

Design

Our design for this solution can be represented in the following diagram.

AntiVirus Diagram

There is a lot in this so let’s describe all that is happening here.

  • We use ClamAV to perform the anti-virus scans.
  • ClamAV definitions are stored in an Amazon Elastic File System (EFS).
  • An Amazon EventBridge scheduled rule starts an Amazon Elastic Container Service (ECS) task periodically (every 3 hours) which, runs freshclam to update the virus database on the EFS file system.
  • A bucket notification is created for the S3 bucket that stores files to be scanned.
  • When new objects are created in this bucket, the event is sent to an Amazon Simple Queue Service (SQS) queue.
  • An Amazon EventBridge scheduled rule invokes an AWS Lambda function every minute.
  • The lambda function uses an approach documented in this guide to determine the ScanBacklogPerTask by reading attributes of the SQS queue and the ECS service’s task count.
  • The lambda publishes the ScanBacklogPerTask metric to Amazon CloudWatch.
  • An Amazon CloudWatch alarm which, is monitoring the ScanBacklogPerTask metric, notifies the Application Auto Scaling service.
  • Application Auto Scaling updates the running task count of an ECS service.
  • The tasks in the ECS service mount the EFS file system so that the latest ClamAV virus definitions are available.
  • The tasks then receive messages from the SQS queue.
  • Each message contains details of the S3 object to be scanned. The task downloads the object and performs a clamdscan on it.
  • The result of the virus scan (either “CLEAN” or “INFECTED”) is set as the “av-status” tag on the S3 object.
  • Note also that the ECS scan service runs in a protected VPC subnet. That is, a subnet which has no internet access.

Docker

The Docker code for the ECS tasks can be found at aw5academy/docker/clamav. The Docker containers built from this code poll SQS for messages and perform the ClamAV virus scan. We will come back to this later.

Terraform

The Terraform code that will provision our infrastructure can be found at aw5academy/terraform/clamav.

When you apply the code you will prompted for a bucket name. Enter the name of the bucket that was created in the first part of this article.

Terraform apply

Configuration

When Terraform is applied, we now have to push the Docker code to a CodeCommit repository created by Terraform. The following steps will do this:

git clone https://gitlab.com/aw5academy/docker/clamav.git clamav-aw5academy
pip3 install git-remote-codecommit
export PATH=$PATH:~/.local/bin
git clone codecommit::us-east-1://clamav
cp clamav-aw5academy/* clamav/
cd clamav
git add .
git commit -m "Initial commit"
git push origin master

You should then see the code in the CodeCommit console.

CodeCommit

Next, we need to start an AWS CodeBuild project which will clone the clamav repository, perform a Docker build and push the image to an Amazon Elastic Container Registry (ECR) repository.

Docker build in AWS CodeCommit
ECR repository

One last step is we need to trigger a run of the freshclam task so that the ClamAV database files are present on our EFS file system. The easiest way to do this is to update the schedule for the task from the ECS console and set it to run every minute.

ECS Scheduled Task

We can verify that the database is updated from the task logs.

Freshclam logs

Testing

Now let’s test our solution by uploading a file directly to the S3 bucket. When we do, we can check the metrics for our SQS queue for activity as well as the logs for the ECS scan tasks.

SQS metrics
ECS scan logs

Success! We can see from the metrics that a message was sent to the queue and deleted shortly after. And the ECS logs show the file being scanned and the S3 object being tagged.

Virus Check

As one final test, let’s see if a virus will be detected and appropriate action taken. This solution has been designed to block access to all objects uploaded to S3 unless they have been tagged with “av-status-CLEAN”. So we expect to have no access to a virus infected file.

Rather than using a real virus we will use the EICAR test file. Let’s upload a file with this content to see what happens.

S3 Infected

Great! The object has been properly tagged as infected. But are we blocked from accessing the file? Let’s try downloading it.

S3 download error

We are denied as expected.

Now let’s check out part 3 where we implement the loading of our CSV data.

One thought on “Serverless File Transfer Workload – Part 2 – AntiVirus

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s