This is the code and configuration to carry out the antivirus checks on a single file from S3
Lambda takes parameters to configure the scanning options.
Parameter Name | Optional | Default Value | Description | Example |
---|---|---|---|---|
consignment_id | false | N/A | TDR UUID for the consignment the object to scan belongs to | |
file_id | false | N/A | Name of the object to scan | |
user_id | true | None |
TDR UUID of the user who uploaded the object to scan | |
original_path | true | None |
Original path to the object to scan. Used to create local version of object for scanning | |
scan_type | true | N/A | Deprecated. Use combination of optional parameters to set configuration. Type of scan to run | metadata / consignment |
s3_source_bucket | true | tdr-upload-files-cloudfront-dirty-{tdr environment} |
S3 bucket containing the object to scan | {some AWS S3 bucket name} |
s3_source_bucket_key | true | {user_id}/{consignment_id}/{file_id} |
S3 bucket key of the object to scan | |
s3_upload_bucket | true | tdr-upload-files-{tdr environment} |
S3 bucket to copy clean objects to. If empty string then no copy occurs | |
s3_upload_bucket_key | true | {consignment_id}/{file_id} |
S3 bucket key of clean object. If empty value string no copy occurs | |
s3_quarantine_bucket | true | tdr-upload-files-quarantine-{tdr environment} |
S3 bucket to copy infected objects to | |
guard_duty_malware_scan_enabled | true | True | Flag whether source s3 bucket has AWS GuardDuty malicious object scanning enabled. |
Object to scan details:
- Consignment Id:
bf2181c7-70e4-448d-b122-be561d0e797a
- File Id:
myFileToScan.txt
- Original Path:
identifier1/identifer2/myFileToScan.txt
- s3 Source bucket name:
some-source-bucket
- s3 source object key: same as original path
- s3 clean bucket name:
some-clean-bucket
- s3 clean object key: same as s3 source object key
- s3 quarantine bucket name:
tdr-upload-files-quarantine-{tdr environment}
- AWS GuardDuty Malicious Scanning Enabled: False
Event configuration to support the above would be as follows:
{
"consignmentId": "bf2181c7-70e4-448d-b122-be561d0e797a",
"fileId": "myFileToScan.txt",
"originalPath": "identifier1/identifer2/myFileToScan.txt",
"s3SourceBucket": "some-source-bucket",
"s3SourceBucketKey": "identifier1/identifer2/myFileToScan.txt",
"s3UploadBucket": "some-clean-bucket",
"s3UploadBucketKey": "identifier1/identifer2/myFileToScan.txt",
"guardDutyMalwareScanEnabled": False
}
The lambda function is built by GitHub actions. There are three actions in three yml files.
This runs git secrets and runs the python tests. This is the standard multibranch pipeline job which runs on PRs and merge to master. If this runs on the master branch, it will rebuild the three Docker images locally and use them to build the lambda.
There are three docker images that are used to build the lambda.
File name | Image Name | Description |
---|---|---|
Dockerfile-yara | yara | Installs yara and some dependencies it needs like openssl on an alpine image. |
Dockerfile-compile | yara-rules | Uses yara as the base image. Gets the yara rules from github and compiles them into a single file for yara to use |
Dockerfile-dependencies | yara-dependencies | Installs necessary software on an amazon linux image and zips it up to be used by the lambda |
These images are built on every master build locally and are not stored in ECR.
This deploys the lambda from S3
Create a virtual environment in the antivirus directory
python3 -m venv venv
Activate the environment
source venv/bin/activate
Install dependencies
pip install -r requirements.txt
To run it, you will need to add code to the matcher.py file at the bottom. This will connect to the integration s3 dirty bucket so you will need to run it with integration credentials.
matcher_lambda_handler({
"userId": "7ad28066-7a76-4e07-b540-f005b6919328",
"consignmentId": "bf2181c7-70e4-448d-b122-be561d0e797a",
"fileId": "df216308-e78b-4328-90ef-8e4ebfef6b9d",
"originalPath": "original/path"
}, None)
Then either run this through the cli
python src/matcher.py
or debug through Pycharm or the IDE of your choice.
The following environment variables need to be set for this to work although the AWS ones are optional if you're using profiles. AWS_ACCESS_KEY_ID AWS_SECRET_ACCESS_KEY AWS_SESSION_TOKEN
AWS_LAMBDA_FUNCTION_VERSION - This is the lambda function version. It's provided by the lambda and so needs to be set here. Set it to "$LATEST" ENVIRONMENT - intg, staging or prod ROOT_DIRECTORY - This is the root directory for the file download in the lambda file system.
Normal run: python -m pytest
Run with coverage python -m pytest --cov=src
Run with coverage and missing lines python -m pytest --cov-report term-missing --cov=src
Run with coverage, missing lines and junit output python -m pytest --cov-report term-missing --junitxml="result.xml" --cov=src
There is a GitHub actions job TDR Check Antivirus Rules which is run on a schedule from the .github/workflows/check_rules file. This runs at 07:20 on a Monday. This carries out the following steps.
- Builds the base yara image.
- Builds the rules yara image.
- Gets the highest version tag from git.
- Downloads the lambda zip file from S3.
- Unzips the zip file and copies the existing compiled rule file into the current directory.
- Downloads the tests file from the tdr-antivirus-test-files-mgmt S3 bucket.
- Builds the docker image from the Dockerfile-run-tests file which copies the existing compiled rules and the test files to the container.
- Runs this docker image. This runs a python script which runs these steps:
- Compares the rule identifiers in the old compiled rules file with the new one.
- If there are new yara rules in the new compiled rules, it will run the antivirus checks against the test files from the S3 bucket.
- NOTE: It will not detect if rules have been removed, and it won't check if they've been changed either. I tried to do it based on hashes but it's likely to bring up too many false positives.
- Yara is run against the test files using the new rules. If no matches are found after the new rules are run against the test files, it returns exit code zero and the GitHub actions job builds the Antivirus multibranch master branch job.
These steps can be run manually by copying the commands from the .github/workflows/check_rules.yml file.
Because this runs the master build job without any code changes, you end up with more than one version pointed to the same commit. We still have separate zip files stored in S3 for each version though so we can still roll back if necessary.
Important Note: tdr-antivirus uses >= v1.5.0 of Terraform. Ensure that Terraform >= v1.5.0 is installed before proceeding.
In the terraform directory there are some terraform files which are used to create the bucket tdr-antivirus-test-files-mgmt
. This bucket is used to store the files against which we run a periodic check of any new yara rules. This should almost never need to be updated.
To run the Terraform:
- Navigate to the
terraform
directory:[location of project]: cd terraform
- Clone the
tdr-terraform-modules
repository into theterraform
directory[location of terraform directory]: git clone https://github.com/nationalarchives/tdr-terraform-modules.git
- Initiate Terraform:
[location of terraform directory]: terraform init
- Select the
default
Terraform workspace`[location of terraform directory]: terraform workspace select default
- Run Terraform
plan
/apply
commands with AWS credentials that have access to the TDR management environment as needed:[location of terraform directory]: terraform {plan/apply}