Skip to content

This project extends the ClamAV software capability to be able to extract and scan the contents of archives greater than 2GB. ClamAV is unable to scan files larger than 2GB.

License

Notifications You must be signed in to change notification settings

Cisco-Talos/clamav-large-archive-scanner

ClamAV Large Archive Scanner

The ClamAV Large Archive Scanner utility is a wrapper around the ClamAV clamd and clamdscan programs that provides a way to scan archives which exceed ClamAV's maximum file size limit. At the time of writing (2024/03/09), ClamAV may not scan any file or archive larger than 2 GiB.

Important: This utility is a workaround to supplement ClamAV until such time as archives larger than 2 GiB can be scanned. This utility is not intended to replace clamscan or clamdscan. We have no intention of providing feature parity between this utility and clamscan or clamdscan.

This utility works around ClamAV's file size limitations for non-archive files. It will not enable you scan large documents, graphics, videos, etc. In case you were wondering if large files could be chunked into smaller files and then scanned... No. That is not an effective solution to scan large files.

The utility has three sub-commands: scan, unpack, and cleanup.

The scan command combines the other two commands to unpack, scan, and clean up.

The unpack command provides the ability extract archives or mount disk images of the supported archive types without scanning them.

The cleanup command is complementary to the unpack command, enabling you to easily un-mount or delete the extracted archive contents.

Supported Archive Types

The ClamAV Large Archive Scanner supports extraction or mounting of the following types of archives:

  • TAR
  • ZIP
  • ISO
  • VMDK
  • TARGZ
  • QCOW2

Installation

We provide two options for installation. You may run the utility in your local environment or you may run the utility in a Docker container. The Docker container is easier.

Running in Your Local Environment

To use the ClamAV Large Archive Scanner in your local environment, you will need to install an assortment of supporting tools and libraries:

  • Install Python 3.9 or newer.

  • Install the required Python packages. We suggest using a venv virtual environment. From the project root directory, run the following:

    python3 -m venv .venv
    source .venv/bin/activate
    pip3 install .

    If you open a new terminal, you will need to reactivate your Python virtual environment again, using: source .venv/bin/activate

  • Install ClamAV. Both clamd and clamdscan are required. On some Linux distributions, these are packaged separately. You can verify that they are present by running which clamd and which clamdscan.

  • Install libmagic which is required to determine file types.

  • Install libguestfs which is needed to unpack VMDK/QCOW2 disk images.

You will need to start the clamd service before you can use the ClamAV Large Archive Scanner. This may require some initial configuration to include using freshclam to download the latest malware detection signatures. See the ClamAV documentation for more information on how to set up ClamAV.

Regarding clamd.conf config options, you must set the LocalSocket option (or TCPSocket option), at a minimum. On some systems, this is preconfigured. For the ClamAV Large Archive Scanner project, the goal is to scan extremely large archives, so you'll also need to add the following settings to max out ClamAV's file size capabilities:

MaxFileSize 0
MaxScanSize 0
MaxScanTime 3600000
MaxFiles 100000
MaxRecursion 20

Finally, you may also wish to raise an alert if the limits have been exceeded. You can do so by adding this option:

AlertExceedsMax yes

Note: Regarding the selected config options...

  • MaxFileSize 0 - This maxes out the file size limit for ClamAV.

  • MaxScanSize 0 - This maxes out the scan size limit. The scan size is the total amount of bytes scanned per file when extracting files from archives, decompressing embedded files, normalizing scripts, or even re-scanning the same data as a different type. The total number of bytes scanned is often much larger than the original file size, even for plain text files.

  • MaxScanTime 3600000 - This increases the scan time limit per file scanned to 1 hour (60 x 60 x 1000 milliseconds). Scanning large archives will take a long time. You may wish to increase or disable this limit.

  • MaxFiles 100000 - This increases the limit for the number of embedded files scanned. You may wish to increase or disable this limit.

  • MaxRecursion 20 - This increases the maximum recursion depth from 17 to 20. Scan recursion is the process of unpacking and scanning embedded files. Files unpacked by the Large Archive Scanner before passing to ClamAV for scanning do not count towards the maximum recursion depth. The maximum recursion depth cannot be disabled.

  • AlertExceedsMax yes - This option will cause scans to alert when a scan limit was exceeded, with the signature names like:

    • Heuristics.Limits.Exceeded.MaxFileSize
    • Heuristics.Limits.Exceeded.MaxScanSize
    • Heuristics.Limits.Exceeded.MaxFiles
    • Heuristics.Limits.Exceeded.MaxRecursion
    • Heuristics.Limits.Exceeded.MaxScanTime

See the clamd.conf.sample config for more details.

After you've installed everything and have started clamd, you may run a scan with the ClamAV Large Archive Scanner. For example:

archive scan /path/to/archive

Tip: If you have multiple ClamAV installations outside the normal $PATH, you may need to add the bin directory for your preferred version to the $PATH before attempting a scan.

For example:

PATH=~/clams/1.3.0/bin:$PATH

archive scan /path/to/archive

To learn more, run archive --help, or skip to Usage.

Running in a Docker Container

The simplest way to use the ClamAV Large Archive Scanner is using a Docker container.

The provided Dockerfile may be used to build an container with the environment and tools necessary to run this utility. This Dockerfile also increases the ClamAV scan limits described in the previous section.

This Docker container is based on the ClamAV project's clamav-debian image. You can find additional instructions for how to customize and use this container here.

Note: Privileged mode will be needed for to mount ISO archives when they are unpacked.

To build the image, run:

docker build . -t clamav-large-archive-scanner --load

To start the container, run:

docker run \
    --interactive \
    --tty \
    --rm \
    --name "clam_container_01" \
    clamav-large-archive-scanner

Tip: You may wish to mount a /some/path containing the archives you wish to scan as a volume in the container. You can do so when starting the container, like this. Replace /some/path with the actual directory you wish to mount. The directory contents will be found within the running container under /target:

docker run \
    --interactive \
    --tty \
    --rm \
    --mount type=bind,source=/some/path,target=/target \
    --name "clam_container_01" \
    clamav-large-archive-scanner

After the container is up and running, and after clamd has finished loading, you may execute commands in the running container, or open a shell in the running container to execute commands.

Tip: Within the container, you can use clamdscan --ping 100 to wait up to 100 second for clamd to finish loading. If clamd takes longer than 100 seconds to load, or fails to load, then the command will exit with a non-zero exit code. For example:

docker exec --interactive --tty "clam_container_01" clamdscan --ping 100

Suppose you used the --mount option to mount a directory at /target containing some_archive.tgz, you might try scanning it like this:

docker exec --interactive --tty "clam_container_01" archive scan /target/some_archive.tgz

Or, to enter a shell in the container, run:

docker exec --interactive --tty "clam_container_01" /bin/bash

To shut down the container, run:

docker kill clam_container_01

Usage

Usage: archive [OPTIONS] COMMAND [ARGS]...

Options:
  -t, --trace        Enable trace logging. By default, log all actions to
                     /tmp/clam_unpacker.log
  --trace-file PATH  Override the default trace log file
  -v, --verbose      Enable verbose logging
  -q, --quiet        Disable all logging
  --help             Show this message and exit.

Commands:
  cleanup
  scan
  unpack

Commands

  • scan

    This command is used to scan regular files and directories.

    The scan command combines the other two commands to unpack, scan, and clean up.

    Use the following options to customize scan behavior:

    Usage: archive scan [OPTIONS] PATH
    
    Options:
      --min-size TEXT   Minimum file size to unpack (default: 2.0 GiB).
      --ignore-size     Ignore file size lower limit (equivalent to --min-size=0).
      --tmp-dir PATH    Temporary working directory (default: /tmp).
      -ff, --fail-fast  Stop scanning after the first failure.
      --allmatch        Continue scanning if a signature match occurs.
      --help            Show this message and exit.
    
  • unpack

    This command unpacks or mounts supported large archives to a given directory. By default, a "large" archive is a one greater than 2 GiB. This action is recursive.

    Archives smaller than 2 GiB will be skipped. You may use --ignore-size or --min-size=0 unpack all supported archives, regardless of size.

    Usage: archive unpack [OPTIONS] PATH
    
    Options:
      -r, --recursive  Recursively unpack files.
      --min-size TEXT  Minimum file size to unpack (default: 2.0 GiB).
      --ignore-size    Ignore file size lower limit (equivalent to --min-size=0).
      --tmp-dir PATH   Directory to unpack files to (default: /tmp).
      --help           Show this message and exit.
    
  • cleanup

    This command will clean up the temp directories/files created as part of the script to scan input file or directory.

    Usage: archive cleanup [OPTIONS] PATH
    
    Options:
      --file          Recursively cleanup directories associated with the file.
      --tmp-dir PATH  Directory to search for unpacked files(default: /tmp).
      --help          Show this message and exit.
    

Examples

Using the scan command to scan an archive:

archive -t -v scan /path/to/archive

Using the unpack command to unpack and archive:

archive -t -v unpack /path/to/archive

Contributing

There are many ways to contribute.

Unit Tests

This repo includes some tests to verify correct functionality. You can run the tests from your local environment or within the running Docker container.

  1. First install ClamAV Large Archive Scanner utility one of the two ways.

  2. Then run this to install the test prerequisites:

source .venv/bin/activate
pip3 install -r ./src/clamav_large_archive_scanner/test/requirements.txt
  1. Now run the unit tests:
pytest -v

License

This project is licensed under the BSD 3-Clause license.

About

This project extends the ClamAV software capability to be able to extract and scan the contents of archives greater than 2GB. ClamAV is unable to scan files larger than 2GB.

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published