diff --git a/.gitignore b/.gitignore index 6efa9e73b..8318b688e 100644 --- a/.gitignore +++ b/.gitignore @@ -27,3 +27,5 @@ docs/api/*.txt dulwich.dist-info .stestr target/ +# Files created by OSS-Fuzz when running locally +fuzz_*.pkg.spec diff --git a/NEWS b/NEWS index 87fd9843c..a42cbdc36 100644 --- a/NEWS +++ b/NEWS @@ -4,6 +4,8 @@ * Ship ``tests/`` and ``testdata/`` in sdist. (Jelmer Vernooij, #1292) + * Add initial integration with OSS-Fuzz for continuous fuzz testing and first fuzzing test (David Lakin, #1302) + 0.22.1 2024-04-23 * Handle alternate case for worktreeconfig setting (Will Shanks, #1285) diff --git a/fuzzing/README.md b/fuzzing/README.md new file mode 100644 index 000000000..a1a0160e1 --- /dev/null +++ b/fuzzing/README.md @@ -0,0 +1,190 @@ +# Fuzzing Dulwich + +[![Fuzzing Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/dulwich.svg)][oss-fuzz-issue-tracker] + +This directory contains files related to Dulwich's suite of fuzz tests that are executed daily on automated +infrastructure provided by [OSS-Fuzz][oss-fuzz-repo]. This document aims to provide necessary information for working +with fuzzing in Dulwich. + +The latest details regarding OSS-Fuzz test status, including build logs and coverage reports, is available +on [the Open Source Fuzzing Introspection website](https://introspector.oss-fuzz.com/project-profile?project=dulwich). + +## How to Contribute + +There are many ways to contribute to Dulwich's fuzzing efforts! Contributions are welcomed through issues, +discussions, or pull requests on this repository. + +Areas that are particularly appreciated include: + +- **Tackling the existing backlog of open issues**. While fuzzing is an effective way to identify bugs, that information + isn't useful unless they are fixed. If you are not sure where to start, the issues tab is a great place to get ideas! +- **Improvements to this (or other) documentation** make it easier for new contributors to get involved, so even small + improvements can have a large impact over time. If you see something that could be made easier by a documentation + update of any size, please consider suggesting it! + +For everything else, such as expanding test coverage, optimizing test performance, or enhancing error detection +capabilities, jump into the "Getting Started" section below. + +## Getting Started with Fuzzing Dulwich + +> [!TIP] +> **New to fuzzing or unfamiliar with OSS-Fuzz?** +> +> These resources are an excellent place to start: +> +> - [OSS-Fuzz documentation][oss-fuzz-docs] - Continuous fuzzing service for open source software. +> - [Google/fuzzing][google-fuzzing-repo] - Tutorials, examples, discussions, research proposals, and other resources + related to fuzzing. +> - [CNCF Fuzzing Handbook](https://github.com/cncf/tag-security/blob/main/security-fuzzing-handbook/handbook-fuzzing.pdf) - + A comprehensive guide for fuzzing open source software. +> - [Efficient Fuzzing Guide by The Chromium Project](https://chromium.googlesource.com/chromium/src/+/main/testing/libfuzzer/efficient_fuzzing.md) - + Explores strategies to enhance the effectiveness of your fuzz tests, recommended for those looking to optimize their + testing efforts. + +### Setting Up Your Local Environment + +Before contributing to fuzzing efforts, ensure Python and Docker are installed on your machine. Docker is required for +running fuzzers in containers provided by OSS-Fuzz. [Install Docker](https://docs.docker.com/get-docker/) following the official guide if you do not already have it. + +### Understanding Existing Fuzz Targets + +Review the `fuzz-targets/` directory to familiarize yourself with how existing tests are implemented. See +the [Files & Directories Overview](#files--directories-overview) for more details on the directory structure. + +### Contributing to Fuzz Tests + +Start by reviewing the [Atheris documentation][atheris-repo] and the section +on [Running Fuzzers Locally](#running-fuzzers-locally) to begin writing or improving fuzz tests. + +## Files & Directories Overview + +The `fuzzing/` directory is organized into three key areas: + +### Fuzz Targets (`fuzz-targets/`) + +Contains Python files for each fuzz test. + +**Things to Know**: + +- Each fuzz test targets a specific part of Dulwich's functionality. +- Test files adhere to the naming convention: `fuzz_.py`, where `` indicates the + functionality targeted by the test. +- Any functionality that involves performing operations on input data is a possible candidate for fuzz testing, but + features that involve processing untrusted user input or parsing operations are typically going to be the most + interesting. +- The goal of these tests is to identify previously unknown or unexpected error cases caused by a given input. For that + reason, fuzz tests should gracefully handle anticipated exception cases with a `try`/`except` block to avoid false + positives that halt the fuzzing engine. + +### Dictionaries (`dictionaries/`) + +Provides hints to the fuzzing engine about inputs that might trigger unique code paths. Each fuzz target may have a +corresponding `.dict` file. For information about dictionary syntax, refer to +the [LibFuzzer documentation on the subject](https://llvm.org/docs/LibFuzzer.html#dictionaries). + +**Things to Know**: + +- OSS-Fuzz loads dictionary files per fuzz target if one exists with the same name, all others are ignored. +- Most entries in the dictionary files found here are escaped byte values that were recommended by the fuzzing + engine after previous runs. +- A default set of dictionary entries are created for all fuzz targets as part of the build process, regardless of an + existing file here. +- Development or updates to dictionaries should reflect the varied formats and edge cases relevant to the + functionalities under test. +- Example dictionaries (some of which are used to build the default dictionaries mentioned above) can be found here: + - [AFL++ dictionary repository](https://github.com/AFLplusplus/AFLplusplus/tree/stable/dictionaries#readme) + - [Google/fuzzing dictionary repository](https://github.com/google/fuzzing/tree/master/dictionaries) + +### OSS-Fuzz Scripts (`oss-fuzz-scripts/`) + +Includes scripts for building and integrating fuzz targets with OSS-Fuzz: + +- **`container-environment-bootstrap.sh`** - Sets up the execution environment. It is responsible for fetching default + dictionary entries and ensuring all required build dependencies are installed and up-to-date. +- **`build.sh`** - Executed within the Docker container, this script builds fuzz targets with necessary instrumentation + and prepares seed corpora and dictionaries for use. + +**Where to learn more:** + +- [OSS-Fuzz documentation on the build.sh](https://google.github.io/oss-fuzz/getting-started/new-project-guide/#buildsh) +- [See Dulwich's build.sh and Dockerfile in the OSS-Fuzz repository](https://github.com/google/oss-fuzz/tree/master/projects/dulwich) + +## Running Fuzzers Locally + +This approach uses Docker images provided by OSS-Fuzz for building and running fuzz tests locally. It offers +comprehensive features but requires a local clone of the OSS-Fuzz repository and sufficient disk space for Docker +containers. + +### Build the Execution Environment + +Clone the OSS-Fuzz repository and prepare the Docker environment: + +```shell +git clone --depth 1 https://github.com/google/oss-fuzz.git oss-fuzz +cd oss-fuzz +python infra/helper.py build_image dulwich +python infra/helper.py build_fuzzers --sanitizer address dulwich +``` + +> [!TIP] +> The `build_fuzzers` command above accepts a local file path pointing to your Dulwich repository clone as the last +> argument. +> This makes it easy to build fuzz targets you are developing locally in this repository without changing anything in +> the OSS-Fuzz repo! +> For example, if you have cloned this repository (or a fork of it) into: `~/code/dulwich` +> Then running this command would build new or modified fuzz targets using the `~/code/dulwich/fuzzing/fuzz-targets` +> directory: +> ```shell +> python infra/helper.py build_fuzzers --sanitizer address dulwich ~/code/dulwich +> ``` + +Verify the build of your fuzzers with the optional `check_build` command: + +```shell +python infra/helper.py check_build dulwich +``` + +### Run a Fuzz Target + +Setting an environment variable for the fuzz target argument of the execution command makes it easier to quickly select +a different target between runs: + +```shell +# specify the fuzz target without the .py extension: +export FUZZ_TARGET=fuzz_configfile +``` + +Execute the desired fuzz target: + +```shell +python infra/helper.py run_fuzzer dulwich $FUZZ_TARGET -- -max_total_time=60 -print_final_stats=1 +``` + +> [!TIP] +> In the example above, the "`-- -max_total_time=60 -print_final_stats=1`" portion of the command is optional but quite +> useful. +> +> Every argument provided after "`--`" in the above command is passed to the fuzzing engine directly. In this case: +> - `-max_total_time=60` tells the LibFuzzer to stop execution after 60 seconds have elapsed. +> - `-print_final_stats=1` tells the LibFuzzer to print a summary of useful metrics about the target run upon + completion. +> +> But almost any [LibFuzzer option listed in the documentation](https://llvm.org/docs/LibFuzzer.html#options) should +> work as well. + +#### Next Steps + +For detailed instructions on advanced features like reproducing OSS-Fuzz issues or using the Fuzz Introspector, refer +to [the official OSS-Fuzz documentation][oss-fuzz-docs]. + + + +[oss-fuzz-repo]: https://github.com/google/oss-fuzz + +[oss-fuzz-docs]: https://google.github.io/oss-fuzz + +[oss-fuzz-issue-tracker]: https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:dulwich + +[google-fuzzing-repo]: https://github.com/google/fuzzing + +[atheris-repo]: https://github.com/google/atheris diff --git a/fuzzing/dictionaries/fuzz_configfile.dict b/fuzzing/dictionaries/fuzz_configfile.dict new file mode 100644 index 000000000..384e3aba6 --- /dev/null +++ b/fuzzing/dictionaries/fuzz_configfile.dict @@ -0,0 +1,31 @@ +"\\357\\273\\277" +"\\\\\\015\\012" +"\\001\\000" +"\\000\\000\\000\\000" +"\\001\\000\\000\\000" +"\\377h" +"-\\000\\000\\000\\000\\000\\000\\000" +"[\\000\\000\\000\\000\\000\\000\\000" +"H]\\000" +"2\\000\\000\\000\\000\\000\\000\\000" +"\\377\\377\\377\\377\\377\\377\\377;" +"]\\377" +"\\000\\000\\000\\000\\000\\000\\000B" +"\\\\\\012" +"\\000\\000\\000\\000\\000\\000\\0001" +"rue" +"b\\271\\"" +"\\000\\000\\000\\000\\000\\000\\000]" +"\\\\\\000\\000\\000\\000\\000\\000\\000" +"\\330\\330 +"\\000\\000\\000\\000\\000\\000\\000\\000" +"\\377\\377\\377\\377" +"%\\000\\000\\000\\000\\000\\000\\000" +"\\000\\000\\000\\000\\000\\000\\000\\\\" +"\\377\\377\\377\\377\\377\\377\\377$" +"[\\000\\000\\000\\000\\000\\000\\000" +"p\\012" +"\\001\\000\\000\\000\\000\\000\\000\\"" +"\\337\\000\\000\\000\\000\\000\\000\\000" +"\\001\\000\\000\\000\\000\\000\\000\\000" +"\\\\0=" diff --git a/fuzzing/fuzz-targets/fuzz_configfile.py b/fuzzing/fuzz-targets/fuzz_configfile.py new file mode 100644 index 000000000..25cd76b20 --- /dev/null +++ b/fuzzing/fuzz-targets/fuzz_configfile.py @@ -0,0 +1,41 @@ +import atheris +import sys +from io import BytesIO + +with atheris.instrument_imports(): + from dulwich.config import ConfigFile + + +def is_expected_error(error_list, error_msg): + for error in error_list: + if error in error_msg: + return True + return False + + +def TestOneInput(data): + try: + ConfigFile.from_file(BytesIO(data)) + except ValueError as e: + expected_errors = [ + "without section", + "invalid variable name", + "expected trailing ]", + "invalid section name", + "Invalid subsection", + "escape character", + "missing end quote", + ] + if is_expected_error(expected_errors, str(e)): + return -1 + else: + raise e + + +def main(): + atheris.Setup(sys.argv, TestOneInput) + atheris.Fuzz() + + +if __name__ == "__main__": + main() diff --git a/fuzzing/oss-fuzz-scripts/build.sh b/fuzzing/oss-fuzz-scripts/build.sh new file mode 100644 index 000000000..5d0bb4dd1 --- /dev/null +++ b/fuzzing/oss-fuzz-scripts/build.sh @@ -0,0 +1,37 @@ +# shellcheck shell=bash + +set -euo pipefail + +python3 -m pip install . + +# Directory to look in for dictionaries, options files, and seed corpora: +SEED_DATA_DIR="$SRC/seed_data" + +find "$SEED_DATA_DIR" \( -name '*_seed_corpus.zip' -o -name '*.options' -o -name '*.dict' \) \ + ! \( -name '__base.*' \) -exec printf 'Copying: %s\n' {} \; \ + -exec chmod a-x {} \; \ + -exec cp {} "$OUT" \; + +# Build fuzzers in $OUT. +find "$SRC/dulwich/fuzzing" -name 'fuzz_*.py' -print0 | while IFS= read -r -d '' fuzz_harness; do + compile_python_fuzzer "$fuzz_harness" + + common_base_dictionary_filename="$SEED_DATA_DIR/__base.dict" + if [[ -r "$common_base_dictionary_filename" ]]; then + # Strip the `.py` extension from the filename and replace it with `.dict`. + fuzz_harness_dictionary_filename="$(basename "$fuzz_harness" .py).dict" + output_file="$OUT/$fuzz_harness_dictionary_filename" + + printf 'Appending %s to %s\n' "$common_base_dictionary_filename" "$output_file" + if [[ -s "$output_file" ]]; then + # If a dictionary file for this fuzzer already exists and is not empty, + # we append a new line to the end of it before appending any new entries. + # + # LibFuzzer will happily ignore multiple empty lines in a dictionary but fail with an error + # if any single line has incorrect syntax (e.g., if we accidentally add two entries to the same line.) + # See docs for valid syntax: https://llvm.org/docs/LibFuzzer.html#id32 + echo >>"$output_file" + fi + cat "$common_base_dictionary_filename" >>"$output_file" + fi +done diff --git a/fuzzing/oss-fuzz-scripts/container-environment-bootstrap.sh b/fuzzing/oss-fuzz-scripts/container-environment-bootstrap.sh new file mode 100755 index 000000000..b3b0e9f84 --- /dev/null +++ b/fuzzing/oss-fuzz-scripts/container-environment-bootstrap.sh @@ -0,0 +1,55 @@ +#!/usr/bin/env bash + +set -euo pipefail + +################# +# Prerequisites # +################# + +for cmd in python3 git wget rsync; do + command -v "$cmd" >/dev/null 2>&1 || { + printf '[%s] Required command %s not found, exiting.\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$cmd" >&2 + exit 1 + } +done + +SEED_DATA_DIR="$SRC/seed_data" +mkdir -p "$SEED_DATA_DIR" + +############# +# Functions # +############# + +download_and_concatenate_common_dictionaries() { + # Assign the first argument as the target file where all contents will be concatenated + target_file="$1" + + # Shift the arguments so the first argument (target_file path) is removed + # and only URLs are left for the loop below. + shift + + for url in "$@"; do + wget -qO- "$url" >>"$target_file" + # Ensure there's a newline between each file's content + echo >>"$target_file" + done +} + +fetch_seed_data() { + rsync -avc "$SRC/dulwich/fuzzing/dictionaries/" "$SEED_DATA_DIR/" +} + +######################## +# Main execution logic # +######################## + +fetch_seed_data + +download_and_concatenate_common_dictionaries "$SEED_DATA_DIR/__base.dict" \ + "https://raw.githubusercontent.com/google/fuzzing/master/dictionaries/utf8.dict" \ + "https://raw.githubusercontent.com/google/fuzzing/master/dictionaries/url.dict" + +# The OSS-Fuzz base image has outdated dependencies by default so we upgrade them below. +python3 -m pip install --upgrade pip +# Upgrade to the latest versions known to work at the time the below changes were introduced: +python3 -m pip install 'setuptools~=69.0' 'pyinstaller~=6.0'