Skip to content

Commit

Permalink
Initial OSS-Fuzz Integration and First Fuzzing Test
Browse files Browse the repository at this point in the history
Introduces an initial fuzzing test and supporting files for
integrating Dulwich into OSS-Fuzz as discussed in:
#1302

The corresponding PR on the OSS-Fuzz repo is:
google/oss-fuzz#11900
  • Loading branch information
DaveLak committed May 3, 2024
1 parent 5f0497d commit 35792cd
Show file tree
Hide file tree
Showing 7 changed files with 358 additions and 0 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -27,3 +27,5 @@ docs/api/*.txt
dulwich.dist-info
.stestr
target/
# Files created by OSS-Fuzz when running locally
fuzz_*.pkg.spec
2 changes: 2 additions & 0 deletions NEWS
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@

* Ship ``tests/`` and ``testdata/`` in sdist. (Jelmer Vernooij, #1292)

* Add initial integration with OSS-Fuzz for continuous fuzz testing and first fuzzing test (David Lakin, #1302)

0.22.1 2024-04-23

* Handle alternate case for worktreeconfig setting (Will Shanks, #1285)
Expand Down
190 changes: 190 additions & 0 deletions fuzzing/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Fuzzing Dulwich

[![Fuzzing Status](https://oss-fuzz-build-logs.storage.googleapis.com/badges/dulwich.svg)][oss-fuzz-issue-tracker]

This directory contains files related to Dulwich's suite of fuzz tests that are executed daily on automated
infrastructure provided by [OSS-Fuzz][oss-fuzz-repo]. This document aims to provide necessary information for working
with fuzzing in Dulwich.

The latest details regarding OSS-Fuzz test status, including build logs and coverage reports, is available
on [the Open Source Fuzzing Introspection website](https://introspector.oss-fuzz.com/project-profile?project=dulwich).

## How to Contribute

There are many ways to contribute to Dulwich's fuzzing efforts! Contributions are welcomed through issues,
discussions, or pull requests on this repository.

Areas that are particularly appreciated include:

- **Tackling the existing backlog of open issues**. While fuzzing is an effective way to identify bugs, that information
isn't useful unless they are fixed. If you are not sure where to start, the issues tab is a great place to get ideas!
- **Improvements to this (or other) documentation** make it easier for new contributors to get involved, so even small
improvements can have a large impact over time. If you see something that could be made easier by a documentation
update of any size, please consider suggesting it!

For everything else, such as expanding test coverage, optimizing test performance, or enhancing error detection
capabilities, jump into the "Getting Started" section below.

## Getting Started with Fuzzing Dulwich

> [!TIP]
> **New to fuzzing or unfamiliar with OSS-Fuzz?**
>
> These resources are an excellent place to start:
>
> - [OSS-Fuzz documentation][oss-fuzz-docs] - Continuous fuzzing service for open source software.
> - [Google/fuzzing][google-fuzzing-repo] - Tutorials, examples, discussions, research proposals, and other resources
related to fuzzing.
> - [CNCF Fuzzing Handbook](https://github.com/cncf/tag-security/blob/main/security-fuzzing-handbook/handbook-fuzzing.pdf) -
A comprehensive guide for fuzzing open source software.
> - [Efficient Fuzzing Guide by The Chromium Project](https://chromium.googlesource.com/chromium/src/+/main/testing/libfuzzer/efficient_fuzzing.md) -
Explores strategies to enhance the effectiveness of your fuzz tests, recommended for those looking to optimize their
testing efforts.

### Setting Up Your Local Environment

Before contributing to fuzzing efforts, ensure Python and Docker are installed on your machine. Docker is required for
running fuzzers in containers provided by OSS-Fuzz. [Install Docker](https://docs.docker.com/get-docker/) following the official guide if you do not already have it.

### Understanding Existing Fuzz Targets

Review the `fuzz-targets/` directory to familiarize yourself with how existing tests are implemented. See
the [Files & Directories Overview](#files--directories-overview) for more details on the directory structure.

### Contributing to Fuzz Tests

Start by reviewing the [Atheris documentation][atheris-repo] and the section
on [Running Fuzzers Locally](#running-fuzzers-locally) to begin writing or improving fuzz tests.

## Files & Directories Overview

The `fuzzing/` directory is organized into three key areas:

### Fuzz Targets (`fuzz-targets/`)

Contains Python files for each fuzz test.

**Things to Know**:

- Each fuzz test targets a specific part of Dulwich's functionality.
- Test files adhere to the naming convention: `fuzz_<API Under Test>.py`, where `<API Under Test>` indicates the
functionality targeted by the test.
- Any functionality that involves performing operations on input data is a possible candidate for fuzz testing, but
features that involve processing untrusted user input or parsing operations are typically going to be the most
interesting.
- The goal of these tests is to identify previously unknown or unexpected error cases caused by a given input. For that
reason, fuzz tests should gracefully handle anticipated exception cases with a `try`/`except` block to avoid false
positives that halt the fuzzing engine.

### Dictionaries (`dictionaries/`)

Provides hints to the fuzzing engine about inputs that might trigger unique code paths. Each fuzz target may have a
corresponding `.dict` file. For information about dictionary syntax, refer to
the [LibFuzzer documentation on the subject](https://llvm.org/docs/LibFuzzer.html#dictionaries).

**Things to Know**:

- OSS-Fuzz loads dictionary files per fuzz target if one exists with the same name, all others are ignored.
- Most entries in the dictionary files found here are escaped byte values that were recommended by the fuzzing
engine after previous runs.
- A default set of dictionary entries are created for all fuzz targets as part of the build process, regardless of an
existing file here.
- Development or updates to dictionaries should reflect the varied formats and edge cases relevant to the
functionalities under test.
- Example dictionaries (some of which are used to build the default dictionaries mentioned above) can be found here:
- [AFL++ dictionary repository](https://github.com/AFLplusplus/AFLplusplus/tree/stable/dictionaries#readme)
- [Google/fuzzing dictionary repository](https://github.com/google/fuzzing/tree/master/dictionaries)

### OSS-Fuzz Scripts (`oss-fuzz-scripts/`)

Includes scripts for building and integrating fuzz targets with OSS-Fuzz:

- **`container-environment-bootstrap.sh`** - Sets up the execution environment. It is responsible for fetching default
dictionary entries and ensuring all required build dependencies are installed and up-to-date.
- **`build.sh`** - Executed within the Docker container, this script builds fuzz targets with necessary instrumentation
and prepares seed corpora and dictionaries for use.

**Where to learn more:**

- [OSS-Fuzz documentation on the build.sh](https://google.github.io/oss-fuzz/getting-started/new-project-guide/#buildsh)
- [See Dulwich's build.sh and Dockerfile in the OSS-Fuzz repository](https://github.com/google/oss-fuzz/tree/master/projects/dulwich)

## Running Fuzzers Locally

This approach uses Docker images provided by OSS-Fuzz for building and running fuzz tests locally. It offers
comprehensive features but requires a local clone of the OSS-Fuzz repository and sufficient disk space for Docker
containers.

### Build the Execution Environment

Clone the OSS-Fuzz repository and prepare the Docker environment:

```shell
git clone --depth 1 https://github.com/google/oss-fuzz.git oss-fuzz
cd oss-fuzz
python infra/helper.py build_image dulwich
python infra/helper.py build_fuzzers --sanitizer address dulwich
```

> [!TIP]
> The `build_fuzzers` command above accepts a local file path pointing to your Dulwich repository clone as the last
> argument.
> This makes it easy to build fuzz targets you are developing locally in this repository without changing anything in
> the OSS-Fuzz repo!
> For example, if you have cloned this repository (or a fork of it) into: `~/code/dulwich`
> Then running this command would build new or modified fuzz targets using the `~/code/dulwich/fuzzing/fuzz-targets`
> directory:
> ```shell
> python infra/helper.py build_fuzzers --sanitizer address dulwich ~/code/dulwich
> ```
Verify the build of your fuzzers with the optional `check_build` command:
```shell
python infra/helper.py check_build dulwich
```
### Run a Fuzz Target

Setting an environment variable for the fuzz target argument of the execution command makes it easier to quickly select
a different target between runs:

```shell
# specify the fuzz target without the .py extension:
export FUZZ_TARGET=fuzz_configfile
```

Execute the desired fuzz target:

```shell
python infra/helper.py run_fuzzer dulwich $FUZZ_TARGET -- -max_total_time=60 -print_final_stats=1
```

> [!TIP]
> In the example above, the "`-- -max_total_time=60 -print_final_stats=1`" portion of the command is optional but quite
> useful.
>
> Every argument provided after "`--`" in the above command is passed to the fuzzing engine directly. In this case:
> - `-max_total_time=60` tells the LibFuzzer to stop execution after 60 seconds have elapsed.
> - `-print_final_stats=1` tells the LibFuzzer to print a summary of useful metrics about the target run upon
completion.
>
> But almost any [LibFuzzer option listed in the documentation](https://llvm.org/docs/LibFuzzer.html#options) should
> work as well.
#### Next Steps

For detailed instructions on advanced features like reproducing OSS-Fuzz issues or using the Fuzz Introspector, refer
to [the official OSS-Fuzz documentation][oss-fuzz-docs].



[oss-fuzz-repo]: https://github.com/google/oss-fuzz

[oss-fuzz-docs]: https://google.github.io/oss-fuzz

[oss-fuzz-issue-tracker]: https://bugs.chromium.org/p/oss-fuzz/issues/list?sort=-opened&can=1&q=proj:dulwich

[google-fuzzing-repo]: https://github.com/google/fuzzing

[atheris-repo]: https://github.com/google/atheris
31 changes: 31 additions & 0 deletions fuzzing/dictionaries/fuzz_configfile.dict
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
"\\357\\273\\277"
"\\\\\\015\\012"
"\\001\\000"
"\\000\\000\\000\\000"
"\\001\\000\\000\\000"
"\\377h"
"-\\000\\000\\000\\000\\000\\000\\000"
"[\\000\\000\\000\\000\\000\\000\\000"
"H]\\000"
"2\\000\\000\\000\\000\\000\\000\\000"
"\\377\\377\\377\\377\\377\\377\\377;"
"]\\377"
"\\000\\000\\000\\000\\000\\000\\000B"
"\\\\\\012"
"\\000\\000\\000\\000\\000\\000\\0001"
"rue"
"b\\271\\""
"\\000\\000\\000\\000\\000\\000\\000]"
"\\\\\\000\\000\\000\\000\\000\\000\\000"
"\\330\\330
"\\000\\000\\000\\000\\000\\000\\000\\000"
"\\377\\377\\377\\377"
"%\\000\\000\\000\\000\\000\\000\\000"
"\\000\\000\\000\\000\\000\\000\\000\\\\"
"\\377\\377\\377\\377\\377\\377\\377$"
"[\\000\\000\\000\\000\\000\\000\\000"
"p\\012"
"\\001\\000\\000\\000\\000\\000\\000\\""
"\\337\\000\\000\\000\\000\\000\\000\\000"
"\\001\\000\\000\\000\\000\\000\\000\\000"
"\\\\0="
41 changes: 41 additions & 0 deletions fuzzing/fuzz-targets/fuzz_configfile.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
import atheris
import sys
from io import BytesIO

with atheris.instrument_imports():
from dulwich.config import ConfigFile


def is_expected_error(error_list, error_msg):
for error in error_list:
if error in error_msg:
return True
return False


def TestOneInput(data):
try:
ConfigFile.from_file(BytesIO(data))
except ValueError as e:
expected_errors = [
"without section",
"invalid variable name",
"expected trailing ]",
"invalid section name",
"Invalid subsection",
"escape character",
"missing end quote",
]
if is_expected_error(expected_errors, str(e)):
return -1
else:
raise e


def main():
atheris.Setup(sys.argv, TestOneInput)
atheris.Fuzz()


if __name__ == "__main__":
main()
37 changes: 37 additions & 0 deletions fuzzing/oss-fuzz-scripts/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# shellcheck shell=bash

set -euo pipefail

python3 -m pip install .

# Directory to look in for dictionaries, options files, and seed corpora:
SEED_DATA_DIR="$SRC/seed_data"

find "$SEED_DATA_DIR" \( -name '*_seed_corpus.zip' -o -name '*.options' -o -name '*.dict' \) \
! \( -name '__base.*' \) -exec printf 'Copying: %s\n' {} \; \
-exec chmod a-x {} \; \
-exec cp {} "$OUT" \;

# Build fuzzers in $OUT.
find "$SRC/dulwich/fuzzing" -name 'fuzz_*.py' -print0 | while IFS= read -r -d '' fuzz_harness; do
compile_python_fuzzer "$fuzz_harness"

common_base_dictionary_filename="$SEED_DATA_DIR/__base.dict"
if [[ -r "$common_base_dictionary_filename" ]]; then
# Strip the `.py` extension from the filename and replace it with `.dict`.
fuzz_harness_dictionary_filename="$(basename "$fuzz_harness" .py).dict"
output_file="$OUT/$fuzz_harness_dictionary_filename"

printf 'Appending %s to %s\n' "$common_base_dictionary_filename" "$output_file"
if [[ -s "$output_file" ]]; then
# If a dictionary file for this fuzzer already exists and is not empty,
# we append a new line to the end of it before appending any new entries.
#
# LibFuzzer will happily ignore multiple empty lines in a dictionary but fail with an error
# if any single line has incorrect syntax (e.g., if we accidentally add two entries to the same line.)
# See docs for valid syntax: https://llvm.org/docs/LibFuzzer.html#id32
echo >>"$output_file"
fi
cat "$common_base_dictionary_filename" >>"$output_file"
fi
done
55 changes: 55 additions & 0 deletions fuzzing/oss-fuzz-scripts/container-environment-bootstrap.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
#!/usr/bin/env bash

set -euo pipefail

#################
# Prerequisites #
#################

for cmd in python3 git wget rsync; do
command -v "$cmd" >/dev/null 2>&1 || {
printf '[%s] Required command %s not found, exiting.\n' "$(date '+%Y-%m-%d %H:%M:%S')" "$cmd" >&2
exit 1
}
done

SEED_DATA_DIR="$SRC/seed_data"
mkdir -p "$SEED_DATA_DIR"

#############
# Functions #
#############

download_and_concatenate_common_dictionaries() {
# Assign the first argument as the target file where all contents will be concatenated
target_file="$1"

# Shift the arguments so the first argument (target_file path) is removed
# and only URLs are left for the loop below.
shift

for url in "$@"; do
wget -qO- "$url" >>"$target_file"
# Ensure there's a newline between each file's content
echo >>"$target_file"
done
}

fetch_seed_data() {
rsync -avc "$SRC/dulwich/fuzzing/dictionaries/" "$SEED_DATA_DIR/"
}

########################
# Main execution logic #
########################

fetch_seed_data

download_and_concatenate_common_dictionaries "$SEED_DATA_DIR/__base.dict" \
"https://raw.githubusercontent.com/google/fuzzing/master/dictionaries/utf8.dict" \
"https://raw.githubusercontent.com/google/fuzzing/master/dictionaries/url.dict"

# The OSS-Fuzz base image has outdated dependencies by default so we upgrade them below.
python3 -m pip install --upgrade pip
# Upgrade to the latest versions known to work at the time the below changes were introduced:
python3 -m pip install 'setuptools~=69.0' 'pyinstaller~=6.0'

0 comments on commit 35792cd

Please sign in to comment.