Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Name files by hash of their contents #2077

Closed
smoelius opened this issue May 9, 2024 · 7 comments
Closed

Name files by hash of their contents #2077

smoelius opened this issue May 9, 2024 · 7 comments

Comments

@smoelius
Copy link
Contributor

smoelius commented May 9, 2024

Is there a way to name crashes, hangs, etc. by the hash of their contents, e.g., SHA1 or SHA256?

If not, is this something that you would be open to?

@vanhauser-thc
Copy link
Member

That is what libfuzzer does. This looses interesting information like the source the crash was mutated on plus would break existing tooling.
Maybe as an option - why would you find that helpful?

@smoelius
Copy link
Contributor Author

smoelius commented May 9, 2024

why would you find that helpful?

It makes it easier to tell when a new input has been generated, e.g., in comparing runs of a target, or comparing runs of different targets whose inputs have the same structure.

If it were an option, would you imagine it enabled by a environment variable?

@vanhauser-thc
Copy link
Member

But if another instance finds the same bug it highly likely has a different hash because of different length or different bits through mutations.
The crash deduplication in afl in minimal and only per instance but imho effective. Naming files by content hash will only in a few cases help, but loosing interesting information plus creating incompatibilities with important tools like casr-afl - the best tool for crash deduplication.

If you still think that this change would help you (which i honestly doubt :) ) then send a PR. Yes it should be an env.

@smoelius
Copy link
Contributor Author

I see that the repo already contains a SHA1 implementation:

void sha1_init(sha1nfo *s);

To avoid to having two implementations, I'm going to move that code into a src/afl-sha1.c file, and have custom_mutators/libfuzzer/FuzzerSHA1.cpp refer to it. Ok?

@SonicStark
Copy link
Contributor

Give a try to AFLplusplus/include/xxhash.h, which can be used directly in those main components of afl++.

@vanhauser-thc
Copy link
Member

@SonicStark I think this defeats the purpose to see if a libfuzzer instance found the same crash or not (but as I said, IMHO this will rarely have any use anyway).

@smoelius please keep the sha1 implementation as this is a download of libfuzzer which I will update at some point and then things break. just put the best sha1 that you find into src/afl-performance.c.

@smoelius
Copy link
Contributor Author

Hmm. Not sure why this didn't automatically close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants