-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
libass: add seed corpus #12687
base: master
Are you sure you want to change the base?
libass: add seed corpus #12687
Conversation
tyler92 is a new contributor to projects/libass. The PR must be approved by known contributors before it can be merged. The past contributors are: DonggeLiu, rcombs, astiob, devtty1er, Dor1s, inferno-chromium, ssbr (unverified), mikea (unverified) |
@astiob What do you think about the change? |
I’ll be honest: I don’t know fuzzing well enough to comment on this, so I asked around, and my impression is that people are concerned about (1) whether this doesn’t worsen the fuzzing workload due to repetitiveness and human-friendliness of the libass-tests samples and (2) why these tags don’t seem to be generated automatically, seeing as they’re present in the dictionary. If you’re certain that this improves coverage and reasonably sure that it doesn’t have other negative effects, then I figure this is probably good? even if it raises additional questions. |
Additionally even the samples in libass-tests unfortunately don't yet cover all tags — but the dictionary contains all tags (for more than two years already), so as mentioned above the poor coverage is surprising and somewhat worrying. Thanks for bringing attention to this! I’m not sure why it didn’t yet manage to generate any samples for so many tags. In libass/libass#568 the CIFuzz action always early-exited due to timeouts making it useless, though at the time regular OSS Fuzz still seemed to work fine (producing new reports). |
You're right - the primary root cause of the low coverage is the fuzzer's inefficiency. I hope that, once it improves, those tags will be covered even without an additional seed corpus. Regarding the seed corpus, it’s fine to include human-friendly samples, as the fuzzer can minimize the corpus by finding a smaller set that achieves the same coverage. One advantage of having such a seed corpus is that you can add samples that triggered crashes but weren’t discovered by the fuzzer for some reason. Given that, I’m undecided on whether to close this PR |
If the fuzzer actually discards samples not improving coverages during load this might work out, though perhaps it might be better to be more selective about the included files, some rely on special fonts etc. Without having tested it, i’m guessing the following may be the most useful:
While e.g. If a more selective set works out or you’re sure the fuzzer won't waste any time on spurious samples other than a neglible bit during initial corpus loading, it probably makes sense to do this until we figured out what’s wrong with the main fuzzer. |
I noticed that ass_parse_tags is not covered good enough. This PR adds a seed corpus from the libass-tests repository which improves coverage