Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libass: add seed corpus #12687

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

tyler92
Copy link
Contributor

@tyler92 tyler92 commented Nov 4, 2024

I noticed that ass_parse_tags is not covered good enough. This PR adds a seed corpus from the libass-tests repository which improves coverage

Copy link

github-actions bot commented Nov 4, 2024

tyler92 is a new contributor to projects/libass. The PR must be approved by known contributors before it can be merged. The past contributors are: DonggeLiu, rcombs, astiob, devtty1er, Dor1s, inferno-chromium, ssbr (unverified), mikea (unverified)

@tyler92
Copy link
Contributor Author

tyler92 commented Nov 6, 2024

@astiob What do you think about the change?

@astiob
Copy link
Contributor

astiob commented Nov 8, 2024

I’ll be honest: I don’t know fuzzing well enough to comment on this, so I asked around, and my impression is that people are concerned about (1) whether this doesn’t worsen the fuzzing workload due to repetitiveness and human-friendliness of the libass-tests samples and (2) why these tags don’t seem to be generated automatically, seeing as they’re present in the dictionary.

If you’re certain that this improves coverage and reasonably sure that it doesn’t have other negative effects, then I figure this is probably good? even if it raises additional questions.

@TheOneric
Copy link

Additionally even the samples in libass-tests unfortunately don't yet cover all tags — but the dictionary contains all tags (for more than two years already), so as mentioned above the poor coverage is surprising and somewhat worrying. Thanks for bringing attention to this!

I’m not sure why it didn’t yet manage to generate any samples for so many tags. In libass/libass#568 the CIFuzz action always early-exited due to timeouts making it useless, though at the time regular OSS Fuzz still seemed to work fine (producing new reports).
Rechecking this now, timeouts still occur within the first few minutes of fuzzing for address and undefined exits even early due to signed-integer overflows which used to be and should be ignored for libass (at least atm). Does one of those issues now also bog down OSS Fuzz or is something else going on?

@tyler92
Copy link
Contributor Author

tyler92 commented Nov 9, 2024

You're right - the primary root cause of the low coverage is the fuzzer's inefficiency. I hope that, once it improves, those tags will be covered even without an additional seed corpus.

Regarding the seed corpus, it’s fine to include human-friendly samples, as the fuzzer can minimize the corpus by finding a smaller set that achieves the same coverage. One advantage of having such a seed corpus is that you can add samples that triggered crashes but weren’t discovered by the fuzzer for some reason.

Given that, I’m undecided on whether to close this PR

@TheOneric
Copy link

Regarding the seed corpus, it’s fine to include human-friendly samples, as the fuzzer can minimize the corpus by finding a smaller set that achieves the same coverage.

If the fuzzer actually discards samples not improving coverages during load this might work out, though perhaps it might be better to be more selective about the included files, some rely on special fonts etc. Without having tested it, i’m guessing the following may be the most useful:

crash/generic.ass
regression/draw-images/*.ass
regression/karaoke/*.ass
regression/scaling_playres/*.ass
regression/ssa/*.ass
regression/v4++/*.ass

While e.g. font_nonunicode seems unhelpful for fuzzing and I’m unsure about regression/embedded_font/efont.ass.

If a more selective set works out or you’re sure the fuzzer won't waste any time on spurious samples other than a neglible bit during initial corpus loading, it probably makes sense to do this until we figured out what’s wrong with the main fuzzer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants