Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introspector Builds failing for ujson #11770

Open
gordonfierce opened this issue Apr 1, 2024 · 4 comments
Open

Introspector Builds failing for ujson #11770

gordonfierce opened this issue Apr 1, 2024 · 4 comments
Assignees

Comments

@gordonfierce
Copy link

https://oss-fuzz-build-logs.storage.googleapis.com/log-3153a1bb-577c-4ac1-b2a4-254149bace81.txt

I've been looking at the python projects in oss-fuzz, and the ujson introspector job appears to get bogged down in the middle of building the callgraph ( for hypothesis in particular). Locally it looks like this impacts any project that tries to run the introspector on a python project with a hypothesis structured fuzzer, but the fuzzing step itself is unaffected.

@jonathanmetzman
Copy link
Contributor

CC @DavidKorczynski

@DavidKorczynski DavidKorczynski self-assigned this Apr 2, 2024
@DavidKorczynski
Copy link
Collaborator

Did you try to run this locally @gordonfierce ? It looks like something within pycg is either processing a lot or some infinite loop. I've found PyCG tedious to debug so before digging down I wanted to check if you have anymore insights?

@gordonfierce
Copy link
Author

Yeah, I have been doing some profiling and although the other changes I've made have been net-positive, the real bottleneck appears to be in the transitive_closure function here: https://github.com/AdaLogics/PyCG/blob/master/pycg/machinery/definitions.py#L85-L114

That gets called multiple times per outermost loop, builds the transitive closure dictionary from scratch every time, and the runtime grows with the number of definitions. So at some point while looping over every file in hypothesis it gets up to a few thousand definitions every time a file loads. It's possible to squeak out a little more performance just twiddling language features like I was doing, but the next step forward is likely to do a better job caching the closured dictionary, initializing the transitive closure function better per-file so it's not starting from nothing every run, or finding a better data structure for this.

I'm going on a trip tomorrow so I'll probably not get back to it until Monday but it's my current ongoing project to figure out what else can be improved here. But early this week I happened to come across this project where some folks have taken their own fork of PyCG and improved on it with their own call graph approach pythonJaRvis.github.io. I haven't dug all of the way in, but notably they claim to scale much better and I noticed that they don't have the transitive_closure function at all. Looking around their repo it's not terribly close to something that fuzz-introspector can start building with, but it was a stroke of luck I saw them in the citation graph on arxiv. Will take a bit of diffing and testing, but I imagine one could reverse engineer anything particularly cool they've done and port it to the PyCG in fuzz-introspector.

@gordonfierce
Copy link
Author

So, thanks for merging my PRs @DavidKorczynski, but after I built OSS-Fuzz pointed at my kitchen-sink refactor branch that threw all of those changes plus some others I scrounged up, it still runs into the quadratic blowup over the number of definitions getting transitive-closured. Locally it went from running for 48+ hours to crashing the container after three hours, so some more fundamental approach is needed. I'll still rebase my kitchen-sink branch onto the latest PyCG and extract a few other refactors into a PR of things that improve the readability and might squeak out a few 5% speed improvements here or there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants