New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introspector Builds failing for ujson #11770
Comments
Did you try to run this locally @gordonfierce ? It looks like something within pycg is either processing a lot or some infinite loop. I've found PyCG tedious to debug so before digging down I wanted to check if you have anymore insights? |
Yeah, I have been doing some profiling and although the other changes I've made have been net-positive, the real bottleneck appears to be in the transitive_closure function here: https://github.com/AdaLogics/PyCG/blob/master/pycg/machinery/definitions.py#L85-L114 That gets called multiple times per outermost loop, builds the transitive closure dictionary from scratch every time, and the runtime grows with the number of definitions. So at some point while looping over every file in hypothesis it gets up to a few thousand definitions every time a file loads. It's possible to squeak out a little more performance just twiddling language features like I was doing, but the next step forward is likely to do a better job caching the closured dictionary, initializing the transitive closure function better per-file so it's not starting from nothing every run, or finding a better data structure for this. I'm going on a trip tomorrow so I'll probably not get back to it until Monday but it's my current ongoing project to figure out what else can be improved here. But early this week I happened to come across this project where some folks have taken their own fork of PyCG and improved on it with their own call graph approach pythonJaRvis.github.io. I haven't dug all of the way in, but notably they claim to scale much better and I noticed that they don't have the transitive_closure function at all. Looking around their repo it's not terribly close to something that fuzz-introspector can start building with, but it was a stroke of luck I saw them in the citation graph on arxiv. Will take a bit of diffing and testing, but I imagine one could reverse engineer anything particularly cool they've done and port it to the PyCG in fuzz-introspector. |
So, thanks for merging my PRs @DavidKorczynski, but after I built OSS-Fuzz pointed at my kitchen-sink refactor branch that threw all of those changes plus some others I scrounged up, it still runs into the quadratic blowup over the number of definitions getting transitive-closured. Locally it went from running for 48+ hours to crashing the container after three hours, so some more fundamental approach is needed. I'll still rebase my kitchen-sink branch onto the latest PyCG and extract a few other refactors into a PR of things that improve the readability and might squeak out a few 5% speed improvements here or there. |
https://oss-fuzz-build-logs.storage.googleapis.com/log-3153a1bb-577c-4ac1-b2a4-254149bace81.txt
I've been looking at the python projects in oss-fuzz, and the ujson introspector job appears to get bogged down in the middle of building the callgraph ( for hypothesis in particular). Locally it looks like this impacts any project that tries to run the introspector on a python project with a hypothesis structured fuzzer, but the fuzzing step itself is unaffected.
The text was updated successfully, but these errors were encountered: