Introspector Builds failing for ujson #11770

gordonfierce · 2024-04-01T20:59:22Z

https://oss-fuzz-build-logs.storage.googleapis.com/log-3153a1bb-577c-4ac1-b2a4-254149bace81.txt

I've been looking at the python projects in oss-fuzz, and the ujson introspector job appears to get bogged down in the middle of building the callgraph ( for hypothesis in particular). Locally it looks like this impacts any project that tries to run the introspector on a python project with a hypothesis structured fuzzer, but the fuzzing step itself is unaffected.

jonathanmetzman · 2024-04-02T14:14:11Z

CC @DavidKorczynski

DavidKorczynski · 2024-04-09T23:15:04Z

Did you try to run this locally @gordonfierce ? It looks like something within pycg is either processing a lot or some infinite loop. I've found PyCG tedious to debug so before digging down I wanted to check if you have anymore insights?

gordonfierce · 2024-04-10T14:13:30Z

Yeah, I have been doing some profiling and although the other changes I've made have been net-positive, the real bottleneck appears to be in the transitive_closure function here: https://github.com/AdaLogics/PyCG/blob/master/pycg/machinery/definitions.py#L85-L114

That gets called multiple times per outermost loop, builds the transitive closure dictionary from scratch every time, and the runtime grows with the number of definitions. So at some point while looping over every file in hypothesis it gets up to a few thousand definitions every time a file loads. It's possible to squeak out a little more performance just twiddling language features like I was doing, but the next step forward is likely to do a better job caching the closured dictionary, initializing the transitive closure function better per-file so it's not starting from nothing every run, or finding a better data structure for this.

I'm going on a trip tomorrow so I'll probably not get back to it until Monday but it's my current ongoing project to figure out what else can be improved here. But early this week I happened to come across this project where some folks have taken their own fork of PyCG and improved on it with their own call graph approach pythonJaRvis.github.io. I haven't dug all of the way in, but notably they claim to scale much better and I noticed that they don't have the transitive_closure function at all. Looking around their repo it's not terribly close to something that fuzz-introspector can start building with, but it was a stroke of luck I saw them in the citation graph on arxiv. Will take a bit of diffing and testing, but I imagine one could reverse engineer anything particularly cool they've done and port it to the PyCG in fuzz-introspector.

gordonfierce · 2024-04-10T14:25:51Z

So, thanks for merging my PRs @DavidKorczynski, but after I built OSS-Fuzz pointed at my kitchen-sink refactor branch that threw all of those changes plus some others I scrounged up, it still runs into the quadratic blowup over the number of definitions getting transitive-closured. Locally it went from running for 48+ hours to crashing the container after three hours, so some more fundamental approach is needed. I'll still rebase my kitchen-sink branch onto the latest PyCG and extract a few other refactors into a PR of things that improve the readability and might squeak out a few 5% speed improvements here or there.

DavidKorczynski self-assigned this Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Introspector Builds failing for ujson #11770

Introspector Builds failing for ujson #11770

gordonfierce commented Apr 1, 2024

jonathanmetzman commented Apr 2, 2024

DavidKorczynski commented Apr 9, 2024

gordonfierce commented Apr 10, 2024

gordonfierce commented Apr 10, 2024

Introspector Builds failing for ujson #11770

Introspector Builds failing for ujson #11770

Comments

gordonfierce commented Apr 1, 2024

jonathanmetzman commented Apr 2, 2024

DavidKorczynski commented Apr 9, 2024

gordonfierce commented Apr 10, 2024

gordonfierce commented Apr 10, 2024