Radon struggles to collect statistics from large files #223

Sam152 · 2021-08-10T09:00:13Z

I have some large files that radon struggles to analyse. I created an example to demonstrate the problem: https://gist.githubusercontent.com/Sam152/50e8ef27cceb899084b42a069237a7b8/raw/bb21870395df86a0062c22353b532b45d31bd3f5/sample.py (~800 lines)

In my case running radon raw big-package takes 28.38s. In reality the module I'm trying to analyse has ~ 5000 lines with a similar amount of AST per line.

If I double my 800 line example, the script takes roughly 115.50s to run, so my feeling is that there might be something which scales worse than O(n) per-AST.

Any pointers if there might be something that can be optimised here, or if the nature of the analysis is such that speeding this process up is simply not possible?

Thanks in advance, if anyone can share their experience.

Cheers,
Sam

On a side note, while researching this issue, I found radon cited in an academic paper, which I thought was interesting and worth sharing (https://arxiv.org/pdf/2007.08978.pdf).

The text was updated successfully, but these errors were encountered:

rubik · 2021-08-26T08:40:02Z

Hi Sam, thanks for sharing the example. Indeed, it's quite surprising to see such a long run time for such a simple file.

The raw command is definitely the slowest, and that's because it does not use the ast module to parse the file, instead it uses tokenize. The latter is written in pure Python instead of C, so that's already a slowing factor. Moreover, when parsing the AST we can use efficient techniques like the visitor pattern, which are not available with the tokenize module.

However, the superlinear complexity is definitely in Radon's code. It performs some complicated operations to count logical lines, and I suspect that's where the slowest code is. I think your example highlights one of the inefficiencies particularly well.

The next step would be to profile the code. A flamegraph should already give some very useful hints. I'll try to investigate this when I've got time.

Sam152 · 2021-09-03T03:16:57Z

Thanks for the info, that's really helpful!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Radon struggles to collect statistics from large files #223

Radon struggles to collect statistics from large files #223

Sam152 commented Aug 10, 2021

rubik commented Aug 26, 2021

Sam152 commented Sep 3, 2021

Radon struggles to collect statistics from large files #223

Radon struggles to collect statistics from large files #223

Comments

Sam152 commented Aug 10, 2021

rubik commented Aug 26, 2021

Sam152 commented Sep 3, 2021