Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pydoc is obscenely slow for some modules #118465

Closed
serhiy-storchaka opened this issue May 1, 2024 · 4 comments
Closed

pydoc is obscenely slow for some modules #118465

serhiy-storchaka opened this issue May 1, 2024 · 4 comments
Labels
performance Performance or resource usage

Comments

@serhiy-storchaka
Copy link
Member

serhiy-storchaka commented May 1, 2024

For example, ./python -m pydoc test.test_enum takes 32 seconds. It is 20 seconds in 3.12, 15 seconds in 3.11 and only 1.6 seconds in 3.10. Well, perhaps test.test_enum was grown, but the main culprit is bpo-35113. And further changes like gh-106727 only added to it.

For every class without a docstring pydoc tries to find its comments by calling inspect.getcomments() which calls inspect.findsource() which reads and parses the module source, then traverse it and find classes with the specific qualname. For large modules with many classes it has quadratic complexity.

I tried to optimize the AST traversing code, and get 18 seconds on main. It still has quadratic complexity. Further optimization will require introducing a cache and finding positions of all classes in one pass.

But it all would be much simpler and faster if simply save the value of co_firstlineno of the code object executed during class creation in the file dict (as __firstlineno__ for example).

Linked PRs

@serhiy-storchaka serhiy-storchaka added the performance Performance or resource usage label May 1, 2024
@AlexWaygood AlexWaygood changed the title pydoc is obscently slow for some modules pydoc is obscenely slow for some modules May 1, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue May 1, 2024
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue May 1, 2024
It is set by compiler with the line number of the first line of
the class definition.
@serhiy-storchaka
Copy link
Member Author

#118471 reduces the time from 32 to 18 seconds.
#118475 reduces the time to just 1 second.

@carljm
Copy link
Member

carljm commented May 1, 2024

Not only does a runtime __firstlineno__ for classes fix the performance problem, it also improves correctness. In cases where there are multiple conditional definitions of a class, the previous code had to guess. Now we know exactly where the runtime class object you are asking about was actually defined.

@terryjreedy
Copy link
Member

`pyclbr produces a custom tree of class and function descriptors, including line numbers, in one visitor pass from the module ast. The consumer can then poke around the tree as desired without rereading and reparsing. There is intentionally no code execution so that unknown-to-be-safe files can be browsed.

inspect gets information from live objects. So it seems sensible that all the information inspect might need, or a source index thereto, should be included with the object. I presume the first line of Python functions is already available in the code object.

Is it possible to parse and construct an ast for a single statement starting with its first line?

Perhaps when pydoc is given a filename, it should directly call ast and run a custom visitor, like pyclbr now does, and not use inspect. This would be a separate issue from enhancing live objects for the benefit of inspect.

@serhiy-storchaka
Copy link
Member Author

Yes, the current code that searches the class definition in the sources is awful and not completely reliable, but the code that preceded it was worse, although faster.

My concern is that this attribute is only used in inspect.findsource() (and indirectly in inspect.getcomments()). But it is the only reliable solution of the specified problem, otherwise we can only guess. There may be several class definitions with the same name in the file, and the class can be renamed after creation that breaks any searching attempts.

Other solution is to deprecate inspect.findsource() and inspect.getcomments() for classes.

Perhaps when pydoc is given a filename, it should directly call ast and run a custom visitor, like pyclbr now does, and not use inspect. This would be a separate issue from enhancing live objects for the benefit of inspect.

Maybe, but usually (when used as help() in the REPL) it is given a live object: a function, a class or a module. Even if it is given an object path and can load and parse the module code, we have a problem of multiple definitions (depending on conditions), generated classes and functions (see for example turtle) and classes and functions imported from other modules where they are implemented (C implementations, submodules). I makes sense to show what the user get when they import the module, even if it is implemented elsewhere, and not what they can potentially get on other platforms or in different environment.

serhiy-storchaka added a commit that referenced this issue May 6, 2024
It is set by compiler with the line number of the first line of
the class definition.
SonicField pushed a commit to SonicField/cpython that referenced this issue May 8, 2024
)

It is set by compiler with the line number of the first line of
the class definition.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

No branches or pull requests

3 participants