Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bytecode interpreter section: provide a full specification of each opcode #1078

Open
Christopher-Chianelli opened this issue Apr 17, 2023 · 4 comments
Labels
enhancement guide-new content Additions; New content or section needed needs: decision Needs consensus decision from core devs

Comments

@Christopher-Chianelli
Copy link

Describe the enhancement or feature you'd like
The documentation for the dis module provide a summary of what each opcode does. However, the summary is not enough to fully understand what each opcode actually does. For instance, the documentation for SEND:

(https://docs.python.org/3/library/dis.html#opcode-SEND)

Sends None to the sub-generator of this generator. Used in yield from and await statements.
  • It is not immediately obvious how many values are pushed or popped from the stack
  • It is not immediately obvious the fact that SEND has an oparg and branches depending on subgenerator state
  • It is not immediately obvious how one would use SEND to implement yield from (i.e. the context in which SEND is used).

I propose a full spec be given in a format that looks like this:

Opcode Name

Stack Prior: ... [expected stack state]
Stack After: ... [new stack state]

Description of Opcode

Example sources that generate the opcode

For the SEND opcode, it would look like this:

SEND(target_delta)

Stack Prior:                            ... subgenerator, sent_value
Stack if subgenerator is not exhausted: ... subgenerator, yielded_value
Stack if subgenerator is exhausted:     ... subgenerator

Pops off the top of stack, and sends it to the sub-generator of this generator. If the sub-generator is
not exhausted, the yielded value is pushed to the top of the stack. Otherwise, jump forward by
target_delta, leaving subgenerator on the stack. Used to implement yield from and await statements.

Example Sources:
# yield from subgenerator is implemented as the following loop
# (with None initially at the top of the stack)
#
# SEND (sends the top of stack to the subgenerator)
# YIELD_VALUE (returns the yielded value to the caller)
# RESUME
# JUMP_BACKWARD_NO_INTERRUPT (to SEND)
# POP_TOP (target of SEND)
#
# Before the loop, GET_YIELD_FROM_ITER is used to get the generator
# that will act as the subgenerator
yield from subgenerator

This is similar to how the Java virtual machine documents its opcodes (https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-6.html), with an additional section describing sources where the opcode are emitted.

Describe alternatives you've considered

  • Add a full specification for each opcode to the documentation for dis instead. Arguably, since a full specification need to dive deep into specific details, some CPython internals would leak into an otherwise user readable doc. Additionally, it increases the maintenance burden on dis's documentation (which currently only need to list each opcode with a brief description).

Additional context
For the majority of CPython 3.11 bytecodes, I have already written documentation for them using the above format (in Asciidoc): https://github.com/Christopher-Chianelli/optapy/blob/jpyinterpreter-docs/jpyinterpreter-docs/src/modules/ROOT/pages/opcodes/opcodes.adoc . I can convert the documentation to reStructuredText and create a PR to this repo if this issue is accepted.

@encukou
Copy link
Member

encukou commented Apr 18, 2023

IMO, this is changing way too fast to be documented here. The devguide is too version-independent.

AFAIK the stack effect info is nowadays in bytecodes.c, as (inputs -- outputs), as documented with the code.

@CAM-Gerlach
Copy link
Member

Might this belong instead in the bytecode instructions section of the dis section in the main CPython docs? That's where the opcodes are currently documented (e.g. SEND) and is version-dependent.

Should this be moved to the CPython repo? Closed? Or what are the next steps here?

@Christopher-Chianelli
Copy link
Author

I considered adding it to the documentation section of dis, but decided to create the issue here since the documentation can quickly become out of date if a bytecode developer forgets to update the documentation. In my opinion, the only thing worse than not having a specification, is having an incorrect specification. If the documentation is here, then the problem of it being out of date is not as impactful since it would only affect CPython bytecode developers (versus, all dis users), who are hopefully familiar enough with bytecode changes to recognize when something goes out of date.
I don't have a problem moving this issue to the CPython repo so it can be discussed if dis doucmentation should have a full specification.

@encukou
Copy link
Member

encukou commented Jun 13, 2023

Parts of the documentation can be auto-generated. I don't know if the source format is stable enough to maintain another consumer for it, though.

@willingc willingc added needs: decision Needs consensus decision from core devs guide-new content Additions; New content or section needed enhancement labels Oct 11, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement guide-new content Additions; New content or section needed needs: decision Needs consensus decision from core devs
Projects
None yet
Development

No branches or pull requests

4 participants