Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restructure code so that cast is less often needed #1042

Open
dhdaines opened this issue Sep 19, 2024 · 3 comments
Open

Restructure code so that cast is less often needed #1042

dhdaines opened this issue Sep 19, 2024 · 3 comments
Labels
status: accepted type: development Related to the development process for maintainers

Comments

@dhdaines
Copy link
Contributor

dhdaines commented Sep 19, 2024

pdfinterp.py is full of code like this:

    def do_G(self, gray: PDFStackT) -> None:
        """Set gray level for stroking operations"""
        self.graphicstate.scolor = cast(float, gray)
        self.scs = self.csmap["DeviceGray"]

It appears that the intent here (which would be logical to a Java programmer, for instance) is to ensure that the object in question is really a float, coercing it if possible, and throwing an exception if not.

Otherwise, various other code down the line will inevitably throw some other, possibly less obvious, exception. But also, it means that in the case where an object has a union type, e.g. Color:

Color = Union[
    float,  # Greyscale
    Tuple[float, float, float],  # R, G, B
    Tuple[float, float, float, float],  # C, M, Y, K
]

one could (except one cannot, see below) reliably check at runtime which of the possible values it is.

But that's not what typing.cast does! It is actually type assertion (like as in TypeScript) - it says to mypy, "I know this is a float so quit complaining that it isn't". It does nothing at runtime at all.

This is a longstanding issue for some users of pdfminer.six, for example: jsvine/pdfplumber#917 (comment)

It also turns out to be the source of some fuzz errors, since invalid or corrupted PDFs can easily have objects of the wrong type, and instead of causing a PDFSyntaxError or PDFValueError this leads to some other exception which is not caught.

@dhdaines
Copy link
Contributor Author

dhdaines commented Sep 19, 2024

Note that Color of 4 values could also be RGBA, but that's beside the point ;-)

(this is wrong, there is no RGBA ... it could be DeviceN though)

@pdfminer pdfminer deleted a comment from KaboChow Nov 26, 2024
@pietermarsman
Copy link
Member

pietermarsman commented Nov 26, 2024

As someone who identifies as a Python programmer, it does hurt to be recognized as a Java one 😄

These casts were introduced to satisfy mypy when we introduced mypy type checking. Since the data flow in pdfminer.six is not always as explicit as mypy needs it to be.

Ideally we restructure the code so that these casts are no longer needed. I changed the title to reflect that.

@pietermarsman pietermarsman changed the title You keep using that word cast. I do not think it means what you think it means 😀 Restructure code so that cast is less often needed Nov 26, 2024
@pietermarsman pietermarsman added type: development Related to the development process for maintainers status: accepted labels Nov 26, 2024
@dhdaines
Copy link
Contributor Author

As someone who identifies as a Python programmer, it does hurt to be recognized as a Java one 😄

My apologies ;-) not referring to you specifically!

Ideally we restructure the code so that these casts are no longer needed. I changed the title to reflect that.

In many cases they can simply be changed to num_value...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: accepted type: development Related to the development process for maintainers
Projects
None yet
Development

No branches or pull requests

3 participants
@dhdaines @pietermarsman and others