You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears that the intent here (which would be logical to a Java programmer, for instance) is to ensure that the object in question is really a float, coercing it if possible, and throwing an exception if not.
Otherwise, various other code down the line will inevitably throw some other, possibly less obvious, exception. But also, it means that in the case where an object has a union type, e.g. Color:
Color=Union[
float, # GreyscaleTuple[float, float, float], # R, G, BTuple[float, float, float, float], # C, M, Y, K
]
one could (except one cannot, see below) reliably check at runtime which of the possible values it is.
But that's not what typing.cast does! It is actually type assertion (like as in TypeScript) - it says to mypy, "I know this is a float so quit complaining that it isn't". It does nothing at runtime at all.
It also turns out to be the source of some fuzz errors, since invalid or corrupted PDFs can easily have objects of the wrong type, and instead of causing a PDFSyntaxError or PDFValueError this leads to some other exception which is not caught.
The text was updated successfully, but these errors were encountered:
As someone who identifies as a Python programmer, it does hurt to be recognized as a Java one 😄
These casts were introduced to satisfy mypy when we introduced mypy type checking. Since the data flow in pdfminer.six is not always as explicit as mypy needs it to be.
Ideally we restructure the code so that these casts are no longer needed. I changed the title to reflect that.
pietermarsman
changed the title
You keep using that word cast. I do not think it means what you think it means 😀
Restructure code so that cast is less often needed
Nov 26, 2024
pdfinterp.py is full of code like this:
It appears that the intent here (which would be logical to a Java programmer, for instance) is to ensure that the object in question is really a
float
, coercing it if possible, and throwing an exception if not.Otherwise, various other code down the line will inevitably throw some other, possibly less obvious, exception. But also, it means that in the case where an object has a union type, e.g. Color:
one could (except one cannot, see below) reliably check at runtime which of the possible values it is.
But that's not what
typing.cast
does! It is actually type assertion (likeas
in TypeScript) - it says tomypy
, "I know this is afloat
so quit complaining that it isn't". It does nothing at runtime at all.This is a longstanding issue for some users of
pdfminer.six
, for example: jsvine/pdfplumber#917 (comment)It also turns out to be the source of some fuzz errors, since invalid or corrupted PDFs can easily have objects of the wrong type, and instead of causing a
PDFSyntaxError
orPDFValueError
this leads to some other exception which is not caught.The text was updated successfully, but these errors were encountered: