Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Image source turning into .html for xhtml2pdf #3298

Open
PeterNerlich opened this issue Dec 17, 2024 · 0 comments
Open

Image source turning into .html for xhtml2pdf #3298

PeterNerlich opened this issue Dec 17, 2024 · 0 comments
Labels
bug Something isn't working needs-further-investigation This issue needs some further research/investigation

Comments

@PeterNerlich
Copy link
Collaborator

PeterNerlich commented Dec 17, 2024

Describe the Bug

Found in the server logs. Confirmation is needed whether the PDF was still exported or whether the process aborted.

PIL cannot load the image because it is itself a HTML file. The content is logged as
<img class="alignnone size-full wp-image-576" src="https://www.bamf.de/DE/Themen/AsylFluechtlingsschutz/AblaufAsylverfahrens/Ausgang/Aufenthaltserlaubnis/aufenthaltserlaubnis-node.html" alt="" width="150" height="150"/>,
showing that the src is not referring to an image file as would be expected.

  • Either the source is already wrong in the content in our database, in which case this would be a user error, but we shouldn't allow it or at least should show a warning on save. (One reason against prohibiting image sources ending in .html would be that file endings like that are merely convention, and there is nothing stopping a web server from delivering any resource under any URI, regardless of things like filename extensions)
  • Another option could be that between reading the content string from our database and xhtml2pdf interpreting it, the URL in src is requested and served with a redirect, rewriting the src or at least making it appear as the redirect target in the logs. Some servers do this to prevent other websites from using their images I think, which is essentially others off-loading bandwidth for those images to their infrastructure without them profiting from ads or the user seeing their services. The image might also just have been removed from the server since the content was last edited in our system.

Steps to Reproduce

TBD – find out which page was attempted to be exported as PDF at that moment and what the potential original source resulting in this is

Expected Behavior

Foreign images included in content should either work across all aspects of our system or fail to do so right away, visible to the editor. Missing or misbehaving images from external systems should be handled gracefully, with the PDF still exporting without the image or a placeholder.

Actual Behavior

The error and traceback appeared multiple times in the server logs. Investigation is needed whether the PDF was still exported or whether the process aborted.

Additional Information

Log Except
Dec 17 12:46:07 WARNING django.request - 404 Not Found: /api/v3/regions/lesvos/
Dec 17 12:46:25 WARNING xhtml2pdf - Error in handling image
'<img class="alignnone size-full wp-image-576" src="https://www.bamf.de/DE/Themen/AsylFluechtlingsschutz/AblaufAsylverfahrens/Ausgang/Aufenthaltserlaubnis/aufenthaltserlaubnis-node.html" alt="" width="150" height="150"/>'
Traceback (most recent call last):
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 323, in __init__
    self._image = self._read_image(self.fp)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 353, in _read_image
    return PILImage.open(fp)
           ^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/PIL/Image.py", line 3498, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f6d09fad5d0> fileName=<_io.BytesIO object at 0x7f6d09fad5d0>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/tags.py", line 354, in start
    img = PmlImage(
          ^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 467, in __init__
    img = self.getImage()
          ^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 528, in getImage
    img = PmlImageReader(imgdata)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 342, in __init__
    raise RuntimeError("{0} {1} {2}".format(et, ev, tb))
RuntimeError: <class 'PIL.UnidentifiedImageError'> cannot identify image file <_io.BytesIO object at 0x7f6d09fad5d0> fileName=<_io.BytesIO object at 0x7f6d09fad5d0> <traceback object at 0x7f6cf97ce640>
Dec 17 12:46:25 WARNING xhtml2pdf - Error in handling image
'<img class="alignnone size-full wp-image-573" src="https://www.bamf.de/DE/Themen/AsylFluechtlingsschutz/AblaufAsylverfahrens/Ausgang/Aufenthaltserlaubnis/aufenthaltserlaubnis-node.html" alt="" width="150" height="150"/>'
Traceback (most recent call last):
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 323, in __init__
    self._image = self._read_image(self.fp)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 353, in _read_image
    return PILImage.open(fp)
           ^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/PIL/Image.py", line 3498, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f6cf8874ae0> fileName=<_io.BytesIO object at 0x7f6cf8874ae0>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/tags.py", line 354, in start
    img = PmlImage(
          ^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 467, in __init__
    img = self.getImage()
          ^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 528, in getImage
    img = PmlImageReader(imgdata)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 342, in __init__
    raise RuntimeError("{0} {1} {2}".format(et, ev, tb))
RuntimeError: <class 'PIL.UnidentifiedImageError'> cannot identify image file <_io.BytesIO object at 0x7f6cf8874ae0> fileName=<_io.BytesIO object at 0x7f6cf8874ae0> <traceback object at 0x7f6cfb4e60c0>
Dec 17 12:46:26 WARNING django.request - 404 Not Found: /api/v3/augsburg/ro/parents/?url=/augsburg/ro/sanatatea/proiecte-de-s%C4%83n%C4%83tate
Dec 17 12:46:26 WARNING xhtml2pdf - Error in handling image
'<img class="alignnone wp-image-600" src="https://www.bamf.de/DE/Themen/AsylFluechtlingsschutz/FamilienasylFamiliennachzug/familienasylfamiliennachzug-node.html" alt="" width="15" height="15"/>'
Traceback (most recent call last):
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 323, in __init__
    self._image = self._read_image(self.fp)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 353, in _read_image
    return PILImage.open(fp)
           ^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/PIL/Image.py", line 3498, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f6d08f55800> fileName=<_io.BytesIO object at 0x7f6d08f55800>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/tags.py", line 354, in start
    img = PmlImage(
          ^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 467, in __init__
    img = self.getImage()
          ^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 528, in getImage
    img = PmlImageReader(imgdata)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 342, in __init__
    raise RuntimeError("{0} {1} {2}".format(et, ev, tb))
RuntimeError: <class 'PIL.UnidentifiedImageError'> cannot identify image file <_io.BytesIO object at 0x7f6d08f55800> fileName=<_io.BytesIO object at 0x7f6d08f55800> <traceback object at 0x7f6cf8a78f00>
Dec 17 12:46:27 WARNING django.request - 404 Not Found: /api/v3/augsburg/ro/parents/?url=/augsburg/ro/sanatatea/proiecte-de-s%C4%83n%C4%83tate

[…]

Dec 17 12:58:23 WARNING xhtml2pdf - Error in handling image
'<img class="alignnone wp-image-588" src="https://integreat.app/bodenseekreis/en/authorities-and-advice/emergency-numbers-sos" alt="" width="15" height="15"/>'
Traceback (most recent call last):
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 323, in __init__
    self._image = self._read_image(self.fp)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 353, in _read_image
    return PILImage.open(fp)
           ^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/PIL/Image.py", line 3498, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f6d09db8f90> fileName=<_io.BytesIO object at 0x7f6d09db8f90>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/tags.py", line 354, in start
    img = PmlImage(
          ^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 467, in __init__
    img = self.getImage()
          ^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 528, in getImage
    img = PmlImageReader(imgdata)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 342, in __init__
    raise RuntimeError("{0} {1} {2}".format(et, ev, tb))
RuntimeError: <class 'PIL.UnidentifiedImageError'> cannot identify image file <_io.BytesIO object at 0x7f6d09db8f90> fileName=<_io.BytesIO object at 0x7f6d09db8f90> <traceback object at 0x7f6d090999c0>
Dec 17 12:58:23 WARNING xhtml2pdf - Error in handling image
'<img class="alignnone wp-image-345" src="https://integreat.app/bodenseekreis/en/health/general-information/medicines-and-pharmacies" alt="" width="15" height="15"/>'
Traceback (most recent call last):
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 323, in __init__
    self._image = self._read_image(self.fp)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 353, in _read_image
    return PILImage.open(fp)
           ^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/PIL/Image.py", line 3498, in open
    raise UnidentifiedImageError(msg)
PIL.UnidentifiedImageError: cannot identify image file <_io.BytesIO object at 0x7f6d08bcf330> fileName=<_io.BytesIO object at 0x7f6d08bcf330>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/tags.py", line 354, in start
    img = PmlImage(
          ^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 467, in __init__
    img = self.getImage()
          ^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 528, in getImage
    img = PmlImageReader(imgdata)
          ^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/integreat-cms/.venv/lib/python3.11/site-packages/xhtml2pdf/xhtml2pdf_reportlab.py", line 342, in __init__
    raise RuntimeError("{0} {1} {2}".format(et, ev, tb))
RuntimeError: <class 'PIL.UnidentifiedImageError'> cannot identify image file <_io.BytesIO object at 0x7f6d08bcf330> fileName=<_io.BytesIO object at 0x7f6d08bcf330> <traceback object at 0x7f6d08ac6e00>

Related Issues

@PeterNerlich PeterNerlich added bug Something isn't working needs-further-investigation This issue needs some further research/investigation labels Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working needs-further-investigation This issue needs some further research/investigation
Projects
None yet
Development

No branches or pull requests

1 participant