Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Images aren't save with an extension corresponding to the MIME type #13190

Open
Paebbels opened this issue Dec 22, 2024 · 5 comments
Open

Images aren't save with an extension corresponding to the MIME type #13190

Paebbels opened this issue Dec 22, 2024 · 5 comments
Labels

Comments

@Paebbels
Copy link

Paebbels commented Dec 22, 2024

Describe the bug

When referencing a shield from shields.io, the shield URL has no dedicated extension like .png. Such a behavior is allowed. That's also why HTTP uses MIME types to identify the transmitted content types such as image/png.

Example URL: https://img.shields.io/badge/doc-CC--BY%204.0-green?longCache=true&style=flat-square&logo=CreativeCommons&logoColor=fff

Behavior of Sphinx:

  1. The image is downloaded as doc-CC--BY%204.0-green with file extension *.0-green.
  2. The space escaping %20 isn't rewritten to avoid problems in LaTeX (comment sign).
    %20 is not escaped in image file names/URLs for LaTeX #13189 has been reported for this issue.
  3. The generated LaTeX is \sphinxincludegraphics[height=22\sphinxpxdimen]{{doc-CC--BY%2041}.0-green}.
    I'm not sure about the Sphinx specific includegraphics macro, but usually, LaTeX should have a look at the file extension (file type) to decide for further file processing.

How to Reproduce

Example code:

.. |SHIELD:svg:pyTooling-doc-license| image:: https://img.shields.io/badge/doc-CC--BY%204.0-green?longCache=true&style=flat-square&logo=CreativeCommons&logoColor=fff
   :alt: Documentation License
   :height: 22
   :target: License.html

|SHIELD:svg:pyTooling-doc-license|

Output:

Generated LaTeX Code:

\sphinxAtStartPar
\sphinxhref{https://GitHub.com/pyTooling/Actions}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{pyTooling-Actions-63bf7f}.png}} \sphinxhref{https://GitHub.com/pyTooling/Actions/blob/main/LICENSE.md}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{pyTooling}.png}} \sphinxhref{https://pyTooling.github.io/pyTooling/}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{website}.png}} \sphinxhref{https://GitHub.com/pyTooling/Actions/blob/main/doc/License.rst}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{doc-CC--BY%2041}.0-green}}
\sphinxhref{https://GitHub.com/pyTooling/Actions/tags}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{Actions}.png}} \sphinxhref{https://GitHub.com/pyTooling/Actions/releases}{\sphinxincludegraphics[height=22\sphinxpxdimen]{{Actions1}.png}}

Expected behavior:

  • Download external resources and save files with respect to the contents MIME type.
    • If file extension of the URL is identical to the MIME types extensions, use filename as is.
    • If file extension is not identical, add the MIME types preferred file extensions.
  • Generate LaTeX code, using the correct file extensions.

Current filename: doc-CC--BY%204.0-green
Corrected filename: doc-CC--BY%204.0-green.png

Environment Information

Platform:              win32; (Windows-11-10.0.22631-SP0)
Python version:        3.13.1 (tags/v3.13.1:0671451, Dec  3 2024, 19:06:28) [MSC v.1942 64 bit (AMD64)])
Python implementation: CPython
Sphinx version:        8.1.3
Docutils version:      0.21.2
Jinja2 version:        3.1.4
Pygments version:      2.18.0

Sphinx extensions

No extensions needed.

Additional context

LaTeX environment: MikTeX (all updated)
LaTeX processor: xelatex

This error happens when the LaTeX code is generated. No influence by the LaTeX system itself.

@jfbu
Copy link
Contributor

jfbu commented Dec 22, 2024

3. I'm not sure about the Sphinx specific includegraphics macro, but usually, LaTeX should have a look at the file extension (file type) to decide for further file processing.

Only to mention that LaTeX does not require an explicit file extension, it will try (when building PDF from LaTEX source file) various extensions until a file is found. However in your MWE indeed the .0-green is interpreted as an explicit filename extension.

Note that generally speaking it is to be expected that it is not easy for builders such as LaTeX to handle sources which contain HTML specific mark-up and conventions. A simpler example drops ?longCache=true&style=flat-square&logo=CreativeCommons&logoColor=fff.

With

.. image:: https://img.shields.io/badge/doc-CC--BY%204.0-green

input, the HTML will look like this

<img alt="https://img.shields.io/badge/doc-CC--BY%204.0-green" src="https://img.shields.io/badge/doc-CC--BY%204.0-green" />

so 0-green looks like an extension but web browsers know how to fetch correct filenames.

Trying make latex raises a warning

/path/to/sphinxtests/13190_image_filenames/index.rst:14: WARNING: a suitable image for latex builder not found: ['image/svg+xml'] (https://img.shields.io/badge/doc-CC--BY%204.0-green)

and creates a LATEX mark-up like this

\noindent\sphinxincludegraphics{{/path/to/sphinxtests/13190_image_filenames/_build/doctrees/images/c00128cef661b557f379b5ecea39df4b8203ab34/doc-CC--BY%204}.0-green}

which does interpret 0-green as file extension.

The warning is misleading because the distant image file was indeed fetched and stored at the indicated location. The warning seems to say that only an svg extension was looked for, it appears the existing https://img.shields.io/badge/doc-CC--BY%204.0-green.png was not tried out else it would have been found. I am lacking disponibility but perhaps there is a bug here that https://img.shields.io/badge/doc-CC--BY%204.png was looked for (considering .png as alternative extension to .0-green).

@Paebbels
Copy link
Author

Your judgement isn't fully correct.

  1. Yes, LaTeX checks multiple extensions, to find the most suitable one, but it needs to have an extension. Yes, it doesn't need the extension in the LaTeX code, but it needs extensions on disk.
  2. The file is downloaded by Sphinx and stored on disk. LaTeX doesn't read the file (live) from the shields.io server. Thus, it's the responsibility of Sphinx to correctly download and store resources it caches on disk.
    Download path: _build/latex/doc-CC--BY%204.0-green

I've added the generated LaTeX code to the bug description.

As an explanation for shields.io, they use 2 different subdomains:

  • img.shields.io => SVG
  • raster.shields.io => PNG

@jfbu
Copy link
Contributor

jfbu commented Dec 22, 2024

Please avoid saying that comments are "not fully correct" when you are referring to your own interpretation of what I wrote, if you want to bear a by now small chance that I devote some of my unpaid time to your problems with Sphinx.

It is not latex which does the act of including an image file into a produced PDF but the pdftex engine itself of course which is producing that PDF file. The binary pdflatex is only a slim wrapper of pdftex telling it to use the so-called latex format file which is a (big) pre-digested set of TeX macros, and it is not what we mean when we refer to "latex". So when I say that latex needs no explicit extension I am 100% fully correct as the context implies I am not referring to the actual files but only to the \includegraphics usage (which is a LaTeX macro whose functioning relies on some so-called "drivers" in the TeX distributions which are specific files translating into engine primitives).

Besides if I try to execute on the macOS command line open -a Preview doc-CC--BY%204.0-green it simply does not work. And I can't open the file using the GUI either (at least of Preview.app). So it is generally speaking the case that image files need to have a correct extension in their filenames, when we discuss the general matter of image files and binaires able to open them and display them to the screen.

@Paebbels
Copy link
Author

[...] if you want to bear a by now small chance that I devote some of my unpaid time to your problems with Sphinx.

Sorry, but I don't think you're the only developer spending unpaid time.


So please, let's come back to the problem, which is not saving downloaded files according to the MIME type. The MIME type is present in the HTTP response to the Python code within the download routine.

The presented problem isn't related to LaTeX.

@jfbu
Copy link
Contributor

jfbu commented Dec 24, 2024

The presented problem isn't related to LaTeX.

Very well. I will thus leave to others the task of reviewing PRs solving this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants