Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docx: use proper DPI for fallback PNG #9288

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open

Conversation

edwintorok
Copy link
Contributor

Pandoc calls rsvg-convert by specifying only a DPI. But that causes rsvg-convert to try to guess a pixel size as documented in its manpage.

Quite often that will result in a very low dpi (much lower than 72).
That is actually a useful diagnostic tool to see when the fallback image is used instead of SVG, but there should be a way to override this from the command-line.

Here is how a document using the SVG_logo from pandoc's test/command looks like.

test.md:

![Caption test](images/SVG_logo){fig-alt="Alt test" width="5in"}

Convert with:
/usr/bin/pandoc --default-image-extension=svg img.md -t docx -o img.docx --trace

This is how it looks like in the free version of Office.com:
office

LibreOffice 7.6.4.1:
lo

Google Docs:
gdocs

The PNG produced has only 100x100 pixels (matching the SVG's viewbox):

$ unzip -o img.docx
$ file word/media/rId23.png
word/media/rId23.png: PNG image data, 100 x 100, 8-bit/color RGBA, non-interlaced

And the --dpi flag of pandoc has no effect, you can specify a DPI of 1000 and it'll still produce the exact same image.

Solution

Pandoc needs to tell rsvg-convert the desired size in the document, at least when it is known in inches or pixels.
Add the image attributes to the mediabag to allow access to this information in the fallback conversion code.

Currently this is only done when something like a markdown source is used, I haven't updated all the other readers to extract the image attributes.

It'd be good if it also did this when the image is specified as a percentage of pagesize, but I couldn't figure out how to do the asks pagewidth inside PandocMonad.

And now it looks OK everywhere:
Google Docs:
gdocsok
LibreOffice:
logood
Office:
officeok

Workarounds:

  • Convert SVG to .EMF:
$ inkscape images/SVG_logo.svg --export-filename images/SVG_logo.emf
$ pandoc --default-image-extension=emf img.md -t docx -o img.docx --trace

That will look better, but still have jagged edges everywhere except Office365.
The advantage is that this format doesn't need any PNG fallbacks, and remains an actual vector format.
Still out of all the current options, this is currently (with an unpatched pandoc) the best way I could find to embed vector images.

  • Use ODT instead

That will look the best with LibreOffice:
odtlocalok

But Google Docs will convert to some unknown DPI when opened and still look bad
odtjagged

@edwintorok
Copy link
Contributor Author

edwintorok commented Dec 26, 2023

Draft pull request, because I'm not sure how you'd like to address this in the readers of other formats.
And storing all the attributes may be too much, perhaps we should store just the desired size in pt in the mediabag instead.

BTW the ODT may need a fallback image stored too, if not when you open and save in LibreOffice it'll add one, but it'll be a very low resolution one too.

@edwintorok
Copy link
Contributor Author

I think this is ready for review now:

  • it doesn't require changes in the Lua API
  • almost no changes to writers other than Docx
  • properly supports width percentages
  • creates multiple fallback images for different sizes as needed
  • test added for SVG->PNG fallback generation

One disadvantage that I can see:

  • requires small change to PandocMonad API

I'd like to hear your thoughts on whether this is going in the right direction

@edwintorok edwintorok marked this pull request as ready for review December 27, 2023 13:07
@jgm
Copy link
Owner

jgm commented Dec 27, 2023

Two quick comments:

  • pageWidth as a Maybe Int -- what are the units here? That wasn't clear to me.
  • I'm not persuaded we need a change in the PandocMonad API for this. The change is added runConversion (which is just a thin wrapper around pipeProcess, and actually not limited to conversions, but quite a general function for running any program). runConversion is only used once in this PR, and we could just use pipeProcess directly there to do the same thing, as far as I can see.

I'd like to see a more minimal patch without the PandocMonad change, or a stronger rationale for adding runConversion.

@edwintorok
Copy link
Contributor Author

edwintorok commented Dec 27, 2023

Thanks, I'll add some comments about pagewidth, I think the units are points.

Another approach that I considered (that avoids modifying PandocMonad) is to add a getPageWidth to Docx.hs, although this will perform some operations twice (parsing the reference.docx to figure out the size). The code to do that parsing can be factored out into a separate top-level function but it'll get called twice.
I cannot call pipeProcess directly because that is in the IO monad, and the Docx writer runs in the Pandoc monad.

@jgm
Copy link
Owner

jgm commented Dec 27, 2023

Oh, I see now; you changed the type of svgToPng so it works with PandocMonad instead of MonadIO.

@jgm
Copy link
Owner

jgm commented Dec 27, 2023

What troubles me is that runConversion is just very close to runAnyProgram. For example, you could do runConversion("rm","anyFile.txt",mempty).

If we were okay with that, it would be cleaner, I think, just to require PandocMonad instances to be instances of MonadIO (so liftIO works); we could define an instance for pure monads that just throws an error. I've considered that at times, but always shied away.

An alternative would be just to add svgToPng to PandocMonad. This wouldn't be a universal escape hatch for running programs.

@edwintorok
Copy link
Contributor Author

I understand the desire the restrict what you can do (to avoid bugs and security issues).

I can make the change to use svgToPng in the pandocmonad, shouldn't take long.

This will be needed to run the conversion inside the PandocMonad,
where we know the desired image size.

The arguments are: (dpi, width, height).
The width and height is optional to more easily convert existing code.

[API change]

Signed-off-by: Edwin Török <[email protected]>
Introduce getOrCreateFallback, and pass the desired size in points to
rsvg-convert.
Otherwise it'll guess the size based on the SVG's viewbox and completely
ignore the DPI argument.

Signed-off-by: Edwin Török <[email protected]>
Just look at --trace output.
Can't use a golden test because the actual .png will be different
depending on rsvg-convert version.

Signed-off-by: Edwin Török <[email protected]>
@edwintorok
Copy link
Contributor Author

edwintorok commented Dec 27, 2023

The updated API changing commit is here: 682f972

I made both width and height optional to keep this commit smaller and the interface more flexible (should you want to revert back to the old behaviour then just pass Nothing, Nothing, the API doesn't have to be changed again).

I don't particularly like that there is a 4-tuple here instead of 4 arguments, but I couldn't figure out how to implement the various instances and lift for 4 arguments (all other APIs here take only 1 argument).

Should I introduce a data record here to hold those 4 arguments instead?

@jgm
Copy link
Owner

jgm commented Dec 27, 2023

With one argument you can do e.g.

  putCommonState = lift . putCommonState

but with four something like this should work:

  svgToPng w x y z = lift $ svgToPng w x y z

@edwintorok
Copy link
Contributor Author

but with four something like this should work:

  svgToPng w x y z = lift $ svgToPng w x y z

Oh that is surprisingly simple, I was too caught up in trying to keep the LHS short.

I pushed a fixup, see whether you like it better or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants