docx: use proper DPI for fallback PNG #9288

edwintorok · 2023-12-26T18:11:52Z

Pandoc calls rsvg-convert by specifying only a DPI. But that causes rsvg-convert to try to guess a pixel size as documented in its manpage.

Quite often that will result in a very low dpi (much lower than 72).
That is actually a useful diagnostic tool to see when the fallback image is used instead of SVG, but there should be a way to override this from the command-line.

Here is how a document using the SVG_logo from pandoc's test/command looks like.

test.md:

![Caption test](images/SVG_logo){fig-alt="Alt test" width="5in"}

Convert with:
/usr/bin/pandoc --default-image-extension=svg img.md -t docx -o img.docx --trace

This is how it looks like in the free version of Office.com:

LibreOffice 7.6.4.1:

Google Docs:

The PNG produced has only 100x100 pixels (matching the SVG's viewbox):

$ unzip -o img.docx
$ file word/media/rId23.png
word/media/rId23.png: PNG image data, 100 x 100, 8-bit/color RGBA, non-interlaced

And the --dpi flag of pandoc has no effect, you can specify a DPI of 1000 and it'll still produce the exact same image.

Solution

Pandoc needs to tell rsvg-convert the desired size in the document, at least when it is known in inches or pixels.
Add the image attributes to the mediabag to allow access to this information in the fallback conversion code.

Currently this is only done when something like a markdown source is used, I haven't updated all the other readers to extract the image attributes.

It'd be good if it also did this when the image is specified as a percentage of pagesize, but I couldn't figure out how to do the asks pagewidth inside PandocMonad.

And now it looks OK everywhere:
Google Docs:

LibreOffice:

Office:

Workarounds:

Convert SVG to .EMF:

$ inkscape images/SVG_logo.svg --export-filename images/SVG_logo.emf
$ pandoc --default-image-extension=emf img.md -t docx -o img.docx --trace

That will look better, but still have jagged edges everywhere except Office365.
The advantage is that this format doesn't need any PNG fallbacks, and remains an actual vector format.
Still out of all the current options, this is currently (with an unpatched pandoc) the best way I could find to embed vector images.

Use ODT instead

That will look the best with LibreOffice:

But Google Docs will convert to some unknown DPI when opened and still look bad

edwintorok · 2023-12-26T18:17:47Z

Draft pull request, because I'm not sure how you'd like to address this in the readers of other formats.
And storing all the attributes may be too much, perhaps we should store just the desired size in pt in the mediabag instead.

BTW the ODT may need a fallback image stored too, if not when you open and save in LibreOffice it'll add one, but it'll be a very low resolution one too.

edwintorok · 2023-12-27T13:07:21Z

I think this is ready for review now:

it doesn't require changes in the Lua API
almost no changes to writers other than Docx
properly supports width percentages
creates multiple fallback images for different sizes as needed
test added for SVG->PNG fallback generation

One disadvantage that I can see:

requires small change to PandocMonad API

I'd like to hear your thoughts on whether this is going in the right direction

jgm · 2023-12-27T16:56:08Z

Two quick comments:

pageWidth as a Maybe Int -- what are the units here? That wasn't clear to me.
I'm not persuaded we need a change in the PandocMonad API for this. The change is added runConversion (which is just a thin wrapper around pipeProcess, and actually not limited to conversions, but quite a general function for running any program). runConversion is only used once in this PR, and we could just use pipeProcess directly there to do the same thing, as far as I can see.

I'd like to see a more minimal patch without the PandocMonad change, or a stronger rationale for adding runConversion.

edwintorok · 2023-12-27T18:02:44Z

Thanks, I'll add some comments about pagewidth, I think the units are points.

Another approach that I considered (that avoids modifying PandocMonad) is to add a getPageWidth to Docx.hs, although this will perform some operations twice (parsing the reference.docx to figure out the size). The code to do that parsing can be factored out into a separate top-level function but it'll get called twice.
I cannot call pipeProcess directly because that is in the IO monad, and the Docx writer runs in the Pandoc monad.

jgm · 2023-12-27T18:44:36Z

Oh, I see now; you changed the type of svgToPng so it works with PandocMonad instead of MonadIO.

jgm · 2023-12-27T18:49:50Z

What troubles me is that runConversion is just very close to runAnyProgram. For example, you could do runConversion("rm","anyFile.txt",mempty).

If we were okay with that, it would be cleaner, I think, just to require PandocMonad instances to be instances of MonadIO (so liftIO works); we could define an instance for pure monads that just throws an error. I've considered that at times, but always shied away.

An alternative would be just to add svgToPng to PandocMonad. This wouldn't be a universal escape hatch for running programs.

edwintorok · 2023-12-27T18:53:22Z

I understand the desire the restrict what you can do (to avoid bugs and security issues).

I can make the change to use svgToPng in the pandocmonad, shouldn't take long.

Signed-off-by: Edwin Török <[email protected]>

This will be needed to run the conversion inside the PandocMonad, where we know the desired image size. The arguments are: (dpi, width, height). The width and height is optional to more easily convert existing code. [API change] Signed-off-by: Edwin Török <[email protected]>

Introduce getOrCreateFallback, and pass the desired size in points to rsvg-convert. Otherwise it'll guess the size based on the SVG's viewbox and completely ignore the DPI argument. Signed-off-by: Edwin Török <[email protected]>

Just look at --trace output. Can't use a golden test because the actual .png will be different depending on rsvg-convert version. Signed-off-by: Edwin Török <[email protected]>

Signed-off-by: Edwin Török <[email protected]>

edwintorok · 2023-12-27T19:45:54Z

The updated API changing commit is here: 682f972

I made both width and height optional to keep this commit smaller and the interface more flexible (should you want to revert back to the old behaviour then just pass Nothing, Nothing, the API doesn't have to be changed again).

I don't particularly like that there is a 4-tuple here instead of 4 arguments, but I couldn't figure out how to implement the various instances and lift for 4 arguments (all other APIs here take only 1 argument).

Should I introduce a data record here to hold those 4 arguments instead?

jgm · 2023-12-27T19:59:06Z

With one argument you can do e.g.

  putCommonState = lift . putCommonState

but with four something like this should work:

  svgToPng w x y z = lift $ svgToPng w x y z

edwintorok · 2023-12-27T20:07:42Z

but with four something like this should work:
  svgToPng w x y z = lift $ svgToPng w x y z

Oh that is surprisingly simple, I was too caught up in trying to keep the LHS short.

I pushed a fixup, see whether you like it better or not.

edwintorok force-pushed the svg branch 5 times, most recently from 6da581d to f06bb78 Compare December 27, 2023 13:05

edwintorok marked this pull request as ready for review December 27, 2023 13:07

edwintorok force-pushed the svg branch 3 times, most recently from de4ec4d to 136bb84 Compare December 27, 2023 16:48

edwintorok added 6 commits December 27, 2023 18:54

hlint: fix some warnings

4aef107

log(svgToPng): log calls to rsvg-convert with --trace

0c88562

Signed-off-by: Edwin Török <[email protected]>

fix(docx): use proper DPI when creating fallback images

0f45ea1

Introduce getOrCreateFallback, and pass the desired size in points to rsvg-convert. Otherwise it'll guess the size based on the SVG's viewbox and completely ignore the DPI argument. Signed-off-by: Edwin Török <[email protected]>

test(docx): add PNG fallback generation test for SVG

d56f9bb

Just look at --trace output. Can't use a golden test because the actual .png will be different depending on rsvg-convert version. Signed-off-by: Edwin Török <[email protected]>

fix(docx): honour percentage widths for SVG images

438a451

Signed-off-by: Edwin Török <[email protected]>

edwintorok force-pushed the svg branch from 136bb84 to 438a451 Compare December 27, 2023 19:28

edwintorok added 2 commits December 27, 2023 20:04

fixup! refactor(svgToPng): introduce PandocMonad.svgToPng

39cc916

fixup! fix(docx): use proper DPI when creating fallback images

1e2dbf9

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docx: use proper DPI for fallback PNG #9288

docx: use proper DPI for fallback PNG #9288

edwintorok commented Dec 26, 2023

edwintorok commented Dec 26, 2023 •

edited

edwintorok commented Dec 27, 2023

jgm commented Dec 27, 2023

edwintorok commented Dec 27, 2023 •

edited

jgm commented Dec 27, 2023

jgm commented Dec 27, 2023

edwintorok commented Dec 27, 2023

edwintorok commented Dec 27, 2023 •

edited

jgm commented Dec 27, 2023

edwintorok commented Dec 27, 2023

docx: use proper DPI for fallback PNG #9288

Are you sure you want to change the base?

docx: use proper DPI for fallback PNG #9288

Conversation

edwintorok commented Dec 26, 2023

Solution

Workarounds:

edwintorok commented Dec 26, 2023 • edited

edwintorok commented Dec 27, 2023

jgm commented Dec 27, 2023

edwintorok commented Dec 27, 2023 • edited

jgm commented Dec 27, 2023

jgm commented Dec 27, 2023

edwintorok commented Dec 27, 2023

edwintorok commented Dec 27, 2023 • edited

jgm commented Dec 27, 2023

edwintorok commented Dec 27, 2023

edwintorok commented Dec 26, 2023 •

edited

edwintorok commented Dec 27, 2023 •

edited

edwintorok commented Dec 27, 2023 •

edited