-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soupault's HTML prettifying doesn't preserve whitespace correctly #46
Comments
This is an interesting issue indeed... Intuitively, However, I agree that the current "always put tags on separate lines" approach is a bit heavy-handed and often produces a result that is the opposite of pretty. I'd be happy to work with the maintainer of lambdasoup to make it more flexible, but I suppose we'll have to wait for WHATWG's response regarding whitespace significance to know whether the current behavior should still be allowed or not. Meanwhile, you can disable pretty-printing with |
Correction: with I should probably also improve the docs for that section because right now all those options are lumped together in "Basic configuration" now, but the commented config sample with them is really huge. |
I don't think we have to wait to see what the browser vendors and spec body does with this issue. A functional HTML tokenizer and parser needs to keep the whitespace intact, this is very clear from the WHATWG spec. pretty_print_html=false definetly solves my issue, I also think it would be a better default. |
Is there actually an extra space This may actually be a browser issue. P.S. If you want specific formatting run a prettifier or a minifier on the code after Soupault generates the site. I was doing this with Zola before I found Soupault and it actually helped me catch a few errors in the framework I was using. I'd love to see asset pipeline or post-processing support in Soupault, though it's not really that difficult to just do those steps manually or in a shell script. |
@egrieco Since 4.0.0, you can use the "save" hook to take over the output writing stage. The only shortcoming is that there's no Lua function that would allow you to send a string to external filter's stdin... however, it's not hard to add, it's just that I haven't had a use case for it yet and no one else asked me to add it. If an HTML formatter supports modifying a file in-place, it's a non-issue, of course—you can just run it on the page file after writing it. I wonder if I should also add a separate "post-write" hook specially for these cases, though. |
Yeah, I hadn't gotten around to looking at if an "asset pipeline" could be implemented directly within Soupault. This would be useful to generate several sizes of images and potentially several formats to use in P.S. @dmbaturin Soupault is one of the coolest and most useful pieces of software I've run across in at least a decade. You really saved my students. I've been wondering how I was going to go from basic "intro to web dev" to a static site generator without a lot of needless pain. Almost all of the generators have some major flaw that contributes to severe friction or limitations in what sites can be built. I cannot thank you enough for Soupault. I have plenty more to say, but don't want to pollute this issue. :) |
@egrieco Maybe make a separate issue for discussions of post-processing. In fact, I do already have a plugin that handles assets in a non-trivial way: https://github.com/dmbaturin/iproute2-cheatsheet/blob/master/plugins/inline-assets.lua reads asset files and inlines them into the page (CSS and JS as is, images Base64 encoded). |
@dmbaturin Soupault just keeps getting better and better. :) I haven't been playing with Soupault for even a full day yet. I'm setting up several sites in it now. Let me get a better handle on what it can actually do so I don't file any spurious issues. In the meantime I sent you an email from my @egx.com address. My profound thanks for building Soupault. |
The following markdown document:
is converted by
pandoc -f markdown -t html -fmarkdown-implicit_figures --no-highlight
into:However, after soupault is done with parsing the output, the following HTML is produced:
This introduces another space after the period, which is visible in selections in Firefox, and does not have visible effect in Chrome. See also whatwg/html#8003
However, regardless of how browsers handle this, I think soupault should allow me to remove the trailing whitespace, and especially not mangle it by itself. Ideally, a HTML5 tokenizer should produce the same exact tokens before and after soupault has parsed and serialized the document.
The text was updated successfully, but these errors were encountered: