Replacement for normalizeWhitespace option? #3548
-
I have inherited a project using cheerio 1.0.0-rc.10 which relies on the normalizeWhitespace option when processing XML documents. It appears more recent versions of cheerio have eliminated this option. Is there a way to achieve white space normalization with more recent versions? The 1.0.0-rc.10 version has a small flaw in whitespace normalization in xmlMode -- it normalizes Unicode non-breaking space characters away. However I can't upgrade without a replacement for this functionality. The reason we use this is that our XML documents include a lot of stuff that is essentially HTML content, for example paragraphs and lists and images. Authors will spread paragraph text over multiple lines or skip lines or indent things for readability just as in HTML. We want our processor to receive these parsed as if HTML, with whitespace normalized to a single space. But the documents themselves are custom XML, not HTML. |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
The reason this feature was removed was that it didn't cover a lot of edge-cases in HTML. Eg. You probably want to use a library such as html-minifier-terser that properly implements white space compression. |
Beta Was this translation helpful? Give feedback.
-
OK, thanks. Yes, if we choose to upgrade cheerio we could add another pass to normalize whitespace before processing. But I will probably just stick with rc.10. I've already worked around the rc.10 bug handling NBSP chars. Our tool processes XML files, not HTML, and the normalizeWhitespace setting does exactly what we want (but for the glitch on NBSP), so this change definitely broke something we were relying on for XML processing with cheerio. |
Beta Was this translation helpful? Give feedback.
The reason this feature was removed was that it didn't cover a lot of edge-cases in HTML. Eg.
<pre>
tags should not have their white space compressed.You probably want to use a library such as html-minifier-terser that properly implements white space compression.