Replies: 2 comments
-
Hi @headius! Once upon a time Sanitize did use Nokogiri alone to parse HTML. But while Nokogiri can parse HTML sufficiently for some use cases, it's limited (at least in MRI) to the parsing capabilities of libxml, which is fundamentally an XML parser, not an HTML parser. With the rise in popularity of HTML 5 (now the living HTML standard), Nokogiri's non-standard HTML parsing behavior isn't sufficient to meet Sanitize's needs. Sanitize needs a fast, standards-compliant HTML parser in order to ensure that it sees and handles HTML exactly the same way as the browsers that will end up consuming the HTML Sanitize generates. Apart from compatibility concerns, this is also a security concern: if it were possible to craft malicious HTML in such a way that it could exploit a quirk in Sanitize's HTML parser to bypass Sanitize's transforms while still being parsed correctly by a standards compliant browser, then Sanitize could be tricked into generating unsafe HTML. When I made the decision to use Gumbo (via nokogumbo) as Sanitize's HTML parser, it was the only available standards compliant HTML parser for Ruby. As far as I know it still is, although I haven't surveyed the territory lately to see if something new has appeared. I'd be open to considering alternative parsers if they exist, but compliance with the HTML parsing spec is a hard requirement. |
Beta Was this translation helpful? Give feedback.
-
It seems like the jsoup library for Java would be a good analog, and it could be called directly from Ruby using JRuby. This may be more appropriate as a backend for nokogumbo, but I wanted to point it out here since this bug is still open. |
Beta Was this translation helpful? Give feedback.
-
Sanitize currently depends on the "nokogumbo" library. Unfortunately this library is a tiny C extension wrapping the Gumbo library, and so sanitize can't be used on JRuby right now.
The nokogumbo ext could easily be FFI-based, so I will file an issue for that. But perhaps there's another library (or nokogiri itself) that can do what you need?
Beta Was this translation helpful? Give feedback.
All reactions