-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Aws::Xml::Parser::ParsingError: xmlParseCharRef: invalid xmlChar value 8 #3081
Comments
Thanks for opening an issue. I will investigate. Are you able to get it working purely using the SAX parser in nokogiri? If not then I think this may be a bug report for that project. I saw your note about it parsing just fine but that appears to be a different parser (not using SAX)? |
Hmm your right, I just tried to recreate it using SAX: class MyDocument < Nokogiri::XML::SAX::Document
def start_element(name, attrs = [])
puts "Start element: #{name}"
end
def end_element(name)
puts "End element: #{name}"
end
end
good_xml_string = <<-XML
<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n
<SomeKeys>
<A>AAA</A>
<B>BBB</B>
<C>CCC</C>
</SomeKeys>
XML
Nokogiri::XML::SAX::Parser.new(MyDocument.new).parse(good_xml_string)
# Start element: SomeKeys
# Start element: A
# End element: A
# Start element: B
# End element: B
# Start element: C
# End element: C
# End element: SomeKeys
bad_xml_string = <<-XML
<?xml version=\"1.0\" encoding=\"UTF-8\" standalone=\"no\"?>\n
<SomeKeys>
<A>AAA</A>
<B>BB\bB</B>
<C>CCC</C>
</SomeKeys>
XML
Nokogiri::XML::SAX::Parser.new(MyDocument.new).parse(bad_xml_string)
# Start element: SomeKeys
# Start element: A
# End element: A
# Start element: B And the parser definitely fails after hitting that |
Reported here: sparklemotion/nokogiri#3299 |
As I mentioned in that bug report, I think the XML is simply malformed as backspace characters aren't allowed by the standard and libxml2 stopped parsing the input and reported an error when it encountered one. |
Is the \b character critical in your application? If it's causing issues in parsing, and since s3 is an xml service, perhaps avoid it as a key? |
I commented in the Nokogiri issue, but if y'all want to put the SAX parser into recovery mode (or give callers the option to do so), it's possible to set p = Nokogiri::XML::SAX::Parser.new(MyDocument.new)
p.parse(bad_xml_string) do |context|
context.recovery = true
end In this case, the parser will continue after calling the error callback. By default, however, parse errors are treated as fatal errors. |
I tried adding recovery and I'm not seeing that work. I started a PR here #3082 |
@mullermp Just to clarify: the Looking at the test failures it seems like this gem is likely raising an exception when the callback is invoked. Up to you what you want to do there, really. |
Ah. I see. You are correct. I'm actually not sure I should change this behavior - it doesn't seem consistent with the 5 other parsers. It seems like this is just "bad data" to begin with. Oga, Ox and Rexml seem to handle this case though. LibXML doesn't have a recovery option that I can see. |
After discussing with the team, I don't think it makes sense to handle this. It's inconsistent across all parsers and also with JRuby's nokogiri. I think it's best (and safest) not to handle this, and keys should not include this character. |
This issue is now closed. Comments on closed issues are hard for our team to see. |
Thanks for all the responses, and looking into it. Yea, I agree we should probably be scrubbing illegal XML characters, it's interesting that ActiveStorage also doesn't scrub any control characters except for https://api.rubyonrails.org/classes/ActiveStorage/Filename.html#method-i-sanitized ActiveStorage::Filename.new("\b").sanitized
=> "\b" |
Describe the bug
When trying to upload to a key with a
\b
in it, I get a parser error after upload.Here is the full trace:
I'm using the NokogiriEngine it seems:
Expected Behavior
When I put the same text through Nokogiri, it parses it just fine:
Current Behavior
The following error:
Reproduction Steps
The above is mimicking what this line is doing:
aws-sdk-ruby/gems/aws-sdk-core/lib/aws-sdk-core/xml/parser/nokogiri_engine.rb
Line 15 in ba75fe2
Possible Solution
Not sure
Additional Information/Context
Gem name ('aws-sdk', 'aws-sdk-resources' or service gems like 'aws-sdk-s3') and its version
aws-sdk-core (3.201.3)
Environment details (Version of Ruby, OS environment)
Ruby 3.2
The text was updated successfully, but these errors were encountered: