Position of the parse result offset in case of status_bad_start_element #606

lo-asys · 2024-01-23T13:19:48Z

Hi,
according to the docs, the offset field of a parse result points to the last successfully parsed character in the input data. In case of a status_bad_start_element, this seems to be the last scan position of the parser at the point the error was thrown.
I'd like to suggest a change here: The offset should point to the position of the opening '<' of the bad tag instead.
The specific use-case where this would be helpful is receiving a stream of XML messages over the network, where a single message may be split across multiple network packages like so:

P1: '<a x="y" /><b foo="bar" '
P2: ' baz="blob"></b><c />'

In this case, the receiver wants to store the substring containing the incomplete element b in package 1 and prepend it to the content of package 2 on the next iteration to fully parse it there.
Doing this would be much easier if pugixml reported the offset of the opening '<' here.

I'm not currently aware of other common usecases of the offset value in this error scenario (it's my first project using this library 😉), but if other users might find this helpful too, I'd be glad if you considered it.

Greetings, and thanks for your good work!

The text was updated successfully, but these errors were encountered:

lo-asys added the enhancement label Jan 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Position of the parse result offset in case of status_bad_start_element #606

Position of the parse result offset in case of status_bad_start_element #606

lo-asys commented Jan 23, 2024

Position of the parse result offset in case of status_bad_start_element #606

Position of the parse result offset in case of status_bad_start_element #606

Comments

lo-asys commented Jan 23, 2024