Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump org.jsoup:jsoup from 1.17.2 to 1.18.2 #1244

Closed
wants to merge 1 commit into from

Conversation

dependabot[bot]
Copy link
Contributor

@dependabot dependabot bot commented on behalf of github Nov 27, 2024

Bumps org.jsoup:jsoup from 1.17.2 to 1.18.2.

Release notes

Sourced from org.jsoup:jsoup's releases.

jsoup 1.18.2

Improvements

  • Optimized the throughput and memory use throughout the input read and parse flows, with heap allocations and GC down between -6% and -89%, and throughput improved up to +143% for small inputs. Most inputs sizes will see throughput increases of ~ 20%. These performance improvements come through recycling the backing byte[] and char[] arrays used to read and parse the input. 2186
  • Speed optimized html() and Entities.escape() when the input contains UTF characters in a supplementary plane, by around 49%. 2183
  • The form associated elements returned by FormElement.elements() now reflect changes made to the DOM, subsequently to the original parse. 2140
  • In the TreeBuilder, the onNodeInserted() and onNodeClosed() events are now also fired for the outermost / root Document node. This enables source position tracking on the Document node (which was previously unset). And it also enables the node traversor to see the outer Document node. 2182
  • Selected Elements can now be position swapped inline using Elements#set(). 2212

Bug Fixes

  • Element.cssSelector() would fail if the element's class contained a * character. 2169
  • When tracking source ranges, a text node following an invalid self-closing element may be left untracked. 2175
  • When a document has no doctype, or a doctype not named html, it should be parsed in Quirks Mode. 2197
  • With a selector like div:has(span + a), the has() component was not working correctly, as the inner combining query caused the evaluator to match those against the outer's siblings, not children. 2187
  • A selector query that included multiple :has() components in a nested :has() might incorrectly execute. 2131
  • When cookie names in a response are duplicated, the simple view of cookies available via Connection.Response#cookies() will provide the last one set. Generally it is better to use the Jsoup.newSession method to maintain a cookie jar, as that applies appropriate path selection on cookies when making requests. 1831
  • When parsing named HTML entities, base entities should resolve if they are a prefix of the input token (and not in an attribute). 2207
  • Fixed incorrect tracking of source ranges for attributes merged from late-occurring elements that were implicitly created (html or body). 2204
  • Follow the current HTML specification in the tokenizer to allow < as part of a tag name, instead of emitting it as a character node. 2230
  • Similarly, allow a < as the start of an attribute name, vs creating a new element. The previous behavior was intended to parse closer to what we anticipated the author's intent to be, but that does not align to the spec or to how browsers behave. 1483

jsoup-1.18.1

https://jsoup.org/news/release-1.18.1

Improvements

  • Stream Parser: A StreamParser provides a progressive parse of its input. As each Element is completed, it is emitted via a Stream or Iterator interface. Elements returned will be complete with all their children, and an (empty) next sibling, if applicable. Elements (or their children) may be removed from the DOM during the parse, for e.g. to conserve memory, providing a mechanism to parse an input document that would otherwise be too large to fit into memory, yet still providing a DOM interface to the document and its elements. Additionally, the parser provides a selectFirst(String query) / selectNext(String query), which will run the parser until a hit is found, at which point the parse is suspended. It can be resumed via another select() call, or via the stream() or iterator() methods. 2096
  • Download Progress: added a Response Progress event interface, which reports progress and URLs are downloaded (and parsed). Supported on both a session and a single connection level. 2164, 656
  • Added Path accepting parse methods: Jsoup.parse(Path), Jsoup.parse(path, charsetName, baseUri, parser), etc. 2055
  • Updated the button tag configuration to include a space between multiple button elements in the Element.text() method. 2105
  • Added support for the ns|* all elements in namespace Selector. 1811
  • When normalising attribute names during serialization, invalid characters are now replaced with _, vs being stripped. This should make the process clearer, and generally prevent an invalid attribute name being coerced unexpectedly. 2143

Changes

  • Removed previously deprecated internal classes and methods. 2094

... (truncated)

Changelog

Sourced from org.jsoup:jsoup's changelog.

1.18.2 (2024-Nov-27)

Improvements

  • Optimized the throughput and memory use throughout the input read and parse flows, with heap allocations and GC down between -6% and -89%, and throughput improved up to +143% for small inputs. Most inputs sizes will see throughput increases of ~ 20%. These performance improvements come through recycling the backing byte[] and char[] arrays used to read and parse the input. 2186
  • Speed optimized html() and Entities.escape() when the input contains UTF characters in a supplementary plane, by around 49%. 2183
  • The form associated elements returned by FormElement.elements() now reflect changes made to the DOM, subsequently to the original parse. 2140
  • In the TreeBuilder, the onNodeInserted() and onNodeClosed() events are now also fired for the outermost / root Document node. This enables source position tracking on the Document node (which was previously unset). And it also enables the node traversor to see the outer Document node. 2182
  • Selected Elements can now be position swapped inline using Elements#set(). 2212

Bug Fixes

  • Element.cssSelector() would fail if the element's class contained a * character. 2169
  • When tracking source ranges, a text node following an invalid self-closing element may be left untracked. 2175
  • When a document has no doctype, or a doctype not named html, it should be parsed in Quirks Mode. 2197
  • With a selector like div:has(span + a), the has() component was not working correctly, as the inner combining query caused the evaluator to match those against the outer's siblings, not children. 2187
  • A selector query that included multiple :has() components in a nested :has() might incorrectly execute. 2131
  • When cookie names in a response are duplicated, the simple view of cookies available via Connection.Response#cookies() will provide the last one set. Generally it is better to use the Jsoup.newSession method to maintain a cookie jar, as that applies appropriate path selection on cookies when making requests. 1831
  • When parsing named HTML entities, base entities should resolve if they are a prefix of the input token (and not in an attribute). 2207
  • Fixed incorrect tracking of source ranges for attributes merged from late-occurring elements that were implicitly created (html or body). 2204
  • Follow the current HTML specification in the tokenizer to allow < as part of a tag name, instead of emitting it as a character node. 2230
  • Similarly, allow a < as the start of an attribute name, vs creating a new element. The previous behavior was intended to parse closer to what we anticipated the author's intent to be, but that does not align to the spec or to how browsers behave. 1483

1.18.1 (2024-Jul-10)

Improvements

  • Stream Parser: A StreamParser provides a progressive parse of its input. As each Element is completed, it is

... (truncated)

Commits
  • 71063c3 [maven-release-plugin] prepare release jsoup-1.18.2
  • 1a91aac Use the incoming node's parent if outgoing has already been removed
  • df404cf test case for Issue #2212
  • 28db617 Test for #1938
  • d27370a Follow spec so < can start an attribute name
  • 0ef4b70 Allow < in tag name state
  • 51909b1 Tweak HTML javadoc >
  • 91b5a56 Copy attribute source range when merging attributes
  • 5ee376b Entity decoding supports prefix matches
  • 708fc1f Make And and Or constructors public
  • Additional commits viewable in compare view

Dependabot compatibility score

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Bumps [org.jsoup:jsoup](https://github.com/jhy/jsoup) from 1.17.2 to 1.18.2.
- [Release notes](https://github.com/jhy/jsoup/releases)
- [Changelog](https://github.com/jhy/jsoup/blob/master/CHANGES.md)
- [Commits](jhy/jsoup@jsoup-1.17.2...jsoup-1.18.2)

---
updated-dependencies:
- dependency-name: org.jsoup:jsoup
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <[email protected]>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file java Pull requests that update Java code labels Nov 27, 2024
@TobiGr
Copy link
Contributor

TobiGr commented Nov 27, 2024

This still breaks a test (see #1189). Somebody needs to investigate if we just need to update the test, which I think is quite likely. The html escape was extend in a previous version.

YoutubeDescriptionHelperTest > testNoRuns() FAILED
    org.opentest4j.AssertionFailedError: expected: <abc *a* _c_ &lt;br&gt; <br> &lt;a href="#"&gt;test&lt;/a&gt; &nbsp;&amp;amp;> but was: <abc *a* _c_ &lt;br&gt; <br> &lt;a href=&quot;#&quot;&gt;test&lt;/a&gt; &nbsp;&amp;amp;>
        at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151)
        at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132)
        at app//org.junit.jupiter.api.AssertEquals.failNotEqual(AssertEquals.java:197)
        at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:182)
        at app//org.junit.jupiter.api.AssertEquals.assertEquals(AssertEquals.java:177)
        at app//org.junit.jupiter.api.Assertions.assertEquals(Assertions.java:1145)
        at app//org.schabi.newpipe.extractor.services.youtube.YoutubeDescriptionHelperTest.assertRunsToHtml(YoutubeDescriptionHelperTest.java:20)
        at app//org.schabi.newpipe.extractor.services.youtube.YoutubeDescriptionHelperTest.testNoRuns(YoutubeDescriptionHelperTest.java:36)

Copy link
Contributor Author

dependabot bot commented on behalf of github Dec 2, 2024

Superseded by #1245.

@dependabot dependabot bot closed this Dec 2, 2024
@dependabot dependabot bot deleted the dependabot/gradle/org.jsoup-jsoup-1.18.2 branch December 2, 2024 10:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file java Pull requests that update Java code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant