Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Add Support for lxml >= 5.2.0 #217

Merged

Conversation

michael-genson
Copy link
Contributor

@michael-genson michael-genson commented Apr 1, 2024

lxml 5.1.0 removed _ElementStringResult which we use during dom traversal. This PR attempts to import _ElementStringResult as usual, but falls back to defining it explicitly if the import fails. This fixes #215

I chose to attempt to import first to preserve existing behavior if an older version of lxml is installed.

Additionally, we use the lxml HTML cleaner, which is now in its own separate package. Due to this I've added the optional html_clean package to the requirements.txt. As noted below, this raises a warning if using lxml < 5.2.0, but otherwise doesn't impact older versions.

@michael-genson michael-genson changed the title fix: Add Support for lxml >= 5.1 fix: Add Support for lxml >= 5.2.0 Apr 1, 2024
extruct/xmldom.py Outdated Show resolved Hide resolved
requirements.txt Outdated Show resolved Hide resolved
@michael-genson
Copy link
Contributor Author

Looks like the test failures are related to mf2py, not lxml. Looking into whether we should update the tests (probably) or pin mf2py < 2 (probably not)

@michael-genson
Copy link
Contributor Author

michael-genson commented Apr 4, 2024

mf2py released v2 a few months ago which has a single breaking change: it includes image alts in microdata by default (see changelog in release). The reasoning for this is in this mf2py issue, but TL;DR the image alt has been part of the microformat spec for several years, and not including it was considered deprecated behavior (and now is removed entirely).

To accommodate this I've updated the tests, though it's not explicitly related to the lxml update.

@wRAR wRAR closed this Apr 5, 2024
@wRAR wRAR reopened this Apr 5, 2024
extruct/xmldom.py Outdated Show resolved Hide resolved
Co-authored-by: James Addison <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Package breaking due to change in lxml
4 participants