Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a way to bypass the options[:tags] whitelist #95

Merged
merged 2 commits into from
Nov 11, 2023
Merged

Add a way to bypass the options[:tags] whitelist #95

merged 2 commits into from
Nov 11, 2023

Conversation

tuzz
Copy link
Contributor

@tuzz tuzz commented Nov 9, 2023

Readability is primarily concerned with extracting text, not images. We are using readability to extract images by setting tags: %w[img] which preserves tags in the output HTML. However, this won’t work if the image is nested under a non-whitelisted node, e.g.

<figure>
  <img src=“…” />
</figure>

I think we basically just want to whitelist all nodes for extraction because our code already handles stripping out nodes it doesn’t care about. Therefore, add a mechanism to bypass the node whitelisting by setting tags: %w[*], i.e. a wildcard.

tuzz added 2 commits November 9, 2023 18:03
Readability is primarily concerned with extracting text, not images. We
are using readability to extract images by setting `tags: %w[img]` which
preserves <img> tags in the output HTML. However, this won’t work if the
image is nested under a non-whitelisted node, e.g.

```html
<figure>
  <img src=“…” />
</figure>
```

I think we basically just want to whitelist all nodes for extraction
because our SplitCleanService already handles stripping of nodes it
doesn’t care about. Therefore, add a mechanism to bypass the node
whitelisting by setting `tags: %w[*]`, i.e. a wildcard.
@cantino cantino merged commit cc1b0d2 into cantino:master Nov 11, 2023
@cantino
Copy link
Owner

cantino commented Nov 11, 2023

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants