Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra div added (is it expected?) #66

Open
thbar opened this issue Mar 18, 2014 · 4 comments
Open

Extra div added (is it expected?) #66

thbar opened this issue Mar 18, 2014 · 4 comments

Comments

@thbar
Copy link

thbar commented Mar 18, 2014

First, thanks for your work on readability :-)

Just a quick feedback (I'm not a heavy user myself): while upgrading an old setup today, I noticed that a raw content is now wrapped into two levels of divs:

1.9.3-p484 :003 > Readability::Document.new("My content").content
 => "<div><div><p>My content</p></div></div>" 

while previously (2-year old version) was returned as:

 => "<div><p>My content</p></div>" 

Is it expected? I understand that this specific test-case is a bit unrelatistic (not tags at all), but wondered if there could be other similar issues with properly formatted html.

@cantino
Copy link
Owner

cantino commented Mar 18, 2014

Good question. I wonder which changes caused that. I'm not actively working on Readability these days, but am always willing to vet pull requests from interested contributors.

@cqcn1991
Copy link

cqcn1991 commented May 1, 2016

Yes, I'm having the same problem.
Is there any thing that I can do to fix this?

@borama
Copy link

borama commented May 1, 2016

I am not sure if this is expected or not and if there is anything to fix but I found the following:

the two <div> tags are added in the get_article method. The method first always wraps the found article with a <div> (here). Then, it copies all children tags of the found article and if the article itself is a different tag than <p> or <div>, it changes the tag to <div> (here). Because your article node, i.e. the parent node of the single paragraph in your input html, is the <body> tag, it is changed to a <div> tag, effectively resulting in two <div>s in the output.

@cantino
Copy link
Owner

cantino commented May 30, 2016

Thanks @borama, I'm open to a PR with a fix if you're diving into it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants