-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Readbility's title #64
Comments
Hey @tybenz! Interesting idea. Do you want to work on a pull request for that? |
Yeah. I'd love to. I don't know enough about the scoring algorithm though. Wondering if you had any ideas on what a good start might be. |
No problem. I'd try to write a failing spec, then I'd take a look at |
Also, I want to get something straight. Is it true that you only ever score p tags, td tags, and their parents and grandparents? https://github.com/cantino/ruby-readability/blob/master/lib/readability.rb#L270-L271 Am I missing something? |
Sorry to necro this issue. Yes, that's right @tybenz, it only scores Today I opened a pull request to allow you to specify other nodes to score, such as |
Readability pulls its article title from the
title
tag right? Well more often than not, thetitle
tag has a whole lot of other information besides just the title of the article. It usually includes the title of the site itself and sometimes a category.I know the original readability script just grabbed the title, but I'm wondering if this version of the script can be modified to grab the actual title of the article from the markup. It seems as though the scoring system is set up to exclude the header tag that contains the article title.
Example:
In the above example, readability will always grab the content from
.article-content
and not the<article>
tag itself. What can I do to modify the script to grab the whole article, title and all?The text was updated successfully, but these errors were encountered: