You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Running readability on the HTML from https://100wordstory.org/submit/, I expected more markup to remain than readability leaves intact.
Expected
Observed
In the screenshot above, the following content is stripped out:
the red "Submit" heading:
<h1class="titles"><ahref="https://100wordstory.org/submit/" rel="bookmark" title="SubmitPermanent Link to ">Submit</a></h1>
the red "Submissions are now open through January 9, 2024" and "Submit!" headings and links:
<h2style="text-align: center;"><ahref="https://100wordstory.submittable.com/submit">Submissions are now open through January 9, 2024!</a></h2><h2style="text-align: center;"><ahref="https://100wordstory.submittable.com/submit">Submit!</a></h2>
Turning on debug: true doesn't seem to cite why these items are missing:
% readability -d https://100wordstory.org/submit/
/Users/avk/.rvm/gems/ruby-2.7.8@wbm/gems/ruby-readability-0.7.0/bin/readability:31: warning: calling URI.open via Kernel#open is deprecated, call URI.open directly or use URI#open
Removing unlikely candidate - magnific_popup-css
Removing unlikely candidate - nav superfishmenu-100-word-story-menu
Removing unlikely candidate - menu-item menu-item-type-post_type menu-item-object-page menu-item-73menu-item-73
Removing unlikely candidate - menu-item menu-item-type-post_type menu-item-object-page current-menu-item page_item page-item-6 current_page_item menu-item-72menu-item-72
Removing unlikely candidate - menu-item menu-item-type-post_type menu-item-object-page menu-item-83menu-item-83
Removing unlikely candidate - menu-item menu-item-type-post_type menu-item-object-page menu-item-189menu-item-189
Removing unlikely candidate - menu-item menu-item-type-post_type menu-item-object-page menu-item-70menu-item-70
Removing unlikely candidate - header
Removing unlikely candidate - comments
Removing unlikely candidate - commentlist clearfix
Removing unlikely candidate - comment even thread-even depth-1 parentcomment-65
Removing unlikely candidate - comment-author vcard
Removing unlikely candidate - comment-meta commentmetadata
Removing unlikely candidate - comment byuser comment-author-100words bypostauthor odd alt depth-2comment-66
Removing unlikely candidate - comment-author vcard
Removing unlikely candidate - comment-meta commentmetadata
Removing unlikely candidate - comment byuser comment-author-100words bypostauthor even thread-odd thread-alt depth-1comment-57
Removing unlikely candidate - comment-author vcard
Removing unlikely candidate - comment-meta commentmetadata
Removing unlikely candidate - comment odd alt thread-even depth-1comment-56
Removing unlikely candidate - comment-author vcard
Removing unlikely candidate - comment-meta commentmetadata
Removing unlikely candidate - comment even thread-odd thread-alt depth-1comment-52
Removing unlikely candidate - comment-author vcard
Removing unlikely candidate - comment-meta commentmetadata
Removing unlikely candidate - sidebar-wrapper
Removing unlikely candidate - sidebar
Removing unlikely candidate - sidebar-box widget_blockblock-3
Removing unlikely candidate - widget_text sidebar-box widget_custom_htmlcustom_html-2
Removing unlikely candidate - sidebar-box widget_texttext-3
Removing unlikely candidate - sidebar-box widget_texttext-4
Removing unlikely candidate - sidebar-box widget_texttext-7
Removing unlikely candidate - sidebar-box widget_linkslinkcat-10
Removing unlikely candidate - footer
Altering div(#pages.) to p
Altering div(#.) to p
Altering div(#.) to p
Altering div(#.) to p
Altering div(#.) to p
Altering div(#.) to p
Altering div(#.) to p
Altering div(#.) to p
Top 5 candidates:
Candidate div#.post-wrapper with score 51.935052531041066
Candidate div#left-div. with score 16.71186440677966
Best candidate div#.post-wrapper with score 51.935052531041066
Conditionally cleaned div#.addtoany_share_save_container addtoany_content addtoany_content_bottom with weight 25 and content score 0 because it has too short a content length without a single image.
Conditionally cleaned div#.a2a_kit a2a_kit_size_24 addtoany_list with weight 0 and content score 0 because it has too short a content length without a single image.
Conditionally cleaned div#.recentposts with weight 25 and content score 0 because it has too short a content length without a single image.
<div><div><p>100 words for your story … no more or no less. Tell a story, pen a slice of your memoir, or try your hand at an essay.</p><p>You get 100 words—exactly 100 words—which is both the pain and the pleasure here. It’s short, you tell yourself. You could write 100 words at a bus stop, on your lunch break, in your sleep. But with 100 words you must tell the whole story in its entirety, so it holds together like a perfect little doll house. (Your title is not part of the 100 words.)</p><p>Please include a short bio (25 words, max!) with your submission. Also, did we say exactly 100 words? We weren’t kidding! We count words according to Microsoft Word’s word-count tally. Also, make friends with your spell-check, or have a friend proofread your story.</p><p>We currently charge a $2 submission fee, the minimum in order to cover the costs of the submission system.</p><p></p><p></p></div></div>
Any ideas on how to broaden or include this content?
The text was updated successfully, but these errors were encountered:
This is the same code as was in the original readability.js from which this was ported, I think. You could parameterize it if you want to make it more flexible.
Thanks for your work on this neat gem.
Running readability on the HTML from https://100wordstory.org/submit/, I expected more markup to remain than readability leaves intact.
Expected
Observed
In the screenshot above, the following content is stripped out:
Turning on
debug: true
doesn't seem to cite why these items are missing:Any ideas on how to broaden or include this content?
The text was updated successfully, but these errors were encountered: