Releases: domenic/worm-scraper
Releases · domenic/worm-scraper
4.9.0
General content fixes, applied to both Worm and Ward:
- Hyphenated self-preservation, vat-grown, shell-shocked, a just-in-case, dog-tired, one-sided, medium-sized, teary-eyed, and worst-case scenario.
- Hyphenated second-guess, built-in, face-to-face, and fight-or-flight when appropriate.
- Fixed hyphenation code to work even when the phrase is capitalized.
- Removed over-capitalization of judo, aikido, and tae kwon do.
- De-italicized some commas when two words are italicized in a row.
Ward-specific content fixes:
- Spot fixes through Last 20.end (i.e., the end of the book).
- Always capitalize "Uncle Neil" and "Aunt Fleur".
- De-capitalized "flock" since lowercase was much more prevalent.
- De-capitalized "giants", but always capitalized "Mathers Giant", "Mother Giant", and "Goddess Giant".
4.8.0
General content fixes, applied to both Worm and Ward:
- Hyphenated self-esteem, self-loathing, self-harm, level-headed, and clear-cut.
- Hyphenated hand-to-hand when used as an adjective.
- Removed hyphens from "high five" and "fist bump", except where they were used as verbs.
- Removed over-capitalization of "parahumans".
- Removed more non-breaking spaces, which would manifest as lines wrapping strangely or as sentences followed by too many spaces.
Ward-specific content fixes:
- Spot fixes through Infrared 19.4.
- Standardized on always capitalizing "Titan" and "Titans" after Sundown 17.y (and before when used as part of a name, e.g. "Kronos Titan").
- Fixed capitalization of "Stranger Titan" and "the Stranger"; they were being erroneously de-capitalized by the existing PRT designation de-capitalization code.
- Fixed misspellings of "Tattletale" as "Tatteltale".
- Always capitalize the "Aunt" in "Aunt Sarah".
- Always capitalize "Fragile One".
- Always capitalize "Machine Army".
4.7.0
General content fixes, applied to both Worm and Ward:
- Standardized on "Jotun", replacing some instances of "Jotunn".
- Standardized on "Juliette", replacing some instances of "Juliet".
- Standardized on "Dragon-craft" and "Dragon-mech", replacing some instances that were missing the hyphen or capitalization.
- Standardized on "A.I." instead of "AI".
- Fixed the possessive of Marquis.
- Hyphenated spelled-out numbers from 21 through 99.
- Hyphenated "self-conscious" and derivatives.
- Hyphenated compound words ending in "-haired".
- Hyphenated compound words ending in "-dimensional".
- Hyphenated "on-on-one".
- Hyphenated "day-to-day" when appropriate.
- Removed hyphenation around "hundred" and "percent".
- Always capitalize "Nazi" and "English".
- Capitalized more instances of "Earth" when referring to the planet.
- Removed over-capitalization of "english muffin" and "french toast".
- Removed over-capitalization of "church".
- Removed over-capitalization of "corona pollentia", "radiata", and "gemma".
- Fixed capitalization and apostrophes for "’Cage" (as in Birdcage).
- Fixed a few incomplete or misplaced ellipses.
- Fixed end-of-line commas that should be periods.
- Removed extra spaces before closing quotation marks.
- Removed italicization from commas.
- Italicized question marks in single-question-word sentences.
- Fixed various instances of opening quotes being used where it should be an apostrophe.
- Replaced some hyphen-minuses with em dashes when they preceded a question mark.
Ward-specific content fixes:
- Spot fixes through From Within 16.z.
- Standardized on "Crock o’ Shit", replacing some instances of "Crock o’Shit" or "Croc o’Shit".
- Fixed the possessive of Semiramis.
- Always capitalize "Dauntless Titan"
- Always capitalize "the Bunker".
- Always capitalize "U-turn".
- Fixed capitalization and apostrophes for "’Lace" (as in Anelace).
- Removed over-capitalization of "aunt" and "uncle".
- Removed over-capitalization of season names.
- Removed over-capitalization of "math".
- Un-did conversion of color-based team names (like "Team Green-Black") to en dashes; they're more like compound adjectives, so we restore their hyphen-minuses.
4.6.1
v4.6.0
Program improvements:
- Improved the conversion step's resilience in the face of busy filesystems (e.g. due to antivirus scanners).
- The conversion step now displays the amount of time it takes.
General content fixes, applied to both Worm and Ward:
- Standardized on the style "Case Fifty-Three", instead of the other styles used (which include, but are not limited to, "Case-53", "case 53", "case-fifty-three", "Case Fifty Three", ...). Similarly for other PRT cases.
- Standardized on "Mrs. Yamada", replacing many instances of "Ms."
- Removed over-capitalization of "university".
- Fixed a variety of missing periods between sentences.
- Fixed many erroneously-italicized exclamation points, closing quotation marks, and commas.
- Fixed backward closing single quotation marks.
- Fixed a large variety of incorrect closing double quotation marks.
Ward-specific content fixes:
- Spot fixes through Heavens 12.4.
- Standardized on "Amias", replacing a few instances of "Amais".
- Capitalized "Heartbroken" when it's used as a proper noun.
- Fixed a couple instances of "Hardboil" to be "Hard Boil"
- Fixed instances of "Warden’s" which should be "Wardens’"
- Always lowercased "headquarters" when talking about "the Wardens’ headquarters".
- Fixed the capitalization and apostrophe placement for the truncated names "’Piece", "’Joint", and "’Tend"
- Fixed double periods.
- Fixed inconsistently-bolded colons when denoting text conversation senders.
4.5.0
Program improvements:
- Fixed the output of incorrect text replacement warnings to no longer get garbled with the progress bar. (You generally won't see this, unless you are developing this scraper, or if wildbow updates the source text.)
General content fixes, applied to both Worm and Ward:
- Restored the author's original scene breaks, which were "■" for Worm, and either "⊙", "☽", or "⊙ ⊙ ⊙ ⊙ ⊙" for Ward. (One instance of "⊙⊙" was assumed erroneous and corrected to a single "⊙".) Previously "■" and "⊙" were being replaced with horizontal rules.
- Changed a few instances of "T-shirt" to "t-shirt"; the latter is overwhelmingly more common.
- Fixed missing spaces after commas.
- Fixed a variety of misplaced, backward, or extraneous quotation marks.
- Correctly hyphenated "X-year-old"; usually it was "X year old", but all of "X-year-old", "X year-old", and "X-year old" also appeared.
Ward-specific content fixes:
- Spot fixes through Polarize 10.x.
- Fixed a few instances where "the Pharmacist" was not capitalized, even after transitioning from a profession to a cape name.
4.4.0
General content fixes, applied to both Worm and Ward:
- Always capitalize "Earths" when referring to other worlds.
Worm-specific content fixes:
- Always capitalize "The Clairvoyant". The previous fixup pass missed cases where his name started a sentence.
Ward-specific content fixes:
- Spot fixes through Beacon 8.12.
- Always capitalize "the Megalopolis".
- Fixed instances of a hyphen-minus followed by a comma (i.e. -,), mostly replacing them with em dashes.
4.3.0
Program improvements:
- Used parallelism during the conversion process, so it should be significantly faster on multi-core computers (i.e. all modern computers).
- Introduced a progress bar during the conversion process, instead of printing out each chapter filename as it is converted.
- Removed an unnecessary dependency (
xmlserializer
) and updated other dependencies.
General content fixes, applied to both Worm and Ward:
- Removed the capitalization of the "master" PRT designation, like all the others.
- Fixed a few instances of "P.R.T." to be "PRT", which is overwhelmingly more common.
- Fixed the common misspelling of "shoulder blade" as "shoulderblade".
- Fixed the common misspelling of "preemptive(ly)" as "pre-emptive(ly)".
- Fixed a variety of uncapitalized sentences.
- Fixed more cases of extra spaces after sentences.
- Fixed missing commas and periods at the end of quotations.
- Fixed various dash issues.
- Fixed Ward's indentation (for blockquote-type paragraphs) to match Worm's, at 30px, instead of mostly being 40px but sometimes 30px.
Ward-specific content fixes:
- Spot fixes through Torch 7.x.
- Fixed a few places where the letters "tv" were over-capitalized by the converter, e.g. in "outvoted".
- Ensured all train station names are capitalized (e.g. "Norwalk station" became "Norwalk Station").
- Settled on lowercasing "kiss and kill"; it was inconsistent.
- Ensured that all instances of "Patrol", when discussing the proper noun and its derivatives, are capitalized. This partially reverses the change made in v4.2.0 to standardize on "patrol block".
- Replaced the hyphen-minus with an en dash for another joint name, G–N.
- Fixed missing periods at the end of sentences.
4.2.0
General fixes, applied to both Worm and Ward:
- Capitalized "Dad" and "Mom" when used as names.
- Capitalized a couple of "Birdcage" instances.
- Added a hyphen to all instances of "able-bodied".
- Fixed various erroneous repeated words.
- Fixed extra spaces after periods.
- Fixed missing spaces after periods.
- Fixed some hyphen-minuses that should be em dashes, at the beginning of italicized quotes.
- Converted hyphen-minuses to en dashes for joint names.
- Fixed some periods that were over-italicized. (This isn't noticeable, but it bugs me.)
Worm-specific fixes:
- Fixed the possessive of "Chuckles" to be "Chuckles’s" instead of "Chuckles’"
Ward-specific fixes:
- Standardized on "patrol block" instead of "Patrol block" or "Patrol Block". All three were in use, but the former was most widely used.
- Fixed the over-capitalization of "clairvoyants", which was introduced by worm-scraper for Worm but backfired for Ward.
- Fixed some instances of "Cedar point" and "Hollow point" to be "Cedar Point" and "Hollow Point", respectively.
- Fixed some instances of "Resound" to be "ReSound".
- Fixed the possessive of various new-in-Ward names ending with "s".
- Spot fixes through Shadow 5.8.