Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add some kind of edition tracker inside the generated EPUBs #46

Open
domenic opened this issue Nov 6, 2024 · 2 comments
Open

Add some kind of edition tracker inside the generated EPUBs #46

domenic opened this issue Nov 6, 2024 · 2 comments

Comments

@domenic
Copy link
Owner

domenic commented Nov 6, 2024

It'd be nice, if one sees a worm-scraper-produced EPUB in the wild, to know what version it is, and therefore how many of the text fixups it has.

The ideal version of this would only count changes that cause different EPUB outputs, but I don't think that's feasible without error-prone manual updates. So we'll probably just inject the worm-scraper version.

@domenic
Copy link
Owner Author

domenic commented Nov 6, 2024

If we can get a date instead of a version number, then at least in EPUB 3, it looks like <meta property="dcterms:modified">...</meta> is the way to go here.

We should also probably include publication date.

domenic added a commit that referenced this issue Nov 6, 2024
As part of this, a few invisible structural improvements:

* Use the same cover HTML everywhere, instead of one per book.
* Include each chapter's publication date as microdata in each output chapter.
* Move each chapter's original URL from a HTML comment to microdata.

And a few possibly-visible improvements:

* Include the publication date for each book in the EPUB's metadata. (It's set to the last chapter's publication date.)
* Add a last-modified date to the EPUB metadata, equal to the date the EPUB was generated. This might suffice for #46, but you could also imagine something better...
* Add a landmark for the beginning of the content, which should allow some readers to skip past the cover when desired.
* Stop marking the cover as "auxiliary", which makes sure the cover appears in certain viewers (such as Calibre).

Fixes #45.
@domenic
Copy link
Owner Author

domenic commented Nov 6, 2024

As of fba981f there is a dcterms:modified giving the date that the edition was built.

This isn't fully satisfactory, because we would ideally want to output the same version/modified date for multiple people building at separate times, as long as the contents are the same.

Possible improvements:

  • Embed multiple dates, one for the source text and one for the worm-scraper update, only allowing one of them to be the dcterms:modified that will show up in (some) ebook readers.
    • It's still kind of annoying to get the worm-scraper publication date. I guess we could query the npm API?? That's a bit silly though... A prepublish hook that adds the date in a text file, maybe, is the way to go.
  • Don't use dates, at least for the worm-scraper version; just use invisible custom metadata corresponding to the package version number.

Rejected ideas:

  • max(latest date any source text is updated, latest publication date of worm-scraper).

    • Rejected because this doesn't work well in edge cases. E.g. if a worm-scraper version is published 2024-11-06 and another 2024-11-08, and then the source text is updated 2024-11-10, both the old and newer worm-scrapers will produce the same date (2024-11-10). Similarly for if there's a worm-scraper update after source text modification but people are holding on to old source text.
  • Use neither dates nor version numbers, but instead content hashes.

    • Rejected because content hashes aren't comparable so you can't tell whether a produced EPUB is "better" than another.
  • Try to combine both worm-scraper version info and content version info into a single version string, or pseudo-date.

    • Rejected because this is a fundamentally two-dimensional problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant