Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infra: improve mdbook preprocessing infrastructure #4135

Merged
merged 8 commits into from
Dec 5, 2024
Merged

Conversation

chriskrycho
Copy link
Contributor

@chriskrycho chriskrycho commented Dec 5, 2024

  • Pull all the mdbook preprocessing packages into a single mdbook_trpl package which contains multiple binaries. Once this lands here, we will need to integrate a change into rust-lang/rust, since there is now only the single crate instead of several.

  • Use anyhow for the preprocessor error reporting: mdbook already uses it under the hood, so this doesn’t cost us anything, and it gets us some niceties.

  • Decouple the mdbook preprocessors from mdbook’s own dependencies: they are not actually using mdbook directly, but instead are working with the text it supplies via pipes, so they can use whatever versions of tools make the most sense. Specifically, drop the dependency on mdbook and update to using more current versions of pulldown-cmark and pulldown-cmark-to-cmark.

This enables some meaningful improvements to the nostarch output as well as to the general rendering of Markdown tables (coming in future PRs).


  • Before merging, confirm that the integration in rust-lang/rust is smooth and without issue.

- Create an `mdbook_trpl` library package which hosts shared concerns
  for the packages, e.g. error and config handling.
- Move `mdbook-trpl-note` and `mdbook-trpl-listing` into the new shared
  package, with binaries at `src/bin/(note|listing)/main.rs` and the
  existing libraries at `src/note/mod.rs` and `src/listing/mod.rs` with
  their associated tests.
- Extract their actual shared pieces into the crate root.
- Update `tools/nostarch.sh` to build all the bins in `mdbook_trpl` at
  one time.

At the moment, this doesn't do a lot except trim down the number of
packages in the repository, but it sets things up nicely to support more
preprocessors (which I am going to add shortly).
mdbook already does this, and anyhow has nice error reporting.
Building on the shared infra from `mdbook_trpl`, this preprocessor takes
nicely accessible and semantic HTML input like this:

    <figure>
    
    <img src="https://www.example.com/something.jpg">
    
    <figcaption>Figure 12-34: An illustration of something</figcaption>
    
    </figure>
    
It produces output like this:

    <img src="https://www.example.com/something.jpg">
    
    Figure 12-34: An illustration of something

This matches what we need for the nostarch output. Accordingly, wire up
the `nostarch/book.toml` to use this. There is no need to worry about
ordering of the two preprocessors, because in `Simple` mode the listing
preprocessor emits plain text output for listings (much as this does for
figures).
Instead of using `mdbook:new_cmark_parser`, provide our own utility
function which does the same thing. This allows us to preprocess the
text using versions of `pulldown-cmark` and `pulldown-cmark-to-cmark`
different from those used by `mdbook` itself so long as we take care
that the configuration continues to match.
This (in principle) lets us get better support for round-tripping and
therefore less hoop-jumping to get the desired Markdown output after
preprocessing some text. This intentionally keeps the rendering the same
as it was on previous versions.
Preprocessors using `pulldown-cmark-to-cmark` do not yet perform round
trips 100% correctly, and insert leading spaces and an extra initial `>`
for block quotes, so strip those. Once that is fixed upstream, this will
become a no-op, and can be removed then.
Stop using the `pulldown-cmark` parser to rewrite the listings. Instead,
find the tags and parse them with the DOM parser, and then rewrite the
lines of the string directly.
This fixes escaping of `|` in a table. Bonus trivia: comment formatting
in the Cargo.toml for `mdbook_trpl`.
@chriskrycho chriskrycho merged commit 9900d97 into main Dec 5, 2024
6 checks passed
@chriskrycho chriskrycho deleted the mdbook-fixes branch December 5, 2024 17:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant