Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement multithreading for rendering and local search indexing, use JSDOM for better section split performance and reliability. #3386

Open
wants to merge 52 commits into
base: main
Choose a base branch
from

Conversation

zhangyx1998
Copy link
Contributor

@zhangyx1998 zhangyx1998 commented Dec 29, 2023

Before

  vitepress v1.0.0-rc.33

  ✓ building client + server bundles - 9.43 seconds
- ✓ rendering pages - 1 minute, 0.95 seconds
  ✓ generating sitemap - 0 seconds
  build complete in 1 minute, 10.56 seconds.
vp-single-thread-render

After

buildConcurrency = 10

  vitepress v1.0.0-rc.33

  ✓ building client + server bundles - 9.4 seconds
+ ✓ rendering pages - 42.26 seconds
  ✓ generating sitemap - 0.01 seconds
  build complete in 51.85 seconds.
vp-multi-thread-render

Challenges

In addition to vuetifyjs/vite-ssg@1f1663a, this PR also need to handle the problem of passing siteConfig to worker threads. This is very challenging because we cannot just import userConfig multiple times for each worker, since userConfig may have side-effects or internal state dependencies.

After some research, I found that there does not exist such a package for this unique challenge, therefore I created one (rpc-magic-proxy) for it. This utility will (1) serialize siteConfig into a pure static object so it can be sent through RPC channel, and (2) deserialize it on the worker side as a proxy. It will proxy all the function calls back to the main thread. With only one side effect being all the function calls have to be "awaited".

Thanks to the asynchronous coding paradigm that have already been taken in this project, I did not encounter any issue plugging this proxy into the render function.

One thing worth mentioning is that, due to the RPC proxy overhead, the speedup is not as significant as seen in #3374. But it still brought 30% speedup when building large sites, so it can be a nice-to-have option.

Comments and insights needed!

Please feel free comment on possible problems and/or ways to improve parallelism!

@zhangyx1998 zhangyx1998 marked this pull request as draft December 29, 2023 08:29
@zhangyx1998
Copy link
Contributor Author

For discussions general parallelism, please go to #3183

@zhangyx1998
Copy link
Contributor Author

@brc-dd Do you have time to look at this?

@brc-dd
Copy link
Member

brc-dd commented Dec 29, 2023

Ah, I'm currently looking into other PRs. Will get back to you on this in few days.

@zhangyx1998
Copy link
Contributor Author

Ah, I'm currently looking into other PRs. Will get back to you on this in few days.

Sure, thanks!

@zhangyx1998
Copy link
Contributor Author

Latest perf:

  vitepress v1.0.0-rc.33

  ✓ building client + server bundles - 9.05 seconds
+ ✓ rendering pages - 25.24 seconds
  ✓ generating sitemap - 0.01 seconds
  build complete in 34.5 seconds.

This is more than 50% speedup compared to single thread (60.95s). I think this is good enough to be included as an experimental feature (not enabled by default).

@zhangyx1998 zhangyx1998 marked this pull request as ready for review December 30, 2023 16:22
@brc-dd
Copy link
Member

brc-dd commented Dec 31, 2023

The bundle step

await task('building client + server bundles', async () => {
clientResult = config.mpa
? null
: ((await build(await resolveViteConfig(false))) as Rollup.RollupOutput)
serverResult = (await build(
await resolveViteConfig(true)
)) as Rollup.RollupOutput
})
can be made async too. We should do something about the cache in
const cache = new LRUCache<string, MarkdownCompileResult>({ max: 1024 })
though. In parallel client and server bundles, it doesn't get hit much. Also, LRU doesn't seem like a good idea there, as even in sync bundles, it won't get hit at all if there are 2048 or more pages.

@zhangyx1998
Copy link
Contributor Author

Is there a repo I can use for test? My own project do not have heavy deps so I have no way to test it.

@zhangyx1998
Copy link
Contributor Author

One low hanging fruit could be running two builds in two workers. But I do not know if it would really help.

@zhangyx1998
Copy link
Contributor Author

zhangyx1998 commented Jan 4, 2024

Never mind, I found the cause: a global variable is used to pass around site config.

Not exactly the cause of the problem above, I've done many other tricks to make vite happy in a worker thread.

@zhangyx1998 zhangyx1998 changed the title feat: implement multithreading for several time consuming tasks feat: implement multithreading for rendering and local search indexing Jan 8, 2024
@zhangyx1998 zhangyx1998 changed the title feat: implement multithreading for rendering and local search indexing feat: implement multithreading for rendering and local search indexing, use JSDOM for better section split performance and reliability. Jan 8, 2024
@zhangyx1998
Copy link
Contributor Author

@brc-dd I've reverted changes related to parallel bundling. Everything else is stable and ready for review.

Regarding parallel bundling, I've tried several different approaches (as you can see in the commit history), but each of them has their own problems.

  1. Re-importing user config in every worker is not an option, because user config may have side effects or internal states. Therefore, userConfig can only be imported once in the main thread and be exposed to workers by rpc-magic-proxy.

  2. markdown-it does not support async hooks, but all functions in user-config, including user defined markdown hooks, will be converted to async (as explained in 1). This will break markdown-it when run in worker threads with user provided hooks. Therefore, markdown-it can only live in main thread.

  3. Vite/Rollup has some issue resolving node imports in worker threads. Although this can be patched using a helper plugin, I am not sure if the patch actually fixed the problem or just hide it away.

I've developed some tricks to get around the problems above, but those tricks are hacky and unstable. Also, forcing markdown renderer to stay in main thread will produce heavy RPC overhead, making it even slower than single thread.

I think it's better to create another PR dedicated for parallel bundling, when there is actually someone complaining about bundling performance.

@zhangyx1998
Copy link
Contributor Author

BTW, local search indexing now only takes about 10 seconds for 1700 pages. It previously took 4.7 hours.

@brc-dd brc-dd self-assigned this Jan 28, 2024
@github-actions github-actions bot added the stale label Mar 7, 2024
@github-actions github-actions bot removed the stale label Apr 28, 2024
@Machineric
Copy link

This PR looks very promising. Any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants