Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: faster handling of static paths #2148

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

skirtles-code
Copy link
Contributor

@skirtles-code skirtles-code commented Feb 20, 2024

Background

Original issue: #2132. In short, the performance of adding a large number of routes is slow, and is particularly problematic with SSR.

This PR attempts some performance improvements. While the changes are currently unpolished and experimental, I believe these changes are 'realistic', in the sense that they could be tidied up and included without making breaking changes.

These changes are separate from the caching and binary search ideas that have already been suggested. Each strategy has its pros and cons and they each target slightly different use cases.

My priority was just runtime performance, not clean code, maintainability, or bundle size. Balancing those other concerns can come later. I'm currently looking to share and discuss the main ideas, rather than worrying about the minutiae of the implementation.

Overview of the changes

I started coming at this from a slightly different angle, investigating the performance of route resolution rather than addition. I figured that iterating through the array of routes and testing each one against the path might be slow when the number of routes is large. On pages with a lot of links this could happen many times.

After some profiling, I had two main ideas for how to speed this up:

  1. Testing using a RegExp might not be the most efficient approach, especially for routes that are effectively static (i.e. no params).
  2. Route paths often begin with a static section. e.g. The Nuxt i18n module prefixes route paths with the locale, e.g. /en/foo. If we're trying to match /de/bar then we can ignore all the paths beginning with /en without needing to check each one individually.

Roughly speaking, these are the two changes that are implemented here. There is some overlap between them, as they both require some initial analysis of the route path to establish which portions are static. Thankfully, most of that work is already done by tokenizePath(), and that step is already very fast.

While these changes were motivated by investigating route resolution, they also significantly improve the performance of route addition.

The changes in detail

The matcher tree

The idea here is inspired by Eduardo's suggestion here to use a trie. Whether this is actually what he had in mind I'm not sure, but I think it's a similar concept.

Paths often start with a static section. e.g. in /en/products/:id, the /en/products portion is static. This allows us to create a tree structure to store the route, with a node for en and a child node for products.

When we want to match a path, say /en/products/1234, we can walk that tree to find candidate routes. There may still be multiple routes that need to be checked, but we can immediately skip routes in different parts of the tree, such as those that begin with /de, or those beginning /en/users.

This also speeds up the route creation process. Routes no longer need inserting into a massive array. Instead, each node of the tree holds a much smaller array of routes, so the insertion point can be found much quicker.

There are various complications, such as case sensitivity. I'm currently assuming that calling toUpperCase() is sufficient to achieve matching equivalent to the i flag of a RegExp. This is based on something I read here, but it would need further verification to confirm it gives consistent casing.

While this approach does seem to be really fast, it does lead to a lot of extra garbage collection. This can lead to spikes in the performance when the garbage collector kicks in. This is understandable, as the data structures involved are much more complicated. It needs further investigation to understand whether this can be avoided or if it's really a problem worth worrying about.

Further work may be needed on the getRoutes() method. This can no longer simply return the matchers array, as it doesn't exist. I currently have it building a new array each time it's called. It may need some form of caching.

Exact path matching

For routes that are entirely static, i.e. that have no params, it's possible to bypass the tree mentioned above and just store them directly in an object as a quick lookup. Case sensitivity still needs taking into account, but it's still really fast for both adding routes and resolving routes. Keeping these static routes out of the tree can greatly reduce the size of the tree.

Parsing static paths

tokenizePath() parses the route path into tokens. It's already really fast, so I've left that alone.

Those tokens are then passed to tokensToParser(), which does a lot of stuff. For entirely static paths, without any params, I believe this is doing a lot of unnecessary work.

I've introduced staticPathToParser(), which creates a simpler implementation of PathParser. This is only used in cases where the path is entirely static. It skips most of the work. For example, rather than using a RegExp to check the path it uses a === check instead (also taking account of the sensitive and strict options as required). This not only makes the addition of routes faster, it also makes the parser itself more efficient.

Further work

All of these changes assume a static prefix to the path. If an application has a dynamic section at the start of all its paths then these changes won't help at all. There needs to be a reasonable spread within the tree for there to be any benefit. In those problematic cases, the binary search proposal would still be beneficial, as would the caching proposal.

This is nowhere near an exhaustive list of things to do, but:

  1. Investigate the GC problems.
  2. Write a lot more tests to confirm this actually handles all the edge cases correctly.
  3. Establish more concretely how much each of these changes helps performance. I was checking performance all the way through writing them, so I'm confident they do help, but I don't have concrete numbers to share.
  4. Evaluate other trade-offs, such as maintainability and bundle size. The code will need polishing up, including handling all edge cases, before this can be properly evaluated.

There are several places, e.g. pathParserRanker.ts, where I just hacked in whatever changes I needed, without much regard for anything else. I'll worry about that once the bigger picture becomes a bit clearer.


Update: I've now written a lot more tests and there were problems in some edge cases. Most of these problems were with end: false, but the handling of the sensitive option wasn't quite correct either. I've just pushed fixes for most of those problems. The remaining (known) problems are just with end: false, and only occur in cases where it has the same score as another route. As noted in #2153, I think that would best be addressed by including end in the score calculation.

Copy link

netlify bot commented Feb 20, 2024

Deploy Preview for vue-router canceled.

Name Link
🔨 Latest commit 987ea09
🔍 Latest deploy log https://app.netlify.com/sites/vue-router/deploys/65e5b4e5dc8b540009bf6995

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

1 participant