perf: faster handling of static paths #2148
Draft
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Background
Original issue: #2132. In short, the performance of adding a large number of routes is slow, and is particularly problematic with SSR.
This PR attempts some performance improvements. While the changes are currently unpolished and experimental, I believe these changes are 'realistic', in the sense that they could be tidied up and included without making breaking changes.
These changes are separate from the caching and binary search ideas that have already been suggested. Each strategy has its pros and cons and they each target slightly different use cases.
My priority was just runtime performance, not clean code, maintainability, or bundle size. Balancing those other concerns can come later. I'm currently looking to share and discuss the main ideas, rather than worrying about the minutiae of the implementation.
Overview of the changes
I started coming at this from a slightly different angle, investigating the performance of route resolution rather than addition. I figured that iterating through the array of routes and testing each one against the path might be slow when the number of routes is large. On pages with a lot of links this could happen many times.
After some profiling, I had two main ideas for how to speed this up:
RegExp
might not be the most efficient approach, especially for routes that are effectively static (i.e. noparams
)./en/foo
. If we're trying to match/de/bar
then we can ignore all the paths beginning with/en
without needing to check each one individually.Roughly speaking, these are the two changes that are implemented here. There is some overlap between them, as they both require some initial analysis of the route path to establish which portions are static. Thankfully, most of that work is already done by
tokenizePath()
, and that step is already very fast.While these changes were motivated by investigating route resolution, they also significantly improve the performance of route addition.
The changes in detail
The matcher tree
The idea here is inspired by Eduardo's suggestion here to use a trie. Whether this is actually what he had in mind I'm not sure, but I think it's a similar concept.
Paths often start with a static section. e.g. in
/en/products/:id
, the/en/products
portion is static. This allows us to create a tree structure to store the route, with a node foren
and a child node forproducts
.When we want to match a path, say
/en/products/1234
, we can walk that tree to find candidate routes. There may still be multiple routes that need to be checked, but we can immediately skip routes in different parts of the tree, such as those that begin with/de
, or those beginning/en/users
.This also speeds up the route creation process. Routes no longer need inserting into a massive array. Instead, each node of the tree holds a much smaller array of routes, so the insertion point can be found much quicker.
There are various complications, such as case sensitivity. I'm currently assuming that calling
toUpperCase()
is sufficient to achieve matching equivalent to thei
flag of aRegExp
. This is based on something I read here, but it would need further verification to confirm it gives consistent casing.While this approach does seem to be really fast, it does lead to a lot of extra garbage collection. This can lead to spikes in the performance when the garbage collector kicks in. This is understandable, as the data structures involved are much more complicated. It needs further investigation to understand whether this can be avoided or if it's really a problem worth worrying about.
Further work may be needed on the
getRoutes()
method. This can no longer simply return thematchers
array, as it doesn't exist. I currently have it building a new array each time it's called. It may need some form of caching.Exact path matching
For routes that are entirely static, i.e. that have no
params
, it's possible to bypass the tree mentioned above and just store them directly in an object as a quick lookup. Case sensitivity still needs taking into account, but it's still really fast for both adding routes and resolving routes. Keeping these static routes out of the tree can greatly reduce the size of the tree.Parsing static paths
tokenizePath()
parses the routepath
into tokens. It's already really fast, so I've left that alone.Those tokens are then passed to
tokensToParser()
, which does a lot of stuff. For entirely static paths, without anyparams
, I believe this is doing a lot of unnecessary work.I've introduced
staticPathToParser()
, which creates a simpler implementation ofPathParser
. This is only used in cases where the path is entirely static. It skips most of the work. For example, rather than using aRegExp
to check the path it uses a===
check instead (also taking account of thesensitive
andstrict
options as required). This not only makes the addition of routes faster, it also makes the parser itself more efficient.Further work
All of these changes assume a static prefix to the
path
. If an application has a dynamic section at the start of all its paths then these changes won't help at all. There needs to be a reasonable spread within the tree for there to be any benefit. In those problematic cases, the binary search proposal would still be beneficial, as would the caching proposal.This is nowhere near an exhaustive list of things to do, but:
There are several places, e.g.
pathParserRanker.ts
, where I just hacked in whatever changes I needed, without much regard for anything else. I'll worry about that once the bigger picture becomes a bit clearer.Update: I've now written a lot more tests and there were problems in some edge cases. Most of these problems were with
end: false
, but the handling of thesensitive
option wasn't quite correct either. I've just pushed fixes for most of those problems. The remaining (known) problems are just withend: false
, and only occur in cases where it has the same score as another route. As noted in #2153, I think that would best be addressed by includingend
in the score calculation.