Added support for basic incremental index sorting #3656
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Performance bottleneck
I am using a lot of multi-table simple queries with
ORDER BY
(and eventuallyOFFSET/LIMIT
), which aren't optimized in any sense. All records are fetched, a final sorting is done and eventually anOFFSET/LIMIT
is applied. There is an existing optimization in H2 based on theSelect.sortUsingIndex
flag, which hints that the records are already fetched in a sorted way (because an index is used), so a final sorting is not required. Also, LIMIT clause has a lot to benefit from this, as only some records are fetched. However, this works only for single-table queries.Example
The most basic example is something like
SELECT * FROM TEST1 CROSS JOIN TEST2 ORDER BY TEST1.F1 TEST2.F2 LIMIT 5
. Right now, H2 sequentially scans both tables, sorts the result set according toORDER BY TEST1.F1 TEST2.F2
and retrieves the first 5 rows. If both tables have ~1000 unique records, this operation takes ~450ms (embedded anonymous in-memory database v2.1.214).Research
LIMIT
is also optimized due to this technique.LIMIT
in the context of multi-tableORDER BY
(LIMIT 1
is considerably faster thanLIMIT 100000
even for multi-table queries). Of course, these are C based back-ends, but the theoretic approach is relevant here.LIMIT
values. This may be misleading, as the final sorting performance difference may spoil the performance check.Solution
I suggest using the best b-tree index for the
topFilterTable
- the one that matches the longest prefix of columns from theORDER BY
clause as possible. Using such index will ensure that the result set is partially sorted on the firstSelect.alreadySorted
elements fromORDER BY
.When
OFFSET/LIMIT
is specified, onlyLIMIT
elements shall be retrieved initially. To ensure completeness, extra records should be fetched until a "bigger" record is encountered based on the firstSelect.alreadySorted
fields. Sorting these and applyingLIMIT
is a complete and sound solution.Using something like
LIMIT X
will makeSELECT
to retrieve onlyX
records and eventually fetches some extra records to ensure completeness.ORDER BY
has many fields matched by an index or the records have a high variance. Very few extra records are fetched (maybe none). In this case, the number of extra comparisons is low, so the overhead is clearly acceptable comparing to the fetching of all records.ORDER BY
and these have a very low variance. Many extra records are fetched (maybe the whole result-set). In this case, the number of extra comparisons is high, but the number of fields for which one comparison is done is very low (1/2). Therefore, the overhead of applying this solution on result-sets with low variance is negligible.Result
The same SQL I provided earlier now runs in ~6 ms. The change never compromises as this does not interfere with join order or index selection. Only when it is safe to use/upgrade an index,
Select.alreadySorted
is used.CROSS JOIN
queries withORDER BY
andLIMIT
are working considerably faster.ORDER BY
are not affected by the change.ORDER BY
andLIMIT
ORDER BY
andLIMIT
, but always preserves the invariant proposed by the index in the first place.