[Cosmos][VectorSearch] Non Streaming Order By Query #39897

aayush3011 · 2024-04-24T15:10:49Z

Description

Java follow up to the .Net PR: Azure/azure-cosmos-dotnet-v3#4362

Using the flag for nonStreamingOrderBy that is now present in the query plan, we choose to create a separate query execution context for these types of operations.
This update introduces NonStreamingOrderByQueryQueryContext, essential for vector search capabilities. Previously, the SDK operated under the assumption that documents returned in response to ORDER BY queries were fully ordered across all continuations. However, with the newly implemented non-streaming OrderBy feature in the backend, this assumption is no longer valid.
The approach for this is having a Priority of size TOP +1 or LIMIT+OFFSET+1 for every document producer. This query pipeline functions as a blocking pipeline, similar to the GROUP BY. We will accumulate all the backend responses for every document producer in it's own PQ which is ordered using the comparator passed by the BE. Once all the document producers are processed, the are re-balanced together to yield the results.
The reason for having the size of PQ as TOP +1 or LIMIT+OFFSET+1, because once the PQ has reached the TOP size, we want to add one document, and then remove based on the ordering. If we just used TOP or LIMIT+OFFSET, adding a new document after reaching the limit would make the PQ 50% bigger, which could mess up the results.

All SDK Contribution checklist:

The pull request does not introduce [breaking changes]
CHANGELOG is updated for new features, bug fixes or other significant changes.
I have read the contribution guidelines.

General Guidelines and Best Practices

Title of the pull request is clear and informative.
There are a small number of commits, each of which have an informative message. This means that previously merged commits do not appear in the history of the PR. For more information on cleaning up the commits in your PR, see this page.

Testing Guidelines

Pull request includes test coverage for the included changes.

…ngPolicy (Azure#40004) * Adding changes for vectorIndex and vectorEmbeddingPolicy * Adding some necessary comments * Adding test case * updating enum values * Updating test case * Updating test case * Updating test case * updating changelog * Updating test case * Resolving comments * Resolving comments * Fixing test case * Resolving comments * Resolving Comments * Fixing build issues * Resolving comments * Resolving Comments

azure-sdk · 2024-05-03T20:09:16Z

API change check

APIView has identified API level changes in this PR and created following API reviews.

com.azure:azure-cosmos

azure-pipelines · 2024-05-17T06:49:13Z

Azure Pipelines successfully started running 1 pipeline(s).

xinlian12 · 2024-05-17T15:06:09Z

...rc/main/java/com/azure/cosmos/implementation/query/DocumentQueryExecutionContextFactory.java

- } else {
- initialPageSize = pageSizeWithTopOrLimit;
- }
+ initialPageSize = pageSizeWithTopOrLimit;


why the change here?

Because for Top greater than 100, and maxItemCount not set, initialPageSize by default is 100. So the top value was always being 100.

And if Non-streaming order by, top or limit will always be there, and their value should be picked up.

limit is dangerous - because for several queries offset/limit are actually client-side processing

But initialPageSize is used as the size of our PQ. For limit if we use the default initialPageSize, then we will not get the correct number of items back.

From some of the discussions on the JS SDK, I think we need to have OFFSET+LIMIT tests for at least the single partition and multiple partition cases, since they are processed differently by the service.

...ure-cosmos/src/main/java/com/azure/cosmos/implementation/query/NonStreamingOrderByUtils.java

xinlian12

LGTM, thanks

aayush3011 · 2024-05-17T16:11:18Z

/azp run java - cosmos - tests

azure-pipelines · 2024-05-17T16:11:39Z

Azure Pipelines successfully started running 1 pipeline(s).

aayush3011 · 2024-05-17T19:54:50Z

/azp run java - cosmos - tests

azure-pipelines · 2024-05-17T19:55:09Z

Azure Pipelines successfully started running 1 pipeline(s).

aayush3011 · 2024-05-17T20:28:07Z

/azp run java - cosmos - tests

azure-pipelines · 2024-05-17T20:28:27Z

Azure Pipelines successfully started running 1 pipeline(s).

aayush3011 · 2024-05-17T22:52:40Z

/azp run java - cosmos - tests

azure-pipelines · 2024-05-17T22:52:58Z

Azure Pipelines successfully started running 1 pipeline(s).

aayush3011 · 2024-05-18T00:12:16Z

/azp run java - cosmos - tests

azure-pipelines · 2024-05-18T00:12:36Z

Azure Pipelines successfully started running 1 pipeline(s).

aayush3011 · 2024-05-18T03:32:51Z

/azp run java - cosmos - tests

azure-pipelines · 2024-05-18T03:33:10Z

Azure Pipelines successfully started running 1 pipeline(s).

kushagraThapar · 2024-05-20T00:05:58Z

/check-enforcer override

Query and service team has decided to go ahead with the opt-out env variable vector search feature, need to release today.

Initial changes

540a16d

github-actions bot added the Cosmos label Apr 24, 2024

aayush3011 added 6 commits April 25, 2024 12:27

Initial changes

6f49c75

Merge branch 'main' into users/akataria/nonStreamingOrderBy

97509eb

Merge branch 'main' into users/akataria/nonStreamingOrderBy

86b36d3

Initial changes

a979c11

Initial changes

e2756a5

Initial changes

8be2277

aayush3011 marked this pull request as ready for review May 3, 2024 22:11

aayush3011 requested review from kushagraThapar, FabianMeiswinkel, kirankumarkolli, xinlian12, milismsft, simorenoh, jeet1995 and Pilchie as code owners May 3, 2024 22:11

aayush3011 requested review from azure-sdk, kushagraThapar and FabianMeiswinkel and removed request for Pilchie, kirankumarkolli, kushagraThapar, milismsft, jeet1995, FabianMeiswinkel, simorenoh and xinlian12 May 3, 2024 22:13

xinlian12 reviewed May 17, 2024

View reviewed changes

...ure-cosmos/src/main/java/com/azure/cosmos/implementation/query/NonStreamingOrderByUtils.java Outdated Show resolved Hide resolved

xinlian12 approved these changes May 17, 2024

View reviewed changes

Resolving comments, adding new test cases

7002362

Adding argument to run emulator tests

508e94a

aayush3011 requested review from hallipr, weshaggard, benbp and JimSuplizio as code owners May 17, 2024 19:34

fixing emulator test pipeline

822bd67

fixing emulator test pipeline

46fe7cb

aayush3011 added 2 commits May 17, 2024 16:46

Adding logging for variable AZURE_COSMOS_DISABLE_NON_STREAMING_ORDER_BY

5657b75

Adding logging for variable AZURE_COSMOS_DISABLE_NON_STREAMING_ORDER_BY

015a77c

fixing emulator test pipeline

f87be45

kushagraThapar merged commit 0c4e817 into Azure:main May 20, 2024
85 of 93 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cosmos][VectorSearch] Non Streaming Order By Query #39897

[Cosmos][VectorSearch] Non Streaming Order By Query #39897

aayush3011 commented Apr 24, 2024 •

edited

azure-sdk commented May 3, 2024 •

edited

azure-pipelines bot commented May 17, 2024

xinlian12 May 17, 2024

aayush3011 May 17, 2024

FabianMeiswinkel May 17, 2024

aayush3011 May 17, 2024 •

edited

Pilchie May 17, 2024

xinlian12 left a comment

aayush3011 commented May 17, 2024

azure-pipelines bot commented May 17, 2024

aayush3011 commented May 17, 2024

azure-pipelines bot commented May 17, 2024

aayush3011 commented May 17, 2024

azure-pipelines bot commented May 17, 2024

aayush3011 commented May 17, 2024

azure-pipelines bot commented May 17, 2024

aayush3011 commented May 18, 2024

azure-pipelines bot commented May 18, 2024

aayush3011 commented May 18, 2024

azure-pipelines bot commented May 18, 2024

kushagraThapar commented May 20, 2024

[Cosmos][VectorSearch] Non Streaming Order By Query #39897

[Cosmos][VectorSearch] Non Streaming Order By Query #39897

Conversation

aayush3011 commented Apr 24, 2024 • edited

Description

All SDK Contribution checklist:

General Guidelines and Best Practices

Testing Guidelines

azure-sdk commented May 3, 2024 • edited

azure-pipelines bot commented May 17, 2024

xinlian12 May 17, 2024

Choose a reason for hiding this comment

aayush3011 May 17, 2024

Choose a reason for hiding this comment

FabianMeiswinkel May 17, 2024

Choose a reason for hiding this comment

aayush3011 May 17, 2024 • edited

Choose a reason for hiding this comment

Pilchie May 17, 2024

Choose a reason for hiding this comment

xinlian12 left a comment

Choose a reason for hiding this comment

aayush3011 commented May 17, 2024

azure-pipelines bot commented May 17, 2024

aayush3011 commented May 17, 2024

azure-pipelines bot commented May 17, 2024

aayush3011 commented May 17, 2024

azure-pipelines bot commented May 17, 2024

aayush3011 commented May 17, 2024

azure-pipelines bot commented May 17, 2024

aayush3011 commented May 18, 2024

azure-pipelines bot commented May 18, 2024

aayush3011 commented May 18, 2024

azure-pipelines bot commented May 18, 2024

kushagraThapar commented May 20, 2024

aayush3011 commented Apr 24, 2024 •

edited

azure-sdk commented May 3, 2024 •

edited

aayush3011 May 17, 2024 •

edited