-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Cosmos][VectorSearch] Non Streaming Order By Query #39897
[Cosmos][VectorSearch] Non Streaming Order By Query #39897
Conversation
…ngPolicy (Azure#40004) * Adding changes for vectorIndex and vectorEmbeddingPolicy * Adding some necessary comments * Adding test case * updating enum values * Updating test case * Updating test case * Updating test case * updating changelog * Updating test case * Resolving comments * Resolving comments * Fixing test case * Resolving comments * Resolving Comments * Fixing build issues * Resolving comments * Resolving Comments
API change check APIView has identified API level changes in this PR and created following API reviews. |
Azure Pipelines successfully started running 1 pipeline(s). |
} else { | ||
initialPageSize = pageSizeWithTopOrLimit; | ||
} | ||
initialPageSize = pageSizeWithTopOrLimit; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because for Top greater than 100, and maxItemCount not set, initialPageSize by default is 100. So the top value was always being 100.
And if Non-streaming order by, top or limit will always be there, and their value should be picked up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
limit is dangerous - because for several queries offset/limit are actually client-side processing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But initialPageSize
is used as the size of our PQ. For limit if we use the default initialPageSize
, then we will not get the correct number of items back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From some of the discussions on the JS SDK, I think we need to have OFFSET+LIMIT
tests for at least the single partition and multiple partition cases, since they are processed differently by the service.
...ure-cosmos/src/main/java/com/azure/cosmos/implementation/query/NonStreamingOrderByUtils.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run java - cosmos - tests |
Azure Pipelines successfully started running 1 pipeline(s). |
/check-enforcer override |
Query and service team has decided to go ahead with the opt-out env variable vector search feature, need to release today.
Description
Java follow up to the .Net PR: Azure/azure-cosmos-dotnet-v3#4362
nonStreamingOrderBy
that is now present in the query plan, we choose to create a separate query execution context for these types of operations.NonStreamingOrderByQueryQueryContext
, essential for vector search capabilities. Previously, the SDK operated under the assumption that documents returned in response to ORDER BY queries were fully ordered across all continuations. However, with the newly implemented non-streaming OrderBy feature in the backend, this assumption is no longer valid.TOP +1
orLIMIT+OFFSET+1
for every document producer. This query pipeline functions as a blocking pipeline, similar to the GROUP BY. We will accumulate all the backend responses for every document producer in it's own PQ which is ordered using the comparator passed by the BE. Once all the document producers are processed, the are re-balanced together to yield the results.TOP +1
orLIMIT+OFFSET+1
, because once the PQ has reached the TOP size, we want to add one document, and then remove based on the ordering. If we just usedTOP
orLIMIT+OFFSET
, adding a new document after reaching the limit would make the PQ 50% bigger, which could mess up the results.All SDK Contribution checklist:
General Guidelines and Best Practices
Testing Guidelines