Integrate Stored Procedure and Query Plan request routing to Gateway V2 endpoint#47759
Open
jeet1995 wants to merge 69 commits into
Open
Integrate Stored Procedure and Query Plan request routing to Gateway V2 endpoint#47759jeet1995 wants to merge 69 commits into
jeet1995 wants to merge 69 commits into
Conversation
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
xinlian12
reviewed
Jan 30, 2026
xinlian12
reviewed
Jan 30, 2026
xinlian12
reviewed
Jan 30, 2026
xinlian12
reviewed
Jan 30, 2026
xinlian12
reviewed
Jan 30, 2026
mbhaskar
reviewed
Jan 30, 2026
mbhaskar
reviewed
Jan 30, 2026
xinlian12
reviewed
Jun 13, 2026
Member
|
✅ Review complete (01:52) Posted 3 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
xinlian12
reviewed
Jun 13, 2026
xinlian12
reviewed
Jun 13, 2026
Member
|
✅ Review complete (17:17) Posted 2 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
xinlian12
reviewed
Jun 13, 2026
Member
|
✅ Review complete (51:55) Posted 1 inline comment(s). Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
readManyByPartitionKeys validates any caller-supplied custom query by fetching a query plan and asserting it is single-partition and non-hybrid. Until now that validation called fetchQueryPlanForValidation with no DocumentCollection, so the plan request was pinned to Gateway V1 (useGatewayMode = (partitionKeyDefinition == null)). Thread the container's DocumentCollection from RxDocumentClientImpl.validateCustomQueryForReadManyByPartitionKeys -> DocumentQueryExecutionContextFactory.fetchQueryPlanForValidation so QueryPlanRetriever has the PartitionKeyDefinition needed to convert PartitionKeyInternal-formatted queryRanges from the thin client (Gateway V2) proxy into the EPK-hex Range<String> entries the query pipeline consumes. With this wiring, the validation query plan goes to the thin client when the client is configured for it, and remains on Gateway V1 otherwise. No behavior change on the non-thin-client path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- Add ReadManyByPartitionKeyQueryPlanRoutingTest unit tests that pin the useGatewayMode gate in QueryPlanRetriever: gateway mode when DocumentCollection is null and partitioned mode when a PartitionKeyDefinition is present. - Add three readManyByPartitionKeys E2E tests to ThinClientQueryE2ETest that exercise the validation QueryPlan path through Direct TCP (baseline) and Gateway V2 (thin client), covering no-custom-query, projection+filter, and parameterized variants. Each thin-client diagnostics page is asserted to use the :10250 endpoint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
QueryPlan requests intentionally carry no RCS/CL headers (matches the V1 HTTP behavior). When the V2 thin-client routes the QueryPlan precursor through the same :10250 endpoint as the data query, the spy must skip the QueryPlan frame so the assertion checks the actual data-query frame. This mirrors the IS_QUERY_PLAN_REQUEST filter on the V1 path. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
The thin-client/Gateway-V2 proxy returns a non-2xx response with a raw, non-JSON, NUL-padded error body for invalid-syntax queries. In RxGatewayStoreModel.validateOrThrow, new CosmosError(body) attempted to parse that body as JSON and threw IllegalArgumentException, which escaped the method before the existing status-carrying throw could run. Upstream then wrapped it as statusCode 0. Wrap the CosmosError(body) construction in a narrow try/catch (IllegalArgumentException) and fall back to the non-parsing CosmosError(errorCode, message) constructor with a sanitized body. The existing throw now fires with the correct status (400) and the proxy error text is preserved. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
F1: strict 400 + unconditional thin-client endpoint check on invalid query, locking the statusCode-0->400 fix. F2: ordered-vs-sorted-set document-ID comparison (ORDER BY sequence-compared, others set-compared). F3: ID-set equality + no-duplicate check across drained continuation pages. F4: numeric-tolerance (1e-6) comparison for scalar and GROUP BY aggregates to avoid float-formatting false mismatches. F5: validated vector/full-text/hybrid queries match Direct vs thin-client end-to-end through the proxy. Validated live via -Pthinclient: 84 tests, 0 failures. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Rewrite THINCLIENT_TEST_MATRIX.md as a reviewable test-design specification reverse-engineered from the committed ThinClientQueryE2ETest code (84 tests). Documents the differential-testing oracle (Direct :443 baseline vs thin client :10250 SUT), the data-model fixture, every assertion contract (endpoint provenance, ordered-vs-unordered ID equality, scalar/GROUP BY tolerance), the full 84-test matrix, the F1-F5 hardened special cases (continuation draining, invalid-query 400, vector/FTS/hybrid ranking, readMany validation path), advertised-feature coverage, and known gaps (CountIf/DCount/MultipleOrderBy) for reviewer sign-off. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
@sdkReviewAgent |
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Executes actionable review feedback on ThinClientQueryE2ETest plus two harness fixes surfaced during live validation against thin-client-multi-region-ci. ThinClientQueryE2ETest: - Strict ordering: ORDER BY results validated with isStrictlyOrdered across the board (stricter-by-default, per reviewer guidance) instead of set parity. - Add testMultipleOrderBy() with a composite-index container to cover the MultipleOrderBy query feature. - Add testDCount() using the canonical Cosmos DCount idiom (COUNT over a DISTINCT VALUE subquery); SQL-standard COUNT(DISTINCT ...) is not valid Cosmos SQL grammar. - Reword multi-EPK-range comment/javadoc to reflect emulator/backend reality (multiple metadata ranges served by a single backend partition; SDK routing and query pipeline still exercised). Harness fixes: - TestSuiteBase: add wait-and-poll utility waitForCollectionToBeAvailableToRead (predicate on NotFound/substatus 1013) to deflake "Collection is not yet available for read" on freshly created containers; update call sites in OrderbyDocumentQueryTest, NonStreamingOrderByQueryVectorSearchTest, QueryValidationTests, ReadFeedCollectionsTest. - SinglePartitionDocumentQueryTest: make the processMessage Mockito assertion mode-aware (thin-client routes query plan + query => times(2)). Validated live against thin-client-multi-region-ci: 86/86 green. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Member
Author
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR integrates Query Plan and Stored Procedure request routing into the Gateway V2 thin-client (proxy) path, adds the RNTBD protocol surface the proxy needs to generate a query plan, and deserializes the proxy-generated query plan back into the client query pipeline. It also fixes a thin-client query-plan error-status regression and a hybrid/full-text diagnostics gap, and ships an oracle-style E2E suite that validates thin-client parity against Direct TCP.
Production changes (
azure-cosmos)RNTBD protocol surface (Gateway V2 proxy)
RntbdConstants.java— newQueryPlanoperation type (0x0042) and two new request headers:SupportedQueryFeatures(0x00FF,String) andQueryVersion(0x0100,SmallString). IDs match the server-side proxy (ADO PR 1982503).RntbdRequestHeaders.java— populate theSupportedQueryFeaturesandQueryVersionRNTBD tokens from their HTTP-header equivalents so the proxy can extract them from the RNTBD body.RntbdRequestFrame.java— wire the newQueryPlanoperation type onto the request frame.Query-plan routing & deserialization
QueryPlanRetriever.java— advertiseCountIfinSUPPORTED_QUERY_FEATURES; routegetQueryPlanThroughGatewayAsyncthrough thin-client mode and pass thepartitionKeyDefinition; defense-in-depth guard that converts a non-2xx / malformed query-plan response into a clean400instead of a leaked exception.PartitionedQueryExecutionInfo.java— thin-client deserialization overload that accepts the partition-key definition (and response timeline), used to construct the query pipeline from a proxy-generated query plan.QueryInfo.java—getGroupByAliasToAggregateType()andCountIfhandling.DocumentQueryExecutionContextFactory.java,IDocumentQueryClient.java— thread the partition-key definition into query-plan retrieval.Stored-procedure routing
RxDocumentClientImpl.java,RxDocumentServiceRequest.java,ThinClientStoreModel.java— route stored-procedure execution through the Gateway V2 thin-client path.Bug fix — thin-client query-plan error status (
statusCode 0→400)RxGatewayStoreModel.validateOrThrow— a thin-client query-plan error frame (non-JSON, NUL-padded body) causednew CosmosError(body)to throw, which escaped before the intendedthrow dceand surfaced upstream asstatusCode 0. It now falls back to a sanitizedCosmosError, so the existing throw carries the real 400 with the server-provided substatus and message preserved. Strictly additive:2xxresponses and valid-JSON error bodies are byte-identical to before — only a previously-leaked exception path is corrected.Diagnostics fix — hybrid / full-text
HybridSearchDocumentQueryExecutionContext.java— propagate the component-query client-side request statistics into the synthetic finalFeedResponse, so endpoint diagnostics still show the core response path forRRF(...)/ full-text queries.Partition-key range routing
PartitionKeyInternalHelper.java— convertPartitionKeyInternalranges into sorted EPK ranges for multi-range routing.Test changes (
azure-cosmos-tests)The monolithic
ThinClientE2ETestis replaced (−378) by a focused thin-client E2E suite (TestNG groupthinclient, proxy:10250):ThinClientQueryE2ETestThinClientChangeFeedE2ETestforFullRange(),forLogicalPartition(), incremental change feedThinClientPointOperationE2ETestThinClientStoredProcedureE2ETestPartitionKey.NONEPartitionKeyInternalTestPartitionKeyInternalranges to sorted EPK rangesQueryPlanRetrieverSupportedFeaturesTestReadManyByPartitionKeyQueryPlanRoutingTestreadManyquery-plan routingGatewayReadConsistencyStrategySpyWireTestPlus:
ThinClientTestBase/TestSuiteBasethin-client helpers,pom.xmlthinclientprofile wiring, andTHINCLIENT_TEST_MATRIX.mdcoverage matrix.Query-test methodology
ThinClientQueryE2ETestruns each query shape against the same seeded data through two paths:Assertions: (1) thin-client diagnostics include a request to the
:10250proxy endpoint, (2) Direct and thin-client result counts match, (3) result contents match — preserving order forORDER BYqueries and compared as sorted sets otherwise.Assertion hardening (F1–F5)
400+ unconditional:10250endpoint assertion on invalid query (locks thestatusCode 0→400fix above; no longer tolerates0).ORDER BY).1e-6numeric tolerance for scalar andGROUP BYaggregates (avoidsSUM/AVGfloat-formatting false mismatches).Query coverage validated
ORDER BY,DISTINCT,TOP,OFFSET/LIMITGROUP BYJOIN,EXISTS,LIKE,BETWEENRRF(...)queriesQuery feature header validation
QueryPlanRetrieverSupportedFeaturesTestverifies Java now advertisesCountIfwhile intentionally not advertising:ListAndSetAggregate— Java does not yet implementMAKELIST/MAKESETaggregation.HybridSearchSkipOrderByRewrite— currently fails Java thin-client hybrid validation against staging with a backend400 / SC1001syntax error.All SDK Contribution checklist:
Testing Guidelines