GH-44: Clear signature columns before re-populating in ArrowFlightStatement#executeFlightInfoQuery#1135
Open
hkad98 wants to merge 1 commit intoapache:mainfrom
Open
GH-44: Clear signature columns before re-populating in ArrowFlightStatement#executeFlightInfoQuery#1135hkad98 wants to merge 1 commit intoapache:mainfrom
hkad98 wants to merge 1 commit intoapache:mainfrom
Conversation
…ghtStatement#executeFlightInfoQuery ArrowFlightStatement#executeFlightInfoQuery appended the dataset schema columns to the statement's reused Meta.Signature without first clearing them, doubling the column list on every invocation. When the FlightInfo has at least one endpoint, ArrowFlightJdbcVectorSchemaRootResultSet overwrites signature.columns from the actual stream schema and hides the duplication. With an empty endpoint list — reported against both Rust- and Denodo-based Flight SQL servers — that overwrite never runs and ResultSetMetaData#getColumnCount() reports 2x the schema width. Regression introduced in 15.0.0 (GH-33475 prepared-statement parameter binding) when handle.signature became mutable across executions. Adds a regression test that registers a mock query with no endpoints and asserts ResultSetMetaData#getColumnCount() matches the schema. Closes apache#44.
This comment has been minimized.
This comment has been minimized.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale
ArrowFlightStatement#executeFlightInfoQueryappends the dataset schema columns to the statement's reusedMeta.Signaturewithout first clearing them, so the column list doubles on every invocation. When theFlightInfohas at least one endpoint,ArrowFlightJdbcVectorSchemaRootResultSet#populateDataoverwritessignature.columnsfrom the actual stream schema and hides the duplication. With an empty endpoint list — the case reported by @mingnuj in the original issue against a Rust-based Flight SQL server, and independently reproducible against Denodo Express 9.4.2 — that overwrite never runs, andResultSetMetaData#getColumnCount()reports2× columnCount. Calcite/Avatica then refuses the metadata withCannot have more columns with the same name.This is a regression introduced in 15.0.0 by the prepared-statement parameter binding work in GH-33475, which made
handle.signaturemutable and shared across theprepareAndExecute→executeFlightInfoQuerypath.Fix
One line:
signature.columns.clear()before the existingaddAll(...). The fresh schema returned at execute time is authoritative; pre-existing entries on the signature are stale by definition. This is exactly the workaround @mingnuj proposed in the issue and confirmed working.Test
Adds
ResultSetMetadataTest#testShouldNotDuplicateColumnsWhenFlightInfoHasNoEndpointsplus aLEGACY_REGULAR_NO_ENDPOINTS_SQL_CMDfixture inCoreMockedSqlProducers(registered with an empty result-provider list, sogetFlightInfoStatementreturns aFlightInfowith zero endpoints — the bug-triggering shape). The test assertsgetColumnCount() == 1, fails onmainwithExpected: <1> but was <2>, and passes after the fix../gradlew-equivalent run for this module:Result: 1234 tests run, 0 failures, 0 errors, 44 skipped.
Cross-server reproductions
The bug surfaces against any Flight SQL server that returns an empty endpoint list:
Wire-level the same
FlightInfois parsed correctly as 1 column bypyarrow.flight, by ADBC C++ / Python, and by Denodo's own native JDBC driver — onlyflight-sql-jdbc-driver18.x / 19.0.0 doubles the column count, confirming the issue is Java-side.Closes #44.