Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Java][FlightSQL] Column Duplication When Selecting from no result record in Arrow Flight SQL JDBC Driver #44467

Open
mingnuj opened this issue Oct 18, 2024 · 1 comment

Comments

@mingnuj
Copy link

mingnuj commented Oct 18, 2024

Describe the bug, including details regarding any error messages, version, and platform.

I encountered an issue when working with Arrow Flight SQL, and I would appreciate your help.

I created a test_table with the following columns:

CREATE TABLE default.public.test_table (col1 int);

When I run a SELECT * query without inserting any data into the table, the columns appear duplicated.
I get the following output:
image

I am developing the database myself, using Rust and connecting through FlightSQL.
When executing SELECT clause and the table is empty, I handle this by returning an endpoint with empty endpoint in the get_flight_info method.

This did not occur in Arrow Flight SQL JDBC Driver versions prior to 15.0.0, but it started happening with version 15.0.0.

Component(s)

Java

@mingnuj
Copy link
Author

mingnuj commented Oct 28, 2024

I've been building each Flight SQL-related branch following the Arrow 15.0.0 release and identified an issue occurring after commit [GH-33475: Add parameter binding for Prepared Statements in JDBC driver (#38404)].

After debugging last week, I observed the following:

When querying from DBeaver, the prepareAndExecute function is called within ArrowFlightMetaImpl.java in flight-sql-jdbc-core. Before this commit, the signature in ArrowFlightMetaImpl was a fixed value, declared as final. However, with the parameter binding addition, the code now utilizes the handle's signature, making it mutable.

  • Previous code (ArrowFlightMetaImpl.java, line 160):
    final Signature signature = newSignature(query); // Immutable
    final long updateCount = statementType.equals(StatementType.UPDATE) ? preparedStatement.executeUpdate() : -1;
    synchronized (callback.getMonitor()) {
        callback.clear();
        callback.assign(signature, null, updateCount);
    }
    callback.execute();
    final MetaResultSet metaResultSet = MetaResultSet.create(handle.connectionId, handle.id, false, signature, null);
  • Updated code (ArrowFlightMetaImpl.java, line 196):
    PreparedStatement preparedStatement = prepareForHandle(query, handle);
    final StatementType statementType = preparedStatement.getType();
    final long updateCount = statementType.equals(StatementType.UPDATE) ? preparedStatement.executeUpdate() : -1;
    synchronized (callback.getMonitor()) {
        callback.clear();
        callback.assign(handle.signature, null, updateCount);
    }
    callback.execute();
    final MetaResultSet metaResultSet = MetaResultSet.create(handle.connectionId, handle.id, false, handle.signature, null); // Mutable

The addition is needed for parameter binding, yet it introduces an issue where columns are appended repeatedly. During execution, the executeFlightInfoQuery function in ArrowFlightStatement.java (in flight-sql-jdbc-core) is invoked, which appends columns.

@Override
public FlightInfo executeFlightInfoQuery() throws SQLException {
   final PreparedStatement preparedStatement = getConnection().getMeta().getPreparedStatement(handle);
   final Meta.Signature signature = getSignature();
   if (signature == null) {
       return null;
   }

   final Schema resultSetSchema = preparedStatement.getDataSetSchema();
   signature.columns.addAll(ConvertUtils.convertArrowFieldsToColumnMetaDataList(resultSetSchema.getFields()));
   setSignature(signature);

   return preparedStatement.executeQuery();
}

With every query, the signature column set is duplicated and propagated. In cases where the result has endpoints (e.g., a typical doPut operation in FlightSqlClient), the column list seems consistent as the signature is replaced with the result signature. However, in my case, where there are no endpoints, the columns appear duplicated.

To address this, I temporarily fixed the issue by simply calling clear on the column list before adding new columns.

@Override
public FlightInfo executeFlightInfoQuery() throws SQLException {
   final PreparedStatement preparedStatement = getConnection().getMeta().getPreparedStatement(handle);
   final Meta.Signature signature = getSignature();
   if (signature == null) {
       return null;
   }

   final Schema resultSetSchema = preparedStatement.getDataSetSchema();
   signature.columns.clear();
   signature.columns.addAll(ConvertUtils.convertArrowFieldsToColumnMetaDataList(resultSetSchema.getFields()));
   setSignature(signature);

   return preparedStatement.executeQuery();
}

However, I believe a more fundamental solution could be beneficial here. Could anyone provide insights or assistance?
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant