Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-33475: [Java] Add parameter binding for Prepared Statements in JDBC driver #38404

Merged
merged 23 commits into from
Nov 4, 2023

Conversation

aiguofer
Copy link
Contributor

@aiguofer aiguofer commented Oct 23, 2023

This PR is a combination of #33961 and #14627. The goal is to support parametrized queries through the Arrow Flight SQL JDBC driver.

An Arrow Flight SQL server returns a Schema for the PreparedStatement parameters. The driver then converts the Field list associated with the Schema into a list of AvaticaParameter. When the user sets values for the parameters, Avatica generates a list of TypedValue, which we then bind to each parameter vector. This conversion between Arrow and Avatica is handled by implementations of a AvaticaParameterConverter interface for each Arrow type. This interface which provides 2 methods:

  • createParameter: Create an AvaticaParameter from the given Arrow Field.
  • bindParameter: Cast the given TypedValue and bind it to the FieldVector at the specified index.

This PR purposely leaves out a few features:

  • We currently naively cast the TypedValue values assuming users set the type correctly. If this cast fails, we raise an exception letting the user know that the cast is not supported. This could be improved in subsequent PRs to do smarter conversions from other types.
  • We currently don't provide conversions for complex types such as List, Map, Struct, Union, Interval, and Duration. The stubs are there so they can be implemented as needed.
  • Tests for specific types have not been implemented. I'm not very familiar with a lot of these JDBC types so it's hard to implement rigorous tets.

@aiguofer aiguofer marked this pull request as draft October 23, 2023 15:34
@github-actions
Copy link

⚠️ GitHub issue #33475 has been automatically assigned in GitHub to PR creator.

@aiguofer aiguofer changed the title GH-33475: [Java][jdbc-driver] Add parameter binding for Prepared Statements GH-33475: [Java] Add parameter binding for Prepared Statements in JDBC driver Oct 23, 2023
@github-actions
Copy link

⚠️ GitHub issue #33475 has been automatically assigned in GitHub to PR creator.

@github-actions
Copy link

⚠️ GitHub issue #33475 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Oct 23, 2023
@aiguofer aiguofer marked this pull request as ready for review October 23, 2023 22:07
@aiguofer aiguofer marked this pull request as draft October 30, 2023 18:53
@github-actions
Copy link

⚠️ GitHub issue #33475 has been automatically assigned in GitHub to PR creator.

@aiguofer aiguofer marked this pull request as ready for review October 30, 2023 22:28
@aiguofer
Copy link
Contributor Author

I refactored this quite a bit to put Arrow <-> Avatica conversions in the same class for each Arrow type. This should help ensure we do this correctly for all Arrow types.

Currently, testing is pretty basic and we're missing conversions for complex types, but I think it's a pretty good starting point to implement the rest.

I'm not sure why the one test failed and can't figure out how to re-run it.

@aiguofer
Copy link
Contributor Author

aiguofer commented Nov 3, 2023

@lidavidm @jduo Hey all, I've spent a lot of time on this PR. It's a little frustrating that a PR that started well after mine has merged and now I have to deal with merge conflicts while mine still hasn't been thoroughly reviewed. We need this for one of our partner integrations and there's still been no movement on this. Could I at least get some eyes on this?

@lidavidm
Copy link
Member

lidavidm commented Nov 3, 2023

@aiguofer sorry about that - I'll give this a review over the weekend once Github is working again.

@jduo
Copy link
Member

jduo commented Nov 3, 2023

@lidavidm @jduo Hey all, I've spent a lot of time on this PR. It's a little frustrating that a PR that started well after mine has merged and now I have to deal with merge conflicts while mine still hasn't been thoroughly reviewed. We need this for one of our partner integrations and there's still been no movement on this. Could I at least get some eyes on this?

Hi @aiguofer . It looks good. I didn't have additional comments.

Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of my comments are critical - if you want to just file an issue to track them that would be fine

We currently naively cast the TypedValue values assuming users set the type correctly. If this cast fails, we raise an exception letting the user know that the cast is not supported. This could be improved in subsequent PRs to do smarter conversions from other types.

We can file a followup for this too.

We currently don't provide conversions for complex types such as List, Map, Struct, Union, Interval, and Duration. The stubs are there so they can be implemented as needed.

Ditto.

Tests for specific types have not been implemented. I'm not very familiar with a lot of these JDBC types so it's hard to implement rigorous tets.

I would be in favor of trying to set up integration suites with systems that depend on the driver, in the same way we have integration tests with Spark and Pandas (two other projects that heavily depend on Arrow). But this is also something we can defer.

*
* @param vector FieldVector that the parameter should be bound to.
* @param typedValue TypedValue to bind as a parameter.
* @param index Vector index that the TypedValue should be bound to.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

0-index, right? (I ask because JDBC is often 1-indexed)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct! that would be the index on the Arrow vector. I'll add to the docstring just to be more clear


@Override
public boolean bindParameter(FieldVector vector, TypedValue typedValue, int index) {
byte[] value = (byte[]) typedValue.toJdbc(null);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to handle value == null? Or in general how does Avatica handle null bind parameters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhh good catch. Avatica doesn't really do anything special for null values so it just returns null.

For example:

  public Object toJdbc(Calendar calendar) {
    return this.value == null ? null : serialToJdbc(this.type, this.componentType, this.value, calendar);
  }

I guess we need to call setNull if the value is null?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah - do you want to handle that here, or do you want to file a follow-up for now?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, let's just get this in before something else conflicts with it


@Override
public boolean bindParameter(FieldVector vector, TypedValue typedValue, int index) {
// FIXME: how should we handle TZ? Do we need to convert the value to the TZ on the vector?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The timezone on the vector is purely for display; the underlying value is always UTC in Arrow.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool that's what I was thinking. At this point Avatica has already converted to UTC so that's perfect.

private void bind(FieldVector vector, TypedValue typedValue, int index) {
try {
if (!vector.getField().getType().accept(new BinderVisitor(vector, typedValue, index))) {
throw new RuntimeException(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is there a more appropriate exception than RuntimeException?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking some type of NotImplementedException would be ideal but I couldn't find an exception like that. Any thoughts on an alternative? UnsupportedOperationException seem to imply that it will not be supported, which might not be what we want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO UnsupportedOperationException is fine for this sort of thing

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting committer review Awaiting committer review labels Nov 4, 2023
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Nov 4, 2023
@lidavidm lidavidm merged commit fc8c6b7 into apache:main Nov 4, 2023
19 checks passed
@lidavidm lidavidm removed the awaiting merge Awaiting merge label Nov 4, 2023
@lidavidm
Copy link
Member

lidavidm commented Nov 4, 2023

I filed a general followups issue at #38585

Thanks again @aiguofer!

@aiguofer
Copy link
Contributor Author

aiguofer commented Nov 4, 2023

I filed a general followups issue at #38585

Thanks again @aiguofer!

Perfect sounds good! I can work on a few of those soon after this!

Copy link

After merging your PR, Conbench analyzed the 5 benchmarking runs that have been run so far on merge-commit fc8c6b7.

There was 1 benchmark result with an error:

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 5 possible false positives for unstable benchmarks that are known to sometimes produce them.

loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…in JDBC driver (apache#38404)

This PR is a combination of apache#33961 and apache#14627. The goal is to support parametrized queries through the Arrow Flight SQL JDBC driver.

An Arrow Flight SQL server returns a Schema for the `PreparedStatement` parameters. The driver then converts the `Field` list associated with the Schema into a list of `AvaticaParameter`. When the user sets values for the parameters, Avatica generates a list of `TypedValue`, which we then bind to each parameter vector. This conversion between Arrow and Avatica is handled by implementations of a `AvaticaParameterConverter` interface for each Arrow type. This interface which provides 2 methods:
- createParameter: Create an `AvaticaParameter` from the given Arrow `Field`.
- bindParameter: Cast the given `TypedValue` and bind it to the `FieldVector` at the specified index.

This PR purposely leaves out a few features:
- We currently naively cast the `TypedValue` values assuming users set the type correctly. If this cast fails, we raise an exception letting the user know that the cast is not supported. This could be improved in subsequent PRs to do smarter conversions from other types.
- We currently don't provide conversions for complex types such as List, Map, Struct, Union, Interval, and Duration. The stubs are there so they can be implemented as needed.
- Tests for specific types have not been implemented. I'm not very familiar with a lot of these JDBC types so it's hard to implement rigorous tets.

* Closes: apache#33475
* Closes: apache#35536

Authored-by: Diego Fernandez <[email protected]>
Signed-off-by: David Li <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…in JDBC driver (apache#38404)

This PR is a combination of apache#33961 and apache#14627. The goal is to support parametrized queries through the Arrow Flight SQL JDBC driver.

An Arrow Flight SQL server returns a Schema for the `PreparedStatement` parameters. The driver then converts the `Field` list associated with the Schema into a list of `AvaticaParameter`. When the user sets values for the parameters, Avatica generates a list of `TypedValue`, which we then bind to each parameter vector. This conversion between Arrow and Avatica is handled by implementations of a `AvaticaParameterConverter` interface for each Arrow type. This interface which provides 2 methods:
- createParameter: Create an `AvaticaParameter` from the given Arrow `Field`.
- bindParameter: Cast the given `TypedValue` and bind it to the `FieldVector` at the specified index.

This PR purposely leaves out a few features:
- We currently naively cast the `TypedValue` values assuming users set the type correctly. If this cast fails, we raise an exception letting the user know that the cast is not supported. This could be improved in subsequent PRs to do smarter conversions from other types.
- We currently don't provide conversions for complex types such as List, Map, Struct, Union, Interval, and Duration. The stubs are there so they can be implemented as needed.
- Tests for specific types have not been implemented. I'm not very familiar with a lot of these JDBC types so it's hard to implement rigorous tets.

* Closes: apache#33475
* Closes: apache#35536

Authored-by: Diego Fernandez <[email protected]>
Signed-off-by: David Li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants