Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters #13589

Merged
merged 8 commits into from
Jul 26, 2022

Conversation

lidavidm
Copy link
Member

This extends the arrow-jdbc adapter to also allow taking Arrow data and using it to bind JDBC PreparedStatement parameters, allowing you to "round trip" data to a certain extent. This was factored out of arrow-adbc since it's not strictly tied to ADBC.

@lidavidm
Copy link
Member Author

lidavidm commented Jul 12, 2022

TODOs:

  • Add documentation
  • Are we handling time/timestamp types properly when time zones come into play?
  • Add date support as well

@lidavidm lidavidm marked this pull request as ready for review July 13, 2022 16:47
@lidavidm
Copy link
Member Author

CC @toddfarmer @lwhite1

Not all types are supported here but a core set is. If the approach looks reasonable I can extend coverage to at least Decimal and Binary types.

@lwhite1
Copy link
Contributor

lwhite1 commented Jul 14, 2022

Overall, this looks really nice to me.

One minor nit (not from the current change set): The statement on line 87 "Currently, it is not possible to define a custom type conversion for a supported or unsupported type." has me scratching my head a bit Should it just say "it is not possible to define a custom type conversion"? If there's a quick re-phrasing that helps, it might be worth adding.

@lidavidm
Copy link
Member Author

Thanks for taking a look!

Updated the docs, and implemented binders for binary types and decimals.

@lidavidm
Copy link
Member Author

@emkornfield @liyafan82 @pitrou any opinions on having this functionality (binding Arrow data -> JDBC prepared statement parameters) here?

@pitrou
Copy link
Member

pitrou commented Jul 19, 2022

Hmm, I'm out of my depth here, but I guess it looks useful? The main downside being the one-row-at-a-time mechanics, I suppose.

@liyafan82
Copy link
Contributor

Interesting work! Thanks. @lidavidm
I find an example for executeUpdate, does it also support executeQuery?

@lidavidm
Copy link
Member Author

Interesting work! Thanks. @lidavidm I find an example for executeUpdate, does it also support executeQuery?

Yes, or really, the only thing this module does is call setString, setInteger, etc. for you. It's up to the application to then call executeQuery, addBatch, etc. for maximum flexibility. For instance in ADBC it's used with addBatch/executeBatch:

https://github.com/apache/arrow-adbc/blob/2485d7c3da217a7190f86128d769a7d0445755ab/java/driver/jdbc/src/main/java/org/apache/arrow/adbc/driver/jdbc/JdbcStatement.java#L160-L166

@liyafan82
Copy link
Contributor

Interesting work! Thanks. @lidavidm I find an example for executeUpdate, does it also support executeQuery?

Yes, or really, the only thing this module does is call setString, setInteger, etc. for you. It's up to the application to then call executeQuery, addBatch, etc. for maximum flexibility. For instance in ADBC it's used with addBatch/executeBatch:

https://github.com/apache/arrow-adbc/blob/2485d7c3da217a7190f86128d769a7d0445755ab/java/driver/jdbc/src/main/java/org/apache/arrow/adbc/driver/jdbc/JdbcStatement.java#L160-L166

Cool! I believe this is a super useful feature. I'd like to review it in the following days.

Currently, it is not possible to define a custom type conversion for a
supported or unsupported type.
Currently, it is not possible to override the type conversion for a
supported type, or define a new conversion for an unsupported type.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I mean is that you can't implement a custom Consumer and have it be used, all you can do is change what type is assumed by the existing converters. But I'll clarify this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

* \(1) Strings longer than Integer.MAX_VALUE bytes (the maximum length
of a Java ``byte[]``) will cause a runtime exception.
* \(2) If the timestamp has a timezone, the JDBC type defaults to
TIMESTAMP_WITH_TIMEZONE. If the timestamp has no timezone,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen when a timezone is absent, the program would thrown an exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll just call setTimestamp(int, Timestamp) instead of setTimestamp(int, Timestamp, Calendar), I'll update the doc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.

*/
public abstract class BaseColumnBinder<V extends FieldVector> implements ColumnBinder {
protected V vector;
protected int jdbcType;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we declare the fields as final?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks for catching that.

}

public BigIntBinder(BigIntVector vector, int jdbcType) {
super(vector, jdbcType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a type other than Types.BIGINT allowed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, I wanted to allow things like binding an Int64 vector to an Int field, maybe that is too much flexibility though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the clarification.

return jdbcType == null ? new TinyIntBinder((TinyIntVector) vector) :
new TinyIntBinder((TinyIntVector) vector, jdbcType);
} else {
throw new UnsupportedOperationException(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can extract this statement for all type widths, at the beginning of this method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Comment on lines 39 to 40
JdbcParameterBinder(
final PreparedStatement statement,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentionally package private instead of private? Maybe add a comment on the relationship between the last two parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, changed it to private, and added some docstrings + an explicit Preconditions check for the last two parameters.

@lidavidm
Copy link
Member Author

I've let this sit for a while so having fixed an additional bug I found, I'll merge this now (though not for 9.0.0)

@lidavidm lidavidm changed the title ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters Jul 26, 2022
@lidavidm lidavidm merged commit a5a2837 into apache:master Jul 26, 2022
@lidavidm lidavidm deleted the arrow-17004 branch July 26, 2022 16:59
@github-actions
Copy link

@ursabot
Copy link

ursabot commented Jul 26, 2022

Benchmark runs are scheduled for baseline = bbf249e and contender = a5a2837. a5a2837 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.34% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.36% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Failed] a5a28377 ec2-t3-xlarge-us-east-2
[Failed] a5a28377 test-mac-arm
[Finished] a5a28377 ursa-i9-9960x
[Finished] a5a28377 ursa-thinkcentre-m75q
[Failed] bbf249e0 ec2-t3-xlarge-us-east-2
[Failed] bbf249e0 test-mac-arm
[Finished] bbf249e0 ursa-i9-9960x
[Finished] bbf249e0 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

Yicong-Huang added a commit to Texera/texera that referenced this pull request Dec 13, 2022
This PR bumps Apache Arrow version from 9.0.0 to 10.0.0.

Main changes related to PyAmber:

## Java/Scala side:

- JDBC Driver for Arrow Flight SQL
([13800](apache/arrow#13800))
- Initial implementation of immutable Table API
([14316](apache/arrow#14316))
- Substrait, transaction, cancellation for Flight SQL
([13492](apache/arrow#13492))
- Read Arrow IPC, CSV, and ORC files by NativeDatasetFactory
([13811](apache/arrow#13811),
[13973](apache/arrow#13973),
[14182](apache/arrow#14182))
- Add utility to bind Arrow data to JDBC parameters
([13589](apache/arrow#13589))

## Python side:

- The batch_readahead and fragment_readahead arguments for scanning
Datasets are exposed in Python
([ARROW-17299](https://issues.apache.org/jira/browse/ARROW-17299)).
- ExtensionArrays can now be created from a storage array through the
pa.array(..) constructor
([ARROW-17834](https://issues.apache.org/jira/browse/ARROW-17834)).
- Converting ListArrays containing ExtensionArray values to numpy or
pandas works by falling back to the storage array
([ARROW-17813](https://issues.apache.org/jira/browse/ARROW-17813)).
- Casting Tables to a new schema now honors the nullability flag in the
target schema
([ARROW-16651](https://issues.apache.org/jira/browse/ARROW-16651)).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants