ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters #13589

lidavidm · 2022-07-12T20:43:52Z

This extends the arrow-jdbc adapter to also allow taking Arrow data and using it to bind JDBC PreparedStatement parameters, allowing you to "round trip" data to a certain extent. This was factored out of arrow-adbc since it's not strictly tied to ADBC.

lidavidm · 2022-07-12T20:44:51Z

TODOs:

Add documentation
Are we handling time/timestamp types properly when time zones come into play?
Add date support as well

lidavidm · 2022-07-13T17:50:18Z

CC @toddfarmer @lwhite1

Not all types are supported here but a core set is. If the approach looks reasonable I can extend coverage to at least Decimal and Binary types.

lwhite1 · 2022-07-14T16:56:42Z

Overall, this looks really nice to me.

One minor nit (not from the current change set): The statement on line 87 "Currently, it is not possible to define a custom type conversion for a supported or unsupported type." has me scratching my head a bit Should it just say "it is not possible to define a custom type conversion"? If there's a quick re-phrasing that helps, it might be worth adding.

lidavidm · 2022-07-14T19:40:10Z

Thanks for taking a look!

Updated the docs, and implemented binders for binary types and decimals.

lidavidm · 2022-07-19T18:42:06Z

@emkornfield @liyafan82 @pitrou any opinions on having this functionality (binding Arrow data -> JDBC prepared statement parameters) here?

pitrou · 2022-07-19T20:46:56Z

Hmm, I'm out of my depth here, but I guess it looks useful? The main downside being the one-row-at-a-time mechanics, I suppose.

liyafan82 · 2022-07-20T12:47:00Z

Interesting work! Thanks. @lidavidm
I find an example for executeUpdate, does it also support executeQuery?

lidavidm · 2022-07-20T13:13:51Z

Interesting work! Thanks. @lidavidm I find an example for executeUpdate, does it also support executeQuery?

Yes, or really, the only thing this module does is call setString, setInteger, etc. for you. It's up to the application to then call executeQuery, addBatch, etc. for maximum flexibility. For instance in ADBC it's used with addBatch/executeBatch:

https://github.com/apache/arrow-adbc/blob/2485d7c3da217a7190f86128d769a7d0445755ab/java/driver/jdbc/src/main/java/org/apache/arrow/adbc/driver/jdbc/JdbcStatement.java#L160-L166

liyafan82 · 2022-07-21T11:37:20Z

Interesting work! Thanks. @lidavidm I find an example for executeUpdate, does it also support executeQuery?

Yes, or really, the only thing this module does is call setString, setInteger, etc. for you. It's up to the application to then call executeQuery, addBatch, etc. for maximum flexibility. For instance in ADBC it's used with addBatch/executeBatch:

https://github.com/apache/arrow-adbc/blob/2485d7c3da217a7190f86128d769a7d0445755ab/java/driver/jdbc/src/main/java/org/apache/arrow/adbc/driver/jdbc/JdbcStatement.java#L160-L166

Cool! I believe this is a super useful feature. I'd like to review it in the following days.

liyafan82 · 2022-07-21T11:43:31Z

docs/source/java/jdbc.rst

-Currently, it is not possible to define a custom type conversion for a
-supported or unsupported type.
+Currently, it is not possible to override the type conversion for a
+supported type, or define a new conversion for an unsupported type.


Sorry, I think we do support overriding the conversion now. Please see https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowConfig.java#L76

What I mean is that you can't implement a custom Consumer and have it be used, all you can do is change what type is assumed by the existing converters. But I'll clarify this

liyafan82 · 2022-07-21T11:48:22Z

docs/source/java/jdbc.rst

+* \(1) Strings longer than Integer.MAX_VALUE bytes (the maximum length
+  of a Java ``byte[]``) will cause a runtime exception.
+* \(2) If the timestamp has a timezone, the JDBC type defaults to
+  TIMESTAMP_WITH_TIMEZONE.  If the timestamp has no timezone,


what would happen when a timezone is absent, the program would thrown an exception?

It'll just call setTimestamp(int, Timestamp) instead of setTimestamp(int, Timestamp, Calendar), I'll update the doc

Thanks for the clarification.

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcParameterBinder.java

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/binder/ColumnBinder.java

liyafan82 · 2022-07-22T09:56:41Z

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/binder/BaseColumnBinder.java

+ */
+public abstract class BaseColumnBinder<V extends FieldVector> implements ColumnBinder {
+  protected V vector;
+  protected int jdbcType;


can we declare the fields as final?

Done, thanks for catching that.

liyafan82 · 2022-07-22T09:57:40Z

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/binder/BigIntBinder.java

+  }
+
+  public BigIntBinder(BigIntVector vector, int jdbcType) {
+    super(vector, jdbcType);


Is a type other than Types.BIGINT allowed here?

In principle, I wanted to allow things like binding an Int64 vector to an Int field, maybe that is too much flexibility though.

I see. Thanks for the clarification.

liyafan82 · 2022-07-22T10:02:48Z

...er/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/binder/ColumnBinderArrowTypeVisitor.java

+          return jdbcType == null ? new TinyIntBinder((TinyIntVector) vector) :
+              new TinyIntBinder((TinyIntVector) vector, jdbcType);
+        } else {
+          throw new UnsupportedOperationException(


Maybe we can extract this statement for all type widths, at the beginning of this method?

emkornfield · 2022-07-23T16:45:38Z

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcParameterBinder.java

+  JdbcParameterBinder(
+      final PreparedStatement statement,


is this intentionally package private instead of private? Maybe add a comment on the relationship between the last two parameters?

No, changed it to private, and added some docstrings + an explicit Preconditions check for the last two parameters.

lidavidm · 2022-07-26T16:57:21Z

I've let this sit for a while so having fixed an additional bug I found, I'll merge this now (though not for 9.0.0)

github-actions · 2022-07-26T17:18:32Z

https://issues.apache.org/jira/browse/ARROW-17004

ursabot · 2022-07-26T19:11:44Z

Benchmark runs are scheduled for baseline = bbf249e and contender = a5a2837. a5a2837 is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Failed ⬇️0.0% ⬆️0.0%] ec2-t3-xlarge-us-east-2
[Failed ⬇️0.34% ⬆️0.0%] test-mac-arm
[Finished ⬇️0.0% ⬆️0.0%] ursa-i9-9960x
[Finished ⬇️0.36% ⬆️0.0%] ursa-thinkcentre-m75q
Buildkite builds:
[Failed] a5a28377 ec2-t3-xlarge-us-east-2
[Failed] a5a28377 test-mac-arm
[Finished] a5a28377 ursa-i9-9960x
[Finished] a5a28377 ursa-thinkcentre-m75q
[Failed] bbf249e0 ec2-t3-xlarge-us-east-2
[Failed] bbf249e0 test-mac-arm
[Finished] bbf249e0 ursa-i9-9960x
[Finished] bbf249e0 ursa-thinkcentre-m75q
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

This PR bumps Apache Arrow version from 9.0.0 to 10.0.0. Main changes related to PyAmber: ## Java/Scala side: - JDBC Driver for Arrow Flight SQL ([13800](apache/arrow#13800)) - Initial implementation of immutable Table API ([14316](apache/arrow#14316)) - Substrait, transaction, cancellation for Flight SQL ([13492](apache/arrow#13492)) - Read Arrow IPC, CSV, and ORC files by NativeDatasetFactory ([13811](apache/arrow#13811), [13973](apache/arrow#13973), [14182](apache/arrow#14182)) - Add utility to bind Arrow data to JDBC parameters ([13589](apache/arrow#13589)) ## Python side: - The batch_readahead and fragment_readahead arguments for scanning Datasets are exposed in Python ([ARROW-17299](https://issues.apache.org/jira/browse/ARROW-17299)). - ExtensionArrays can now be created from a storage array through the pa.array(..) constructor ([ARROW-17834](https://issues.apache.org/jira/browse/ARROW-17834)). - Converting ListArrays containing ExtensionArray values to numpy or pandas works by falling back to the storage array ([ARROW-17813](https://issues.apache.org/jira/browse/ARROW-17813)). - Casting Tables to a new schema now honors the nullability flag in the target schema ([ARROW-16651](https://issues.apache.org/jira/browse/ARROW-16651)).

github-actions bot added Component: Documentation Component: Java labels Jul 12, 2022

lidavidm force-pushed the arrow-17004 branch 3 times, most recently from 007ecbe to 7adaca4 Compare July 13, 2022 14:27

ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters

6322a4b

lidavidm force-pushed the arrow-17004 branch from 7adaca4 to 6322a4b Compare July 13, 2022 14:33

lidavidm marked this pull request as ready for review July 13, 2022 16:47

lidavidm added 2 commits July 14, 2022 15:05

Clarify docs

92415db

Implement binary, decimal support

b13bc00

Update docs

d28297e

liyafan82 reviewed Jul 21, 2022

View reviewed changes

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcParameterBinder.java Outdated Show resolved Hide resolved

liyafan82 reviewed Jul 21, 2022

View reviewed changes

java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/binder/ColumnBinder.java Show resolved Hide resolved

Address review feedback

7e20ab8

liyafan82 reviewed Jul 22, 2022

View reviewed changes

Address review feedback

80551c6

emkornfield reviewed Jul 23, 2022

View reviewed changes

lidavidm added 2 commits July 23, 2022 12:58

Add some docstrings

0e80e0e

Fix bug with zero-length values in varbinary vectors

52ae3bb

lidavidm mentioned this pull request Jul 26, 2022

ARROW-17199: [Java][FlightRPC] Clean up Flight SQL example server #13710

Merged

lidavidm changed the title ~~ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters~~ ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters Jul 26, 2022

lidavidm merged commit a5a2837 into apache:master Jul 26, 2022

lidavidm deleted the arrow-17004 branch July 26, 2022 16:59

Yicong-Huang mentioned this pull request Dec 8, 2022

Bump Apache Arrow to 10.0.0 Texera/texera#1764

Merged

asfimport mentioned this pull request Jul 26, 2022

[Java] Implement Arrow->JDBC prepared statement parameters for arrow-jdbc #32317

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters #13589

ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters #13589

lidavidm commented Jul 12, 2022

lidavidm commented Jul 12, 2022 •

edited

Loading

lidavidm commented Jul 13, 2022

lwhite1 commented Jul 14, 2022

lidavidm commented Jul 14, 2022

lidavidm commented Jul 19, 2022

pitrou commented Jul 19, 2022

liyafan82 commented Jul 20, 2022

lidavidm commented Jul 20, 2022

liyafan82 commented Jul 21, 2022

liyafan82 Jul 21, 2022

lidavidm Jul 21, 2022

liyafan82 Jul 22, 2022

liyafan82 Jul 21, 2022

lidavidm Jul 21, 2022

liyafan82 Jul 22, 2022

liyafan82 Jul 22, 2022

lidavidm Jul 22, 2022

liyafan82 Jul 22, 2022

lidavidm Jul 22, 2022

liyafan82 Jul 26, 2022

liyafan82 Jul 22, 2022

lidavidm Jul 22, 2022

emkornfield Jul 23, 2022

lidavidm Jul 23, 2022

lidavidm commented Jul 26, 2022

github-actions bot commented Jul 26, 2022

ursabot commented Jul 26, 2022

ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters #13589

ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters #13589

Conversation

lidavidm commented Jul 12, 2022

lidavidm commented Jul 12, 2022 • edited Loading

lidavidm commented Jul 13, 2022

lwhite1 commented Jul 14, 2022

lidavidm commented Jul 14, 2022

lidavidm commented Jul 19, 2022

pitrou commented Jul 19, 2022

liyafan82 commented Jul 20, 2022

lidavidm commented Jul 20, 2022

liyafan82 commented Jul 21, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lidavidm commented Jul 26, 2022

github-actions bot commented Jul 26, 2022

ursabot commented Jul 26, 2022

lidavidm commented Jul 12, 2022 •

edited

Loading