-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters #13589
Conversation
TODOs:
|
007ecbe
to
7adaca4
Compare
Not all types are supported here but a core set is. If the approach looks reasonable I can extend coverage to at least Decimal and Binary types. |
Overall, this looks really nice to me. One minor nit (not from the current change set): The statement on line 87 "Currently, it is not possible to define a custom type conversion for a supported or unsupported type." has me scratching my head a bit Should it just say "it is not possible to define a custom type conversion"? If there's a quick re-phrasing that helps, it might be worth adding. |
Thanks for taking a look! Updated the docs, and implemented binders for binary types and decimals. |
@emkornfield @liyafan82 @pitrou any opinions on having this functionality (binding Arrow data -> JDBC prepared statement parameters) here? |
Hmm, I'm out of my depth here, but I guess it looks useful? The main downside being the one-row-at-a-time mechanics, I suppose. |
Interesting work! Thanks. @lidavidm |
Yes, or really, the only thing this module does is call |
Cool! I believe this is a super useful feature. I'd like to review it in the following days. |
docs/source/java/jdbc.rst
Outdated
Currently, it is not possible to define a custom type conversion for a | ||
supported or unsupported type. | ||
Currently, it is not possible to override the type conversion for a | ||
supported type, or define a new conversion for an unsupported type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I think we do support overriding the conversion now. Please see https://github.com/apache/arrow/blob/master/java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcToArrowConfig.java#L76
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean is that you can't implement a custom Consumer and have it be used, all you can do is change what type is assumed by the existing converters. But I'll clarify this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks.
* \(1) Strings longer than Integer.MAX_VALUE bytes (the maximum length | ||
of a Java ``byte[]``) will cause a runtime exception. | ||
* \(2) If the timestamp has a timezone, the JDBC type defaults to | ||
TIMESTAMP_WITH_TIMEZONE. If the timestamp has no timezone, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would happen when a timezone is absent, the program would thrown an exception?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'll just call setTimestamp(int, Timestamp)
instead of setTimestamp(int, Timestamp, Calendar)
, I'll update the doc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the clarification.
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/JdbcParameterBinder.java
Outdated
Show resolved
Hide resolved
java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/binder/ColumnBinder.java
Show resolved
Hide resolved
*/ | ||
public abstract class BaseColumnBinder<V extends FieldVector> implements ColumnBinder { | ||
protected V vector; | ||
protected int jdbcType; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we declare the fields as final
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks for catching that.
} | ||
|
||
public BigIntBinder(BigIntVector vector, int jdbcType) { | ||
super(vector, jdbcType); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is a type other than Types.BIGINT
allowed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In principle, I wanted to allow things like binding an Int64 vector to an Int field, maybe that is too much flexibility though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Thanks for the clarification.
return jdbcType == null ? new TinyIntBinder((TinyIntVector) vector) : | ||
new TinyIntBinder((TinyIntVector) vector, jdbcType); | ||
} else { | ||
throw new UnsupportedOperationException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can extract this statement for all type widths, at the beginning of this method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
JdbcParameterBinder( | ||
final PreparedStatement statement, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this intentionally package private instead of private? Maybe add a comment on the relationship between the last two parameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, changed it to private, and added some docstrings + an explicit Preconditions check for the last two parameters.
I've let this sit for a while so having fixed an additional bug I found, I'll merge this now (though not for 9.0.0) |
Benchmark runs are scheduled for baseline = bbf249e and contender = a5a2837. a5a2837 is a master commit associated with this PR. Results will be available as each benchmark for each run completes. |
This PR bumps Apache Arrow version from 9.0.0 to 10.0.0. Main changes related to PyAmber: ## Java/Scala side: - JDBC Driver for Arrow Flight SQL ([13800](apache/arrow#13800)) - Initial implementation of immutable Table API ([14316](apache/arrow#14316)) - Substrait, transaction, cancellation for Flight SQL ([13492](apache/arrow#13492)) - Read Arrow IPC, CSV, and ORC files by NativeDatasetFactory ([13811](apache/arrow#13811), [13973](apache/arrow#13973), [14182](apache/arrow#14182)) - Add utility to bind Arrow data to JDBC parameters ([13589](apache/arrow#13589)) ## Python side: - The batch_readahead and fragment_readahead arguments for scanning Datasets are exposed in Python ([ARROW-17299](https://issues.apache.org/jira/browse/ARROW-17299)). - ExtensionArrays can now be created from a storage array through the pa.array(..) constructor ([ARROW-17834](https://issues.apache.org/jira/browse/ARROW-17834)). - Converting ListArrays containing ExtensionArray values to numpy or pandas works by falling back to the storage array ([ARROW-17813](https://issues.apache.org/jira/browse/ARROW-17813)). - Casting Tables to a new schema now honors the nullability flag in the target schema ([ARROW-16651](https://issues.apache.org/jira/browse/ARROW-16651)).
This extends the arrow-jdbc adapter to also allow taking Arrow data and using it to bind JDBC PreparedStatement parameters, allowing you to "round trip" data to a certain extent. This was factored out of arrow-adbc since it's not strictly tied to ADBC.