Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-17004: [Java] Add utility to bind Arrow data to JDBC parameters #13589

Merged
merged 8 commits into from
Jul 26, 2022
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
112 changes: 108 additions & 4 deletions docs/source/java/jdbc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,8 +84,10 @@ inconsistent scale. A RoundingMode can be set to handle these cases:
}
}

Currently, it is not possible to define a custom type conversion for a
supported or unsupported type.
The mapping from JDBC type to Arrow type can be overridden via the
``JdbcToArrowConfig``, but it is not possible to customize the
conversion from JDBC value to Arrow value itself, nor is it possible
to define a conversion for an unsupported type.

Type Mapping
------------
Expand Down Expand Up @@ -120,7 +122,7 @@ The JDBC to Arrow type mapping can be obtained at runtime from
+--------------------+--------------------+-------+
| DOUBLE | Double | |
+--------------------+--------------------+-------+
| FLOAT | Float | |
| FLOAT | Float32 | |
+--------------------+--------------------+-------+
| INTEGER | Int32 | |
+--------------------+--------------------+-------+
Expand All @@ -138,7 +140,7 @@ The JDBC to Arrow type mapping can be obtained at runtime from
+--------------------+--------------------+-------+
| NVARCHAR | Utf8 | |
+--------------------+--------------------+-------+
| REAL | Float | |
| REAL | Float32 | |
+--------------------+--------------------+-------+
| SMALLINT | Int16 | |
+--------------------+--------------------+-------+
Expand Down Expand Up @@ -172,3 +174,105 @@ The JDBC to Arrow type mapping can be obtained at runtime from
.. _setArraySubTypeByColumnIndexMap: https://arrow.apache.org/docs/java/reference/org/apache/arrow/adapter/jdbc/JdbcToArrowConfigBuilder.html#setArraySubTypeByColumnIndexMap-java.util.Map-
.. _setArraySubTypeByColumnNameMap: https://arrow.apache.org/docs/java/reference/org/apache/arrow/adapter/jdbc/JdbcToArrowConfigBuilder.html#setArraySubTypeByColumnNameMap-java.util.Map-
.. _ARROW-17006: https://issues.apache.org/jira/browse/ARROW-17006

VectorSchemaRoot to PreparedStatement Parameter Conversion
==========================================================

The adapter can bind rows of Arrow data from a VectorSchemaRoot to
parameters of a JDBC PreparedStatement. This can be accessed via the
JdbcParameterBinder class. Each call to next() will bind parameters
from the next row of data, and then the application can execute the
statement, call addBatch(), etc. as desired. Null values will lead to
a setNull call with an appropriate JDBC type code (listed below).

.. code-block:: java

final JdbcParameterBinder binder =
JdbcParameterBinder.builder(statement, root).bindAll().build();
while (binder.next()) {
statement.executeUpdate();
}
// Use a VectorLoader to update the root
binder.reset();
while (binder.next()) {
statement.executeUpdate();
}

The mapping of vectors to parameters, the JDBC type code used by the
converters, and the type conversions themselves can all be customized:

.. code-block:: java

final JdbcParameterBinder binder =
JdbcParameterBinder.builder(statement, root)
.bind(/*parameterIndex*/2, /*columnIndex*/0)
.bind(/*parameterIndex*/1, customColumnBinderInstance)
.build();

Type Mapping
------------

The Arrow to JDBC type mapping can be obtained at runtime via
a method on ColumnBinder.

+----------------------------+----------------------------+-------+
| Arrow Type | JDBC Type | Notes |
+============================+============================+=======+
| Binary | VARBINARY (setBytes) | |
+----------------------------+----------------------------+-------+
| Bool | BOOLEAN (setBoolean) | |
+----------------------------+----------------------------+-------+
| Date32 | DATE (setDate) | |
+----------------------------+----------------------------+-------+
| Date64 | DATE (setDate) | |
+----------------------------+----------------------------+-------+
| Decimal128 | DECIMAL (setBigDecimal) | |
+----------------------------+----------------------------+-------+
| Decimal256 | DECIMAL (setBigDecimal) | |
+----------------------------+----------------------------+-------+
| FixedSizeBinary | BINARY (setBytes) | |
+----------------------------+----------------------------+-------+
| Float32 | REAL (setFloat) | |
+----------------------------+----------------------------+-------+
| Int8 | TINYINT (setByte) | |
+----------------------------+----------------------------+-------+
| Int16 | SMALLINT (setShort) | |
+----------------------------+----------------------------+-------+
| Int32 | INTEGER (setInt) | |
+----------------------------+----------------------------+-------+
| Int64 | BIGINT (setLong) | |
+----------------------------+----------------------------+-------+
| LargeBinary | LONGVARBINARY (setBytes) | |
+----------------------------+----------------------------+-------+
| LargeUtf8 | LONGVARCHAR (setString) | \(1) |
+----------------------------+----------------------------+-------+
| Time[s] | TIME (setTime) | |
+----------------------------+----------------------------+-------+
| Time[ms] | TIME (setTime) | |
+----------------------------+----------------------------+-------+
| Time[us] | TIME (setTime) | |
+----------------------------+----------------------------+-------+
| Time[ns] | TIME (setTime) | |
+----------------------------+----------------------------+-------+
| Timestamp[s] | TIMESTAMP (setTimestamp) | \(2) |
+----------------------------+----------------------------+-------+
| Timestamp[ms] | TIMESTAMP (setTimestamp) | \(2) |
+----------------------------+----------------------------+-------+
| Timestamp[us] | TIMESTAMP (setTimestamp) | \(2) |
+----------------------------+----------------------------+-------+
| Timestamp[ns] | TIMESTAMP (setTimestamp) | \(2) |
+----------------------------+----------------------------+-------+
| Utf8 | VARCHAR (setString) | |
+----------------------------+----------------------------+-------+

* \(1) Strings longer than Integer.MAX_VALUE bytes (the maximum length
of a Java ``byte[]``) will cause a runtime exception.
* \(2) If the timestamp has a timezone, the JDBC type defaults to
TIMESTAMP_WITH_TIMEZONE. If the timestamp has no timezone,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what would happen when a timezone is absent, the program would thrown an exception?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It'll just call setTimestamp(int, Timestamp) instead of setTimestamp(int, Timestamp, Calendar), I'll update the doc

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification.

technically there is not a correct conversion from Arrow value to
JDBC value, because a JDBC Timestamp is in UTC, and we have no
timezone information. In this case, the default binder will call
`setTimestamp(int, Timestamp)
<https://docs.oracle.com/en/java/javase/11/docs/api/java.sql/java/sql/PreparedStatement.html#setTimestamp(int,java.sql.Timestamp)>`_,
which will lead to the driver using the "default timezone" (that of
the Java VM).
6 changes: 6 additions & 0 deletions java/adapter/jdbc/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,12 @@
<scope>test</scope>
</dependency>

<dependency>
<groupId>org.assertj</groupId>
<artifactId>assertj-core</artifactId>
<scope>test</scope>
</dependency>

<dependency>
<groupId>io.netty</groupId>
<artifactId>netty-common</artifactId>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.arrow.adapter.jdbc;

import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.util.HashMap;
import java.util.Map;

import org.apache.arrow.adapter.jdbc.binder.ColumnBinder;
import org.apache.arrow.util.Preconditions;
import org.apache.arrow.vector.VectorSchemaRoot;

/**
* A binder binds JDBC prepared statement parameters to rows of Arrow data from a VectorSchemaRoot.
*/
public class JdbcParameterBinder {
private final PreparedStatement statement;
private final VectorSchemaRoot root;
private final ColumnBinder[] binders;
private final int[] parameterIndices;
private int nextRowIndex;

JdbcParameterBinder(
final PreparedStatement statement,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intentionally package private instead of private? Maybe add a comment on the relationship between the last two parameters?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, changed it to private, and added some docstrings + an explicit Preconditions check for the last two parameters.

final VectorSchemaRoot root,
final ColumnBinder[] binders,
int[] parameterIndices) {
this.statement = statement;
this.root = root;
this.binders = binders;
this.parameterIndices = parameterIndices;
this.nextRowIndex = 0;
}

/**
* Initialize a binder with a builder.
*
* @param statement The statement to bind to. The binder does not maintain ownership of the statement.
* @param root The {@link VectorSchemaRoot} to pull data from. The binder does not maintain ownership
* of the vector schema root.
*/
public static Builder builder(final PreparedStatement statement, final VectorSchemaRoot root) {
return new Builder(statement, root);
}

/** Reset the binder (so the root can be updated with new data). */
public void reset() {
nextRowIndex = 0;
}

/**
* Bind the next row to the statement.
*
* @return true if a row was bound, false if rows were exhausted
*/
public boolean next() throws SQLException {
if (nextRowIndex >= root.getRowCount()) {
return false;
}
for (int i = 0; i < parameterIndices.length; i++) {
final int parameterIndex = parameterIndices[i];
binders[i].bind(statement, parameterIndex, nextRowIndex);
}
nextRowIndex++;
return true;
}

/**
* A builder for a {@link JdbcParameterBinder}.
*/
public static class Builder {
private final PreparedStatement statement;
private final VectorSchemaRoot root;
private final Map<Integer, ColumnBinder> bindings;

Builder(PreparedStatement statement, VectorSchemaRoot root) {
this.statement = statement;
this.root = root;
this.bindings = new HashMap<>();
}

/** Bind each column to the corresponding parameter in order. */
public Builder bindAll() {
for (int i = 0; i < root.getFieldVectors().size(); i++) {
bind(/*parameterIndex=*/ i + 1, /*columnIndex=*/ i);
}
return this;
}

/** Bind the given parameter to the given column using the default binder. */
public Builder bind(int parameterIndex, int columnIndex) {
return bind(
parameterIndex,
ColumnBinder.forVector(root.getVector(columnIndex)));
}

/** Bind the given parameter using the given binder. */
public Builder bind(int parameterIndex, ColumnBinder binder) {
Preconditions.checkArgument(
parameterIndex > 0, "parameterIndex %d must be positive", parameterIndex);
bindings.put(parameterIndex, binder);
return this;
}

/** Build the binder. */
public JdbcParameterBinder build() {
ColumnBinder[] binders = new ColumnBinder[bindings.size()];
int[] parameterIndices = new int[bindings.size()];
int index = 0;
for (Map.Entry<Integer, ColumnBinder> entry : bindings.entrySet()) {
binders[index] = entry.getValue();
parameterIndices[index] = entry.getKey();
index++;
}
return new JdbcParameterBinder(statement, root, binders, parameterIndices);
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.arrow.adapter.jdbc.binder;

import org.apache.arrow.vector.FieldVector;

/**
* Base class for ColumnBinder implementations.
* @param <V> The concrete FieldVector subtype.
*/
public abstract class BaseColumnBinder<V extends FieldVector> implements ColumnBinder {
protected final V vector;
protected final int jdbcType;

public BaseColumnBinder(V vector, int jdbcType) {
this.vector = vector;
this.jdbcType = jdbcType;
}

@Override
public int getJdbcType() {
return jdbcType;
}

@Override
public V getVector() {
return vector;
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
/*
* Licensed to the Apache Software Foundation (ASF) under one or more
* contributor license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright ownership.
* The ASF licenses this file to You under the Apache License, Version 2.0
* (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/

package org.apache.arrow.adapter.jdbc.binder;

import java.sql.PreparedStatement;
import java.sql.SQLException;
import java.sql.Types;

import org.apache.arrow.vector.BigIntVector;

/** A column binder for 8-bit integers. */
public class BigIntBinder extends BaseColumnBinder<BigIntVector> {
public BigIntBinder(BigIntVector vector) {
this(vector, Types.BIGINT);
}

public BigIntBinder(BigIntVector vector, int jdbcType) {
super(vector, jdbcType);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is a type other than Types.BIGINT allowed here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, I wanted to allow things like binding an Int64 vector to an Int field, maybe that is too much flexibility though.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks for the clarification.

}

@Override
public void bind(PreparedStatement statement, int parameterIndex, int rowIndex) throws SQLException {
final long value = vector.getDataBuffer().getLong((long) rowIndex * BigIntVector.TYPE_WIDTH);
statement.setLong(parameterIndex, value);
}
}
Loading