-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-7744: [Java] Implement Flight JDBC Driver [WIP] #6343
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some minor notes on the Flight stuff. Overall this is pretty interesting...
private static final org.slf4j.Logger logger = LoggerFactory.getLogger(Driver.class); | ||
|
||
/** JDBC connection string prefix. */ | ||
private static final String PREFIX = "jdbc:arrow://"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flight has a set of URI schemes (grpc+tcp
, grpc+tls
, etc), maybe those should be reflected?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then the Flight URI (minus the jdbc prefix) could get passed to the FlightConnection
.
@Override | ||
public Connection connect(String url, Properties properties) throws SQLException { | ||
logger.info("connect() url={}", url); | ||
//TODO this needs much more work to parse full URLs but this is enough to get end to end tests running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
java.net.URI
should be able to do all the parsing.
@Override | ||
public ResultSet executeQuery(String query) throws SQLException { | ||
|
||
FlightClient client = FlightClient.builder() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally this client should be able to live in the driver instance and get shared between statements.
|
||
CallOption callOptions = CallOptions.timeout(5, TimeUnit.SECONDS); | ||
|
||
Ticket ticket = new Ticket(query.getBytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's been a "right way" to use Flight but I think so far things call GetFlightInfo
to get a ticket, then DoGet
(isn't that how Dremio uses Flight for their database-like product?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I've been reading the Flight documentation again and I can see that this could make sense. However, this is all very specific to how the Flight server might implement support for arbitrary queries so I'm already running into questions about what makes sense to contribute to the Arrow repo versus just creating a JDBC driver directly for a particular Flight implementation.
.location(Location.forGrpcInsecure(flightConnection.host, flightConnection.port)) | ||
.build(); | ||
|
||
CallOption callOptions = CallOptions.timeout(5, TimeUnit.SECONDS); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this timeout is for the entire RPC, i.e. a query that takes more than 5 seconds (to stream all the data or something) would get cancelled. Unfortunately gRPC doesn't make it easy to have per-message timeouts.
It's worth considering Calcite's Avatica as the basis for the JDBC driver. Avatica's remote driver has a pluggable RPC layer - existing implementations are JSON over HTTP and Protobuf - and I think Flight would slot in as a third. Avatica also has a local driver that shares a lot of code with the remote driver. That would allow people to switch between a Flight-based remote driver and a local driver, and see the same behavior. Not directly relevant to this effort, but I would love to use Arrow (even without Flight) as the bulk data format for other Avatica-based drivers. There are certainly Calcite clients who would like to receive a batch of rows as an Arrow blob rather than making thousands of calls to |
@julianhyde Thanks for the info. This does sound interesting and it would be great to see Avatica benefit from Arrow. |
public boolean next() throws SQLException { | ||
if (batchIndex + 1 == root.getRowCount()) { | ||
if (getNextBatch()) { | ||
batchIndex++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we should reset batchIndex to 0?
|
||
@Override | ||
public int getInt(String columnName) throws SQLException { | ||
return ResultSetHelper.getInt(getObject(columnName)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are boxing/unboxing overheads here, which should be eliminated in the future.
|
||
@Override | ||
public BigDecimal getBigDecimal(String s) throws SQLException { | ||
throw new SQLFeatureNotSupportedException(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems this can also be implemented in a way similar to getBigDecimal(int i)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes there are a lot of methods to be implemented still. I just wanted to share an early draft to start a discussion.
@julianhyde Your proposal raises some interesting questions about dependencies between Avatica and Arrow. Should Arrow depend on Avatica or the other way around? |
I'm closing this PR for now since it is stale. |
This is a starting point for the Flight JDBC driver. The following test passes, executing a query against the Rust
flight-server
example and collecting the results.