Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-7744: [Java] Implement Flight JDBC Driver [WIP] #6343

Closed
wants to merge 11 commits into from

Conversation

andygrove
Copy link
Member

@andygrove andygrove commented Feb 3, 2020

This is a starting point for the Flight JDBC driver. The following test passes, executing a query against the Rust flight-server example and collecting the results.

  @Test
  public void executeQuery() throws SQLException {
    try (Connection conn = driver.connect("jdbc:arrow://localhost:50051", new Properties())) {
      try (Statement stmt = conn.createStatement()) {
        try (ResultSet rs = stmt.executeQuery("SELECT id FROM alltypes_plain")) {
          List<Integer> ids = new ArrayList<>();
          while (rs.next()) {
            ids.add(rs.getInt(1));
          }
          assertEquals(ImmutableList.of(4, 5, 6, 7, 2, 3, 0, 1), ids);
        }
      }
    }
  }

@andygrove andygrove self-assigned this Feb 3, 2020
@andygrove andygrove changed the title ARROW-7744: [Java] Implement Flight JDBC Driver ARROW-7744: [Java] Implement Flight JDBC Driver [WIP] Feb 3, 2020
@github-actions
Copy link

github-actions bot commented Feb 3, 2020

@andygrove andygrove marked this pull request as ready for review February 3, 2020 22:43
Copy link
Member

@lidavidm lidavidm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some minor notes on the Flight stuff. Overall this is pretty interesting...

private static final org.slf4j.Logger logger = LoggerFactory.getLogger(Driver.class);

/** JDBC connection string prefix. */
private static final String PREFIX = "jdbc:arrow://";
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flight has a set of URI schemes (grpc+tcp, grpc+tls, etc), maybe those should be reflected?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the Flight URI (minus the jdbc prefix) could get passed to the FlightConnection.

@Override
public Connection connect(String url, Properties properties) throws SQLException {
logger.info("connect() url={}", url);
//TODO this needs much more work to parse full URLs but this is enough to get end to end tests running
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

java.net.URI should be able to do all the parsing.

@Override
public ResultSet executeQuery(String query) throws SQLException {

FlightClient client = FlightClient.builder()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally this client should be able to live in the driver instance and get shared between statements.


CallOption callOptions = CallOptions.timeout(5, TimeUnit.SECONDS);

Ticket ticket = new Ticket(query.getBytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think there's been a "right way" to use Flight but I think so far things call GetFlightInfo to get a ticket, then DoGet (isn't that how Dremio uses Flight for their database-like product?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I've been reading the Flight documentation again and I can see that this could make sense. However, this is all very specific to how the Flight server might implement support for arbitrary queries so I'm already running into questions about what makes sense to contribute to the Arrow repo versus just creating a JDBC driver directly for a particular Flight implementation.

.location(Location.forGrpcInsecure(flightConnection.host, flightConnection.port))
.build();

CallOption callOptions = CallOptions.timeout(5, TimeUnit.SECONDS);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this timeout is for the entire RPC, i.e. a query that takes more than 5 seconds (to stream all the data or something) would get cancelled. Unfortunately gRPC doesn't make it easy to have per-message timeouts.

@julianhyde
Copy link

It's worth considering Calcite's Avatica as the basis for the JDBC driver.

Avatica's remote driver has a pluggable RPC layer - existing implementations are JSON over HTTP and Protobuf - and I think Flight would slot in as a third.

Avatica also has a local driver that shares a lot of code with the remote driver. That would allow people to switch between a Flight-based remote driver and a local driver, and see the same behavior.

Not directly relevant to this effort, but I would love to use Arrow (even without Flight) as the bulk data format for other Avatica-based drivers. There are certainly Calcite clients who would like to receive a batch of rows as an Arrow blob rather than making thousands of calls to ResultSet methods just to pull the data through the membrane. Any JDBC client could call resultSet.unwrap(ArrowBatch.class) just on the off-chance that their driver supports direct access to the Arrow data.

@andygrove
Copy link
Member Author

@julianhyde Thanks for the info. This does sound interesting and it would be great to see Avatica benefit from Arrow.

public boolean next() throws SQLException {
if (batchIndex + 1 == root.getRowCount()) {
if (getNextBatch()) {
batchIndex++;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we should reset batchIndex to 0?


@Override
public int getInt(String columnName) throws SQLException {
return ResultSetHelper.getInt(getObject(columnName));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are boxing/unboxing overheads here, which should be eliminated in the future.


@Override
public BigDecimal getBigDecimal(String s) throws SQLException {
throw new SQLFeatureNotSupportedException();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this can also be implemented in a way similar to getBigDecimal(int i)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there are a lot of methods to be implemented still. I just wanted to share an early draft to start a discussion.

@andygrove
Copy link
Member Author

@julianhyde Your proposal raises some interesting questions about dependencies between Avatica and Arrow. Should Arrow depend on Avatica or the other way around?

@andygrove
Copy link
Member Author

I'm closing this PR for now since it is stale.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants