Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FlightSQL] Support DoExchange (in addition to DoPut) to bind parameters and execute prepared statements #37741

Open
alamb opened this issue Sep 15, 2023 · 8 comments

Comments

@alamb
Copy link
Contributor

alamb commented Sep 15, 2023

Describe the enhancement requested

As suggested by @kou on #37720 (comment)

Usecase

  1. Reduce the number of messages round trips required to run a prepared statement via FlightSQL
  2. Avoid the need to (potentially) serialize bind parameters in a stateless architecture (see [FlightSQL] Stateless prepared statement with parameter support #37720)

Background

Currently FlightSQL requires three messages to run a prepared statement. DoPut, then a GetFlightInfo and then a DoGet:

...
DoPut `CommandPreparedStatementQuery` 
GetFlightInfo `CommandPreparedStatementQuery`
DoGet 
...

CommandPreparedStatementQuery mmd

Proposal

By supporting DoExchange instead of DoPut + GetFlightInfo + DoGet only a single round trip is needed:

...
DoExchange `CommandPreparedStatementQuery` 
...

Benefits:

  • With this approach, parameters aren't needed to send-back to client-side

Drawbacks:

  • the query execution result can be returned by only one server.

Component(s)

FlightRPC

@lidavidm
Copy link
Member

SGTM

We should add flags in (the awkwardly named) SqlInfo to indicate support for these as we did with other new features

@zeroshade
Copy link
Member

+1 and I agree with @lidavidm that we should add flags to SqlInfo

@suremarc
Copy link

suremarc commented Feb 2, 2024

I have started looking into implementing this on the Go client side, and I'm running into some difficulties. Namely, the existing interface for PreparedStatement.Execute returns a *FlightInfo, but DoExchange skips GetFlightInfo altogether. The Rust client also presents a similar interface, and I would guess that most clients in other languages do the same.

Of course, one can always drop down to raw Flight/FlightSQL calls without using the abstraction of prepared statement "handles", but this just moves complexity to consumers of the library and makes me worry if the DoExchange protocol for prepared statements will actually be adopted in the ecosystem.

So we have a few options:

  1. Change PreparedStatement.Execute to fetch the flight streams instead of just returning the FlightInfo
    • Pros: simpler for users, and hides details like using DoPut/DoExchange
    • Cons: disruptive breaking change, less control over consumption of the flight streams.
  2. Add a new PreparedStatement.DoExchange method, separate from the existing Execute implementation that uses DoPut
    • Pros: less disruptive
    • Cons: dispatching to the correct implementation (aka whatever the server supports) is still forced on users of this library

@alamb do you have any thoughts on this?

@alamb
Copy link
Contributor Author

alamb commented Feb 4, 2024

@suremarc can you remind me why we would need to pass a FlightInfo? I think we could send the required information as part of the payload of the stream, right?

For example

DoExchange

rpc DoExchange(stream FlightData) returns (stream FlightData) {}

Sends a stream of FlightData

arrow/format/Flight.proto

Lines 495 to 520 in aded7bf

message FlightData {
/*
* The descriptor of the data. This is only relevant when a client is
* starting a new DoPut stream.
*/
FlightDescriptor flight_descriptor = 1;
/*
* Header for message data as described in Message.fbs::Message.
*/
bytes data_header = 2;
/*
* Application-defined metadata.
*/
bytes app_metadata = 3;
/*
* The actual batch of Arrow data. Preferably handled with minimal-copies
* coming last in the definition to help with sidecar patterns (it is
* expected that some implementations will fetch this field off the wire
* with specialized code to avoid extra memory copies).
*/
bytes data_body = 1000;
}

And the FlightData:: flight_descriptor field has the embedded cmd message (in which FlightSQL messages are embedded)

bytes cmd = 2;

FYI @kallisti-dev who is also working on stateless prepared statement execution

@suremarc
Copy link

suremarc commented Feb 4, 2024

@alamb I think we are on the same page that DoExchange does not require passing a FlightInfo — the problem is that the existing prepared statement interfaces do require returning a FlightInfo back to the client. Introducing the DoExchange protocol breaks this assumption.

For example, see the Go prepared statement implementation: PreparedStatement.Execute, which returns a FlightInfo. The Rust FlightSQL client does this too.

@kou
Copy link
Member

kou commented Feb 5, 2024

How about just returning null or an empty FlightInfo (no endpoint)?

@kou
Copy link
Member

kou commented Feb 5, 2024

Ah, we may want to return *flight.Reader like Client.DoGet() for DoExchange version.

@alamb
Copy link
Contributor Author

alamb commented Feb 5, 2024

@alamb I think we are on the same page that DoExchange does not require passing a FlightInfo — the problem is that the existing prepared statement interfaces do require returning a FlightInfo back to the client. Introducing the DoExchange protocol breaks this assumption.

I always think of FlightInfo as a source of potential indirection (as it can contain endpoint information so the subsequent call may be to a different endpoint / hostname)

Thus if there is no subsequent calls (just DoExchange instead of DoPut/GetFlightInfo/DoGet) the lack of FlightInfo makes sense to me (as DoExchange starts feeding back data from whatever endpoint you sent data to)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants