-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARROW-4651: [Flight] Use URIs instead of host/port pair #4047
Conversation
Does the change in format/ need to be voted on the mailing list? |
Hmm, I suppose we should, even if Flight is unstable for now. |
format/Flight.proto
Outdated
* empty, the expectation is that the ticket can only be redeemed on the | ||
* current service where the ticket was generated. | ||
*/ | ||
repeated Location location = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to keep the location message and replace the internal with a single field URI. This makes adding additional fields easier in the future if needed. Also, documenting supported protocols might be useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense.
I think it should be. Let's define the URIs that we support and make sure we get consensus there. |
Ok, I will put up a proposal on the mailing list. Thanks for the comments! |
The protocol change proposal was formally accepted on the Arrow-dev mailing list. Now this PR needs to be rebased (and conflicts fixed) before it gets reviewed. @lihalite, do you have time for this? Otherwise, I can take up. |
@pitrou sorry about that, I will rebase & clean up things later today (just got back from vacation). |
I've now rebased this. |
Codecov Report
@@ Coverage Diff @@
## master #4047 +/- ##
==========================================
- Coverage 88.19% 88.12% -0.08%
==========================================
Files 779 774 -5
Lines 97927 97265 -662
Branches 1251 1251
==========================================
- Hits 86370 85714 -656
+ Misses 11321 11315 -6
Partials 236 236
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the C++ server API isn't changed (FlightServerBase::Init
takes a simple port number). Is it intentional?
cpp/src/arrow/flight/types.cc
Outdated
Status Location::ForGrpcInsecure(const std::string& host, const int port, | ||
Location& location) { | ||
std::stringstream uri_string; | ||
uri_string << "grpc://" << host << ':' << port; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps it can be fixed later, but this won't work for IPv6 numeric addresses, e.g. you need grpc://[::1]:80
and not grpc://::1:80
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
uriparser (understandably) doesn't deal with URI construction, so we'd need recognize IPv6 addresses, or create a separate method for such addresses. Or perhaps just require that the user pass [::1]
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, we can require that for now.
(uriparser seems able to deal with URI construction, but the API looks a bit terrible)
I intended to change the C++ server API. I think I rebased out the change on accident as I was also trying to implement the builder APIs at the same time, but now I'd rather do that in a follow up PR. Thanks for the review, I'll get to fixing these issues! |
While I'm at it, might as well add the builder APIs here, and make TLS-enabled services possible. In C++/Python, I did not go all the way to a Java-style builder - the changes there are much more invasive, and I don't think it's worth it until we fully settle on supporting another transport. (And even then, I think it could be done within the current APIs, or with minimal changes to them.) |
Rebased, tests pass. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for updating this. It looks mostly good to me, just some style issues and a couple other details.
sock.bind(('', 0)) | ||
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) | ||
port = sock.getsockname()[1] | ||
location = flight.Location.for_grpc_tcp("localhost", port) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally we would have a way to let gRPC bind the port and then return it (there is a race condition otherwise). But this needn't be in this PR.
Thanks for the feedback! I've updated things. gRPC supports binding to port 0 for a free port, we just need a way to report that back to the API user. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 from me on the C++ and Python changes.
Does this need reviewing on the Java side? |
Thanks @pitrou. For the Java side, perhaps @jacques-n could take a look again? |
Updated to fix an inadvertent API breakage. ( |
There are actually lots of little things I'm noticing here now that I'm trying to test internally, so please hold off while I fix things over the next couple days...apologies for the trouble. |
Ok, this should be ready now; fixed some inadvertent API breakages. |
.map(t -> new Location(t)).collect(Collectors.toList()); | ||
public FlightEndpoint(Flight.FlightEndpoint flt) throws URISyntaxException { | ||
locations = new ArrayList<>(); | ||
for (final Flight.Location location : flt.getLocationList()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the move away from streams in some of these changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, here it's because I wanted the checked exception to be propagated - the stream hides this. And that applies to the change in FlightInfo
as well.
java/flight/src/main/java/org/apache/arrow/flight/FlightServer.java
Outdated
Show resolved
Hide resolved
java/flight/src/main/java/org/apache/arrow/flight/FlightServer.java
Outdated
Show resolved
Hide resolved
* @param certChain The certificate chain to use. | ||
* @param key The private key to use. | ||
*/ | ||
public Builder useTls(final File certChain, final File key) throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we have a default for certChain? Also, does it make sense to require it be a file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add an overload for InputStream.
We may be able to hard-code some platform-specific paths to try.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, I mixed this up. This is the server, so you must provide the certificate chain and key (it's the cert/key the server presents). The client defaults to a platform-specific value already, though we should let the client specify a certificate store to check against.
java/flight/src/main/java/org/apache/arrow/flight/FlightServer.java
Outdated
Show resolved
Hide resolved
java/flight/src/main/java/org/apache/arrow/flight/LocationSchemes.java
Outdated
Show resolved
Hide resolved
* Constants representing well-known URI schemes for Flight services. | ||
*/ | ||
public class LocationSchemes { | ||
public static final String GRPC = "grpc"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dumb question, what is grpc verus grpc+tcp?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a dumb question :) So far, they're aliases for each other (so the default "grpc" protocol is insecure gRPC over TCP)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks mostly good to me on the C++/Python side. Just a few nits.
It looks like this is near merge-readiness. @jacques-n can you review/sign off on the Java changes? |
Rebased with master as well. |
I think Jacques is traveling right now so it may be a little time before we can get a go-ahead from the Java side. Would my review on the C++/Python side be helpful? |
I'd appreciate any feedback! On the Java side, I'm not in a particular rush to get this merged, and I can keep up with master, I'd just like to make sure I get everything in for 0.14. |
Cool, BTW I'm guessing on a somewhat longer release timeline than usual for 0.14 to give various in-flight efforts to sort themselves out, e.g. toward end of June / beginning July (EDIT: pun was not intended but...) |
Please make sure to run rebase any java CLs and re-run CI to make sure javadoc's are in place. |
@jacques-n I think you are the last approver on this PR |
Looks good to me. Thanks for pulling this together @lihalite! |
thanks @lihalite! It might have already been discussed here, but what is the testing strategy for TLS-enabled Flight going to be? Unless I missed it is doesn't seem that this is tested now, can we open a JIRA? |
It's not currently tested, JIRA: https://jira.apache.org/jira/browse/ARROW-5397 We'll need some way to generate certs/keys. |
The easiest thing to do is to store self-signed certs and the corresponding private key in the repo. |
I haven't changed the client/server construction interfaces to really take advantage of this. I would rather follow up with creating builders instead, as proposed, as that would be cleaner than special-casing the logic in the current constructors.