Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-4651: [Flight] Use URIs instead of host/port pair #4047

Closed
wants to merge 5 commits into from
Closed

ARROW-4651: [Flight] Use URIs instead of host/port pair #4047

wants to merge 5 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Mar 27, 2019

I haven't changed the client/server construction interfaces to really take advantage of this. I would rather follow up with creating builders instead, as proposed, as that would be cleaner than special-casing the logic in the current constructors.

@emkornfield
Copy link
Contributor

Does the change in format/ need to be voted on the mailing list?

@ghost
Copy link
Author

ghost commented Mar 27, 2019

Hmm, I suppose we should, even if Flight is unstable for now.

* empty, the expectation is that the ticket can only be redeemed on the
* current service where the ticket was generated.
*/
repeated Location location = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it might be better to keep the location message and replace the internal with a single field URI. This makes adding additional fields easier in the future if needed. Also, documenting supported protocols might be useful.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense.

@jacques-n
Copy link
Contributor

jacques-n commented Mar 27, 2019

Does the change in format/ need to be voted on the mailing list?

I think it should be. Let's define the URIs that we support and make sure we get consensus there.

@ghost
Copy link
Author

ghost commented Mar 27, 2019

Ok, I will put up a proposal on the mailing list. Thanks for the comments!

@pitrou
Copy link
Member

pitrou commented Apr 16, 2019

The protocol change proposal was formally accepted on the Arrow-dev mailing list. Now this PR needs to be rebased (and conflicts fixed) before it gets reviewed. @lihalite, do you have time for this? Otherwise, I can take up.

@ghost
Copy link
Author

ghost commented Apr 30, 2019

@pitrou sorry about that, I will rebase & clean up things later today (just got back from vacation).

@ghost
Copy link
Author

ghost commented Apr 30, 2019

I've now rebased this.

@codecov-io
Copy link

codecov-io commented May 1, 2019

Codecov Report

Merging #4047 into master will decrease coverage by 0.07%.
The diff coverage is 50.5%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #4047      +/-   ##
==========================================
- Coverage   88.19%   88.12%   -0.08%     
==========================================
  Files         779      774       -5     
  Lines       97927    97265     -662     
  Branches     1251     1251              
==========================================
- Hits        86370    85714     -656     
+ Misses      11321    11315       -6     
  Partials      236      236
Impacted Files Coverage Δ
cpp/src/arrow/python/flight.cc 0.79% <ø> (+0.02%) ⬆️
cpp/src/arrow/python/flight.h 0% <ø> (ø) ⬆️
python/pyarrow/tests/test_flight.py 4.13% <1.28%> (-0.58%) ⬇️
cpp/src/arrow/flight/types.h 100% <100%> (ø) ⬆️
cpp/src/arrow/flight/client.h 100% <100%> (ø) ⬆️
cpp/src/arrow/flight/test-server.cc 97.1% <100%> (+0.13%) ⬆️
cpp/src/arrow/flight/server.h 100% <100%> (ø) ⬆️
cpp/src/arrow/flight/flight-test.cc 100% <100%> (ø) ⬆️
cpp/src/arrow/util/uri.cc 100% <100%> (ø) ⬆️
cpp/src/arrow/flight/test-util.h 100% <100%> (ø) ⬆️
... and 221 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 445482e...3000485. Read the comment docs.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the C++ server API isn't changed (FlightServerBase::Init takes a simple port number). Is it intentional?

cpp/src/arrow/flight/types.h Outdated Show resolved Hide resolved
cpp/src/arrow/flight/types.h Outdated Show resolved Hide resolved
python/pyarrow/_flight.pyx Outdated Show resolved Hide resolved
cpp/src/arrow/flight/types.cc Outdated Show resolved Hide resolved
Status Location::ForGrpcInsecure(const std::string& host, const int port,
Location& location) {
std::stringstream uri_string;
uri_string << "grpc://" << host << ':' << port;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it can be fixed later, but this won't work for IPv6 numeric addresses, e.g. you need grpc://[::1]:80 and not grpc://::1:80.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uriparser (understandably) doesn't deal with URI construction, so we'd need recognize IPv6 addresses, or create a separate method for such addresses. Or perhaps just require that the user pass [::1]?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can require that for now.
(uriparser seems able to deal with URI construction, but the API looks a bit terrible)

python/pyarrow/_flight.pyx Outdated Show resolved Hide resolved
@ghost
Copy link
Author

ghost commented May 2, 2019

I intended to change the C++ server API. I think I rebased out the change on accident as I was also trying to implement the builder APIs at the same time, but now I'd rather do that in a follow up PR.

Thanks for the review, I'll get to fixing these issues!

@ghost
Copy link
Author

ghost commented May 3, 2019

While I'm at it, might as well add the builder APIs here, and make TLS-enabled services possible. In C++/Python, I did not go all the way to a Java-style builder - the changes there are much more invasive, and I don't think it's worth it until we fully settle on supporting another transport. (And even then, I think it could be done within the current APIs, or with minimal changes to them.)

@ghost
Copy link
Author

ghost commented May 6, 2019

Rebased, tests pass.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for updating this. It looks mostly good to me, just some style issues and a couple other details.

cpp/src/arrow/flight/client.cc Outdated Show resolved Hide resolved
cpp/src/arrow/util/uri.h Outdated Show resolved Hide resolved
cpp/src/arrow/flight/types.h Show resolved Hide resolved
python/pyarrow/_flight.pyx Outdated Show resolved Hide resolved
python/pyarrow/_flight.pyx Outdated Show resolved Hide resolved
python/pyarrow/_flight.pyx Outdated Show resolved Hide resolved
python/pyarrow/_flight.pyx Outdated Show resolved Hide resolved
sock.bind(('', 0))
sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
port = sock.getsockname()[1]
location = flight.Location.for_grpc_tcp("localhost", port)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would have a way to let gRPC bind the port and then return it (there is a race condition otherwise). But this needn't be in this PR.

cpp/src/arrow/flight/server.h Outdated Show resolved Hide resolved
@ghost
Copy link
Author

ghost commented May 7, 2019

Thanks for the feedback! I've updated things.

gRPC supports binding to port 0 for a free port, we just need a way to report that back to the API user.

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 from me on the C++ and Python changes.

@pitrou
Copy link
Member

pitrou commented May 9, 2019

Does this need reviewing on the Java side?

@ghost
Copy link
Author

ghost commented May 9, 2019

Thanks @pitrou. For the Java side, perhaps @jacques-n could take a look again?

@ghost
Copy link
Author

ghost commented May 13, 2019

Updated to fix an inadvertent API breakage. (Ticket is quite useless if you can't read its value...)

@ghost
Copy link
Author

ghost commented May 13, 2019

There are actually lots of little things I'm noticing here now that I'm trying to test internally, so please hold off while I fix things over the next couple days...apologies for the trouble.

@ghost
Copy link
Author

ghost commented May 14, 2019

Ok, this should be ready now; fixed some inadvertent API breakages.

.map(t -> new Location(t)).collect(Collectors.toList());
public FlightEndpoint(Flight.FlightEndpoint flt) throws URISyntaxException {
locations = new ArrayList<>();
for (final Flight.Location location : flt.getLocationList()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the move away from streams in some of these changes?

Copy link
Author

@ghost ghost May 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, here it's because I wanted the checked exception to be propagated - the stream hides this. And that applies to the change in FlightInfo as well.

* @param certChain The certificate chain to use.
* @param key The private key to use.
*/
public Builder useTls(final File certChain, final File key) throws IOException {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we have a default for certChain? Also, does it make sense to require it be a file?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add an overload for InputStream.

We may be able to hard-code some platform-specific paths to try.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh wait, I mixed this up. This is the server, so you must provide the certificate chain and key (it's the cert/key the server presents). The client defaults to a platform-specific value already, though we should let the client specify a certificate store to check against.

* Constants representing well-known URI schemes for Flight services.
*/
public class LocationSchemes {
public static final String GRPC = "grpc";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dumb question, what is grpc verus grpc+tcp?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a dumb question :) So far, they're aliases for each other (so the default "grpc" protocol is insecure gRPC over TCP)

Copy link
Member

@pitrou pitrou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks mostly good to me on the C++/Python side. Just a few nits.

cpp/src/arrow/flight/test-integration-client.cc Outdated Show resolved Hide resolved
python/pyarrow/tests/test_flight.py Outdated Show resolved Hide resolved
python/pyarrow/_flight.pyx Outdated Show resolved Hide resolved
@wesm
Copy link
Member

wesm commented May 16, 2019

It looks like this is near merge-readiness. @jacques-n can you review/sign off on the Java changes?

@ghost
Copy link
Author

ghost commented May 17, 2019

Rebased with master as well.

@wesm
Copy link
Member

wesm commented May 17, 2019

I think Jacques is traveling right now so it may be a little time before we can get a go-ahead from the Java side. Would my review on the C++/Python side be helpful?

@ghost
Copy link
Author

ghost commented May 17, 2019

I'd appreciate any feedback! On the Java side, I'm not in a particular rush to get this merged, and I can keep up with master, I'd just like to make sure I get everything in for 0.14.

@wesm
Copy link
Member

wesm commented May 17, 2019

Cool, BTW I'm guessing on a somewhat longer release timeline than usual for 0.14 to give various in-flight efforts to sort themselves out, e.g. toward end of June / beginning July

(EDIT: pun was not intended but...)

@emkornfield
Copy link
Contributor

Please make sure to run rebase any java CLs and re-run CI to make sure javadoc's are in place.

@wesm
Copy link
Member

wesm commented May 22, 2019

@jacques-n I think you are the last approver on this PR

@jacques-n
Copy link
Contributor

Looks good to me. Thanks for pulling this together @lihalite!

@wesm wesm closed this in c18251e May 22, 2019
@wesm
Copy link
Member

wesm commented May 22, 2019

thanks @lihalite! It might have already been discussed here, but what is the testing strategy for TLS-enabled Flight going to be? Unless I missed it is doesn't seem that this is tested now, can we open a JIRA?

@ghost
Copy link
Author

ghost commented May 22, 2019

It's not currently tested, JIRA: https://jira.apache.org/jira/browse/ARROW-5397

We'll need some way to generate certs/keys.

@pitrou
Copy link
Member

pitrou commented May 22, 2019

The easiest thing to do is to store self-signed certs and the corresponding private key in the repo.
Preferable along with the script to regenerate them when the expiry date is reached.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants