Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using gRPC as an externally exposed API #271

Closed
zaa opened this issue Mar 2, 2017 · 23 comments
Closed

Consider using gRPC as an externally exposed API #271

zaa opened this issue Mar 2, 2017 · 23 comments
Labels
lifecycle/stale type/feature The PR added a new feature or issue requested a new feature

Comments

@zaa
Copy link

zaa commented Mar 2, 2017

grpc (http://grpc.io) has ready-made clients for Java, C++, Go, Python, etc. So Yahoo Pulsar clients would not need to reimplement efficient clients in all the languages (currently exposed websocket interface does not support all the methods provided by the protobuf based protocol, has lower performance and requires creation of a separate websocket connection per topic publisher/consumer).

@agarman
Copy link

agarman commented Mar 2, 2017

Are you suggesting gRPC service that uses the C++ lib as an alternative to the Web Sockets service?
Or are you suggesting a rewrite of Java & C++ libs to use gRPC instead of current protobuf based protocol?

@merlimat
Copy link
Contributor

merlimat commented Mar 2, 2017

When we started the project gRPC was not available yet so we went with custom protocol.

I think it would be great to offer a gRPC based interface for better integration, though I would see that as an additional layer, such as the WebSocket proxy (which can run embedded in the broker or as a separate component).

One of the primary goals of the custom binary protocol we came up with, was to have the client establish a "session" (either producer/consumer), attached to a topic, perform authentication and then let it publish/receive messages as fast as possible, with flow control to guard the rail.

Eg: we don't want to perform auth at every message, or to specify the topic name each time (which in many cases can be as long as the data itself).

So, mixing the "session" with RPC seems a bit complicated. Also guaranteeing ordering would be challenging as there would be no relation for different publish requests on the same topic.

Having said that, I'd be really happy to have a gRPC based proxy service. Contributions welcome! 😄

It may be also interesting to offer the same interface (or at least a significant portion of it) as GCP pub-sub: https://cloud.google.com/pubsub/docs/reference/rpc/google.pubsub.v1

@lzaugg
Copy link

lzaugg commented Jun 12, 2018

Having gRPC as an alternative to the Web Socket service would be awesome. gRPC has built-in support for bidirectional streams - so basically it could be seen as a session (it's using HTTP/2 streams), no? Guaranteeing the message order shouldn't be a problem then. I think it's also important to know if pulsar is going to loosen the message ordering constraints (at least for such an "additional" layer) because it would make things easier (for use cases where ordering is not important - in the same way as GCP pub/sub doesn't guarantee any order).

@sijie
Copy link
Member

sijie commented Jun 12, 2018

Having gRPC as an alternative to the Web Socket service would be awesome. gRPC has built-in support for bidirectional streams - so basically it could be seen as a session (it's using HTTP/2 streams), no?

agreed. I believe ordering is not a problem with gRPC bidirectional streaming. It is actually very easier and super fun on using gRPC bidirectional streaming.

I think the most interesting piece here is to add a GCP pub/sub proxy with gRPC protocol.

I think it's also important to know if pulsar is going to loosen the message ordering constraints (at least for such an "additional" layer) because it would make things easier

in the context of "shared" subscription, the message ordering constraints are already relaxed. that said you can use exclusive/failover subscription for ordered message consumption, shared subscription for non-ordered message consumption.

@sijie sijie added type/feature The PR added a new feature or issue requested a new feature triage/week-34 labels Aug 22, 2018
@RobIsHere
Copy link

Whole Clusters run on grpc like k8s + istio and it's well supported by proxies like envoy.

You could think about clients talking to their user's topics directly, authenticated by ingress - e.g. envoy filters (see https://www.envoyproxy.io/docs/envoy/latest/configuration/http_filters/grpc_web_filter for ideas).

IMHO, I'm not convinced about GCP pub-sub. Making one thing like the other is almost often a large effort and a bad fit if you look into details. When google changes the api, do you follow?
Wrapping your already proofen, tested and implemented custom protocol in grpc is probably a huge time saver. And your clients apis are well-designed like they are. Better early than google like ;)

@snoodleboot
Copy link

I would love to see this! I can understand about pub/sub. It really doesn't serve the same purpose as Pulsar or any distributed log. Different use cases.

@cbornet
Copy link
Contributor

cbornet commented Nov 8, 2018

+1. Supporting gRPC would give access to a lot more clients, integration with reactive frameworks (like RxJava or Reactor), provide application-level flow control, etc...
Another stream protocol to watch IMO is RSocket.io which has built-in integration of reactive-streams spec (non blocking streams with back-pressure). Since it was just released, it lacks client SDKs but that should evolve over time.

@cbornet
Copy link
Contributor

cbornet commented Nov 20, 2018

One of the primary goals of the custom binary protocol we came up with, was to have the client establish a "session" (either producer/consumer), attached to a topic, perform authentication and then let it publish/receive messages as fast as possible, with flow control to guard the rail.
Eg: we don't want to perform auth at every message, or to specify the topic name each time (which in many cases can be as long as the data itself).
So, mixing the "session" with RPC seems a bit complicated. Also guaranteeing ordering would be challenging as there would be no relation for different publish requests on the same topic.

gRPC can establish a session for bidirectional streaming ! So IMO it could totally be used as the base protocol for Pulsar. The definition would look something like

service Pulsar {
    rpc exchange(stream BaseCommand) returns (stream BaseCommand);
}

Instead of passing auth info via CommandConnect, you would pass them as gRPC's Metadata fields (similar to HTTP headers).
Note that this reuses the PulsarApi.proto, so a lot of code would be unchanged I think.
You would probably still need drivers because some functionalities require cooperation between the client and the server. But these drivers would be easier to write.
gRPC also has auth mecanisms built-in that could maybe be reused and has built-in flow control.

That said I have started the work on a gRPC proxy. Consumption and production are working. I need to clean it up then I'll do a PR.

@cbornet
Copy link
Contributor

cbornet commented Nov 20, 2018

And if I'm not mistaken BookKeeper uses gRPC internally, so it would be coherent to make it the base protocol in Pulsar also.

@sijie
Copy link
Member

sijie commented Nov 20, 2018

That said I have started the work on a gRPC proxy. Consumption and production are working. I need to clean it up then I'll do a PR.

Look forward to your PR.

And if I'm not mistaken BookKeeper uses gRPC internally, so it would be coherent to make it the base protocol in Pulsar also.

gRPC is used only used for bookkeeper's table service. but the ledger service is still using custom protocol. but agreed with you, gRPC has very rich ecosystem, it is a good direction to good in general.

@merlimat
Copy link
Contributor

@cbornet The reason we haven't used gRPC is that it wasn't available when we started, so we went with custom protocol over protocol buffer. After that, migrating the internal protocol was a big step.

@cbornet
Copy link
Contributor

cbornet commented Nov 20, 2018

Yes. That's a very good reason indeed. Maybe in the future 😄 . I can understand there are bigger priorities.

@mickdelaney
Copy link

Any progress likely on this ?
We’re Python, dotnet & so have no viable option to use Pulsar.

@sijie
Copy link
Member

sijie commented Sep 10, 2019

@mickdelaney Pulsar has a python client and there is an ongoing development of dotnet client. Does it meet your requirement? Or gRPC is your preferred option?

@mickdelaney
Copy link

Hi,
Sorry for the late reply.

So we use Kafka at the moment, confluent provide dotnet & python clients, based on librdkafka which
in theory gives a baseline for all clients that extend it.

The reality is that its very expensive to maintain all these language drivers, and so you get differences, you get things that are coming down the line, for example the schema/avro support in the various languages for kafka varies significantly, Java being very different than say C#.

So for teams using these drivers, you have to rely on different semantics, you have to create different approaches to dealing with things like schemas, and it increases costs.

Also you have to think about the teams providing the drivers, and the costs they have in maintaining them. Its not easy.

So if there's a possibility that GRPC will fit the semantis of pulsars protocol, it seems to me that its a win for everyone, the pulsar team in particular can focus they're attention on making the GRPC layer first class.

Thanks...

@TC-oKozlov
Copy link

We have real-time messaging system implemented in Erlang, and looking at pulsar as a pub/sub /queue message broker. Unfortunately that means implementing our own client lib with tons of features on top of binary / protobuf protocol. Having gRPC support would have greatly helped

@sijie
Copy link
Member

sijie commented Nov 28, 2019

@mickdelaney @TC-oKozlov thank you for your input.

just to understand a bit more about the requirements, are you expecting a gRPC based proxy or pulsar broker protocol exposed in gPRC? This would lead into two different approaches.

A gRPC based proxy means providing a much simpler protocol than the current broker protocol. So it is easy to have different language gRPC clients. But it will has its own limitations and drawbacks, such as another network hop, and some of the features might be hard to support and etc.

Exposing pulsar broker protocol in gPRC can solve the problem in handling wire-level request & response encoding and decoding. However the challenge of implementing a Pulsar client is not about handling wire-level encoding and decoding. It is more about the logic within a Pulsar client, such as flow-control, topic lookup, error handling and etc. So we will still be facing the same challenges that current Pulsar client is facing. It is probably even worse than implementing language client wrapper using Pulsar c/c++ client, because implementing a language client wrapper is much simpler and less error prone than re-implementing flow-control, topic lookup and error handling in different languages.

I would like to collect more requirements of gRPC to understand what is the right approach for solving the problem here.

@cbornet
Copy link
Contributor

cbornet commented Nov 28, 2019

I think moving to gRPC for the Pulsar clients would have some benefits. For instance it already handles flow control and bi-directional streaming. For those who want to write native clients, that's a layer less to develop.
Another interesting alternative could be RSocket which has some very nice features such as session resumption and message-level backpressure. In JAVA, it would be possible to have a fully reactive-streams API using these protocols.

@mickdelaney
Copy link

@sijie thanks for the detailed feedback. i was thinking of the former, my thinking being that it would atleast remove some of the concerns in maintaining the various language level clients.

@cbornet
Copy link
Contributor

cbornet commented Dec 4, 2020

Since v2.7.0 has been released, you can now use the gRPC protocol handler which implements PIP59.
So far all features of 2.7.0 are implemented except transactions (coming soon) and credentials refreshing (probably harder).
You can download a pre-version of the nar here.
I'd be happy to get your feedback on this. I'll publish a blog post in the coming weeks.

@cbornet
Copy link
Contributor

cbornet commented Dec 6, 2020

New pre-release with full transaction support : https://github.com/cbornet/pulsar-grpc/releases/tag/v1.0.0-20201206-rc

@sl1316
Copy link

sl1316 commented Nov 28, 2021

@cbornet can you provide some guidance regarding how to use the grpc protocol? I only saw binary protocol in http://pulsar.apache.org/docs/en/develop-binary-protocol/ .

@tisonkun
Copy link
Member

Closed as answered by #271 (comment). New questions or issues can be created separately.

dlg99 pushed a commit to dlg99/pulsar that referenced this issue May 23, 2024
dlg99 pushed a commit to dlg99/pulsar that referenced this issue May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lifecycle/stale type/feature The PR added a new feature or issue requested a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.