-
Notifications
You must be signed in to change notification settings - Fork 994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implementation of a gRPC transport #538
base: main
Are you sure you want to change the base?
Conversation
This is needed to allow for other transports to live in other packages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work @mkeeler !!
I have some cosmetics comments but nothing substantial.
Thinking about the questions you asked in the description:
- I think GRPC transport should be the de facto transport for our library and so live in the library, to encourage users to go with that transport if they don't have any limitations.
- About authorization, I think we should export the Auth method as you suggested and for the time being go with the mTLS in Consul, we can explore adding other methods in the future.
- Thinking about retro compatibility I'm still convinced that having a dual transport that would fallback to RPC is the best option as it will not require any out of band communication and would not require a change to the protocol. We can also make the dual transport more intelligent and not perform a fallback on every request. About the Address provider, I think it's not an issue as we can construct 2 address providers in Consul side, and pass each to each transport (RPC and gRPC) and then wrap both transports into a dual transport to handle fallbacks.
I will try to start a PR on top of this one to add some tests around the new transport. Let me know what you think about the dual transport, I can maybe add it too.
// mog annotation: | ||
// | ||
// target=github.com/hashicorp/raft.RPCHeader | ||
// output=common.gen.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe types.gen.go to keep the names in sync
// mog annotation: | ||
// | ||
// target=github.com/hashicorp/raft.Log | ||
// output=common.gen.go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here about the filename
select { | ||
case g.rpcCh <- rpc: | ||
case <-ctx.Done(): | ||
return nil, status.FromContextError(ctx.Err()).Err() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think calling this is necessary in here as the handler error will be transformed into a status code when returned anyway.
I think we can just return ctx.Err()
and it will be equivalent.
const ( | ||
// rpcMaxPipeline controls the maximum number of outstanding | ||
// AppendEntries RPC calls. | ||
rpcMaxPipeline = 128 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #541, we probably need to at least mirror that PR in this transport if we use it to avoid the same pathology.
g.logger.Info("handling an append entries pipeline") | ||
// This is not as efficient as it could be. To improve this we could | ||
// queue up the append entries requests in one go routine and wait | ||
// on their completion and send results back in a second go routine. | ||
// If sending responses is experiencing extra latency then we would | ||
// be better pipelining the requests. However the existing net_transport | ||
// processes commands for a singular connection serially waiting on one | ||
// RPC to finish before processing more commands. Therefore this should | ||
// be at least as performant as that transport. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed optimization here is an alternative fix for #541 issue.
As implemented in the PR so far, this transport will have the same behavior as network transport that makes pipelining almost always worse than not pipelining at all.
If we did what this comment said though and applied batching on the recieve side here before writing to disk in one go, then that would fix the issue and we'd be able to avoid the problems of pipelining, at least in theory - we'd want to do actual performance comparisons to determine if there are other downsides.
I think ultimately it will end up being mostly equivalent though:
- at low throughput pipelining doesn't impact perf at all since requests are too far apart that they complete before the next one arrives
- at high throughput with sync receive processing, long pipelines prevent leader batching from grouping work which forces the pipeline to fill and the queue built up in the pipeline functions as a multiplier for latency.
- at high throughput with receive side batching as proposed in this comment then you get back to the same latency as with pipelining disabled since the follower will automatically commit to disk more efficiently once a queue builds up and so keep the queue as small as possible (until we hit the max of 64 logs per batch or whatever limit we choose to prevent unbounded growth) it's not clear though if this actually is better than just not pipelining at all in the first place which acheives the same thing just by batching on the leader. In theory it might due to overlapping encoding and decoding and network RTT vs. purely sync replication but I suspect if we test it we'd only get measurable gains where the RTT is a lot larger than the disk commit times which is not a typical situation for HashiCorp raft usages at least.
This is more of a brain dump for when this is picked up than a recommendation at this point but we should do at least one of:
- make pipeline size configurable and default to short like in NetworkTransport make pipelining configurable and default to max 2 in flight. #541
- or batch on the receive side here too as in the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think after all your investigation we have generally proved that pipelining in the transport on the sending side is almost always worse in cases where this library is intended to be used.
More or less we want to cause the followers to commit to disk fewer times with larger batches when under heavier load. It doesn't seem worth it to batch things prior to hitting the transport on the leader and then again on the follower.
So given that we already have batching code outside of the transport on the leader, it seems like we ought to keep this transport simpler and not do receive side batching.
any update on this? |
Overview
There are currently two commits in this PR.
1 was necessary to allow for better code organization with transports so that not all code has to live at the top level. If we like this approach I think we should also move the other transports into a similar directory structure.
When implementing the transport I chose to take the approach of having something outside of Raft maintain the gRPC server and providing a method on the transport to register the gRPC service with that other server. No scenario is coming to mind where you would want gRPC Raft but not gRPC for other purposes so being able to share the same grpc.Server seems ideal.
Open Questions
Should the grpc transport and protobuf files live in their own module in order to not enforce arbitrary library versions on Raft consumers utilizing the other transports.
Should we have some native concept of authorization? Consul requires mTLS with specific server name verification in its Raft utilization. Should the Raft library itself be able to extract the gRPC transport credentials and have some callback for authorizing RPCs? Potentially it could even have access to the gRPC metadata to extract other forms of authorization other than mTLS.
TODOS