-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
distsql: flow scheduler back up could result in failed streams #27746
Comments
Long queuing can lead to errors like described in cockroachdb#27746. This change should give us more visibility into when queuing is occurring and how problematic it is. Release note: None
33690: server: refactor the grpc server r=andreimatei a=andreimatei This patch creates the grpcServer object, to encapsulate the grpc.Server object, the serveMode object and the interceptor that our server.Server uses to filter out RPCs. The idea is to work towards decoupling the gprc.Server from our server.Server object. I'd like to have the grpc server be created before Server.Start(): I'd like the cluster id and node id to be available by the time Server.Start() is called so that we can get rid of the nodeIDCountainer that everybody and their dog uses. But for getting these ids a grpc server needs to be running early to handle the "init" rpc which bootstraps a cluster. This patch doesn't accomplish too much - it doesn't do anything about actually starting to serve any requests without a Server (i.e. create the needed listeners), but I think the patch stands on its own too as good refactoring. Release note: None 34027: distsqlrun: add metrics for queue size and wait duration r=ajwerner a=ajwerner Long queuing can lead to errors like described in #27746. This change should give us more visibility into when queuing is occurring and how problematic it is. Release note: None Co-authored-by: Andrei Matei <[email protected]> Co-authored-by: Andrew Werner <[email protected]>
34027: distsqlrun: add metrics for queue size and wait duration r=ajwerner a=ajwerner Long queuing can lead to errors like described in #27746. This change should give us more visibility into when queuing is occurring and how problematic it is. Release note: None Co-authored-by: Andrew Werner <[email protected]>
This behavior has become a regular nuisance. Having a fixed limit in the first place is probably wrong. The queuing can be okay if the expected wait time is low (probably not more than some multiple of the expected life of the flow itself). As the queue builds up the failure mode it triggers is to lead flow setup to timeout from the perspective of the upstream node. This ultimately results in a cryptic error to the user ( We need a mechanism to determine when there are too many flows and that the cluster is overloaded and gracefully report that to the user. |
In the very short term should we just question the default value of 500? Is it justified? My expectation is that the resource we're preserving here is RAM? What is a simplistic RAM model for each flow? |
We're limiting network and CPU use as well, though RAM is probably the most important (network and CPU will get slower, RAM will hit a wall and cause the node to die). The flow limit is a form of admission control, but it is interesting that this is happening at an internal layer and not when the query first arrives. My intuition is that once we decide to process a query, we should process it. Bumping the value of Cc @RaduBerinde and @andreimatei for their insight on the current design. |
My understanding is that @andreimatei and @andy-kimball have been discussing this throughout the day. I'm looking forward to seeing what falls out of the various discussions. |
@jordanlewis any update on addressing this item? |
I intend to explore putting an admission controller here. The basic approach that’s being applied to kv should drop in nicely. There are some differences, especially that there’s not clear expectation that running flows will finish in a relatively timely manner the way that there is for kv reads but even the naive approach should be better than we have today. |
Superseded by #34229 |
When scheduling a flow, the
flowScheduler
checks how many running flows there arecockroach/pkg/sql/distsqlrun/flow_scheduler.go
Line 90 in a9410c5
and enqueues a flow to be run later if there are too many running . This is a potential issue because if there are two nodes with uneven load, node A with a low load and node B with a high load, both could receive a request to set up a flow, node A could try to connect to node B but node B would have this flow backed up in its queue. If it takes too long to dequeue the flow on node B, the whole flow will fail.
A solution might be to set up these connections early regardless of the load (i.e. registering a flow through
RegisterFlow
as soon as we receive a request to set up a flow prior to a possible enqueue)The text was updated successfully, but these errors were encountered: