-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DNM] stop: disable task tracking in release builds, optimize task creation #52894
[DNM] stop: disable task tracking in release builds, optimize task creation #52894
Conversation
…eation This commit optimizes the Stopper for task creation by ripping out the existing heavyweight task tracking in production builds. I realized that my biggest concern with most of the proposals (cockroachdb#52843 and cockroachdb#51566) being floated to address cockroachdb#51544 was that they bought more into the inefficient tracking in the Stopper, not that they were doing anything inherently wrong themselves. Before this change, creating a task acquired an exclusive mutex and then wrote to a hashmap. At high levels of concurrency, this would have become a performance chokepoint. After this change, the cost of launching a Task is three atomic increments – one to acquire a read lock, one to register with a WaitGroup, and one to release the read lock. When no one is draining the Stopper, these are all wait-free operations, which means that task creation becomes wait-free. With a change like this, I would feel much more comfortable pushing on Stopper tasks to solve cockroachdb#51544.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been thinking about doing this, but at the same time I have dreams of more task registration. Namely I want an /rpcz
page that lists all ongoing RPCs. So I guess I first want to see how much inserting into a map really hurts us. Currently that map in the stopper is just a regular map behind a mutex, so I 'd also want to see what whether sync.Map
helps.
I was planning on experimenting myself.
Reviewable status: complete! 0 of 0 LGTMs obtained
Have you seen https://grpc.io/blog/a-short-introduction-to-channelz/? I'm going to play around with it when I get the chance. It's got a "z" at the end, so it must have a blessing from the right people. My contention here is that the task tracking in the Stopper is already too heavyweight for general usage. It does too much and it doesn't do enough well. It sounds like we all want to be using Stopper tasks more, not less, which I agree with. But we aren't going to get there if we add more RPC-specific bloat to what should be a very simple draining mechanism. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you seen https://grpc.io/blog/a-short-introduction-to-channelz/? I'm going to play around with it when I get the chance. It's got a "z" at the end, so it must have a blessing from the right people.
That looks cool, but I can't tell if it tells you what RPCs are currently running on a server.
As far as this particular PR is concerned, it looks good to me and I wouldn't even bother with the "debug" implementation. When a test times out, we'll get stack traces for the tasks, which I think should be enough.
But I do want to reserve the right to re-introduce a syncmap update per rpc start/finish in the future if no benchmark regresses too much.
Reviewable status: complete! 0 of 0 LGTMs obtained
We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as cockroachdb#58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in issue cockroachdb#58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes cockroachdb#52894. Release note: None
We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as cockroachdb#58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in issue cockroachdb#58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes cockroachdb#52894. Release note: None
We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as cockroachdb#58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in issue cockroachdb#58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes cockroachdb#52894. Release note: None
59647: stop: rip out expensive task tracking r=knz a=tbg First commit was put up for PR separately, ignore it here. ---- We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as #58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in #51544 and makes sense for all of the reasons mentioned in issue #58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes #52894. Release note: None 59732: backupccl: add an owner column behind the WITH PRIVILEGES option r=pbardea a=Elliebababa Previously, when users perform RESTORE, they are ignorant of the original owner. This PR gives ownership data as a column behind privileges. Resolves: #57906. Release note: None. 59746: opt: switch checks to use CrdbTestBuild instead of RaceEnabled r=RaduBerinde a=RaduBerinde The RaceEnabled flag is not very useful for checks; e.g. apparently execbuilder tests aren't run routinely in race mode. These checks are now "live" in any test build, using the crdb_test build tag. Release note: None 59747: tree: correct StatementTag of ALTER TABLE ... LOCALITY r=ajstorm a=otan Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: elliebababa <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Oliver Tan <[email protected]>
This commit optimizes the Stopper for task creation by ripping out the
existing heavyweight task tracking in production builds. I realized that
my biggest concern with most of the proposals (#52843 and #51566) being
floated to address #51544 was that they bought more into the inefficient
tracking in the Stopper, not that they were doing anything inherently
wrong themselves.
Before this change, creating a task acquired an exclusive mutex and then
wrote to a hashmap. At high levels of concurrency, this would have
become a performance chokepoint. After this change, the cost of
launching a Task is three atomic increments – one to acquire a read
lock, one to register with a WaitGroup, and one to release the read
lock. When no one is draining the Stopper, these are all wait-free
operations, which means that task creation becomes wait-free.
With a change like this, I would feel much more comfortable pushing on
Stopper tasks to solve #51544.
This isn't a real PR. I'm just throwing it up to see how people feel about it.
cc. @tbg @andreimatei