[DNM] stop: disable task tracking in release builds, optimize task creation #52894

nvanbenschoten · 2020-08-17T14:27:26Z

This commit optimizes the Stopper for task creation by ripping out the
existing heavyweight task tracking in production builds. I realized that
my biggest concern with most of the proposals (#52843 and #51566) being
floated to address #51544 was that they bought more into the inefficient
tracking in the Stopper, not that they were doing anything inherently
wrong themselves.

Before this change, creating a task acquired an exclusive mutex and then
wrote to a hashmap. At high levels of concurrency, this would have
become a performance chokepoint. After this change, the cost of
launching a Task is three atomic increments – one to acquire a read
lock, one to register with a WaitGroup, and one to release the read
lock. When no one is draining the Stopper, these are all wait-free
operations, which means that task creation becomes wait-free.

With a change like this, I would feel much more comfortable pushing on
Stopper tasks to solve #51544.

This isn't a real PR. I'm just throwing it up to see how people feel about it.

cc. @tbg @andreimatei

…eation This commit optimizes the Stopper for task creation by ripping out the existing heavyweight task tracking in production builds. I realized that my biggest concern with most of the proposals (cockroachdb#52843 and cockroachdb#51566) being floated to address cockroachdb#51544 was that they bought more into the inefficient tracking in the Stopper, not that they were doing anything inherently wrong themselves. Before this change, creating a task acquired an exclusive mutex and then wrote to a hashmap. At high levels of concurrency, this would have become a performance chokepoint. After this change, the cost of launching a Task is three atomic increments – one to acquire a read lock, one to register with a WaitGroup, and one to release the read lock. When no one is draining the Stopper, these are all wait-free operations, which means that task creation becomes wait-free. With a change like this, I would feel much more comfortable pushing on Stopper tasks to solve cockroachdb#51544.

cockroach-teamcity · 2020-08-17T14:27:34Z

This change is

andreimatei

I've been thinking about doing this, but at the same time I have dreams of more task registration. Namely I want an /rpcz page that lists all ongoing RPCs. So I guess I first want to see how much inserting into a map really hurts us. Currently that map in the stopper is just a regular map behind a mutex, so I 'd also want to see what whether sync.Map helps.
I was planning on experimenting myself.

Reviewable status: complete! 0 of 0 LGTMs obtained

nvanbenschoten · 2020-08-17T22:08:11Z

Namely I want an /rpcz page that lists all ongoing RPCs.

Have you seen https://grpc.io/blog/a-short-introduction-to-channelz/? I'm going to play around with it when I get the chance. It's got a "z" at the end, so it must have a blessing from the right people.

My contention here is that the task tracking in the Stopper is already too heavyweight for general usage. It does too much and it doesn't do enough well. It sounds like we all want to be using Stopper tasks more, not less, which I agree with. But we aren't going to get there if we add more RPC-specific bloat to what should be a very simple draining mechanism.

andreimatei

Have you seen https://grpc.io/blog/a-short-introduction-to-channelz/? I'm going to play around with it when I get the chance. It's got a "z" at the end, so it must have a blessing from the right people.

That looks cool, but I can't tell if it tells you what RPCs are currently running on a server.

As far as this particular PR is concerned, it looks good to me and I wouldn't even bother with the "debug" implementation. When a test times out, we'll get stack traces for the tasks, which I think should be enough.
But I do want to reserve the right to re-introduce a syncmap update per rpc start/finish in the future if no benchmark regresses too much.

Reviewable status: complete! 0 of 0 LGTMs obtained

We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as cockroachdb#58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in issue cockroachdb#58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes cockroachdb#52894. Release note: None

59647: stop: rip out expensive task tracking r=knz a=tbg First commit was put up for PR separately, ignore it here. ---- We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as #58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in #51544 and makes sense for all of the reasons mentioned in issue #58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes #52894. Release note: None 59732: backupccl: add an owner column behind the WITH PRIVILEGES option r=pbardea a=Elliebababa Previously, when users perform RESTORE, they are ignorant of the original owner. This PR gives ownership data as a column behind privileges. Resolves: #57906. Release note: None. 59746: opt: switch checks to use CrdbTestBuild instead of RaceEnabled r=RaduBerinde a=RaduBerinde The RaceEnabled flag is not very useful for checks; e.g. apparently execbuilder tests aren't run routinely in race mode. These checks are now "live" in any test build, using the crdb_test build tag. Release note: None 59747: tree: correct StatementTag of ALTER TABLE ... LOCALITY r=ajstorm a=otan Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: elliebababa <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Oliver Tan <[email protected]>

andreimatei reviewed Aug 17, 2020

View reviewed changes

andreimatei reviewed Aug 19, 2020

View reviewed changes

nvanbenschoten mentioned this pull request Jan 22, 2021

stopper: remove RunWorker and ShouldStop #59041

Merged

tbg mentioned this pull request Feb 1, 2021

stop: rip out expensive task tracking #59647

Merged

craig bot closed this in 05c5a74 Feb 3, 2021

nvanbenschoten deleted the nvanbenschoten/stopperDrain branch February 6, 2021 01:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DNM] stop: disable task tracking in release builds, optimize task creation #52894

[DNM] stop: disable task tracking in release builds, optimize task creation #52894

nvanbenschoten commented Aug 17, 2020

cockroach-teamcity commented Aug 17, 2020

andreimatei left a comment

nvanbenschoten commented Aug 17, 2020

andreimatei left a comment

[DNM] stop: disable task tracking in release builds, optimize task creation #52894

[DNM] stop: disable task tracking in release builds, optimize task creation #52894

Conversation

nvanbenschoten commented Aug 17, 2020

cockroach-teamcity commented Aug 17, 2020

andreimatei left a comment

Choose a reason for hiding this comment

nvanbenschoten commented Aug 17, 2020

andreimatei left a comment

Choose a reason for hiding this comment