-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
*: ban the go keyword in non-test code #58164
Comments
Here is the current list of users of
|
I have looked at a few. I believe we need a mix of both. A large number can+should use a stopper and a |
@tbg this connects well to our discussion earlier last week to promote the use of the stopper to ensure that logging contexts get properly propagated. |
Thanks! I wasn't aware of this issue, glad it already exists.First I wanted to add that we'd also have to do something about My (unordered) running list of benefits for reworking our
|
I am looking at our existing use of profiler labels (https://rakyll.org/profiler-labels/) right now and what it would take to make them usable for observability - say you're looking at a node that hovers at 95% CPU, what operation causes the CPU to be consumed? I think that in the short term, profiler labels are our best bet at developing a solution here. Profiler labels are interesting because they transition into child goroutines, meaning that if we do a good job at labeling goroutines "where the work starts" and making sure that these labels hop across RPCs, we should be in a fairly good place. Right now, it's good enough so that if some SQL operation is burning CPU, you will be able to pick out which one it is. But what isn't yet possible (as far as I can tell) is that if you start with an overloaded CPU, you can tell from the profiles conclusively that a statement is responsible for it. Sure, you see some CPU attributed to that statement, but not, like, the lion's share. It seems that most of the work done on behalf of the statement is elsewhere. I still have to understand why that is, but it's possible that it ties in with this issue. One issue is definitely that when you delegate work to a thread pool (as happens for raft operations), the association breaks. But at least for read-heavy workloads, I'm not aware of such a switching point. Possibly DistSQL has one, though. |
We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as cockroachdb#58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in issue cockroachdb#58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes cockroachdb#52894. Release note: None
We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as cockroachdb#58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in issue cockroachdb#58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes cockroachdb#52894. Release note: None
We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as cockroachdb#58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in issue cockroachdb#58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes cockroachdb#52894. Release note: None
We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as cockroachdb#58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in cockroachdb#51544 and makes sense for all of the reasons mentioned in issue cockroachdb#58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes cockroachdb#52894. Release note: None
59647: stop: rip out expensive task tracking r=knz a=tbg First commit was put up for PR separately, ignore it here. ---- We are likely going to invest more in the stopper-conferred observability in the near future as part of initiatives such as #58164, but the task tracking that has been a part of the stopper since near its conception has not proven to be useful in practice, while at the same time raising concern about stopper use in hot paths. When shutting down a running server, we don't particularly care about leaking goroutines (as the process will end anyway). In tests, we want to ensure goroutine hygiene, but if a test hangs during `Stop`, it is easier to look at the stacks to find out why than to consult the task map. Together, this left little reason to do anything more complicated than what's left after this commit: we keep track of the running number of tasks, and wait until this drops to zero. With this change in, we should feel comfortable using the stopper extensively and, for example, ensuring that any CRDB goroutine is anchored in a Stopper task; this is the right approach for test flakes such as in #51544 and makes sense for all of the reasons mentioned in issue #58164 as well. In a future change, we should make the Stopper more configurable and, through this configurability, we could in principle bring a version of the task map back (in debug builds) without backing it into the stopper, though I don't anticipate that we'll want to. Closes #52894. Release note: None 59732: backupccl: add an owner column behind the WITH PRIVILEGES option r=pbardea a=Elliebababa Previously, when users perform RESTORE, they are ignorant of the original owner. This PR gives ownership data as a column behind privileges. Resolves: #57906. Release note: None. 59746: opt: switch checks to use CrdbTestBuild instead of RaceEnabled r=RaduBerinde a=RaduBerinde The RaceEnabled flag is not very useful for checks; e.g. apparently execbuilder tests aren't run routinely in race mode. These checks are now "live" in any test build, using the crdb_test build tag. Release note: None 59747: tree: correct StatementTag of ALTER TABLE ... LOCALITY r=ajstorm a=otan Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: elliebababa <[email protected]> Co-authored-by: Radu Berinde <[email protected]> Co-authored-by: Oliver Tan <[email protected]>
This helps move the needle on cockroachdb#58164 by introducing linters that force both the use of a `ctxgroup` over an `errgroup` and prevent direct use of the `go` keyword. It's part of `make lint`, but can be invoked stand-alone as well: ``` go vet -vettool ./bin/roachvet -errgroupgo ./pkg/somewhere go vet -vettool ./bin/roachvet -nakedgo ./pkg/somewhere ``` The lint currently fails with over 200 [infractions], but I suppose a little elbow grease is all that's needed here. [infractions]: https://gist.github.com/tbg/7c1af445007650387841359de6f8fbbc Release note: None
This helps move the needle on cockroachdb#58164 by introducing linters that force both the use of a `ctxgroup` over an `errgroup` and prevent direct use of the `go` keyword. It's part of `make lint`, but can be invoked stand-alone as well: ``` go vet -vettool ./bin/roachvet -errgroupgo ./pkg/somewhere go vet -vettool ./bin/roachvet -nakedgo ./pkg/somewhere ``` Release note: None
…nd `go` This helps move the needle on cockroachdb#58164 by introducing linters that force both the use of a `ctxgroup` over an `errgroup` and prevent direct use of the `go` keyword. They are disabled by default because we need to fix the issues they find first. This can be done (for development) in `forbiddenmethod.Analyzers`. They can then also be invoked in a targeted fashion: ``` go vet -vettool ./bin/roachvet -errgroupgo ./pkg/somewhere go vet -vettool ./bin/roachvet -nakedgo ./pkg/somewhere ``` Release note: None
…nd `go` This helps move the needle on cockroachdb#58164 by introducing linters that force both the use of a `ctxgroup` over an `errgroup` and prevent direct use of the `go` keyword. They are disabled by default because we need to fix the issues they find first. This can be done (for development) in `forbiddenmethod.Analyzers`. They can then also be invoked in a targeted fashion: ``` go vet -vettool ./bin/roachvet -errgroupgo ./pkg/somewhere go vet -vettool ./bin/roachvet -nakedgo ./pkg/somewhere ``` Release note: None
…nd `go` This helps move the needle on cockroachdb#58164 by introducing linters that force both the use of a `ctxgroup` over an `errgroup` and prevent direct use of the `go` keyword. They are disabled by default because we need to fix the issues they find first. This can be done (for development) in `forbiddenmethod.Analyzers`. They can then also be invoked in a targeted fashion: ``` go vet -vettool ./bin/roachvet -errgroupgo ./pkg/somewhere go vet -vettool ./bin/roachvet -nakedgo ./pkg/somewhere ``` Release note: None
62243: forbiddenmethod: add (disabled) lint against `(*errgroup.Group).Go` and `go` r=knz,erikgrinaker,jordanlewis a=tbg This helps move the needle on #58164 by introducing linters that force both the use of a `ctxgroup` over an `errgroup` and prevent direct use of the `go` keyword. They are disabled by default because we need to fix the issues they find first. This can be done (for development) in `forbiddenmethod.Analyzers`. They can then also be invoked in a targeted fashion: ``` go vet -vettool ./bin/roachvet -errgroupgo ./pkg/somewhere go vet -vettool ./bin/roachvet -nakedgo ./pkg/somewhere ``` Release note: None 62629: docs, server: update auto-generated logging docs r=knz a=taroface The auto-generated docs at https://github.com/cockroachdb/cockroach/tree/master/docs/generated will be used as includes on the public docs site. This PR makes copyedits and some corrections to the text. - In the log format docs, I added formatting that I think will add code ticks to field names. In some cases (e.g., ‹...›) I may not have done this correctly. - The structured event description for `node_decommissioning` is not appearing correctly in eventlog.md. The generated sentence for this event does not match the corresponding line in the other event descriptions. I fixed a typo where this was written as "NodeDecommissioned" but I'm not sure if that fixes the issue. - In a few cases I pasted lines that were much longer than what was originally in the commented-out blocks. I hope this doesn't break the generation! Release note: none 63841: pgwire,logpb,eventpb: various structured logging doc updates r=taroface a=knz First commit from #62629. Fixes #63764. Fixes #63762. See individual commits for details. cc @taroface 64706: pkg/cli: new flag `--log-config-file` to simplify log configuration r=rauchenstein a=knz Fixes #64349. cc @thtruo Release note (cli change): The new parameter `--log-config-file` simplifies the process of loading the logging configuration from a YAML file. Instead of passing the content of the file via e.g. `--log=$(cat file.yaml)`, it is now possible to pass the path to the file `--log-config-file=file.yaml`. Note: each occurrence of `--log` and `--log-config-file` on the command line overrides the configuration set from previous occurrences. 64776: workload: mark tpcc idle-conns flag as runtime-only r=dt a=dt Closes #64678. Release note: none. 64777: ccl/backupccl: skip TestBackupWorkerFailure r=stevendanna a=adityamaru Refs: #64773 Reason: flaky test Generated by bin/skip-test. Release justification: non-production code changes Release note: None Co-authored-by: Tobias Grieger <[email protected]> Co-authored-by: taroface <[email protected]> Co-authored-by: Raphael 'kena' Poss <[email protected]> Co-authored-by: David Taylor <[email protected]> Co-authored-by: Aditya Maru <[email protected]>
72492: server: fix span use after Finish() r=andreimatei a=andreimatei The closure serving a pgwire connection was capturing a context with a long-lived span, and so all connections were logging to that span. That was bad, but what was even worse is that the long-lived span can end before the closures, on server shutdown. That was use-after-Finish(), which is currently reluctantly tolerated, but won't be tolerated much longer. This patch fixes it by giving each connection its own span. Touches #58164 Release note: None Co-authored-by: Andrei Matei <[email protected]>
The backup processor was spawning a naked goroutine. We don't like that very much - see cockroachdb#58164. This patch puts that goroutine under the Stopper. One benefit is that the goroutine gets its own span, so it's resilient to the parent span being Finish()ed from under it (which was a bug until the prior commit). Release note: None
The splitAndScatter processor was spawning a naked goroutine. We don't like that very much - see cockroachdb#58164. This patch puts that goroutine under the Stopper. One benefit is that the goroutine gets its own span, so it's resilient to the parent span being Finish()ed from under it (which was a bug until the prior commit). Release note: None
The backup processor was spawning a naked goroutine. We don't like that very much - see cockroachdb#58164. This patch puts that goroutine under the Stopper. One benefit is that the goroutine gets its own span, so it's resilient to the parent span being Finish()ed from under it (which was a bug until the prior commit). Release note: None
The splitAndScatter processor was spawning a naked goroutine. We don't like that very much - see cockroachdb#58164. This patch puts that goroutine under the Stopper. One benefit is that the goroutine gets its own span, so it's resilient to the parent span being Finish()ed from under it (which was a bug until the prior commit). Release note: None
The backup processor was spawning a naked goroutine. We don't like that very much - see cockroachdb#58164. This patch puts that goroutine under the Stopper. One benefit is that the goroutine gets its own span, so it's resilient to the parent span being Finish()ed from under it (which was a bug until the prior commit). Release note: None
The splitAndScatter processor was spawning a naked goroutine. We don't like that very much - see cockroachdb#58164. This patch puts that goroutine under the Stopper. One benefit is that the goroutine gets its own span, so it's resilient to the parent span being Finish()ed from under it (which was a bug until the prior commit). Release note: None
The backup processor was spawning a naked goroutine. We don't like that very much - see cockroachdb#58164. This patch puts that goroutine under the Stopper. One benefit is that the goroutine gets its own span, so it's resilient to the parent span being Finish()ed from under it (which was a bug until the prior commit). Release note: None
The splitAndScatter processor was spawning a naked goroutine. We don't like that very much - see cockroachdb#58164. This patch puts that goroutine under the Stopper. One benefit is that the goroutine gets its own span, so it's resilient to the parent span being Finish()ed from under it (which was a bug until the prior commit). Release note: None
The backup processor was spawning a naked goroutine. We don't like that very much - see cockroachdb#58164. This patch puts that goroutine under the Stopper. One benefit is that the goroutine gets its own span, so it's resilient to the parent span being Finish()ed from under it (which was a bug until the prior commit). Release note: None Release justification (bug fix): goroutines spawned by the backup processor were mutating BoundAccounts used for memory monitoring, after the BoundAccount was closed by the processor on shutdown. To prevent this, we teach the processor to wait for all goroutines to terminate before running its cleanup on shutdown.
#98269 is another example of code where a panic shouldn't bring down the node. Distsql flows should recover panics. |
Misuse of the
go
keyword is problematic for a server that desires to be HA like CockroachDB: a panic-causing stack on the resultant goroutine will crash the entire process without custom crash handling.There are already several wrappers for
go
throughout the codebase, such as Tasks and Workers creatable by the stopper. These wrappers perform cleanup and report errors to sentry, and can be extended over time. Unfortunately, they're not used consistently, and their use is not enforced.This ticket tracks removing the
go
keyword from non-test code, except for the packages that implement the wrappers, and adding a linter that bans direct use of thego
keyword except for in those places.Jira issue: CRDB-3425
The text was updated successfully, but these errors were encountered: