-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spanconfig/spanconfigkvsubscriber: TestBlockedKVSubscriberDisablesQueues failed #128453
Comments
Looks similar to #122608 |
Seems stuck on stopping the test cluster, when the defer is executed in last line of the test:
|
Unclear why the test reports timing out at 41 minutes. From logs, the test cluster has been running between |
2 goroutines in
|
Presumably, |
The goroutine dump is the last thing printed in the logs, but it's not a fatal message. So I think the test has really been stuck in this state for 40-ish minutes, but everything was nearly shut down, except there is some deadlock preventing all the tasks being shut down. |
Removing the |
The only task in the stacks dump is the following, and it appears to be stuck:
|
This would be the observability team cc @dhartunian |
The buffer ingester could get stuck during closing because it would independently evaluate a `running` bool and then try to write to a channel with a buffer of 1. This channel could have accumulated a payload prior to the ingester shutting down and would then block forever. Instead of using the boolean, the flush operation now selects on a channel that we close on shutdown. This ensure that no matter how many extra flushes we have blocked, they will all get cancelled. Resolves: cockroachdb#128453 Epic: None Release note: None
128713: sqlstats: add close chan to buffer ingester r=xinhaoz a=dhartunian The buffer ingester could get stuck during closing because it would independently evaluate a `running` bool and then try to write to a channel with a buffer of 1. This channel could have accumulated a payload prior to the ingester shutting down and would then block forever. Instead of using the boolean, the flush operation now selects on a channel that we close on shutdown. This ensure that no matter how many extra flushes we have blocked, they will all get cancelled. Resolves: #128453 Epic: None Release note: None Co-authored-by: David Hartunian <[email protected]>
Based on the specified backports for linked PR #128713, I applied the following new label(s) to this issue: branch-release-24.2. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
The buffer ingester could get stuck during closing because it would independently evaluate a `running` bool and then try to write to a channel with a buffer of 1. This channel could have accumulated a payload prior to the ingester shutting down and would then block forever. Instead of using the boolean, the flush operation now selects on a channel that we close on shutdown. This ensure that no matter how many extra flushes we have blocked, they will all get cancelled. Resolves: #128453 Epic: None Release note: None
The buffer ingester could get stuck during closing because it would independently evaluate a `running` bool and then try to write to a channel with a buffer of 1. This channel could have accumulated a payload prior to the ingester shutting down and would then block forever. Instead of using the boolean, the flush operation now selects on a channel that we close on shutdown. This ensure that no matter how many extra flushes we have blocked, they will all get cancelled. Resolves: cockroachdb#128453 Epic: None Release note: None
spanconfig/spanconfigkvsubscriber.TestBlockedKVSubscriberDisablesQueues failed on release-24.2.0-rc @ 3d1bcc1df630c10c4065a22fc846daa911b436d6:
Fatal error:
Stack:
Log preceding fatal error
Parameters:
attempt=1
race=true
run=1
shard=1
Help
See also: How To Investigate a Go Test Failure (internal)
This test on roachdash | Improve this report!
Jira issue: CRDB-41025
The text was updated successfully, but these errors were encountered: