-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl/streamingccl/streamproducer: TestStreamPartition failed #102286
Comments
ccl/streamingccl/streamproducer.TestStreamPartition failed with artifacts on release-23.1 @ 0d8f02757d44e05435eb8de80a311fcfe625ef1b: Fatal error:
Stack:
Log preceding fatal error
Same failure on other branches
|
ccl/streamingccl/streamproducer.TestStreamPartition failed with artifacts on release-23.1 @ 86a86b15c13afe0a580deed35ce7dafea7adad78: Fatal error:
Stack:
Log preceding fatal error
|
Fixes cockroachdb#102523 Informs cockroachdb#102286 Informs cockroachdb#86206 Release note: none
Informs cockroachdb#102523 Informs cockroachdb#102286 Informs cockroachdb#86206 Release note: none
102593: c2c: skip and deflake a few unit tests r=benbardin a=msbutler c2c: skip a few unit tests Informs #102523 Informs #102286 Informs #86206 Release note: none c2c: close pgconn with correct context in test infra When a user opens a pgx connection, they pass it a context that must be used to close the connection, else pgx's contextWatcher will run indefinetly on a seperate goroutine. Previously, the `startReplication()` test helper was initializing its own context to open pgx connection, and did not pass this context back to the caller, and consequently, the user would close the connection with the incorrect context, leading to a leaky goroutine. This patch allows the user to pass a context in the startReplication(), preventing the user from closing the connection with the wrong goroutine. Fixes #102523 Release note: None Co-authored-by: Michael Butler <[email protected]>
I can reproduce this via:
and can confirm that the rangefeed does surface this error via this log line several minutes before the test times out. I suspect there's a real bug in It seems like we saw a similar bug a while back, where an internal error caused everything to hang: #85867 To get to the bottom of this, I'll need to read a bit more code. |
ccl/streamingccl/streamproducer.TestStreamPartition failed with artifacts on release-23.1 @ fa8d237846052ae72cc44b3eaebe1f38c89834f8: Fatal error:
Stack:
Log preceding fatal error
Same failure on other branches
|
On the client side, it seems that feed.Next() is hanging after we send the clear range request:
On the server side, I have confirmed the rangefeed surfaces the correct error and returns, via observing this log line in the failed test server side logs. Now, I know the eventStream (i.e. the producer sql processor), never surfaces this error because Next() never gets called for some mysterious reason. I can confirm this because these added log lines never surface in the failed test logs. We also never Close() the eventStream, as these added log lines never surface either. On the client side, next should never hang. I'll need to figure out why this occurs. This seems related to a bug fixed by #85867. But there, eventStream.Close() was called. Here, it is not. |
Running |
ccl/streamingccl/streamproducer.TestStreamPartition failed with artifacts on release-23.1 @ 4226a83871bbce776bc9389fca5cf084b4bb7632:
Parameters: Same failure on other branches
|
102876: c2c: fix rangefeed error propogration race r=stevendanna a=msbutler In the producer dist sql processor, rangefeed errors only propograte to the user if the error can be sent on a non-blocking channel. Because the channel was previously unbuffered, the dist sql processor would inadvertently swallow the error if the receiver was not actively waiting on the channel. This would then cause the sql processor to hang, as the underlying rangefeed would close after the ignored error message. This patch buffers the errCh, guaranteeing that the first rangefeed error will be processed by the sql processor. If the rangefeed surfaces several errors while the buffered channel is full, these errors will be swallowed, which is fine, as the first error will always shut down the sql processor. Fixes #102286 Release note: None Co-authored-by: Michael Butler <[email protected]>
102593: c2c: skip and deflake a few unit tests r=benbardin a=msbutler c2c: skip a few unit tests Informs cockroachdb#102523 Informs cockroachdb#102286 Informs cockroachdb#86206 Release note: none c2c: close pgconn with correct context in test infra When a user opens a pgx connection, they pass it a context that must be used to close the connection, else pgx's contextWatcher will run indefinetly on a seperate goroutine. Previously, the `startReplication()` test helper was initializing its own context to open pgx connection, and did not pass this context back to the caller, and consequently, the user would close the connection with the incorrect context, leading to a leaky goroutine. This patch allows the user to pass a context in the startReplication(), preventing the user from closing the connection with the wrong goroutine. Fixes cockroachdb#102523 Release note: None Co-authored-by: Michael Butler <[email protected]>
In the producer dist sql processor, rangefeed errors only propograte to the user if the error can be sent on a non-blocking channel. Because the channel was previously unbuffered, the dist sql processor would inadvertently swallow the error if the receiver was not actively waiting on the channel. This would then cause the sql processor to hang, as the underlying rangefeed would close after the ignored error message. This patch buffers the errCh, guaranteeing that the first rangefeed error will be processed by the sql processor. If the rangefeed surfaces several errors while the buffered channel is full, these errors will be swallowed, which is fine, as the first error will always shut down the sql processor. Fixes cockroachdb#102286 Release note: None
ccl/streamingccl/streamproducer.TestStreamPartition failed with artifacts on release-23.1 @ ec0b5087392d49980a2141315aae62bb2563348f:
Fatal error:
Stack:
Log preceding fatal error
Help
See also: How To Investigate a Go Test Failure (internal)
Same failure on other branches
This test on roachdash | Improve this report!
Jira issue: CRDB-27382
The text was updated successfully, but these errors were encountered: