Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: c2c/shutdown/src/coordinator failed #110166

Closed
cockroach-teamcity opened this issue Sep 7, 2023 · 2 comments · Fixed by #110218
Closed

roachtest: c2c/shutdown/src/coordinator failed #110166

cockroach-teamcity opened this issue Sep 7, 2023 · 2 comments · Fixed by #110218
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery
Milestone

Comments

@cockroach-teamcity
Copy link
Member

cockroach-teamcity commented Sep 7, 2023

roachtest.c2c/shutdown/src/coordinator failed with artifacts on master @ 4e79c65617e3f75dd39da5a9cb5544b445ee1c39:

(monitor.go:153).Wait: monitor failure: getting the job status: read tcp 172.17.0.3:56102 -> 34.148.176.220:26257: read: connection reset by peer
(assertions.go:333).Fail: 
	Error Trace:	github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:780
	            				github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:1261
	            				main/pkg/cmd/roachtest/monitor.go:119
	            				golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:75
	            				GOROOT/src/runtime/asm_amd64.s:1594
	Error:      	"2023-09-07 10:26:57.027323 +0000 UTC" is not greater than or equal to "2023-09-07 10:27:55.391844 +0000 UTC"
	Test:       	c2c/shutdown/src/coordinator
	Messages:   	cannot cutover to a time below the retained time (did the test already fail?)
(require.go:625).GreaterOrEqual: FailNow called
test artifacts and logs in: /artifacts/c2c/shutdown/src/coordinator/run_1

Parameters: ROACHTEST_arch=amd64 , ROACHTEST_cloud=gce , ROACHTEST_cpu=8 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-31294

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery labels Sep 7, 2023
@cockroach-teamcity cockroach-teamcity added this to the 23.2 milestone Sep 7, 2023
@msbutler msbutler self-assigned this Sep 7, 2023
@stevendanna
Copy link
Collaborator

stevendanna commented Sep 7, 2023

@msbutler I think this is #110095 which I hit yesterday when stressing this test as well.

@msbutler
Copy link
Collaborator

msbutler commented Sep 7, 2023

you're right. I'm seeing it in logs/5.unredacted/cockroach-stderr.log. I also plan to open PR that shuts down the main roachtest driver as soon as the jobShutdownExecutor encounters an error.

@dt dt linked a pull request Sep 7, 2023 that will close this issue
@craig craig bot closed this as completed in #110218 Sep 8, 2023
msbutler added a commit to msbutler/cockroach that referenced this issue Sep 8, 2023
…utor fails

During cockroachdb#110166, the c2c/shutdown test fataled while the job shutdown executor
was running, yet the test kept running for quite a while because the goroutine
that manages the c2c job had not realized the test failed. This patch refactors
the c2c/shutdown tests such that when the job shutdown executor detects a
failure, it cancels the context used by the goroutine managing the c2c job.

Informs cockroachdb#110166

Release note: none
craig bot pushed a commit that referenced this issue Sep 11, 2023
109439: changefeedccl: Emit span resolved event when end time reached  r=miretskiy a=miretskiy

Changefeed supports a mode where the user wants to emit
all events that occurred since some time in the past (`cursor`),
and end the changefeed (`end_time) at the time in the near future.

In this mode, the rangefeed catchup scan starting from `cursor`
position could take some time -- maybe even a lot of time --
and in this case, the very first checkpoint kvfeed will observe
will be after `end_time`.  All of the events, including
checkpoints after `end_time` are skipped, as they should.

However, this meant that no changefeed checkpoint
records could be produced until entire changefeed completes.
This PR ensures that once the `end_time` is reached, we will
emit 1 "resolved event" for that span, so that changefeed
can produce span based checkpoint if needed.

Fixes #108464

Release note: None

110267: roachtest: during c2c/shutdown, shutdown main driver if shutdown executor fails r=stevendanna a=msbutler

During #110166, the c2c/shutdown test fataled while the job shutdown executor was running, yet the test kept running for quite a while because the goroutine that manages the c2c job had not realized the test failed. This patch refactors the c2c/shutdown tests such that when the job shutdown executor detects a failure, it cancels the context used by the goroutine managing the c2c job.

Informs #110166

Release note: none

110329: rangefeed: reuse annotated context in `ScheduledProcessor.process()` r=erikgrinaker a=erikgrinaker

Context construction is expensive enough to show up in CPU profiles. With 20k rangefeeds/node on an idle cluster, this made up 1% of overall CPU usage, or 4% of rangefeed scheduler CPU usage.

Epic: none
Release note: None

Co-authored-by: Yevgeniy Miretskiy <[email protected]>
Co-authored-by: Michael Butler <[email protected]>
Co-authored-by: Erik Grinaker <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. T-disaster-recovery
Projects
No open projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants