Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

roachtest: acceptance/cli/node-status failed #51497

Closed
cockroach-teamcity opened this issue Jul 16, 2020 · 17 comments · Fixed by #51893
Closed

roachtest: acceptance/cli/node-status failed #51497

cockroach-teamcity opened this issue Jul 16, 2020 · 17 comments · Fixed by #51893
Assignees
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Milestone

Comments

@cockroach-teamcity
Copy link
Member

(roachtest).acceptance/cli/node-status failed on master@3fba3b99d7164f3b8efb9fd8432b7749120d09c5:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/cli/node-status/run_1
	cli.go:76,cli.go:82,acceptance.go:96,test_runner.go:757: expected [is_available is_live true true false false true true], but found [is_available is_live true true true true false false] from:
		id	address	sql_address	build	started_at	updated_at	locality	is_available	is_live
		1	10.128.15.213:26257	10.128.15.213:26257	v20.2.0-alpha.1-1272-g3fba3b99d7	2020-07-16 06:21:20.187021+00:00	2020-07-16 06:22:32.197432+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		2	10.128.15.228:26257	10.128.15.228:26257	v20.2.0-alpha.1-1272-g3fba3b99d7	2020-07-16 06:21:20.523884+00:00	2020-07-16 06:22:32.524185+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		3	10.128.15.219:26257	10.128.15.219:26257	v20.2.0-alpha.1-1272-g3fba3b99d7	2020-07-16 06:21:20.765477+00:00	2020-07-16 06:21:56.770479+00:00	cloud=gce,region=us-central1,zone=us-central1-b	false	false

	cluster.go:1539,context.go:135,cluster.go:1528,test_runner.go:826: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2093071-1594880325-02-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		3: 3845
		1: 4080
		2: dead
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/cli/node-status

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity cockroach-teamcity added branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Jul 16, 2020
@cockroach-teamcity cockroach-teamcity added this to the 20.2 milestone Jul 16, 2020
@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/cli/node-status failed on master@af0031e3004327b8d09e23e99eb9659abf7d82de:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/cli/node-status/run_1
	cli.go:76,cli.go:82,acceptance.go:96,test_runner.go:757: expected [is_available is_live true true false false true true], but found [is_available is_live true true true true false false] from:
		id	address	sql_address	build	started_at	updated_at	locality	is_available	is_live
		1	10.128.0.172:26257	10.128.0.172:26257	v20.2.0-alpha.1-1310-gaf0031e300	2020-07-17 06:22:31.677555+00:00	2020-07-17 06:23:39.183048+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		2	10.128.0.157:26257	10.128.0.157:26257	v20.2.0-alpha.1-1310-gaf0031e300	2020-07-17 06:22:32.156945+00:00	2020-07-17 06:23:39.661763+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		3	10.128.0.177:26257	10.128.0.177:26257	v20.2.0-alpha.1-1310-gaf0031e300	2020-07-17 06:22:32.478725+00:00	2020-07-17 06:23:19.762618+00:00	cloud=gce,region=us-central1,zone=us-central1-b	false	false

	cluster.go:1539,context.go:135,cluster.go:1528,test_runner.go:826: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2096175-1594966822-14-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		3: 3968
		2: dead
		1: 3939
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/cli/node-status
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/cli/node-status failed on master@d79679f3e4e86d3f85c60f5431b8d5874a95735e:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/cli/node-status/run_1
	cli.go:76,cli.go:82,acceptance.go:96,test_runner.go:757: expected [is_available is_live true true false false true true], but found [is_available is_live true true true true false false] from:
		id	address	sql_address	build	started_at	updated_at	locality	is_available	is_live
		1	10.128.0.14:26257	10.128.0.14:26257	v20.2.0-alpha.1-1337-gd79679f3e4	2020-07-18 06:19:34.436902+00:00	2020-07-18 06:20:46.443679+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		2	10.128.0.58:26257	10.128.0.58:26257	v20.2.0-alpha.1-1337-gd79679f3e4	2020-07-18 06:19:34.911818+00:00	2020-07-18 06:20:42.419492+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		3	10.128.0.71:26257	10.128.0.71:26257	v20.2.0-alpha.1-1337-gd79679f3e4	2020-07-18 06:19:35.24627+00:00	2020-07-18 06:20:19.444482+00:00	cloud=gce,region=us-central1,zone=us-central1-b	false	false

	cluster.go:1539,context.go:135,cluster.go:1528,test_runner.go:826: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2098678-1595053010-02-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		3: 3824
		1: 3938
		2: dead
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/cli/node-status
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/cli/node-status failed on master@b901074cd8c2b115affb2b8f5dd89d84e5cf6e32:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/cli/node-status/run_1
	cli.go:76,cli.go:82,acceptance.go:96,test_runner.go:757: expected [is_available is_live true true false false true true], but found [is_available is_live true true true true false false] from:
		id	address	sql_address	build	started_at	updated_at	locality	is_available	is_live
		1	10.128.0.33:26257	10.128.0.33:26257	v20.2.0-alpha.1-1351-gb901074cd8	2020-07-19 06:17:18.640064+00:00	2020-07-19 06:18:30.648029+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		2	10.128.0.47:26257	10.128.0.47:26257	v20.2.0-alpha.1-1351-gb901074cd8	2020-07-19 06:17:19.107338+00:00	2020-07-19 06:18:26.615224+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		3	10.128.0.55:26257	10.128.0.55:26257	v20.2.0-alpha.1-1351-gb901074cd8	2020-07-19 06:17:19.427206+00:00	2020-07-19 06:18:03.729008+00:00	cloud=gce,region=us-central1,zone=us-central1-b	false	false

	cluster.go:1539,context.go:135,cluster.go:1528,test_runner.go:826: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2100039-1595139315-15-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		3: 3840
		1: 4043
		2: dead
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/cli/node-status
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/cli/node-status failed on master@a0123f1bc050f67b942ff1e36181847f0edb3e10:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/cli/node-status/run_1
	cli.go:76,cli.go:82,acceptance.go:96,test_runner.go:757: expected [is_available is_live true true false false true true], but found [is_available is_live true true true true false false] from:
		id	address	sql_address	build	started_at	updated_at	locality	is_available	is_live
		1	10.128.0.111:26257	10.128.0.111:26257	v20.2.0-alpha.1-1355-ga0123f1bc0	2020-07-20 06:07:07.443825+00:00	2020-07-20 06:08:19.450953+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		2	10.128.0.144:26257	10.128.0.144:26257	v20.2.0-alpha.1-1355-ga0123f1bc0	2020-07-20 06:07:07.911364+00:00	2020-07-20 06:08:15.419189+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		3	10.128.0.38:26257	10.128.0.38:26257	v20.2.0-alpha.1-1355-ga0123f1bc0	2020-07-20 06:07:08.228314+00:00	2020-07-20 06:07:53.708796+00:00	cloud=gce,region=us-central1,zone=us-central1-b	false	false

	cluster.go:1539,context.go:135,cluster.go:1528,test_runner.go:826: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2101174-1595225109-03-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		1: 4058
		3: 3832
		2: dead
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/cli/node-status
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/cli/node-status failed on master@8354896f7aa141132765c366ae7ddce9e9e7f361:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/cli/node-status/run_1
	cli.go:76,cli.go:82,acceptance.go:96,test_runner.go:757: expected [is_available is_live true true false false true true], but found [is_available is_live true true true true false false] from:
		id	address	sql_address	build	started_at	updated_at	locality	is_available	is_live
		1	10.128.0.53:26257	10.128.0.53:26257	v20.2.0-alpha.1-1392-g8354896f7a	2020-07-21 06:07:29.956082+00:00	2020-07-21 06:08:41.953753+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		2	10.128.0.35:26257	10.128.0.35:26257	v20.2.0-alpha.1-1392-g8354896f7a	2020-07-21 06:07:30.360399+00:00	2020-07-21 06:08:42.361147+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		3	10.128.0.9:26257	10.128.0.9:26257	v20.2.0-alpha.1-1392-g8354896f7a	2020-07-21 06:07:30.561568+00:00	2020-07-21 06:08:18.920961+00:00	cloud=gce,region=us-central1,zone=us-central1-b	false	false

	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:826: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2104542-1595311333-10-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		2: dead
		1: 4106
		3: 3854
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/cli/node-status
Related:

See this test on roachdash
powered by pkg/cmd/internal/issues

@knz
Copy link
Contributor

knz commented Jul 21, 2020

Found the cause of this: the node status command does not ORDER BY the node ID.

Due to recent changes in SQL executions, the rows are not ordered any more. The command o therwise appears to report the values expected by the test.

The CLI code must be changed to sort the rows, or the test must be changed to expect the rows out of order.

cc @tbg for triage and prioritization. I might suggest this as a task for @irfansharif if irfan is interested?

@irfansharif
Copy link
Contributor

Hm, strange, I see that the cli code already sorts by node id.

cockroach/pkg/cli/node.go

Lines 227 to 229 in f10ba17

case 0:
query := makeQuery(queryString + " ORDER BY id")
return runQuery(conn, query, false)

@irfansharif irfansharif self-assigned this Jul 21, 2020
@irfansharif
Copy link
Contributor

Huh, this is a bit bizzare.

06:22:34 test.go:325: test failure: 	cli.go:76,cli.go:82,acceptance.go:96,test_runner.go:757: expected [is_available is_live true true false false true true], but found [is_available is_live true true true true false false] from:
		id	address	sql_address	build	started_at	updated_at	locality	is_available	is_live
		1	10.128.15.213:26257	10.128.15.213:26257	v20.2.0-alpha.1-1272-g3fba3b99d7	2020-07-16 06:21:20.187021+00:00	2020-07-16 06:22:32.197432+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		2	10.128.15.228:26257	10.128.15.228:26257	v20.2.0-alpha.1-1272-g3fba3b99d7	2020-07-16 06:21:20.523884+00:00	2020-07-16 06:22:32.524185+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		3	10.128.15.219:26257	10.128.15.219:26257	v20.2.0-alpha.1-1272-g3fba3b99d7	2020-07-16 06:21:20.765477+00:00	2020-07-16 06:21:56.770479+00:00	cloud=gce,region=us-central1,zone=us-central1-b	false	false

It is sorting by node ID, but for n3 it thinks its address is 10.128.15.219. But the node stopped in this test is n2, which looking at the logs, is in fact stopped.

But the logs are further confusing still.

Looking at 1.logs/n1

W200716 06:21:57.046283 1110 kv/kvserver/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Unavailable desc = transport is closing:
W200716 06:21:57.046292 1699 kv/kvserver/raft_transport.go:637  [n1] while processing outgoing Raft queue to node 3: rpc error: code = Unavailable desc = transport is closing:

Looking at 3.logs/n3 (??)

W200716 06:21:57.050108 954 kv/kvserver/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Unavailable desc = transport is closing:
W200716 06:21:57.050107 847 kv/kvserver/raft_transport.go:637  [n2] while processing outgoing Raft queue to node 3: rpc error: code = Unavailable desc = transport is closing:

We've crossed wires at some point, and are confusing n2 and n3 for each other (probably in the test code).

@irfansharif
Copy link
Contributor

irfansharif commented Jul 22, 2020

(This repros pretty readily.)

@irfansharif
Copy link
Contributor

I think I've broken roachprod start --sequential. Now that we start all nodes in tandem, and then issue an explicit init, I don't think we automagically get serialized node ID allocations in host-order. What we're seeing above is just a test that depends on node 2 to be n2, and so on, which we're no longer guaranteeing.

@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/cli/node-status failed on master@e9a4f83e3eee59510f97db2c6e0df9b57cf6b944:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/cli/node-status/run_1
	cli.go:76,cli.go:82,acceptance.go:96,test_runner.go:757: expected [is_available is_live true true false false true true], but found [is_available is_live true true true true false false] from:
		id	address	sql_address	build	started_at	updated_at	locality	is_available	is_live
		1	10.128.0.204:26257	10.128.0.204:26257	v20.2.0-alpha.1-1427-ge9a4f83e3e	2020-07-22 06:22:11.910125+00:00	2020-07-22 06:23:23.919708+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		2	10.128.15.253:26257	10.128.15.253:26257	v20.2.0-alpha.1-1427-ge9a4f83e3e	2020-07-22 06:22:12.378388+00:00	2020-07-22 06:23:19.887093+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		3	10.128.0.203:26257	10.128.0.203:26257	v20.2.0-alpha.1-1427-ge9a4f83e3e	2020-07-22 06:22:12.646376+00:00	2020-07-22 06:22:53.217456+00:00	cloud=gce,region=us-central1,zone=us-central1-b	false	false

	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:826: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2107908-1595398673-03-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		1: 4035
		3: 3843
		2: dead
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/cli/node-status

See this test on roachdash
powered by pkg/cmd/internal/issues

@irfansharif
Copy link
Contributor

Brief update here: just rewrote roachprod start, I was finding it a bit inaccessible to change and I kept introducing bugs to it because of it.

@irfansharif
Copy link
Contributor

(Also it was because I'd broken --sequential, and a lot of tests depend on that behavior existing by default. Reminder to stay away from critical roachprod code going forward.)

irfansharif added a commit to irfansharif/cockroach that referenced this issue Jul 22, 2020
This is quite the workhorse, and does a lot and has to be compatible
with a lot of existing CRDB versions. It's grown organically as a result
and I'm finding it a bit difficult to maintain, breaking it down a bit
makes it clearer what the structure of it all is and would've perhaps
prevented me introducing bugs like cockroachdb#51497.

Do scrutinize the PR closely, we use `roachprod start` everywhere. It's
mostly mindless code movement but I did sneak in the fix for cockroachdb#51497
where I'd broken node ID assignments for when `roachprod start` is
called with the `--sequential` flag (true by default). I did so by
explicitly initializing the first node, and then having the remaining
nodes join on to it.

Release note: None
@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/cli/node-status failed on master@b8a50cc4d062293915969cdc83e3ec4d057cede5:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/cli/node-status/run_1
	cli.go:76,cli.go:82,acceptance.go:96,test_runner.go:757: expected [is_available is_live true true false false true true], but found [is_available is_live true true true true false false] from:
		id	address	sql_address	build	started_at	updated_at	locality	is_available	is_live
		1	10.128.0.133:26257	10.128.0.133:26257	v20.2.0-alpha.1-1449-gb8a50cc4d0	2020-07-23 06:03:03.244792+00:00	2020-07-23 06:04:10.753151+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		2	10.128.0.120:26257	10.128.0.120:26257	v20.2.0-alpha.1-1449-gb8a50cc4d0	2020-07-23 06:03:03.706698+00:00	2020-07-23 06:04:11.212337+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		3	10.128.0.100:26257	10.128.0.100:26257	v20.2.0-alpha.1-1449-gb8a50cc4d0	2020-07-23 06:03:04.03395+00:00	2020-07-23 06:03:45.257726+00:00	cloud=gce,region=us-central1,zone=us-central1-b	false	false

	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:826: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2111252-1595484018-11-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		1: 4031
		3: 3793
		2: dead
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/cli/node-status

See this test on roachdash
powered by pkg/cmd/internal/issues

craig bot pushed a commit that referenced this issue Jul 24, 2020
51243: kv/kvserver: clarify declared keys for RequestLease request r=nvanbenschoten a=nvanbenschoten

The request type does not acquire latches but does still declare keys, which was confusing without an accompanying comment.

51525: sql: handle UPSERTs for partial indexes r=mgartner a=mgartner

This commit allows partial indexes to maintain a consistent state when
`UPSERT` statements are issued on the table. There were some structural
changes made in order to better facilitate this functionality.

First, `optbuilder` now keeps track of the column IDs of synthesized
partial index predicate columns rather than ordinals. This simplifies
the complex scoping logic needed for `UPSERT`s.

Second, this commit introduces the PartialIndexUpdateManager which
helps track which partial indexes need to be updated for a given row.
Instead of passing around two `util.FastIntSet`s, this single struct is
now used. It also de-duplicates code for interpretting the synthesized
partial index predicate columns.

Fixes #50222

Release note: None


51790: roachprod: rewrite `roachprod start` r=irfansharif a=irfansharif

This is quite the workhorse, and does a lot and has to be compatible
with a lot of existing CRDB versions. It's grown organically as a result
and I'm finding it a bit difficult to maintain, breaking it down a bit
makes it clearer what the structure of it all is and would've perhaps
prevented me introducing bugs like #51497.

Do scrutinize the PR closely, we use `roachprod start` everywhere. It's
mostly mindless code movement but it's pretty fragile code.

Release note: None

51858: builtins: add ST_RelatePattern r=rytaft a=otan

Release note (sql change): Add the ST_RelatePattern builtin, which
returns whether a given DE-9IM intersection matrix matches a given
pattern.

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Marcus Gartner <[email protected]>
Co-authored-by: irfan sharif <[email protected]>
Co-authored-by: Oliver Tan <[email protected]>
@cockroach-teamcity
Copy link
Member Author

(roachtest).acceptance/cli/node-status failed on master@bfa6307c292ef4dfed4a53cb99f506e6dab26533:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/acceptance/cli/node-status/run_1
	cli.go:76,cli.go:82,acceptance.go:96,test_runner.go:757: expected [is_available is_live true true false false true true], but found [is_available is_live true true true true false false] from:
		id	address	sql_address	build	started_at	updated_at	locality	is_available	is_live
		1	10.128.0.117:26257	10.128.0.117:26257	v20.2.0-alpha.1-1482-gbfa6307c29	2020-07-24 06:14:16.156241+00:00	2020-07-24 06:15:23.6626+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		2	10.128.0.118:26257	10.128.0.118:26257	v20.2.0-alpha.1-1482-gbfa6307c29	2020-07-24 06:14:16.385042+00:00	2020-07-24 06:15:23.890605+00:00	cloud=gce,region=us-central1,zone=us-central1-b	true	true
		3	10.128.0.160:26257	10.128.0.160:26257	v20.2.0-alpha.1-1482-gbfa6307c29	2020-07-24 06:14:16.702739+00:00	2020-07-24 06:15:10.193716+00:00	cloud=gce,region=us-central1,zone=us-central1-b	false	false

	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:826: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2114210-1595571129-16-n4cpu4 --oneshot --ignore-empty-nodes: exit status 1 4: skipped
		1: 4044
		3: 4012
		2: dead
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /acceptance/cli/node-status

See this test on roachdash
powered by pkg/cmd/internal/issues

@knz
Copy link
Contributor

knz commented Jul 24, 2020

I believe this has failed after the roachprod update. Irfan mind having a look?

@irfansharif
Copy link
Contributor

The roachprod update didn't actually land the fix, it just landed the rewrite of the roachprod start that makes it easier to land the fix (which I'm sending out now).

irfansharif added a commit to irfansharif/cockroach that referenced this issue Jul 24, 2020
..and the setting of cluster settings for single node clusters.
`roachprod start --sequential` was broken in cockroachdb#51329, and the broken-ness
outlined in TODOs in cockroachdb#51790. This PR just addresses those TODOs.

Fixes cockroachdb#51497
Fixes cockroachdb#51721
Fixes cockroachdb#51738
Fixes cockroachdb#51768
Fixes cockroachdb#51769
Fixes cockroachdb#51776

Release note: None
craig bot pushed a commit that referenced this issue Jul 25, 2020
51893: roachprod: fixup `roachprod --sequential` r=irfansharif a=irfansharif

..and the setting of cluster settings for single node clusters.
`roachprod start --sequential` was broken in #51329, and the broken-ness
outlined in TODOs in #51790. This PR just addresses those TODOs.

Fixes #51497
Fixes #51721
Fixes #51738
Fixes #51768
Fixes #51769
Fixes #51776

Release note: None

Co-authored-by: irfan sharif <[email protected]>
@craig craig bot closed this as completed in 6d6706b Jul 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
branch-master Failures and bugs on the master branch. C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants