roachtest: clearrange/checks=true failed #44845

cockroach-teamcity · 2020-02-07T09:37:36Z

(roachtest).clearrange/checks=true failed on release-19.1@407017cad14dfa63f19578055082dc10f3283cc4:

		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - error with embedded safe details: output in %s
		    -- arg 1: <string>
		  - output in run_080356.531_n1_cockroach_workload_fixtures_import_bank:
		  - error with attached stack trace:
		    main.execCmd
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:406
		    main.(*cluster).RunL
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2019
		    main.(*cluster).RunE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2000
		    main.(*cluster).Run
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1933
		    main.runClearRange
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/clearrange.go:47
		    main.registerClearRange.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/clearrange.go:33
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:741
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - error with embedded safe details: %s returned:
		    stderr:
		    %s
		    stdout:
		    %s
		    -- arg 1: <string>
		    -- arg 2: <string>
		    -- arg 3: <string>
		  - /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1734688-1581059457-26-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		    stderr:
		    I200207 08:03:57.341594 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		    I200207 09:35:34.295700 66 ccl/workloadccl/fixture.go:516  imported bank (1h31m37s, 0 rows, 0 index entries, 0 B)
		    Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		    Error:  exit status 1
		    
		    stdout::
		  - exit status 1

	cluster.go:1410,context.go:135,cluster.go:1399,test_runner.go:778: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1734688-1581059457-26-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 9: dead
		1: 3489
		2: 3315
		3: 3379
		5: 3775
		4: 3403
		7: 3375
		10: 3422
		8: 3377
		6: 3462
		Error:  9: dead

More

Artifacts: /clearrange/checks=true
Related:

roachtest: clearrange/checks=true failed #43044 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-master

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

The text was updated successfully, but these errors were encountered:

cockroach-teamcity · 2020-02-26T09:51:53Z

(roachtest).clearrange/checks=true failed on release-19.1@ffbadbb6e8ac7d7376611e9487f505428a24d90d:

		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - error with embedded safe details: output in %s
		    -- arg 1: <string>
		  - output in run_080614.005_n1_cockroach_workload_fixtures_import_bank:
		  - error with attached stack trace:
		    main.execCmd
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:406
		    main.(*cluster).RunL
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2019
		    main.(*cluster).RunE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2000
		    main.(*cluster).Run
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1933
		    main.runClearRange
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/clearrange.go:47
		    main.registerClearRange.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/clearrange.go:33
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:741
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - error with embedded safe details: %s returned:
		    stderr:
		    %s
		    stdout:
		    %s
		    -- arg 1: <string>
		    -- arg 2: <string>
		    -- arg 3: <string>
		  - /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1766912-1582701158-25-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		    stderr:
		    I200226 08:06:14.774554 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		    I200226 09:49:53.190754 15 ccl/workloadccl/fixture.go:516  imported bank (1h43m38s, 0 rows, 0 index entries, 0 B)
		    Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		    Error:  exit status 1
		    
		    stdout::
		  - exit status 1

	cluster.go:1410,context.go:135,cluster.go:1399,test_runner.go:778: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1766912-1582701158-25-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 5: dead
		8: 4729
		7: 4105
		1: 4328
		2: 4223
		10: 4218
		3: 4019
		4: 4654
		6: 4600
		9: 4238
		Error:  5: dead

More

Artifacts: /clearrange/checks=true
Related:

roachtest: clearrange/checks=true failed #43044 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-master

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

cockroach-teamcity · 2020-02-27T09:41:06Z

(roachtest).clearrange/checks=true failed on release-19.1@1fcf7104d19c5c7634cfb52c4302bc9e70c4b9ea:

		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - error with embedded safe details: output in %s
		    -- arg 1: <string>
		  - output in run_080414.271_n1_cockroach_workload_fixtures_import_bank:
		  - error with attached stack trace:
		    main.execCmd
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:406
		    main.(*cluster).RunL
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2019
		    main.(*cluster).RunE
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:2000
		    main.(*cluster).Run
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/cluster.go:1933
		    main.runClearRange
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/clearrange.go:47
		    main.registerClearRange.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/clearrange.go:33
		    main.(*testRunner).runTest.func2
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:741
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357
		  - error with embedded safe details: %s returned:
		    stderr:
		    %s
		    stdout:
		    %s
		    -- arg 1: <string>
		    -- arg 2: <string>
		    -- arg 3: <string>
		  - /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1770229-1582787440-26-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		    stderr:
		    I200227 08:04:15.086220 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		    I200227 09:39:06.576459 25 ccl/workloadccl/fixture.go:516  imported bank (1h34m51s, 0 rows, 0 index entries, 0 B)
		    Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		    Error:  exit status 1
		    
		    stdout::
		  - exit status 1

	cluster.go:1410,context.go:135,cluster.go:1399,test_runner.go:778: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1770229-1582787440-26-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 7: dead
		2: 3833
		8: 3780
		5: 3734
		3: 3807
		6: 3789
		1: 3848
		9: 3746
		4: 4186
		10: 3755
		Error:  7: dead

More

Artifacts: /clearrange/checks=true
Related:

roachtest: clearrange/checks=true failed #43044 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-master

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

cockroach-teamcity · 2020-04-07T08:40:11Z

(roachtest).clearrange/checks=true failed on release-19.1@ca235a18adac0241b4e3baf144c7ff7689d952c9:

		    stderr:
		    %s
		    stdout:
		    %s
		    -- arg 1: <string>
		    -- arg 2: <string>
		    -- arg 3: <string>
		  - /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1857128-1586240245-26-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		    stderr:
		    I200407 07:09:31.651336 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		    I200407 08:37:55.047415 15 ccl/workloadccl/fixture.go:516  imported bank (1h28m23s, 0 rows, 0 index entries, 0 B)
		    Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		    Error: DEAD_ROACH_PROBLEM:
		      - error with user detail: Node 1. Command with error:
		        ```
		        ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
		        ```
		      - exit status 1
		    
		    stdout::
		  - exit status 30

	cluster.go:1410,context.go:135,cluster.go:1399,test_runner.go:825: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1857128-1586240245-26-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 2: dead
		8: 3751
		1: 3813
		3: 3736
		7: 3699
		6: 3730
		5: 4180
		10: 3748
		9: 3726
		4: 3730
		Error: UNCLASSIFIED_PROBLEM:
		  - 2: dead
		    main.glob..func13
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1129
		    main.wrap.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:272
		    github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).execute
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:766
		    github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).ExecuteC
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:852
		    github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).Execute
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:800
		    main.main
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1793
		    runtime.main
		    	/usr/local/go/src/runtime/proc.go:203
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357

More

Artifacts: /clearrange/checks=true

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

cockroach-teamcity · 2020-04-09T08:49:20Z

(roachtest).clearrange/checks=true failed on release-19.1@c406bb10543ca97010c64cc230a3c45690a7eb6c:

		    stderr:
		    %s
		    stdout:
		    %s
		    -- arg 1: <string>
		    -- arg 2: <string>
		    -- arg 3: <string>
		  - /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1862644-1586413264-27-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned:
		    stderr:
		    I200409 07:18:35.106734 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		    I200409 08:47:01.073076 67 ccl/workloadccl/fixture.go:516  imported bank (1h28m26s, 0 rows, 0 index entries, 0 B)
		    Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		    Error: DEAD_ROACH_PROBLEM:
		      - error with user detail: Node 1. Command with error:
		        ```
		        ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
		        ```
		      - exit status 1
		    
		    stdout::
		  - exit status 30

	cluster.go:1420,context.go:135,cluster.go:1409,test_runner.go:825: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1862644-1586413264-27-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 8: dead
		6: 4127
		2: 4562
		5: 3685
		1: 4415
		10: 4034
		7: 4589
		9: 4213
		3: 4035
		4: 4509
		Error: UNCLASSIFIED_PROBLEM:
		  - 8: dead
		    main.glob..func13
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1129
		    main.wrap.func1
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:272
		    github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).execute
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:766
		    github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).ExecuteC
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:852
		    github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).Execute
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:800
		    main.main
		    	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1793
		    runtime.main
		    	/usr/local/go/src/runtime/proc.go:203
		    runtime.goexit
		    	/usr/local/go/src/runtime/asm_amd64.s:1357

More

Artifacts: /clearrange/checks=true

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

cockroach-teamcity · 2020-05-03T08:57:21Z

(roachtest).clearrange/checks=true failed on release-19.1@d556976a57c52e188157469ec9a64d8f388a79e9:

		Wraps: (2) 2 safe details enclosed
		Wraps: (3) output in run_071514.728_n1_cockroach_workload_fixtures_import_bank
		Wraps: (4) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1916036-1588486750-28-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned
		  | stderr:
		  | I200503 07:15:15.500419 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		  | I200503 08:55:12.857591 15 ccl/workloadccl/fixture.go:516  imported bank (1h39m57s, 0 rows, 0 index entries, 0 B)
		  | Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		  | Error: DEAD_ROACH_PROBLEM: exit status 1
		  | (1) DEAD_ROACH_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ```
		  |   | ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cockroach (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (5) exit status 30
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *exec.ExitError

	cluster.go:1481,context.go:135,cluster.go:1470,test_runner.go:825: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1916036-1588486750-28-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 7: dead
		3: 4222
		5: 3818
		1: 4541
		4: 4362
		2: 4366
		8: 4513
		6: 4666
		9: 4241
		10: 4720
		Error: UNCLASSIFIED_PROBLEM: 7: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) 7: dead
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1129
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:272
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:766
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:852
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:800
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1799
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Error types: (1) errors.Unclassified (2) *errors.fundamental

More

Artifacts: /clearrange/checks=true

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

petermattis · 2020-05-03T16:07:38Z

In the most recent failure, node 7 died with:

F200503 08:54:46.943158 184 storage/replica_raft.go:927  [n7,s7,r21011/3:/Table/53/1/5999{4141-7413}] during sideloading: during sideloading: IO error: No space left on deviceWhile appending to file: /mnt/data1/cockroach/auxiliary/sideloading/r2XXXX/r21011/i23.t7: No space left on device

Looks like this happened during the import phase of the test, which is surprising. The last compaction stats output to the logs show:

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.0      0.0     0.0      0.0       7.2      7.2       0.1   0.0      0.0    212.6        34       128    0.270       0      0
  L2      0/0    0.00 KB   0.0      3.9     2.5      1.4       3.9      2.5       0.0   1.5    212.2    210.3        19        38    0.498   2147K    63K
  L3      4/0   12.42 KB   0.0      8.9     5.4      3.4       8.8      5.4       1.5   1.6    159.9    159.0        57       119    0.477   4319K    36K
  L4    168/0   633.63 MB   0.5      6.2     3.4      2.8       6.1      3.4       4.4   1.8    162.3    160.7        39       265    0.147   1928K    17K
  L5   1096/1   20.66 GB   1.3     12.1     1.2     11.0       8.5     -2.4     106.4   7.4     74.0     52.1       167      3811    0.044   2516K   390K
  L6   8236/1   167.49 GB   0.0    248.7    79.6    169.2     239.9     70.8      96.7   3.0    157.1    151.5      1622      5852    0.277     26M  1473K
 Sum   9504/2   188.77 GB   0.0    279.8    92.1    187.7     274.4     86.7     209.1   1.4    147.8    145.0      1938     10213    0.190     36M  1981K

That seems reasonable, and not terribly different from another node:

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.0      0.0     0.0      0.0      18.2     18.2       0.3   0.1      0.0    266.0        70       286    0.244       0      0
  L2     15/0   55.94 MB   0.9     24.5    17.1      7.3      24.5     17.1       0.3   1.4    145.0    145.0       173       239    0.723   6010K    65K
  L3     23/0   78.31 MB   0.5     13.2    10.8      2.4      13.2     10.8       7.7   1.2    130.8    130.8       104       505    0.205   3287K    16K
  L4    151/1   890.70 MB   1.0     20.5    10.8      9.7      20.4     10.7       7.9   1.9    118.9    118.3       177       875    0.202   6192K    29K
  L5   2087/5   13.52 GB   1.0     11.4     4.9      6.6      10.8      4.3      14.7   2.2     94.5     89.6       124       555    0.223   3106K    85K
  L6   6451/0   157.90 GB   0.0     80.5     0.6     80.0      72.9     -7.1     165.0 125.0     65.8     59.6      1253      5887    0.213   8293K   837K
 Sum   8727/6   172.42 GB   0.0    150.2    44.2    105.9     159.9     54.0     195.9   0.9     81.0     86.2      1899      8347    0.228     26M  1034K

Not sure what happened here. Perhaps a lot of disk space is being used elsewhere.

cockroach-teamcity · 2020-05-26T08:36:31Z

(roachtest).clearrange/checks=true failed on release-19.1@cd9ecd90d2ce0f5caf362d6ffa6f782e91640837:

		Wraps: (3) output in run_070453.339_n1_cockroach_workload_fixtures_import_bank
		Wraps: (4) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1967593-1590473552-27-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned
		  | stderr:
		  | I200526 07:04:54.114326 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		  | I200526 08:34:07.142888 23 ccl/workloadccl/fixture.go:516  imported bank (1h29m13s, 0 rows, 0 index entries, 0 B)
		  | Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		  | Error: DEAD_ROACH_PROBLEM: exit status 1
		  | (1) DEAD_ROACH_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ```
		  |   | ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cockroach (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (5) exit status 30
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *exec.ExitError

	cluster.go:1481,context.go:135,cluster.go:1470,test_runner.go:825: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1967593-1590473552-27-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 6: dead
		7: 3739
		2: 3728
		8: 3714
		5: 3730
		1: 3812
		3: 4114
		10: 3706
		4: 3766
		9: 3698
		Error: UNCLASSIFIED_PROBLEM: 6: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1129
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:272
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:766
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:852
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:800
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1799
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (3) 6: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errors.errorString

More

Artifacts: /clearrange/checks=true
Related:

roachtest: clearrange/checks=true failed #48414 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202005041945_v19.1.9 release-blocker

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

cockroach-teamcity · 2020-05-31T08:33:59Z

(roachtest).clearrange/checks=true failed on release-19.1@73a373fb8c138c8ef6e4a05d7c1757207efa0a8d:

		Wraps: (3) output in run_070651.858_n1_cockroach_workload_fixtures_import_bank
		Wraps: (4) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-1980374-1590905635-27-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned
		  | stderr:
		  | I200531 07:06:52.615429 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		  | I200531 08:32:00.687259 66 ccl/workloadccl/fixture.go:516  imported bank (1h25m8s, 0 rows, 0 index entries, 0 B)
		  | Error: importing fixture: importing table bank: dial tcp 127.0.0.1:26257: connect: connection refused
		  | Error: DEAD_ROACH_PROBLEM: exit status 1
		  | (1) DEAD_ROACH_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ```
		  |   | ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cockroach (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (5) exit status 30
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *exec.ExitError

	cluster.go:1512,context.go:135,cluster.go:1501,test_runner.go:825: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1980374-1590905635-27-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 1: dead
		8: 3784
		7: 3772
		9: 3729
		10: 3813
		4: 3781
		5: 4154
		6: 3766
		2: 3731
		3: 3742
		Error: UNCLASSIFIED_PROBLEM: 1: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1129
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:272
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:766
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:852
		  | github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:800
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1799
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (3) 1: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errors.errorString

More

Artifacts: /clearrange/checks=true
Related:

roachtest: clearrange/checks=true failed #48414 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202005041945_v19.1.9 release-blocker

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

petermattis · 2020-05-31T16:08:53Z

Similar to what was reported in #44845 (comment), one of the nodes died during the import due to being out of space:

F200531 08:32:00.108957 147 storage/store.go:3779 [n1,s1,r20432/2:/Table/53/1/56{697017-700289}] during sideloading: during sideloading: IO error: No space left on deviceWhile appending to file: /mnt/data1/cockroach/auxiliary/sideloading/r2XXXX/r20432/i15.t6: No space left on device

knz · 2020-06-04T11:41:58Z

@jlinder what was the plan to deal with out-of-disk errors?

petermattis · 2020-06-04T11:48:00Z

@jlinder what was the plan to deal with out-of-disk errors?

Was a plan ever discussed? We might just be pushing up too close to the cluster capacity with this setup. We could reduce the size of the import to provide more breathing room. Or we could switch to using EBS and larger volumes to provide more breathing room.

jlinder · 2020-06-05T17:14:32Z

I don't remember discussion of such a plan.

The obvious fixes to me are to increase disk size for the tests in question or change the tests to be more considerate of how they are using disk (if that's an option). Since roachprod can be told the machine type and amount of disk to use in cluster nodes, would updating roachtest to use different machine types or more disk work?

petermattis · 2020-06-05T18:21:22Z

Since roachprod can be told the machine type and amount of disk to use in cluster nodes, would updating roachtest to use different machine types or more disk work?

roachtest already allows individual tests to choose the machine type they want, but I don't think we have an ability to ask for bigger disks (yet). Adding that should be doable with a bit of elbow grease.

Reducing the size of the import used by clearrange is certainly the more straightforward path as we just have to change 1 line of code. I would like to understand why we seem to be hitting this problem more frequently of late. That could be indicative of a regression.

Cc @dt in case you know of a recent change that could have affected disk imbalances during IMPORTs.

dt · 2020-06-06T22:27:58Z

I don't know of anything that has changed there -- I don't think we've touched anything on bulk side. How recent is as of late? Pebble compaction differences could have changed it, or, going back a lot further the switch to larger ranges could be relevant.

In IMPORT we issue splits and scatter the follow range any time the data producer process has sent out 48mb of data without hitting a range boundary i.e. when it has sent that much to a single range. This was picked back when the range size was 64mb, since it meant the range was 75% full. We left it that way with the move to larger ranges and just let merges clean up afterwards since we were already fighting with hotspots and the inverted LSMs and didn't want to make it any worse at the time. The normal kv background splitting and rebalancing is also enabled throughout the IMPORT ranges that fill bit-by-bit over time from separate small flushes.

That said, we've seen frequent cases of the allocator just doing nothing when we ask it scatter a range, even when disk space usage is not balanced or load is not balanced, sometimes because it looks at mvcc byte counts and not actual storage bytes.

petermattis · 2020-06-07T19:17:00Z

I don't know of anything that has changed there -- I don't think we've touched anything on bulk side. How recent is as of late? Pebble compaction differences could have changed it, or, going back a lot further the switch to larger ranges could be relevant.

The failures predate the switch to using Pebble as the default. For example, my message on May 3 was before that switch. It might be unreasonable to assume the first message on this issue is due to out-of-disk, but that might put a bound on it. The switch to larger ranges landed on Feb 19. The first failure on this issue was Feb 7, but many more failures have occurred since then.

petermattis · 2020-06-23T18:16:48Z

@jbowens You've been running the clearrange roachtests recently. Have you ever encountered this "no space" error? Can we add some additional instrumentation to help identify why one node is running out of space? From the graphs on #50508 we should have plenty of space per node. Is there some sort of severe space imbalance going on?

cockroach-teamcity · 2020-07-03T01:38:22Z

(roachtest).clearrange/checks=true failed on release-19.1@0c04a92ba19eedd4762ca7feb8361433682f3ded:

		Wraps: (4) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2060752-1593731282-26-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned
		  | stderr:
		  | I200703 00:02:55.709944 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		  | I200703 01:36:31.945897 25 ccl/workloadccl/fixture.go:516  imported bank (1h33m36s, 0 rows, 0 index entries, 0 B)
		  | Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		  | Error: DEAD_ROACH_PROBLEM: exit status 1
		  | (1) DEAD_ROACH_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ```
		  |   | ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cockroach (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (5) exit status 30
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *exec.ExitError

	cluster.go:1512,context.go:135,cluster.go:1501,test_runner.go:829: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2060752-1593731282-26-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 3: dead
		9: 3801
		5: 3836
		1: 3881
		10: 3797
		4: 3745
		6: 3847
		7: 3819
		2: 4184
		8: 3802
		Error: UNCLASSIFIED_PROBLEM: 3: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 3: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /clearrange/checks=true
Related:

roachtest: clearrange/checks=true failed #50870 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-master release-blocker
roachtest: clearrange/checks=true failed #48414 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202005041945_v19.1.9 release-blocker

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

jbowens · 2020-07-06T22:39:11Z

F200706 19:09:33.569043 167 storage/replica_raft.go:927  [n7,s7,r20463/1:/Table/53/1/55{899181-902453}] during sideloading: during sideloading: IO error: No space left on deviceWhile appending to file: /mnt/data1/cockroach/auxiliary/sideloading/r2XXXX/r20463/i21.t7: No space left on device

From the debug.zip, node 7's last reported capacity used is 381.9 GB and capacity available 1.34 GB and compactor queue shows:

"compactor.suggestionbytes.queued": 62684108629,

I'll try to reproduce this with instrumentation on the release-19.1 branch tomorrow.

cockroach-teamcity · 2020-07-08T08:14:49Z

(roachtest).clearrange/checks=true failed on release-19.1@8ecf958ac06ee10391ceb108ba11a745de8ff4b1:

		Wraps: (4) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2072432-1594187551-26-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned
		  | stderr:
		  | I200708 06:45:10.892019 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		  | I200708 08:11:43.646069 25 ccl/workloadccl/fixture.go:516  imported bank (1h26m33s, 0 rows, 0 index entries, 0 B)
		  | Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ```
		  |   | ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (5) exit status 20
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *exec.ExitError

	cluster.go:1516,context.go:135,cluster.go:1505,test_runner.go:829: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2072432-1594187551-26-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 5: dead
		3: 3898
		6: 3806
		10: 3836
		7: 3876
		8: 3815
		4: 3833
		9: 3868
		2: 3826
		1: 4369
		Error: UNCLASSIFIED_PROBLEM: 5: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1789
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 5: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /clearrange/checks=true
Related:

roachtest: clearrange/checks=true failed #51128 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202007071743_v20.2.0-alpha.2 release-blocker
roachtest: clearrange/checks=true failed #51021 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202006292108_v19.1.11 release-blocker
roachtest: clearrange/checks=true failed #50870 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-master release-blocker
roachtest: clearrange/checks=true failed #48414 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202005041945_v19.1.9 release-blocker

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

cockroach-teamcity · 2020-08-01T07:49:05Z

(roachtest).clearrange/checks=true failed on release-19.1@7c03505d8daa19dee7f5f0268c9e728e38d4ba6d:

		Wraps: (4) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2137346-1596261039-26-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned
		  | stderr:
		  | I200801 06:17:14.244803 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		  | I200801 07:48:17.862745 25 ccl/workloadccl/fixture.go:516  imported bank (1h31m4s, 0 rows, 0 index entries, 0 B)
		  | Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ```
		  |   | ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (5) exit status 20
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *exec.ExitError

	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:823: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2137346-1596261039-26-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 2: dead
		1: 3941
		4: 3841
		5: 3854
		10: 4053
		3: 4054
		9: 4082
		6: 4215
		8: 3833
		7: 4108
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /clearrange/checks=true
Related:

roachtest: clearrange/checks=true failed #51211 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202007081918_v20.2.0-alpha.2 release-blocker
roachtest: clearrange/checks=true failed #51128 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202007071743_v20.2.0-alpha.2 release-blocker
roachtest: clearrange/checks=true failed #51021 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202006292108_v19.1.11 release-blocker
roachtest: clearrange/checks=true failed #50870 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-master release-blocker
roachtest: clearrange/checks=true failed #48414 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202005041945_v19.1.9 release-blocker

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

cockroach-teamcity · 2020-08-04T08:11:09Z

(roachtest).clearrange/checks=true failed on release-19.1@86b7271623ad797e9c42d5f7900a5cb424fed436:

		Wraps: (4) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2145078-1596520433-26-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned
		  | stderr:
		  | I200804 06:32:21.749636 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		  | I200804 08:10:15.645276 26 ccl/workloadccl/fixture.go:516  imported bank (1h37m54s, 0 rows, 0 index entries, 0 B)
		  | Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ```
		  |   | ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (5) exit status 20
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *exec.ExitError

	cluster.go:1571,context.go:135,cluster.go:1560,test_runner.go:823: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2145078-1596520433-26-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 5: dead
		4: 3730
		10: 3754
		6: 3791
		1: 4234
		3: 3767
		7: 3793
		8: 3790
		2: 3775
		9: 3842
		Error: UNCLASSIFIED_PROBLEM: 5: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1373
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 5: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /clearrange/checks=true
Related:

roachtest: clearrange/checks=true failed #52308 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202008031850_v20.2.0-alpha.3 release-blocker
roachtest: clearrange/checks=true failed #51211 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202007081918_v20.2.0-alpha.2 release-blocker
roachtest: clearrange/checks=true failed #51128 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202007071743_v20.2.0-alpha.2 release-blocker
roachtest: clearrange/checks=true failed #51021 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202006292108_v19.1.11 release-blocker
roachtest: clearrange/checks=true failed #50870 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-master release-blocker
roachtest: clearrange/checks=true failed #48414 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202005041945_v19.1.9 release-blocker

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

cockroach-teamcity · 2020-08-18T08:09:11Z

(roachtest).clearrange/checks=true failed on release-19.1@efeb30fcc83c76819a832e7f12c91c891dbe0e68:

		Wraps: (4) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-2190652-1597729965-27-n10cpu4:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned
		  | stderr:
		  | I200818 06:37:38.620381 1 ccl/workloadccl/cliccl/fixtures.go:324  starting import of 1 tables
		  | I200818 08:08:18.513048 13 ccl/workloadccl/fixture.go:516  imported bank (1h30m40s, 0 rows, 0 index entries, 0 B)
		  | Error: importing fixture: importing table bank: pq: communication error: rpc error: code = Canceled desc = context canceled
		  | Error: COMMAND_PROBLEM: exit status 1
		  | (1) COMMAND_PROBLEM
		  | Wraps: (2) Node 1. Command with error:
		  |   | ```
		  |   | ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
		  |   | ```
		  | Wraps: (3) exit status 1
		  | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
		  |
		  | stdout:
		Wraps: (5) exit status 20
		Error types: (1) *withstack.withStack (2) *safedetails.withSafeDetails (3) *errutil.withMessage (4) *main.withCommandDetails (5) *exec.ExitError

	cluster.go:1612,context.go:135,cluster.go:1601,test_runner.go:823: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-2190652-1597729965-27-n10cpu4 --oneshot --ignore-empty-nodes: exit status 1 2: dead
		9: 3908
		4: 4293
		3: 3913
		10: 3840
		8: 3784
		6: 3784
		1: 3857
		7: 3781
		5: 3741
		Error: UNCLASSIFIED_PROBLEM: 2: dead
		(1) UNCLASSIFIED_PROBLEM
		Wraps: (2) attached stack trace
		  | main.glob..func13
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1115
		  | main.wrap.func1
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:266
		  | github.com/spf13/cobra.(*Command).execute
		  | 	/home/agent/work/.go/pkg/mod/github.com/spf13/[email protected]/command.go:830
		  | github.com/spf13/cobra.(*Command).ExecuteC
		  | 	/home/agent/work/.go/pkg/mod/github.com/spf13/[email protected]/command.go:914
		  | github.com/spf13/cobra.(*Command).Execute
		  | 	/home/agent/work/.go/pkg/mod/github.com/spf13/[email protected]/command.go:864
		  | main.main
		  | 	/home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1808
		  | runtime.main
		  | 	/usr/local/go/src/runtime/proc.go:203
		  | runtime.goexit
		  | 	/usr/local/go/src/runtime/asm_amd64.s:1357
		Wraps: (3) 3 safe details enclosed
		Wraps: (4) 2: dead
		Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *safedetails.withSafeDetails (4) *errors.errorString

More

Artifacts: /clearrange/checks=true
Related:

roachtest: clearrange/checks=true failed #52308 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202008031850_v20.2.0-alpha.3 release-blocker
roachtest: clearrange/checks=true failed #51211 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202007081918_v20.2.0-alpha.2 release-blocker
roachtest: clearrange/checks=true failed #51128 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202007071743_v20.2.0-alpha.2 release-blocker
roachtest: clearrange/checks=true failed #51021 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202006292108_v19.1.11 release-blocker
roachtest: clearrange/checks=true failed #50870 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-master release-blocker
roachtest: clearrange/checks=true failed #48414 roachtest: clearrange/checks=true failed C-test-failure O-roachtest O-robot branch-provisional_202005041945_v19.1.9 release-blocker

See this test on roachdash
_{powered by pkg/cmd/internal/issues}

jbowens · 2020-08-18T16:26:02Z

@itsbilal got a reproduction

jbowens@jbowensmbp cockroach % roachprod run bilal-1597757897-02-n10cpu4 -- 'df | grep /mnt/data1'
bilal-1597757897-02-n10cpu4: df | grep /mnt/data1 10/10
   1: /dev/nvme1n1    95990980 90801212    290572 100% /mnt/data1
   2: /dev/nvme1n1    95990980  370768  90721016   1% /mnt/data1
   3: /dev/nvme1n1    95990980  282332  90809452   1% /mnt/data1
   4: /dev/nvme1n1    95990980  298144  90793640   1% /mnt/data1
   5: /dev/nvme1n1    95990980  359868  90731916   1% /mnt/data1
   6: /dev/nvme1n1    95990980  297052  90794732   1% /mnt/data1
   7: /dev/nvme1n1    95990980  370096  90721688   1% /mnt/data1
   8: /dev/nvme1n1    95990980  294928  90796856   1% /mnt/data1
   9: /dev/nvme1n1    95990980  299720  90792064   1% /mnt/data1
  10: /dev/nvme1n1    95990980  292512  90799272   1% /mnt/data1

                  L0     L1     L2     L3        L4        L5        L6         TOTAL
count             0      0      0      15        172       1010      3071       4268
seq num
  smallest        0      0      0      93601     83522     40167     37166      37166
  largest         0      0      0      457320    470215    470204    470251     470251
size
  data            0 B    0 B    0 B    62 M      2.3 G     13 G      69 G       84 G
    blocks        0      0      0      2240      80346     453233    2400881    2936700
  index           0 B    0 B    0 B    68 K      2.2 M     13 M      68 M       83 M
    blocks        0      0      0      15        172       1010      3071       4268
    top-level     0 B    0 B    0 B    0 B       0 B       0 B       0 B        0 B
  filter          0 B    0 B    0 B    132 K     310 K     1.7 M     1.3 M      3.5 M
  raw-key         0 B    0 B    0 B    4.1 M     5.9 M     33 M      176 M      220 M
  raw-value       0 B    0 B    0 B    64 M      2.3 G     13 G      69 G       84 G
records
  set             0      0      0      58 K      243 K     1.4 M     7.2 M      8.9 M
  delete          0      0      0      89 K      2.4 K     11 K      17         102 K
  range-delete    0      0      0      51        1         48        17         117
  merge           0      0      0      0         0         0         0          0

jbowens · 2020-08-18T19:59:58Z

@itsbilal noticed this test failing often on AWS and never on GCP while trying to reproduce #52720. I never noticed that all the failures were specifically on AWS, and I only tried to reproduce it on GCP. Oops.

None of the nodes had very much disk space headroom around when n1 ran out of space.

debug/nodes.json:            "capacity.available": 742469632, 742 MB
debug/nodes.json:            "capacity.available": 1200033792, 1200 MB
debug/nodes.json:            "capacity.available": 6198439936, 6198 MB
debug/nodes.json:            "capacity.available": 4542550016, 4543 MB
debug/nodes.json:            "capacity.available": 6815735808, 6816 MB
debug/nodes.json:            "capacity.available": 2204524544, 2205 MB 
debug/nodes.json:            "capacity.available": 474505216, 474.5 MB 
debug/nodes.json:            "capacity.available": 1852162048, 1825 MB
debug/nodes.json:            "capacity.available": 5351120896, 5351 MB
debug/nodes.json:            "capacity.available": 2944262144, 2944 MB

On AWS, this test uses a c5d.xlarge which has a 100 GB instance store disk, as opposed to GCP's 375 GB local SSD.

On the dead n1:

du -sh auxiliary/sideloading/
4.9G	auxiliary/sideloading

ubuntu@ip-10-12-29-83:/mnt/data1/cockroach$ ls -l auxiliary/sideloading/r0XXXX/ | head -n 4
total 1208
drwxr-x--- 2 ubuntu ubuntu 4096 Aug 18 14:22 r2113
drwxr-x--- 2 ubuntu ubuntu 4096 Aug 18 14:10 r2135
drwxr-x--- 2 ubuntu ubuntu 4096 Aug 18 14:23 r2440

The earliest of these sideload sstables r2113 appears in the logs here:

W200818 14:09:04.177418 143 kv/kvserver/store_raft.go:502  [n1,s1,r2113/3:/Table/53/1/208{78500-83752}] handle raft ready: 1.1s [applied=0, batches=0, state_assertions=0]
W200818 14:09:04.460892 167 kv/kvserver/store_raft.go:502  [n1,s1,r2145/3:/Table/53/1/145{55630-63750}] handle raft ready: 1.1s [applied=0, batches=0, state_assertions=0]
I200818 14:09:04.748230 207 server/status/runtime.go:522  [n1] runtime stats: 5.8 GiB RSS, 530 goroutines, 1.4 GiB/2.3 GiB/3.3 GiB GO alloc/idle/total, 2.0 GiB/2.5 GiB CGO alloc/total, 5589.9 CGO/sec, 32.1/20.6 %(u/s)time, 0.1 %gc (4x), 412 MiB/199 MiB (r/w)net
W200818 14:09:04.924298 168 kv/kvserver/store_raft.go:502  [n1,s1,r2187/1:/{Table/53/1/6…-Max}] handle raft ready: 1.1s [applied=0, batches=0, state_assertions=0]
W200818 14:09:05.207114 148 kv/kvserver/store_raft.go:502  [n1,s1,r2113/3:/Table/53/1/208{78500-83752}] handle raft ready: 0.9s [applied=0, batches=0, state_assertions=0]
W200818 14:09:05.630115 167 kv/kvserver/store_raft.go:502  [n1,s1,r2145/3:/Table/53/1/145{55630-63750}] handle raft ready: 0.8s [applied=0, batches=0, state_assertions=0]

The node's last log line before panicking was at 14:32:06.691529. Is it expected for a sideloaded sstable to be sitting around for > 20 minutes?

itsbilal · 2020-09-01T15:10:32Z

Fixed in #53572.

cockroach-teamcity added branch-release-19.1 C-test-failure Broken test (automatically or manually discovered). O-roachtest O-robot Originated from a bot. release-blocker Indicates a release-blocker. Use with branch-release-2x.x label to denote which branch is blocked. labels Feb 7, 2020

cockroach-teamcity added this to the 20.1 milestone Feb 7, 2020

cockroach-teamcity assigned andreimatei Feb 7, 2020

cockroach-teamcity mentioned this issue Feb 13, 2020

roachtest: clearrange/checks=true failed #43044

Closed

cockroach-teamcity mentioned this issue Mar 24, 2020

roachtest: clearrange/checks=true failed #46460

Closed

cockroach-teamcity mentioned this issue May 4, 2020

roachtest: clearrange/checks=true failed #48414

Closed

petermattis mentioned this issue May 5, 2020

roachtest: clearrange/checks=false failed #48415

Closed

knz unassigned andreimatei Jun 4, 2020

knz added the A-testing Testing tools and infrastructure label Jun 4, 2020

knz mentioned this issue Jun 4, 2020

roachtest: clearrange/checks=false failed #44329

Closed

knz mentioned this issue Jun 4, 2020

roachtest: tpccbench/nodes=6/cpu=16/multi-az failed #48736

Closed

This was referenced Jun 23, 2020

roachtest: clearrange/checks=false failed #50529

Closed

release: 19.1.10 #50464

Closed

petermattis assigned jbowens Jun 23, 2020

cockroach-teamcity mentioned this issue Jun 23, 2020

roachtest: clearrange/checks=true failed #50569

Closed

asubiotto mentioned this issue Jun 25, 2020

roachtest: clearrange/checks=false failed #50558

Closed

cockroach-teamcity mentioned this issue Jul 1, 2020

roachtest: clearrange/checks=true failed #50870

Closed

cockroach-teamcity mentioned this issue Jul 6, 2020

roachtest: clearrange/checks=true failed #51021

Closed

jbowens mentioned this issue Jul 7, 2020

release: v19.1.11 #50786

Closed

22 tasks

cockroach-teamcity mentioned this issue Jul 8, 2020

roachtest: clearrange/checks=true failed #51128

Closed

cockroach-teamcity mentioned this issue Jul 9, 2020

roachtest: clearrange/checks=true failed #51211

Closed

cockroach-teamcity mentioned this issue Jul 22, 2020

roachtest: clearrange/checks=true failed #51711

Closed

cockroach-teamcity mentioned this issue Aug 3, 2020

roachtest: clearrange/checks=true failed #52308

Closed

petermattis assigned itsbilal Aug 24, 2020

pbardea mentioned this issue Aug 31, 2020

roachtest: backup/2TB/n10cpu4 failed #53258

Closed

itsbilal closed this as completed Sep 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roachtest: clearrange/checks=true failed #44845

roachtest: clearrange/checks=true failed #44845

cockroach-teamcity commented Feb 7, 2020

cockroach-teamcity commented Feb 26, 2020

cockroach-teamcity commented Feb 27, 2020

cockroach-teamcity commented Apr 7, 2020

cockroach-teamcity commented Apr 9, 2020

cockroach-teamcity commented May 3, 2020

petermattis commented May 3, 2020

cockroach-teamcity commented May 26, 2020

cockroach-teamcity commented May 31, 2020

petermattis commented May 31, 2020

knz commented Jun 4, 2020

petermattis commented Jun 4, 2020

jlinder commented Jun 5, 2020

petermattis commented Jun 5, 2020

dt commented Jun 6, 2020

petermattis commented Jun 7, 2020

petermattis commented Jun 23, 2020

cockroach-teamcity commented Jul 3, 2020

jbowens commented Jul 6, 2020

cockroach-teamcity commented Jul 8, 2020

cockroach-teamcity commented Aug 1, 2020

cockroach-teamcity commented Aug 4, 2020

cockroach-teamcity commented Aug 18, 2020

jbowens commented Aug 18, 2020 •

edited

Loading

jbowens commented Aug 18, 2020

itsbilal commented Sep 1, 2020

roachtest: clearrange/checks=true failed #44845

roachtest: clearrange/checks=true failed #44845

Comments

cockroach-teamcity commented Feb 7, 2020

cockroach-teamcity commented Feb 26, 2020

cockroach-teamcity commented Feb 27, 2020

cockroach-teamcity commented Apr 7, 2020

cockroach-teamcity commented Apr 9, 2020

cockroach-teamcity commented May 3, 2020

petermattis commented May 3, 2020

cockroach-teamcity commented May 26, 2020

cockroach-teamcity commented May 31, 2020

petermattis commented May 31, 2020

knz commented Jun 4, 2020

petermattis commented Jun 4, 2020

jlinder commented Jun 5, 2020

petermattis commented Jun 5, 2020

dt commented Jun 6, 2020

petermattis commented Jun 7, 2020

petermattis commented Jun 23, 2020

cockroach-teamcity commented Jul 3, 2020

jbowens commented Jul 6, 2020

cockroach-teamcity commented Jul 8, 2020

cockroach-teamcity commented Aug 1, 2020

cockroach-teamcity commented Aug 4, 2020

cockroach-teamcity commented Aug 18, 2020

jbowens commented Aug 18, 2020 • edited Loading

jbowens commented Aug 18, 2020

itsbilal commented Sep 1, 2020

jbowens commented Aug 18, 2020 •

edited

Loading