server: write directly to fs in disk stall detection, pull RocksDB stats #34224

awoods187 · 2019-01-24T14:42:59Z

Describe the problem

I set up a six node tpc-c cluster with load based splitting turned off and it resulted in a dead node during import.

To Reproduce

What did you do? Describe in your own words.

If possible, provide steps to reproduce the behavior:

export CLUSTER=andy-nolbs
roachprod create $CLUSTER -n 7 --clouds=aws --aws-machine-type-ssd=c5d.4xlarge
roachprod run $CLUSTER:1-6 -- 'sudo umount /mnt/data1; sudo mount -o discard,defaults,nobarrier /dev/nvme1n1 /mnt/data1/; mount | grep /mnt/data1'
roachprod stage $CLUSTER:1-6 cockroach
roachprod stage $CLUSTER:7 workload
roachprod start $CLUSTER:1-6
roachprod sql $CLUSTER:1
SET CLUSTER SETTING kv.range_split.by_load_enabled = false
roachprod adminurl --open $CLUSTER:1
roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=5000 --db=tpcc"

Expected behavior
I expect the import to complete without resulting in a dead node.

Additional data / screenshots

Environment:

v2.2.0-alpha.20181217-820-g645c0c9

Jira issue: CRDB-4660

awoods187 · 2019-01-24T14:53:45Z

debug.zip

awoods187 · 2019-01-24T18:17:07Z

The node crashed

* FATAL: [n4] disk stall detected: unable to write to <no-attributes>=/mnt/data1/cockroach within 10s
*
F190124 14:20:01.180658 18455 server/server_engine_health.go:54  [n4] disk stall detected: unable to write to <no-attributes>=/mnt/data1/cockroach within 10s
goroutine 18455 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc0004fe000, 0xc0004fe060, 0x513cb00, 0x1e)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1005 +0xd4
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x58c3a40, 0xc000000004, 0x513cbd6, 0x1e, 0x36, 0xc007634a20, 0x5c)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:872 +0x95a
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x38b4fc0, 0xc000834000, 0x4, 0x2, 0x0, 0x0, 0xc00981bf48, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2d5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x38b4fc0, 0xc000834000, 0x1, 0x4, 0x0, 0x0, 0xc00981bf48, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:71 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Shout(0x38b4fc0, 0xc000834000, 0x4, 0xc00981bf48, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:89 +0xa5
github.com/cockroachdb/cockroach/pkg/server.guaranteedExitFatal(0x38b4fc0, 0xc000834000, 0x3181e35, 0x34, 0xc004fc77b0, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server_engine_health.go:54 +0xe9
github.com/cockroachdb/cockroach/pkg/server.assertEngineHealth.func1.1()
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server_engine_health.go:62 +0xd7
created by time.goFunc
	/usr/local/go/src/time/sleep.go:172 +0x44

tbg · 2019-01-28T22:36:10Z

Thanks @awoods187.

cc @dt/@petermattis -- could this be stalled writes because of L0 growing too large? I don't think so because I can't see this at all in the logs (though the 60s log interval means I might've missed it):

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.0      0.0     0.0      0.0       1.1      1.1       0.0   0.2      0.0     57.6        19        32    0.593       0      0
  L4      0/0    0.00 KB   0.0      0.9     0.7      0.2       0.7      0.5       0.0   1.0    180.0    141.8         5         8    0.645   4639K   5364
  L5      7/0   28.59 MB   0.4      2.1     0.6      1.5       0.8     -0.7       5.9   1.3    128.2     47.5        17       326    0.051    102M   2678
  L6     63/0   237.79 MB   0.0     12.2     5.1      7.1       7.1      0.0       0.2   1.4     50.5     29.6       247       227    1.088    568M  1875K
 Sum     70/0   266.38 MB   0.0     15.2     6.4      8.8       9.7      0.9       6.1   1.4     54.0     34.5       288       593    0.485    675M  1883K
 Int      0/0    0.00 KB   0.0      0.1     0.0      0.1       0.1     -0.0       0.0 126.3     34.7     25.0         2         2    1.189   6577K   2824
Uptime(secs): 2412.1 total, 600.0 interval
Flush(GB): cumulative 1.068, interval 0.000
AddFile(GB): cumulative 5.742, interval 0.000
AddFile(Total Files): cumulative 301, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 313453857, interval 0
Cumulative compaction: 9.68 GB write, 4.11 MB/s write, 15.16 GB read, 6.44 MB/s read, 287.7 seconds
Interval compaction: 0.06 GB write, 0.10 MB/s write, 0.08 GB read, 0.14 MB/s read, 2.4 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
estimated_pending_compaction_bytes: 0 B

(this is just one I picked, the numbers vary but the last line is always all zeroes. Note that I didn't get the one of the node that died because all I have is the debug zip which doesn't contain dead nodes' logs)

Are these stats bad at all? L0 seems to be empty which is good. What's Comp(sec)? I doubt it's compactions/sec. Is it comparisons/sec? It's possible that the dead node had very different logs.

tbg · 2019-01-28T22:37:33Z

We should also rework the disk stall detection to distinguish between RocksDB slowness and I/O slowness. At the very least by including the rocksdb log in the crash report, but perhaps it's also worth not crashing at all when RocksDB is blocking writes but the disk still works.

petermattis · 2019-01-29T14:13:36Z

Are these stats bad at all? L0 seems to be empty which is good. What's Comp(sec)? I doubt it's compactions/sec. Is it comparisons/sec? It's possible that the dead node had very different logs.

I'm not sure what Comp(sec) is. Comparisons seems unlikely. Perhaps "compaction seconds". That seems about right.

I don't think anything problematic in these stats. No stalls. No L0 files. Most files in L6 (as expected with dynamic_level_bytes).

We should also rework the disk stall detection to distinguish between RocksDB slowness and I/O slowness. At the very least by including the rocksdb log in the crash report, but perhaps it's also worth not crashing at all when RocksDB is blocking writes but the disk still works.

The RocksDB event listener stuff includes an OnStallConditionsChanged callback (which we currently don't use). That callback is passed info on the reason for a write stall:

enum class WriteStallCondition {
  kNormal,
  kDelayed,
  kStopped,
};

struct WriteStallInfo {
  // the name of the column family
  std::string cf_name;
  // state of the write controller
  struct {
    WriteStallCondition cur;
    WriteStallCondition prev;
  } condition;
};

Seems worthwhile to log this info at the very least.

tbg · 2019-01-29T15:40:39Z

Repurposing the issue as retitled.

awoods187 · 2019-01-30T18:26:39Z

I hit this twice today after #34399 merged on master @tbg

awoods187 · 2019-01-30T18:42:51Z

cockroach.log

awoods187 · 2019-01-30T18:43:16Z

cockroach.log

awoods187 · 2019-01-30T18:49:01Z

Make it a solid 3/3
cockroach.log

awoods187 · 2019-01-31T14:42:20Z

Even with running roachprod start $CLUSTER:1-15 --racks=5 -e COCKROACH_LOG_MAX_SYNC_DURATION=24h i hit the

FATAL: [n8] disk stall detected: unable to write to <no-attributes>=/mnt/data1/cockroach within 10s

on 3 of my 4 clusters

tbg · 2019-01-31T15:19:34Z

Discussed in private, I gave the wrong env var - needs to be COCKROACH_ENGINE_.... not COCKROACH_LOG...

awoods187 · 2019-01-31T23:48:18Z

So i largely got this working today (7 success/8) on the restore but I did it one more time even with roachprod start $CLUSTER:1-15 --racks=5 -e COCKROACH_ENGINE_MAX_SYNC_DURATION=24h

tbg · 2019-02-01T11:37:46Z

What was the crash? It couldn't be the same one, though it could be a similar-looking one from the log directory. What provider/machine was that on?

awoods187 · 2019-02-08T18:09:18Z

I hit this again today (but I didn't use -e COCKROACH_ENGINE_MAX_SYNC_DURATION=24h)

awoods187 assigned tbg Jan 24, 2019

tbg added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 28, 2019

tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Jan 29, 2019

tbg changed the title ~~Dead node during tpc-c import~~ server: write directly to fs in disk stall detection, pull RocksDB stats Jan 29, 2019

nvanbenschoten mentioned this issue Mar 6, 2019

Import Data continues with a crashed node #35417

Closed

awoods187 mentioned this issue Mar 22, 2019

Fatal error during sideloading: IO error: No space left on deviceWhile appending to file #35178

Closed

andreimatei mentioned this issue Mar 28, 2019

19.1 release blockers list #35554

Closed

23 tasks

lunevalex added the A-storage Relating to our storage engine (Pebble) on-disk storage. label Jul 27, 2020

jlinder added the T-storage Storage Team label Jun 16, 2021

tbg closed this as completed May 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

server: write directly to fs in disk stall detection, pull RocksDB stats #34224

server: write directly to fs in disk stall detection, pull RocksDB stats #34224

awoods187 commented Jan 24, 2019 •

edited by cockroach-jira-scripts

Loading

awoods187 commented Jan 24, 2019

awoods187 commented Jan 24, 2019 •

edited

Loading

tbg commented Jan 28, 2019

tbg commented Jan 28, 2019

petermattis commented Jan 29, 2019

tbg commented Jan 29, 2019

awoods187 commented Jan 30, 2019 •

edited

Loading

awoods187 commented Jan 30, 2019

awoods187 commented Jan 30, 2019

awoods187 commented Jan 30, 2019

awoods187 commented Jan 31, 2019 •

edited

Loading

tbg commented Jan 31, 2019 via email

awoods187 commented Jan 31, 2019

tbg commented Feb 1, 2019

awoods187 commented Feb 8, 2019 •

edited

Loading

server: write directly to fs in disk stall detection, pull RocksDB stats #34224

server: write directly to fs in disk stall detection, pull RocksDB stats #34224

Comments

awoods187 commented Jan 24, 2019 • edited by cockroach-jira-scripts Loading

awoods187 commented Jan 24, 2019

awoods187 commented Jan 24, 2019 • edited Loading

tbg commented Jan 28, 2019

tbg commented Jan 28, 2019

petermattis commented Jan 29, 2019

tbg commented Jan 29, 2019

awoods187 commented Jan 30, 2019 • edited Loading

awoods187 commented Jan 30, 2019

awoods187 commented Jan 30, 2019

awoods187 commented Jan 30, 2019

awoods187 commented Jan 31, 2019 • edited Loading

tbg commented Jan 31, 2019 via email

awoods187 commented Jan 31, 2019

tbg commented Feb 1, 2019

awoods187 commented Feb 8, 2019 • edited Loading

awoods187 commented Jan 24, 2019 •

edited by cockroach-jira-scripts

Loading

awoods187 commented Jan 24, 2019 •

edited

Loading

awoods187 commented Jan 30, 2019 •

edited

Loading

awoods187 commented Jan 31, 2019 •

edited

Loading

awoods187 commented Feb 8, 2019 •

edited

Loading