Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: write directly to fs in disk stall detection, pull RocksDB stats #34224

Closed
awoods187 opened this issue Jan 24, 2019 · 15 comments
Closed
Assignees
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team

Comments

@awoods187
Copy link
Contributor

awoods187 commented Jan 24, 2019

Describe the problem

I set up a six node tpc-c cluster with load based splitting turned off and it resulted in a dead node during import.

To Reproduce

What did you do? Describe in your own words.

If possible, provide steps to reproduce the behavior:

  1. export CLUSTER=andy-nolbs
  2. roachprod create $CLUSTER -n 7 --clouds=aws --aws-machine-type-ssd=c5d.4xlarge
  3. roachprod run $CLUSTER:1-6 -- 'sudo umount /mnt/data1; sudo mount -o discard,defaults,nobarrier /dev/nvme1n1 /mnt/data1/; mount | grep /mnt/data1'
  4. roachprod stage $CLUSTER:1-6 cockroach
  5. roachprod stage $CLUSTER:7 workload
  6. roachprod start $CLUSTER:1-6
  7. roachprod sql $CLUSTER:1
  8. SET CLUSTER SETTING kv.range_split.by_load_enabled = false
  9. roachprod adminurl --open $CLUSTER:1
  10. roachprod run $CLUSTER:1 -- "./cockroach workload fixtures import tpcc --warehouses=5000 --db=tpcc"

Expected behavior
I expect the import to complete without resulting in a dead node.

Additional data / screenshots
image

Environment:

  • v2.2.0-alpha.20181217-820-g645c0c9

Jira issue: CRDB-4660

@awoods187
Copy link
Contributor Author

debug.zip

@awoods187
Copy link
Contributor Author

awoods187 commented Jan 24, 2019

The node crashed

* FATAL: [n4] disk stall detected: unable to write to <no-attributes>=/mnt/data1/cockroach within 10s
*
F190124 14:20:01.180658 18455 server/server_engine_health.go:54  [n4] disk stall detected: unable to write to <no-attributes>=/mnt/data1/cockroach within 10s
goroutine 18455 [running]:
github.com/cockroachdb/cockroach/pkg/util/log.getStacks(0xc0004fe000, 0xc0004fe060, 0x513cb00, 0x1e)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:1005 +0xd4
github.com/cockroachdb/cockroach/pkg/util/log.(*loggingT).outputLogEntry(0x58c3a40, 0xc000000004, 0x513cbd6, 0x1e, 0x36, 0xc007634a20, 0x5c)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/clog.go:872 +0x95a
github.com/cockroachdb/cockroach/pkg/util/log.addStructured(0x38b4fc0, 0xc000834000, 0x4, 0x2, 0x0, 0x0, 0xc00981bf48, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/structured.go:85 +0x2d5
github.com/cockroachdb/cockroach/pkg/util/log.logDepth(0x38b4fc0, 0xc000834000, 0x1, 0x4, 0x0, 0x0, 0xc00981bf48, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:71 +0x8c
github.com/cockroachdb/cockroach/pkg/util/log.Shout(0x38b4fc0, 0xc000834000, 0x4, 0xc00981bf48, 0x1, 0x1)
	/go/src/github.com/cockroachdb/cockroach/pkg/util/log/log.go:89 +0xa5
github.com/cockroachdb/cockroach/pkg/server.guaranteedExitFatal(0x38b4fc0, 0xc000834000, 0x3181e35, 0x34, 0xc004fc77b0, 0x2, 0x2)
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server_engine_health.go:54 +0xe9
github.com/cockroachdb/cockroach/pkg/server.assertEngineHealth.func1.1()
	/go/src/github.com/cockroachdb/cockroach/pkg/server/server_engine_health.go:62 +0xd7
created by time.goFunc
	/usr/local/go/src/time/sleep.go:172 +0x44

@tbg
Copy link
Member

tbg commented Jan 28, 2019

Thanks @awoods187.

cc @dt/@petermattis -- could this be stalled writes because of L0 growing too large? I don't think so because I can't see this at all in the logs (though the 60s log interval means I might've missed it):

** Compaction Stats [default] **
Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop
----------------------------------------------------------------------------------------------------------------------------------------------------------
  L0      0/0    0.00 KB   0.0      0.0     0.0      0.0       1.1      1.1       0.0   0.2      0.0     57.6        19        32    0.593       0      0
  L4      0/0    0.00 KB   0.0      0.9     0.7      0.2       0.7      0.5       0.0   1.0    180.0    141.8         5         8    0.645   4639K   5364
  L5      7/0   28.59 MB   0.4      2.1     0.6      1.5       0.8     -0.7       5.9   1.3    128.2     47.5        17       326    0.051    102M   2678
  L6     63/0   237.79 MB   0.0     12.2     5.1      7.1       7.1      0.0       0.2   1.4     50.5     29.6       247       227    1.088    568M  1875K
 Sum     70/0   266.38 MB   0.0     15.2     6.4      8.8       9.7      0.9       6.1   1.4     54.0     34.5       288       593    0.485    675M  1883K
 Int      0/0    0.00 KB   0.0      0.1     0.0      0.1       0.1     -0.0       0.0 126.3     34.7     25.0         2         2    1.189   6577K   2824
Uptime(secs): 2412.1 total, 600.0 interval
Flush(GB): cumulative 1.068, interval 0.000
AddFile(GB): cumulative 5.742, interval 0.000
AddFile(Total Files): cumulative 301, interval 0
AddFile(L0 Files): cumulative 0, interval 0
AddFile(Keys): cumulative 313453857, interval 0
Cumulative compaction: 9.68 GB write, 4.11 MB/s write, 15.16 GB read, 6.44 MB/s read, 287.7 seconds
Interval compaction: 0.06 GB write, 0.10 MB/s write, 0.08 GB read, 0.14 MB/s read, 2.4 seconds
Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count
estimated_pending_compaction_bytes: 0 B

(this is just one I picked, the numbers vary but the last line is always all zeroes. Note that I didn't get the one of the node that died because all I have is the debug zip which doesn't contain dead nodes' logs)

Are these stats bad at all? L0 seems to be empty which is good. What's Comp(sec)? I doubt it's compactions/sec. Is it comparisons/sec? It's possible that the dead node had very different logs.

@tbg
Copy link
Member

tbg commented Jan 28, 2019

We should also rework the disk stall detection to distinguish between RocksDB slowness and I/O slowness. At the very least by including the rocksdb log in the crash report, but perhaps it's also worth not crashing at all when RocksDB is blocking writes but the disk still works.

@tbg tbg added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Jan 28, 2019
@petermattis
Copy link
Collaborator

Are these stats bad at all? L0 seems to be empty which is good. What's Comp(sec)? I doubt it's compactions/sec. Is it comparisons/sec? It's possible that the dead node had very different logs.

I'm not sure what Comp(sec) is. Comparisons seems unlikely. Perhaps "compaction seconds". That seems about right.

I don't think anything problematic in these stats. No stalls. No L0 files. Most files in L6 (as expected with dynamic_level_bytes).

We should also rework the disk stall detection to distinguish between RocksDB slowness and I/O slowness. At the very least by including the rocksdb log in the crash report, but perhaps it's also worth not crashing at all when RocksDB is blocking writes but the disk still works.

The RocksDB event listener stuff includes an OnStallConditionsChanged callback (which we currently don't use). That callback is passed info on the reason for a write stall:

enum class WriteStallCondition {
  kNormal,
  kDelayed,
  kStopped,
};

struct WriteStallInfo {
  // the name of the column family
  std::string cf_name;
  // state of the write controller
  struct {
    WriteStallCondition cur;
    WriteStallCondition prev;
  } condition;
};

Seems worthwhile to log this info at the very least.

@tbg tbg added C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) and removed C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. labels Jan 29, 2019
@tbg tbg changed the title Dead node during tpc-c import server: write directly to fs in disk stall detection, pull RocksDB stats Jan 29, 2019
@tbg
Copy link
Member

tbg commented Jan 29, 2019

Repurposing the issue as retitled.

@awoods187
Copy link
Contributor Author

awoods187 commented Jan 30, 2019

I hit this twice today after #34399 merged on master @tbg

@awoods187
Copy link
Contributor Author

cockroach.log

@awoods187
Copy link
Contributor Author

cockroach.log

@awoods187
Copy link
Contributor Author

Make it a solid 3/3
cockroach.log

@awoods187
Copy link
Contributor Author

awoods187 commented Jan 31, 2019

Even with running roachprod start $CLUSTER:1-15 --racks=5 -e COCKROACH_LOG_MAX_SYNC_DURATION=24h i hit the

FATAL: [n8] disk stall detected: unable to write to <no-attributes>=/mnt/data1/cockroach within 10s

on 3 of my 4 clusters

@tbg
Copy link
Member

tbg commented Jan 31, 2019 via email

@awoods187
Copy link
Contributor Author

So i largely got this working today (7 success/8) on the restore but I did it one more time even with roachprod start $CLUSTER:1-15 --racks=5 -e COCKROACH_ENGINE_MAX_SYNC_DURATION=24h

@tbg
Copy link
Member

tbg commented Feb 1, 2019

What was the crash? It couldn't be the same one, though it could be a similar-looking one from the log directory. What provider/machine was that on?

@awoods187
Copy link
Contributor Author

awoods187 commented Feb 8, 2019

I hit this again today (but I didn't use -e COCKROACH_ENGINE_MAX_SYNC_DURATION=24h)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-storage Relating to our storage engine (Pebble) on-disk storage. C-enhancement Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception) T-storage Storage Team
Projects
None yet
Development

No branches or pull requests

5 participants