storage: Test behavior when disk fills up #19656

bdarnell · 2017-10-30T21:15:47Z

Prior to #19447, certain disk errors (the most likely being ENOSPC) were not being handled correctly, and we suspect that inconsistent reads could be served after this had happened. We need more testing of our behavior after disk writes have failed.

One way to do this would be a process that alternately writes a file to fill up the disk (or maybe just fallocate()), waits a bit, then deletes the file (and restarts the cockroach process if it crashed). Maybe this would make sense as a new jepsen nemesis.

The text was updated successfully, but these errors were encountered:

tbg · 2018-05-21T13:54:39Z

Adding this to Jepsen might still make sense for the correctness aspect of this, but we have the infra for doing this in roachtest-land available. The test that comes to mind is

start a cluster with some background workload (one of the scaledata correctness tests comes to mind) and with a ballast file on one node
fill up disk on the node with the ballast file
wait until the process crashes
verify that background workload does not stall (stability: client hangs after one node hits disk errors #7882)
delete ballast file and restart node
verify that node becomes healthy and participates in cluster again

tbg · 2018-10-11T11:43:55Z

#31187 also added the infra to run on a charybdefs and inject these errors, in case we don't want fallocate.

tbg · 2018-10-11T11:50:15Z

Folding into #7882 (the other way than I originally planned to).

knz mentioned this issue Oct 31, 2017

jepsen: make the Jepsen tests first-class citizens #15738

Closed

17 tasks

tbg assigned bdarnell Nov 1, 2017

bdarnell added C-cleanup Tech debt, refactors, loose ends, etc. Solution not expected to significantly change behavior. A-storage Relating to our storage engine (Pebble) on-disk storage. A-testing Testing tools and infrastructure labels Apr 26, 2018

tbg mentioned this issue May 21, 2018

stability: client hangs after one node hits disk errors #7882

Closed

tbg added this to the 2.1 milestone Jul 22, 2018

bdarnell modified the milestones: 2.1, 2.2 Aug 15, 2018

petermattis removed this from the 2.2 milestone Oct 5, 2018

tbg closed this as completed Oct 11, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage: Test behavior when disk fills up #19656

storage: Test behavior when disk fills up #19656

bdarnell commented Oct 30, 2017

tbg commented May 21, 2018

tbg commented Oct 11, 2018

tbg commented Oct 11, 2018

storage: Test behavior when disk fills up #19656

storage: Test behavior when disk fills up #19656

Comments

bdarnell commented Oct 30, 2017

tbg commented May 21, 2018

tbg commented Oct 11, 2018

tbg commented Oct 11, 2018