kola: Increase amount of disk corruption in verity test #515

pothos · 2024-04-03T07:39:47Z

For a particular image build the filesystem seemed to be more resistant to the corruption test, be it due to the corruption hitting other structures or some non-flushable cache covering more of the corrupted area. One can still trigger the verity panic by overwriting 100 MB instead of 10 MB.
Increase the amount of zeros written by the corruption test.

How to use

Testing done

Tested with the amd64 image here https://bincache.flatcar-linux.net/images/amd64/9999.9.9+kai-remove-acbuild/flatcar_production_image.bin.bz2 which was able to avoid the direct panic on the 10 MB corruption due to caching effects.

For a particular image build the filesystem seemed to be more resistant to the corruption test, be it due to the corruption hitting other structures or some non-flushable cache covering more of the corrupted area. One can still trigger the verity panic by overwriting 100 MB instead of 10 MB. Increase the amount of zeros written by the corruption test.

krnowak · 2024-04-03T08:30:13Z

kola/tests/misc/verity.go

+	// write zero bytes to first 100 MB
+	c.MustSSH(m, fmt.Sprintf(`sudo dd if=/dev/zero of=%s bs=1M count=100 status=none`, usrdev))


I'm going to sound ignorant here, but I thought that basically changing one bit in read-only /usr partition should trigger a failure to boot, no?

Yes, but only when it gets read and even with the ls if the contents are cached, this is not the case

ader1990 · 2024-04-03T12:42:36Z

kola/tests/misc/verity.go

-	// write zero bytes to first 10 MB
-	c.MustSSH(m, fmt.Sprintf(`sudo dd if=/dev/zero of=%s bs=1M count=10 status=none`, usrdev))
+	// write zero bytes to first 100 MB
+	c.MustSSH(m, fmt.Sprintf(`sudo dd if=/dev/zero of=%s bs=1M count=100 status=none`, usrdev))


the flush can be done directly by dd oflag=dsync

oflag=dsync means sync after every write - super slow. oflag=direct (no caching) and conv=fsync (flush after all writes) might be better.

On NVME:

$: time dd if=/dev/zero of=t bs=1M count=100 status=none oflag=dsync real 0m0.114s user 0m0.000s sys 0m0.057s $: time dd if=/dev/zero of=t bs=1M count=100 status=none real 0m0.057s user 0m0.000s sys 0m0.053s

On SSD:

$: time dd if=/dev/zero of=t bs=1M count=100 status=none oflag=dsync real 0m0.514s user 0m0.001s sys 0m0.347s $: time dd if=/dev/zero of=t bs=1M count=100 status=none real 0m0.330s user 0m0.001s sys 0m0.265s

alright "super slow" may be exaggerated :) but it does get much slower for bigger writes from within a vm:

core@localhost ~ $ time sudo dd if=/dev/zero of=/dev/vdb bs=1M count=1000 status=none oflag=direct conv=fsync real 0m0.651s user 0m0.001s sys 0m0.009s core@localhost ~ $ time sudo dd if=/dev/zero of=/dev/vdb bs=1M count=1000 status=none oflag=dsync real 0m3.348s user 0m0.001s sys 0m0.010s

Could that lead to the ssh command failing by triggering the panic? Currently it's two separate commands, one that writes the zeros but is not expected to fail but places the corruption successfully, and the second one that is expected to fail. We could merge them into one but maybe the test logic is cleaner this way.

Merging the commands would also work when combined with && to keep the error checking for the first one. When settings flags for dd to remove the sync we would just have to do enough tests to make sure that the test doesn't become flaky, or we could also keep the extra sync just to be sure.

pothos requested a review from a team April 3, 2024 07:39

pothos mentioned this pull request Apr 3, 2024

app-emulation/actool,acbuild: Remove actool and acbuild flatcar/scripts#1817

Merged

tormath1 approved these changes Apr 3, 2024

View reviewed changes

krnowak reviewed Apr 3, 2024

View reviewed changes

ader1990 reviewed Apr 3, 2024

View reviewed changes

pothos merged commit 34752ee into flatcar-master Apr 4, 2024
2 checks passed

pothos deleted the kai/verity-test branch April 4, 2024 08:22

github-actions bot mentioned this pull request Apr 22, 2024

Monthly contributions report 2024-03-22 - 2024-04-21 flatcar/Flatcar#1435

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kola: Increase amount of disk corruption in verity test #515

kola: Increase amount of disk corruption in verity test #515

pothos commented Apr 3, 2024

krnowak Apr 3, 2024

pothos Apr 4, 2024

ader1990 Apr 3, 2024

jepio Apr 3, 2024 •

edited

Loading

ader1990 Apr 3, 2024 •

edited

Loading

ader1990 Apr 3, 2024 •

edited

Loading

jepio Apr 3, 2024 •

edited

Loading

pothos Apr 4, 2024

jepio Apr 4, 2024

pothos Apr 4, 2024

		// write zero bytes to first 100 MB
		c.MustSSH(m, fmt.Sprintf(`sudo dd if=/dev/zero of=%s bs=1M count=100 status=none`, usrdev))

kola: Increase amount of disk corruption in verity test #515

kola: Increase amount of disk corruption in verity test #515

Conversation

pothos commented Apr 3, 2024

How to use

Testing done

krnowak Apr 3, 2024

Choose a reason for hiding this comment

pothos Apr 4, 2024

Choose a reason for hiding this comment

ader1990 Apr 3, 2024

Choose a reason for hiding this comment

jepio Apr 3, 2024 • edited Loading

Choose a reason for hiding this comment

ader1990 Apr 3, 2024 • edited Loading

Choose a reason for hiding this comment

ader1990 Apr 3, 2024 • edited Loading

Choose a reason for hiding this comment

jepio Apr 3, 2024 • edited Loading

Choose a reason for hiding this comment

pothos Apr 4, 2024

Choose a reason for hiding this comment

jepio Apr 4, 2024

Choose a reason for hiding this comment

pothos Apr 4, 2024

Choose a reason for hiding this comment

jepio Apr 3, 2024 •

edited

Loading

ader1990 Apr 3, 2024 •

edited

Loading

ader1990 Apr 3, 2024 •

edited

Loading

jepio Apr 3, 2024 •

edited

Loading