-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod,roachtest: fix mounting on GCP and disable RAID0 in roachtests #72803
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nicktrav and @tbg)
pkg/roachprod/vm/gce/utils.go, line 114 at r1 (raw file):
{{ else }} raiddisk="/dev/md0" mdadm -q --create ${raiddisk} --level=0 --raid-devices=${#disks[@]} "${disks[@]}"
Note for @nicktrav : it's not clear to me that this functionality works even now. Prior to my fixes, when I ran roachtest kv/multi-store-with-overload
, which (admittedly) creates nodes with 2 SSDs intending them to be separate stores rather than a RAID 0 array, this line resulted in the following error message:
GCEMetadataScripts: startup-script: mdadm: Fail create md0 when using /sys/module/md_mod/parameters/new_array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @nicktrav)
pkg/roachprod/vm/gce/utils.go, line 101 at r1 (raw file):
mkfs.ext4 -q -F ${disk} mount -o ${mount_opts} ${disk} ${mountpoint} echo "{d} ${mountpoint} ext4 ${mount_opts} 1 1" | tee -a /etc/fstab
Shouldn't this be ${d}
?
Also, couldn't we write to fstab and then "just" call mount -a
which gives us high confidence that it will work after a restart?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 4 files at r1, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @nicktrav)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nicktrav and @tbg)
pkg/roachprod/vm/gce/utils.go, line 101 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Shouldn't this be
${d}
?
Also, couldn't we write to fstab and then "just" callmount -a
which gives us high confidence that it will work after a restart?
Shouldn’t what be ${d}
? That is what I am changing the first param to, do you mean a different one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @nicktrav)
pkg/roachprod/vm/gce/utils.go, line 101 at r1 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
Shouldn’t what be
${d}
? That is what I am changing the first param to, do you mean a different one?
It reads {d}
now, not ${d}
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @tbg)
pkg/roachprod/vm/aws/aws.go, line 304 at r1 (raw file):
o.ImageAMI, "Override image AMI to use. See https://awscli.amazonaws.com/v2/documentation/api/latest/reference/ec2/describe-images.html") flags.BoolVar(&o.UseMultipleDisks, ProviderName+"-enable-multiple-stores", true, "Enable the use of multiple stores by creating one store directory per disk. "+
Is it safe to just change the default like this? Do you need to announce this? I realize that for GCP, this patch reverts to old behavior. However, for AWS, there are probably tests that depend on this being false
.
I was curious about this when I was writing the original patch, so I did some digging and it looks like it used to be explicitly set to false
. That changed in this commit. Setting it explicitly seems clearer, though my vote would be to keep this as false
.
I don't really have a horse in this race, so I won't oppose changing this to true (we'll just need to update our Pebble benchmark scripts). However, I just want to point out that we're changing behavior for AWS.
pkg/roachprod/vm/gce/utils.go, line 101 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
It reads
{d}
now, not${d}
.
Thanks for catching and fixing.
pkg/roachprod/vm/gce/utils.go, line 114 at r1 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
Note for @nicktrav : it's not clear to me that this functionality works even now. Prior to my fixes, when I ran roachtest
kv/multi-store-with-overload
, which (admittedly) creates nodes with 2 SSDs intending them to be separate stores rather than a RAID 0 array, this line resulted in the following error message:GCEMetadataScripts: startup-script: mdadm: Fail create md0 when using /sys/module/md_mod/parameters/new_array
I think it's just a poor error message. My understanding is that it just means that module wasn't used to create the array. I looked at the source code, and it returns 0
immediately after printing this.
If you read further down in the script output, you'll see the startup script prints the status of the block devices, and the striped disk has indeed been mounted.
For example:
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: loop0 7:0 0 55.4M 1 loop /snap/core18/2066
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: loop1 7:1 0 230.1M 1 loop /snap/google-cloud-sdk/184
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: loop2 7:2 0 67.6M 1 loop /snap/lxd/20326
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: loop3 7:3 0 32.1M 1 loop /snap/snapd/12057
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: sda 8:0 0 10G 0 disk
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: ├─sda1 8:1 0 9.9G 0 part /
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: ├─sda14 8:14 0 4M 0 part
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─sda15 8:15 0 106M 0 part /boot/efi
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n1 259:0 0 375G 0 disk
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n2 259:1 0 375G 0 disk
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n3 259:2 0 375G 0 disk
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n4 259:3 0 375G 0 disk
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n5 259:4 0 375G 0 disk
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n6 259:5 0 375G 0 disk
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n7 259:6 0 375G 0 disk
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n8 259:7 0 375G 0 disk
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1
Stopping �[0;1;39mOpenBSD Secure Shell server�[0m...
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: Filesystem Size Used Avail Use% Mounted on
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/root 9.6G 1.6G 8.0G 17% /
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: devtmpfs 7.4G 0 7.4G 0% /dev
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: tmpfs 7.4G 0 7.4G 0% /dev/shm
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: tmpfs 1.5G 2.0M 1.5G 1% /run
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: tmpfs 5.0M 0 5.0M 0% /run/lock
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/loop1 231M 231M 0 100% /snap/google-cloud-sdk/184
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/loop0 56M 56M 0 100% /snap/core18/2066
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/sda15 105M 7.9M 97M 8% /boot/efi
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/loop2 68M 68M 0 100% /snap/lxd/20326
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/loop3 33M 33M 0 100% /snap/snapd/12057
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: tmpfs 1.5G 0 1.5G 0% /run/user/1050
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/md0 2.9T 89M 2.8T 1% /mnt/data1
Funnily enough, they changed the error message address the confusion we're talking about :)
(Replying directly on GitHub PR as Reviewable is down).
Ah yes, thanks!
While looking (perhaps non-exhaustively) at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nicktrav and @tbg)
pkg/roachprod/vm/gce/utils.go, line 101 at r1 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
It reads
{d}
now, not${d}
.Ah yes, thanks!
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @AlexTalks and @tbg)
@nicktrav also any preferences on what @tbg proposed with writing to /etc/fstab
first and then just calling mount -a
? (I suppose we'd just have to skip this for ZFS?)
pkg/roachprod/vm/gce/utils.go, line 114 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
I think it's just a poor error message. My understanding is that it just means that module wasn't used to create the array. I looked at the source code, and it returns
0
immediately after printing this.If you read further down in the script output, you'll see the startup script prints the status of the block devices, and the striped disk has indeed been mounted.
For example:
Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: loop0 7:0 0 55.4M 1 loop /snap/core18/2066 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: loop1 7:1 0 230.1M 1 loop /snap/google-cloud-sdk/184 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: loop2 7:2 0 67.6M 1 loop /snap/lxd/20326 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: loop3 7:3 0 32.1M 1 loop /snap/snapd/12057 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: sda 8:0 0 10G 0 disk Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: ├─sda1 8:1 0 9.9G 0 part / Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: ├─sda14 8:14 0 4M 0 part Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─sda15 8:15 0 106M 0 part /boot/efi Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n1 259:0 0 375G 0 disk Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n2 259:1 0 375G 0 disk Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n3 259:2 0 375G 0 disk Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n4 259:3 0 375G 0 disk Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n5 259:4 0 375G 0 disk Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n6 259:5 0 375G 0 disk Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n7 259:6 0 375G 0 disk Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: nvme0n8 259:7 0 375G 0 disk Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: └─md0 9:0 0 3T 0 raid0 /mnt/data1 Stopping �[0;1;39mOpenBSD Secure Shell server�[0m... Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: Filesystem Size Used Avail Use% Mounted on Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/root 9.6G 1.6G 8.0G 17% / Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: devtmpfs 7.4G 0 7.4G 0% /dev Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: tmpfs 7.4G 0 7.4G 0% /dev/shm Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: tmpfs 1.5G 2.0M 1.5G 1% /run Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: tmpfs 5.0M 0 5.0M 0% /run/lock Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: tmpfs 7.4G 0 7.4G 0% /sys/fs/cgroup Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/loop1 231M 231M 0 100% /snap/google-cloud-sdk/184 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/loop0 56M 56M 0 100% /snap/core18/2066 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/sda15 105M 7.9M 97M 8% /boot/efi Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/loop2 68M 68M 0 100% /snap/lxd/20326 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/loop3 33M 33M 0 100% /snap/snapd/12057 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: tmpfs 1.5G 0 1.5G 0% /run/user/1050 Nov 16 15:52:43 travers-test-0001 GCEMetadataScripts[3276]: 2021/11/16 15:52:43 GCEMetadataScripts: startup-script: /dev/md0 2.9T 89M 2.8T 1% /mnt/data1
Funnily enough, they changed the error message address the confusion we're talking about :)
OK - Good to know, thanks! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nicktrav and @tbg)
pkg/roachprod/vm/aws/aws.go, line 304 at r1 (raw file):
if you might have some insight @nicktrav.
This has been my first foray into the roachtest infra. Unfortunately, I don't have insight other than what we need for the Pebble / storage benchmarks. I did a quick look too, and the only other multi-SSD configuration (in code) was here.
While looking (perhaps non-exhaustively) at roachtest's invocations of roachprod, it seemed like when a test's ClusterSpec specified multiple disks, it was for multiple stores and not for a single RAID 0 store.
Seems reasonable. Perhaps worth a plug internally (mailing list or Slack) to mention this, just in case?
incorporate the --[gce|aws]-enable-multiple-stores flag by default
I'll put up a patch that explicitly sets this to make it obvious that we (storage) depend on the functionality, and will therefore be immune to whatever the default is set to.
e16cc14
to
0973008
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nicktrav and @tbg)
pkg/roachprod/vm/aws/aws.go, line 304 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
if you might have some insight @nicktrav.
This has been my first foray into the roachtest infra. Unfortunately, I don't have insight other than what we need for the Pebble / storage benchmarks. I did a quick look too, and the only other multi-SSD configuration (in code) was here.
While looking (perhaps non-exhaustively) at roachtest's invocations of roachprod, it seemed like when a test's ClusterSpec specified multiple disks, it was for multiple stores and not for a single RAID 0 store.
Seems reasonable. Perhaps worth a plug internally (mailing list or Slack) to mention this, just in case?
incorporate the --[gce|aws]-enable-multiple-stores flag by default
I'll put up a patch that explicitly sets this to make it obvious that we (storage) depend on the functionality, and will therefore be immune to whatever the default is set to.
Discussed on internal Slack and agreed it would be best to avoid side effects here. For now we'll just make sure the default in roachtests that create nodes with multiple SSDs is to mount multiple stores, and if RAID0 behavior is needed in roachtest at some point it can be added as a ClusterSpec
option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 5 of 5 files at r2, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @AlexTalks)
pkg/roachprod/vm/gce/utils.go, line 101 at r1 (raw file):
Previously, AlexTalks (Alex Sarkesian) wrote…
@nicktrav also any preferences on what @tbg proposed with writing to
/etc/fstab
first and then just callingmount -a
? (I suppose we'd just have to skip this for ZFS?)
No preference there. I'll just say that altering these bash scripts is brittle, so I'd suggest testing manually across a few combinations of config, if you do.
Fixes bugs introduced by cockroachdb#72553, which ported support for multiple stores from AWS to GCP. This change fixes the bugs that caused disks mounted to not be written to `/etc/fstab` on GCP instances, causing mounts to go missing on instance restart. Additionally, this change modifies roachtest to default to creating multiple stores on GCP in roachprod invocations rather than mounting a RAID 0 array, thus preserving the existing behavior used in roachtests such as `kv/multi-store-with-overload`. Fixes cockroachdb#72635 Release note: None
0973008
to
3a7a29d
Compare
bors r+ |
Build failed (retrying...): |
Build succeeded: |
blathers backport 21.2 |
Fixes bugs introduced by #72553, which ported support for multiple
stores from AWS to GCP. This change fixes the bugs that caused
disks mounted to not be written to
/etc/fstab
on GCP instances,causing mounts to go missing on instance restart. Additionally, this
change modifies roachtest to default to creating multiple stores on GCP
in roachprod invocations rather than mounting a RAID 0 array, thus
preserving the existing behavior used in roachtests such as
kv/multi-store-with-overload
.Fixes #72635
Release note: None