-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachprod: stage arm64 binary #103243
roachprod: stage arm64 binary #103243
Conversation
91babf6
to
740cb04
Compare
740cb04
to
a7a4e6f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one question inline.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 10 files at r1, 7 of 7 files at r2, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rail and @smg260)
a7a4e6f
to
0f6ab46
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 7 files at r2, 11 of 11 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rail and @srosenberg)
pkg/roachprod/roachprod.go
line 519 at r3 (raw file):
os := "linux" arch := "amd64"
this logic looks a little clunky - is it deliberate?
isLocal()
as the first condition seems cleaner, with parameters always taking precedence
os := ...
arch := ...
if c.IsLocal() {
}
if stageOs != "" { .. override }
if stageArch != "" {.. override }
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rail and @smg260)
pkg/roachprod/roachprod.go
line 519 at r3 (raw file):
Previously, smg260 (Miral Gadani) wrote…
this logic looks a little clunky - is it deliberate?
isLocal()
as the first condition seems cleaner, with parameters always taking precedenceos := ... arch := ... if c.IsLocal() { } if stageOs != "" { .. override } if stageArch != "" {.. override }
Good catch! Technically, you should be allowed to stage an arch that's different from your local; e.g., running (emulated) amd64 on apple silicon. Either way, IsLocal
should logically come first, and I'll add a warning if someone is attempting to stage an os/arch which differs from local.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @rail, @smg260, and @srosenberg)
pkg/roachprod/roachprod.go
line 519 at r3 (raw file):
Previously, srosenberg (Stan Rosenberg) wrote…
Good catch! Technically, you should be allowed to stage an arch that's different from your local; e.g., running (emulated) amd64 on apple silicon. Either way,
IsLocal
should logically come first, and I'll add a warning if someone is attempting to stage an os/arch which differs from local.
Could these defaults also be coming from the command line? (i.e., StringVar(&stageArch, "arch", defaultArch, ...)
where defaultArch = "amd64"
? Then: 1) we don't need to check for command line overwrites; 2) --help
is more explicit about the default values used; and 3) we don't need to support empty stageArch
values (see comment on archInfoForOS
).
pkg/roachprod/install/staging.go
line 98 at r3 (raw file):
return darwin_arm64_ArchInfo, nil } return darwin_x86_64_ArchInfo, nil
One less desirable property of this approach is that a typo (or even a genuinely unsupported arch) will silently use x86_64
(e.g., --arch amd63
)
0f6ab46
to
f6aac57
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @herkolategan, @rail, @renatolabs, and @smg260)
pkg/roachprod/roachprod.go
line 519 at r3 (raw file):
Could these defaults also be coming from the command line?
Sorta. The multi-value validation story for cobra is not great. Technically, you can define a custom pflag.Value
interface, but that seems like an overkill. Instead, I borrowed the approach from the roachtest
CLI which installs a custom validation function by using thePersistentPreRun
hook.
That takes care of CLI validation, but recall that roachprod is also used as an API. archInfoForOS
is used internally for staging, so added a validation bit there too. It's still not bullet-proof but maybe good enough, considering we currently don't validate a bunch of other args; i.e., to be revisited.
pkg/roachprod/install/staging.go
line 98 at r3 (raw file):
Previously, renatolabs (Renato Costa) wrote…
One less desirable property of this approach is that a typo (or even a genuinely unsupported arch) will silently use
x86_64
(e.g.,--arch amd63
)
Yep, added explicit precondition to validate supported values. I am not foreseeing other architectures we'd support in the near future, but you never know. If a new one does pop up, then we can refactor it to enum
and use a custom pflag.Value
(with shell autocompletion). For now, that seems like an overkill, would you agree?
f6aac57
to
1e98906
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 7 files at r2, 5 of 5 files at r4, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @rail and @renatolabs)
1e98906
to
1afbc37
Compare
After a final check, I discovered that the binary staging logic drifted from its source of truth ( PTAL! |
Add `--arch` to override binary's architecture and refactor. As of this change, `roachprod stage` is able to stage both amd64 and arm64 on linux and darwin, as well as FIPS-enabled binary built for amd64. In conjunction with the previous change [1], roachprod now uses arm64-based AMI for graviton2/graviton3 machines. Below is an example of how to create a VM with graviton3, ``` roachprod create -n1 --clouds aws --aws-machine-type m7g.2xlarge --local-ssd=false $CRL_USERNAME-test roachprod stage --arch arm64 $CRL_USERNAME-test release v23.1.0-rc.2 roachprod start $CRL_USERNAME-test ``` [1] cockroachdb#103236 Epic: none Release note: None
1afbc37
to
153ac47
Compare
TFTR! Next PR adds a few small improvements, so merging this one since it seems functionally correct. bors r=rail,herkolategan,smg260 |
Build failed: |
bors retry |
Build succeeded: |
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. In case, a test is _not_ compatible with the chosen configuration, its provisioning will fail. Thus, '1' is typically used for manual (debug) runs. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. In case, a test is _not_ compatible with the chosen configuration, its provisioning will fail. Thus, '1' is typically used for manual (debug) runs. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Resolves: cockroachdb#89268 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Resolves: cockroachdb#89268 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Resolves: cockroachdb#89268 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Resolves: cockroachdb#89268 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Resolves: cockroachdb#89268 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Resolves: cockroachdb#89268 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
103710: roachtest: metamorphic ARM64 and FIPS clusters r=smg260,herkolategan a=srosenberg Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. Epic: none Release note: None Resolves: #94957 Informs: #94986 [1] #99224 [2] #103243 Co-authored-by: Stan Rosenberg <[email protected]>
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Resolves: cockroachdb#89268 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
Previously, all roachtests used (cloud) machine types with the AMD64 (cpu) architecture. Recently [1], new CI infrastructure was added to run a clone of all the nightly roachtests, configured with FIPS; i.e., same AMD64 machine types, different AMI and crdb binary, patched with FIPS-certified openssl native code. As of this PR, we add the capability to execute any roachtest in a cluster, configured with either ARM64, FIPS, or AMD64 (default). This is controlled via the two CLI args: `metamorphic-arm64-probability` and `metamorphic-fips-probability`. The former denotes the probability (over the uniform distribution) of a new cluster provisioned using ARM64 VMs. The latter denotes the probability of a new AMD64 cluster provisioned with the FIPS-compliant (kernel) configuration. In case a test is compatible only with AMD64, it's effectively excluded from the set; i.e., both probabilities apply to compatible tests only. Note, the two probabilties don't have to add up to 1. E.g., `metamorphic-arm64-probability==0.4`, `metamorphic-fips-probability==0.2` denotes that ARM64 clusters are chosen ~40% of the time, whereas of the remaining ~60% AMD clusters, FIPS is chosen ~20% of the time; i.e., ~12% of all clusters will use FIPS. Note, the values '0' and '1' are absolute. Setting both to '0' is tantamount to the behavior before this PR. Setting either to '1' enforces _all_ clusters are provisioned with either ARM64 or FIPS. A test can specify its required architecture, in which case, it takes precedence over metamorphic settings. This PR builds on [1], which enabled ARM64 provisioning for AWS in roachprod. We add ARM64 provisioning for GCE, i.e., T2A, as well as refactor 'arch' argument to denote one of: AMD64, ARM64, FIPS, where the latter isn't formally a CPU architecture; however, it simplifies provisioning and binary staging. We also modify roachprod.List to display CPU architecture, other than AMD64, with the machine type; this should make it easier to see which clusters are running ARM64 and FIPS configurations, as we ramp up their testing. The PR also adds validation to cockroach binaries and libs to ensure we can execute tests under ARM64 and FIPS. Furthermore, we add 'Enabled Assertions' header, generated at build time, to the cockroach binary; the header is used to validate whether or not the binary has runtime assertions enabled. Epic: none Release note: None Resolves: cockroachdb#94957 Resolves: cockroachdb#89268 Informs: cockroachdb#94986 [1] cockroachdb#99224 [2] cockroachdb#103243
Add
--arch
to override binary's architecture and refactor.As of this change,
roachprod stage
is able to stage bothamd64 and arm64 on linux and darwin.
In conjunction with the previous change [1], roachprod
now uses arm64-based AMI for graviton2/graviton3 machines.
Below is an example of how to create a VM with graviton3,
[1] #103236
Epic: none
Release note: None