-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc: add platform support RFC #1065
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,111 @@ | ||
# RFC 008: Platform Support | ||
|
||
## Objective | ||
|
||
Implement support for diverse platforms in Contrast, ensuring that: | ||
|
||
1. Users can mix-and-match features throughout all platforms supported by Contrast | ||
2. Code with a platform dependency can be deduplicated | ||
3. New platforms and platform features can be added with low complexity (that's `O(1)`, at best) | ||
|
||
## Problem statement | ||
|
||
Contrast currently distinguishes between platforms based on their distinct features. | ||
Taking the `AKS-CLH-SNP` platform as an example, this serialized platform string conveys | ||
the following information: | ||
|
||
- Kubernetes engine: `AKS` | ||
- (Implicit: Kata upstream / Microsoft fork) | ||
- Hypervisor: `CLH` | ||
- Hardware platform: `SNP` | ||
|
||
The user picks such a platform through its serialized string (for example `AKS-CLH-SNP`), and the | ||
code then implements actions based upon that, such as when writing platform-specific Kata | ||
configuration values in the [node-installer]. | ||
|
||
With new features being added to Contrast, such as GPU support, more information needs | ||
to be contained in such a platform description. | ||
|
||
This comes at the cost of growing complexity of conditional statements | ||
depending on the platform throughout the implementation code, as the | ||
current implementation doesn't distinguish between platform features | ||
(for example which hypervisor is used) but only between full platform combinations. | ||
|
||
[node-installer]: https://github.com/edgelesssys/contrast/blob/3f39682ea9a383b2557923e257bd065e461b8ee6/nodeinstaller/internal/constants/constants.go#L48 | ||
|
||
## Proposal | ||
|
||
### Structured Representation | ||
|
||
Throughout the implementation code, platforms shouldn't be represented as _individual combinations_, | ||
but rather through a generic, structured type that contains information about their _features_. | ||
|
||
So `AKS-CLH-SNP` could become something like this: | ||
|
||
```go | ||
type Platform struct { | ||
KubernetesDistribution KubernetesDistribution | ||
Hypervisor Hypervisor | ||
HardwarePlatform HardwarePlatform | ||
} | ||
|
||
// AKS-CLH-SNP | ||
foo := Platform{ | ||
KubernetesDistribution: AKS, | ||
Hypervisor: CloudHypervisor, | ||
HardwarePlatform: SNPMilan, | ||
} | ||
``` | ||
|
||
This would allow the implementation code (for example the aforementioned switch for [Kata configuration]) | ||
to be reduced to only depend on the information bits it requires. | ||
|
||
For example, this: | ||
|
||
```go | ||
case platforms.MetalQEMUTDX, platforms.K3sQEMUTDX, platforms.RKE2QEMUTDX: | ||
``` | ||
|
||
Could become this: | ||
|
||
```go | ||
case TDX | ||
``` | ||
|
||
This would also enable easy validation of user error, as the individual types can be checked for | ||
validity at parsing time, making invalid states unrepresentable [^1]. | ||
Comment on lines
+75
to
+76
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not that it changes much about the proposal at hand, but "make invalid states unrepresentable" is not achieved if I can do myFancyPlatform := Platform{
KubernetesDistribution: RKE2,
Hypervisor: CloudHypervisor,
HardwarePlatform: Unknown,
} There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Under the assumption of only validating constructors being used in the program, of course. I do acknowledge that my example is counter-intuitive for that, but it's a simplification, after all. |
||
|
||
However, to get to such a structured format, the information needs to be passed to Contrast in another | ||
structured format, such as a configuration file or another canonical serialized format, which should | ||
be discussed in the following section. | ||
|
||
[Kata configuration]: https://github.com/edgelesssys/contrast/blob/3f39682ea9a383b2557923e257bd065e461b8ee6/nodeinstaller/internal/constants/constants.go#L48 | ||
|
||
### Serialization | ||
|
||
As platforms are passed throughout multiple (programming) languages, such as the `justfile` and Go code for | ||
developers, but - in the most primitive way - from a user invoking the CLI in a shell to Go code, there must | ||
be a serializable, machine-readable representation of a platform. | ||
|
||
This could be a configuration file that's read by the CLI and other dependants. | ||
|
||
This would reduce implementation complexity, as Contrast could rely on existing YAML / TOML / JSON / etc. parsing | ||
libraries and no canonical string representation of a platform would need to be implemented. | ||
|
||
Additionally, adding platform features would not lead to unreasonably large string representations, which might | ||
be hard to read for humans at some point. | ||
|
||
## Considerations | ||
|
||
### Node-Installer Images | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Runtimes in the sense of Kata runtimes? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Kubernetes runtimes. |
||
|
||
As it would be complex and costly to publish node-installer images for all supported platform | ||
combinations, it might be better to ship a singular one-size-fits-all node installer with support | ||
for all platforms. This could also be a foundation for heterogeneous deployments that utilize multiple | ||
platforms (for example GPU- and Non-GPU machines). | ||
Comment on lines
+104
to
+105
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The node-installer could figure out a lot of configuration without user interaction - TEE flavour, containerd paths, maybe even hypervisor. On the other hand, outside the runtime the config is only used to decide which genpolicy to use and to populate reference values. I wonder whether we actually need to expose the entire runtime config to the user. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think large parts of it have to be exposed. It might be possible that you have an instance that has GPUs available but don't want to run GPU-enabled Contrast on it, for example. Also, reference value generation needs almost all values, if I'm not mistaken? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You picked the one dimension that I did not list as candidate for automation ;) It's still an interesting question whether / how we want to deal with mixed images. As a user, how do I tell Contrast to use a GPU image for pod A, but a plain image for pod B? I imagine a config file is not granular enough here. If there is only one image per runtime, how do I mix two runtimes with one coordinator? (I think that's a scenario we might want to support eventually, though) The reference values need the image measurements, the genpolicy choice and the hardware reference values, where the last two are only a question of AKS-CLH vs. rest. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
For hardware reference values, I think it's also a matter of TDX vs. SNP (Genoa) vs. SNP (Milan) and so on, right? About mixed images, how the customer decides to do that is TBD still, I think. I don't have the solution at hand for this. About "one image per runtime": I think we could ship all variants in the node-installer if we can make them reasonably small? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We don't fill hardware values for bare metal, so it's just image measurements I think. |
||
|
||
To ensure a good developer experience, bandwidth used for transferring the node-installer images should | ||
be kept to a minimum, which is contrary to the aforementioned idea, but could be counteracted with | ||
minimizing the individual components in the node-installer, such as the node images. | ||
|
||
[^1]: https://geeklaunch.io/blog/make-invalid-states-unrepresentable/ |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -137,3 +137,6 @@ RTMR | |
RTMRs | ||
precalculator | ||
initrd | ||
deduplicated | ||
unrepresentable | ||
serializable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional fields that come to mind:
Snapshotter
,ImageVariant
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate on
ImageVariant
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with GPU support or without, debug or prod, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think those should be additional fields, ultimately. That's the key point I want to convey in this proposal. It seems I need to make that more clear?