Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimise TCB on during verification by breaking out the verify subcommand #598

Open
blenessy opened this issue Jun 18, 2024 · 3 comments
Open
Assignees
Labels
feature request Proposing a new feature

Comments

@blenessy
Copy link
Contributor

contrast v0.7.0 is very big ~60 MB in size with DWARF and symbols removed.

I've tested breaking out the verify subcommand from the contrast CLI into its own binary (called verify). The size of verify is 9.7MB.
I analysed the contents of this binary with GSA. I attached the html report so you can see for yourselves. Spoiler: bigger code chunk is related to the GRPC protocol.

Do you guys think this is a good idea to do this (breaking out verify). Would you accept a PR with separated verify binary ?

(FWIW. I would put in more effort to further minimise TCB of verify after the separation - I'm hoping to bring the size down to 4-5 MB).

verify.html.gz

@katexochen katexochen added the feature request Proposing a new feature label Jun 18, 2024
@burgerdev
Copy link
Contributor

Hi @blenessy,

Thanks for the suggestion, I'm pretty happy that you're taking Contrast for a spin and considering contributions!

Regarding Trusted Computing Base

I fear that a reduction in binary size does not necessarily imply a reduction in TCB. Even if you compile a small verify binary and use it to get the Coordinator attestation doc and manifest, you still need to verify them. Things that come to mind:

  • The guest image (and its build process!) is in the TCB, even if this is only reflected by a TrustedMeasurement hash.
  • More or less the same goes for the coordinator/initializer OCI images.
  • The expected workload policies are either from your own run of generate or from somebody else's, but either way generate is now part of your TCB.
Now you might ask yourself why the policies obtained by `verify` are not sufficient for verification. This is mostly due to how workload policies in Kata operate (I wrote a paragraph about that [here](https://github.com/edgelesssys/contrast/blob/v0.7.0/dev-docs/coco/policy.md#storage-rules)): given a bunch of yaml it is easy to verify that `generate` produces the same result as retrieved by `verify`, but working backwards from policies to k8s resource definitions is infeasible, if only because of the precalculated dm-verity hashes for the OCI layers. This may change in the future, depending on the image handling strategy decided by upstream CoCo, but we're stuck with it for now.

At the end of the day, the TCB is the transitive closure of all Contrast components and their dependencies. If the goal is to reduce the TCB, we should imho start by reducing dependencies overall.

Regarding a Standalone Verify

That being said, I can imagine situations where a smaller verify binary would be useful, even if the total TCB is unaffected. Built with the correct reference values and somehow equipped with a manifest through a side-channel, this could make verification feasible even on very constrained systems.

On the other hand, we still want to support the verify subcommand in the main binary, and we're not really eager to maintain this functionality in two binaries. However, if this is useful to you and you see a low effort way to add and maintain that second binary, I'd be open to adding it to a contrib folder, for example.

Regarding Binary Size Overall

There are three main contributors as far as I'm aware of (see table below):

  • The embedded genpolicy tool
  • Kubernetes client libs
  • gRPC

These are all used by contrast generate , so we can't avoid packaging these with the CLI.

```plain $ gsa ./contrast +-----------------------------------------------------------------------------+ | contrast | +---------+----------------------------------------------+--------+-----------+ | PERCENT | NAME | SIZE | TYPE | +---------+----------------------------------------------+--------+-----------+ | 38.59% | .noptrdata | 23 MB | section | | 16.30% | k8s.io/api | 9.8 MB | vendor | | 12.47% | .gopclntab | 7.5 MB | section | | 10.55% | .rodata | 6.4 MB | section | | 2.80% | k8s.io/client-go | 1.7 MB | vendor | | 1.82% | github.com/google/gnostic-models | 1.1 MB | vendor | | 1.62% | google.golang.org/protobuf | 980 kB | vendor | | 1.48% | k8s.io/apimachinery | 893 kB | vendor | | 1.46% | crypto | 879 kB | std | | 1.27% | net | 764 kB | std | | 0.98% | google.golang.org/grpc | 589 kB | vendor | [...] ```

Cheers, Markus

@blenessy
Copy link
Contributor Author

Thanks for the quick and very insightful response @burgerdev !

As you suspected, I was under the assumption contrast verify + the evidence downloaded (by contrast verify) to the ./verify directory is relatively easy to process. But it does sound like it is not as straight forwards as I anticipated :).

I will definitely start by exploring your concerns more in depth before moving forward with this.

@blenessy
Copy link
Contributor Author

Hi @burgerdev ! I get it (especially after reading through Life of a Confidential Container): verification is currently (0.7.1) not trivial and the contrast cli does a lot of the heavy lifting for us.

Nevertheless I still predict :) that you will eventually break up the contrast cli into multiple CLIs considering that it seems to be hard to keep contrast technology agnostic. Already the currently bundled genpolicy is an msft fork - what happens when other cloud providers want you to bundle their forks ?

When/if that happens it might be a good idea get inspired by how git-lfs can extend the git command and from a DevX perspective it shows up as git lfs sub-command. Apart from the separation of uncommon code (MS vs other cloud providers, AMD SEV-SNP vs TDX) you will also get the following benefits:

  • Less to audit - End user's ("data owners") only audit the subcommand that you use/care about and have actually changed between releases. Assuming that the sub-commands are built reproducible => you only need to look closer if a sub-command has a different SHA256.
  • Possibility to optimise (performance, security, formal verification, portability) each sub-command differently by using different programming languages (rust, shell, C/C++).

In the shorter term I do agree that the overall size minimising the TCB of the AKS flow is important/urgent.
In particular the following is a huge part of the TCB:

  • kata-image: The 97 "CBL Mariner" rpms - can we leave some of the RPMs out of the Pod VM?
  • contrast: minimise dependencies and maybe use protocols with less footprint?
    • How attached are you to gRPC ? Considering that the current gRPC APIs (user-api.proto and mesh-api.proto) are tiny and rarely invoked I do not see why you need the full power (and overhead) of gRPC. Have you guys considered a traditional REST API with JSON ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Proposing a new feature
Projects
None yet
Development

No branches or pull requests

3 participants