Minimise TCB on during verification by breaking out the verify subcommand #598

blenessy · 2024-06-18T10:10:22Z

contrast v0.7.0 is very big ~60 MB in size with DWARF and symbols removed.

I've tested breaking out the verify subcommand from the contrast CLI into its own binary (called verify). The size of verify is 9.7MB.
I analysed the contents of this binary with GSA. I attached the html report so you can see for yourselves. Spoiler: bigger code chunk is related to the GRPC protocol.

Do you guys think this is a good idea to do this (breaking out verify). Would you accept a PR with separated verify binary ?

(FWIW. I would put in more effort to further minimise TCB of verify after the separation - I'm hoping to bring the size down to 4-5 MB).

verify.html.gz

The text was updated successfully, but these errors were encountered:

burgerdev · 2024-06-18T15:05:15Z

Hi @blenessy,

Thanks for the suggestion, I'm pretty happy that you're taking Contrast for a spin and considering contributions!

Regarding Trusted Computing Base

I fear that a reduction in binary size does not necessarily imply a reduction in TCB. Even if you compile a small verify binary and use it to get the Coordinator attestation doc and manifest, you still need to verify them. Things that come to mind:

The guest image (and its build process!) is in the TCB, even if this is only reflected by a TrustedMeasurement hash.
More or less the same goes for the coordinator/initializer OCI images.
The expected workload policies are either from your own run of generate or from somebody else's, but either way generate is now part of your TCB.

Now you might ask yourself why the policies obtained by `verify` are not sufficient for verification. This is mostly due to how workload policies in Kata operate (I wrote a paragraph about that [here](https://github.com/edgelesssys/contrast/blob/v0.7.0/dev-docs/coco/policy.md#storage-rules)): given a bunch of yaml it is easy to verify that `generate` produces the same result as retrieved by `verify`, but working backwards from policies to k8s resource definitions is infeasible, if only because of the precalculated dm-verity hashes for the OCI layers. This may change in the future, depending on the image handling strategy decided by upstream CoCo, but we're stuck with it for now.

At the end of the day, the TCB is the transitive closure of all Contrast components and their dependencies. If the goal is to reduce the TCB, we should imho start by reducing dependencies overall.

Regarding a Standalone Verify

That being said, I can imagine situations where a smaller verify binary would be useful, even if the total TCB is unaffected. Built with the correct reference values and somehow equipped with a manifest through a side-channel, this could make verification feasible even on very constrained systems.

On the other hand, we still want to support the verify subcommand in the main binary, and we're not really eager to maintain this functionality in two binaries. However, if this is useful to you and you see a low effort way to add and maintain that second binary, I'd be open to adding it to a contrib folder, for example.

Regarding Binary Size Overall

There are three main contributors as far as I'm aware of (see table below):

The embedded genpolicy tool
Kubernetes client libs
gRPC

These are all used by contrast generate , so we can't avoid packaging these with the CLI.

```plain $ gsa ./contrast +-----------------------------------------------------------------------------+ | contrast | +---------+----------------------------------------------+--------+-----------+ | PERCENT | NAME | SIZE | TYPE | +---------+----------------------------------------------+--------+-----------+ | 38.59% | .noptrdata | 23 MB | section | | 16.30% | k8s.io/api | 9.8 MB | vendor | | 12.47% | .gopclntab | 7.5 MB | section | | 10.55% | .rodata | 6.4 MB | section | | 2.80% | k8s.io/client-go | 1.7 MB | vendor | | 1.82% | github.com/google/gnostic-models | 1.1 MB | vendor | | 1.62% | google.golang.org/protobuf | 980 kB | vendor | | 1.48% | k8s.io/apimachinery | 893 kB | vendor | | 1.46% | crypto | 879 kB | std | | 1.27% | net | 764 kB | std | | 0.98% | google.golang.org/grpc | 589 kB | vendor | [...] ```

Cheers, Markus

blenessy · 2024-06-18T19:48:44Z

Thanks for the quick and very insightful response @burgerdev !

As you suspected, I was under the assumption contrast verify + the evidence downloaded (by contrast verify) to the ./verify directory is relatively easy to process. But it does sound like it is not as straight forwards as I anticipated :).

I will definitely start by exploring your concerns more in depth before moving forward with this.

blenessy · 2024-06-30T10:47:23Z

Hi @burgerdev ! I get it (especially after reading through Life of a Confidential Container): verification is currently (0.7.1) not trivial and the contrast cli does a lot of the heavy lifting for us.

Nevertheless I still predict :) that you will eventually break up the contrast cli into multiple CLIs considering that it seems to be hard to keep contrast technology agnostic. Already the currently bundled genpolicy is an msft fork - what happens when other cloud providers want you to bundle their forks ?

When/if that happens it might be a good idea get inspired by how git-lfs can extend the git command and from a DevX perspective it shows up as git lfs sub-command. Apart from the separation of uncommon code (MS vs other cloud providers, AMD SEV-SNP vs TDX) you will also get the following benefits:

Less to audit - End user's ("data owners") only audit the subcommand that you use/care about and have actually changed between releases. Assuming that the sub-commands are built reproducible => you only need to look closer if a sub-command has a different SHA256.
Possibility to optimise (performance, security, formal verification, portability) each sub-command differently by using different programming languages (rust, shell, C/C++).

In the shorter term I do agree that the overall size minimising the TCB of the AKS flow is important/urgent.
In particular the following is a huge part of the TCB:

kata-image: The 97 "CBL Mariner" rpms - can we leave some of the RPMs out of the Pod VM?
contrast: minimise dependencies and maybe use protocols with less footprint?
- How attached are you to gRPC ? Considering that the current gRPC APIs (user-api.proto and mesh-api.proto) are tiny and rarely invoked I do not see why you need the full power (and overhead) of gRPC. Have you guys considered a traditional REST API with JSON ?

katexochen added the feature request Proposing a new feature label Jun 18, 2024

katexochen assigned burgerdev Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Minimise TCB on during verification by breaking out the verify subcommand #598

Minimise TCB on during verification by breaking out the verify subcommand #598

blenessy commented Jun 18, 2024

burgerdev commented Jun 18, 2024

blenessy commented Jun 18, 2024

blenessy commented Jun 30, 2024

Minimise TCB on during verification by breaking out the verify subcommand #598

Minimise TCB on during verification by breaking out the verify subcommand #598

Comments

blenessy commented Jun 18, 2024

burgerdev commented Jun 18, 2024

Regarding Trusted Computing Base

Regarding a Standalone Verify

Regarding Binary Size Overall

blenessy commented Jun 18, 2024

blenessy commented Jun 30, 2024