Reworked provenance verification. #246

thmsbinder · 2023-09-19T11:19:02Z

Made it more mechanical to better manage expectations by the caller: verification options are a sequence of straightforward checks on an array of provenances. Outcome is either pass or fail (with error) as before.

…tions proto definition

tiziano88 · 2023-09-19T21:00:10Z

proto/verification_options.proto

+  // For every digest, provides a hint about the format. Right now only
+  // "sha2-256" is supported. Empty string is not permitted.
+  repeated string formats = 1;
+
+  // The actual digests - must have the same length as the formats field.
+  // A hex-encoding is assumed for all digests. Empty string is not permitted.
+  repeated string digests = 2;


Not sure what the semantics of these two fields are. Say there are 3 items in the formats and 4 items in digests, what would that mean? I think there should be another proto somewhere else to represent the digest, and use that as a repeated type here. Or just use the human readable format as a single string anyways, since AFAICT you expect users to encode it in a textproto, so having a binary digest is probably not ideal.

len(formats) != len(digests) is invalid/undefined behavior. It is not a map but a multimap (same key may occur > 1 times).

I agree. We should use this opportunity and unify the variants of Digests proto into just one. Defining basic protos in the open source is a good idea because we depend on them both here and internally.

tiziano88 · 2023-09-19T21:00:34Z

proto/verification_options.proto

+// Verifies that the repository coincides with the specified one, for all
+// available provenances.
+message VerifyAllWithRepository {
+  string repositoryUri = 1;


Suggested change

string repositoryUri = 1;

string repository_uri = 1;

tiziano88 · 2023-09-19T21:03:49Z

cmd/endorser/README.md

+If the verification options should be kept in a file (for length reasons), then use
+```bash
+  ...
+  --verification_options="$(cat /tmp/ver_opts.textproto)"


You could also use the shell itself to do this, no need for cat, something like

Suggested change

--verification_options="$(cat /tmp/ver_opts.textproto)"

--verification_options="$(</tmp/ver_opts.textproto)"

tiziano88 · 2023-09-19T21:07:18Z

cmd/endorser/README.md

 ```

-See [this comment](https://github.com/project-oak/oak/pull/4191#issuecomment-1643932356) as the
-source of the binary and provenance info.
+If the verification options should be kept in a file (for length reasons), then use


TBH I think the common case is to keep the verification in a file, so we can just point at the file name, or require setting an explicit --skip-verification flag for extra safety.

Who is going to try and embed a textproto in a flag in practice? Even if it is allowed, it seems very error prone :)

What if the binary digest is part of the verification, then the textproto changes on every call. Also, it is likely called by another script, so we cut the disk access and the temporary files. Yes, I think it's about calling this directly or by another script. In the former case, you'd prefer passing a file, in the latter case, passing it directly.

It is a long command line if there are many checks, but I really liked verifying the counts directly because it is expressive (OTOH you see the number of URIs passed in the other argument):

--verification_options="provenance_count_at_least { count: 2 }"

--skip_verification is a good idea.

tiziano88 · 2023-09-19T21:10:37Z

internal/verifier/verifier.go

@@ -1,4 +1,4 @@
-// Copyright 2022 The Project Oak Authors
+// Copyright 2022-2023 The Project Oak Authors


I don't think these need to be updated at all

I read up: It is either "single year" or "year range", and somehow I was biased towards the year range. When using single year, then it is the year of publication. When using year range, the first year is the publication, and the second is when it was last revised.

I could find no recommendation as when to use either one. So we use the single year variant throughout?

I have only ever seen single year. https://goto.google.com/copyright#the-year says that the multi year range is neither necessary nor encouraged, so I'd just leave everything with the initial year.

tiziano88

I'd like to take a step back for a moment.

Conceptually, is it correct to say that the tool does the following steps?

download zero or more provenance files
run some predicates / tests on such files (checking some properties about them, possibly related to some binary somewhere)
if successful, generate an unsigned provenance file
- this file may then be passed to some other tool / process for the actual signing

thmsbinder · 2023-09-21T11:06:30Z

First and second bullet point are all correct.

For the third, I am confused by the term "unsigned provenance file". Shouldn't it be "unsigned endorsement"? The output file is endorsement.json, containing a header ("_type": "https://in-toto.io/Statement/v0.1", "predicateType": "https://github.com/project-oak/transparent-release/claim/v1"), identification of the underlying binary, endorsement timestamps, and a link to the provenances.

The endorsement follows this spec:

https://github.com/in-toto/attestation/tree/v0.1.0/spec#statement

tiziano88 · 2023-09-21T11:22:11Z

Ah yes that's my bad, I certainly meant unsigned endorsement (not provenance).

Anyways the point I'm getting at, is that there are really 3 steps, and the only reason they currently are in the same binary is that the first step determines whether the rest of the steps should run (which is a good reason, so nothing gets signed unless it's been verified etc.). But this has its own problems, e.g. flags to skip one or more of the steps, or change their semantics. Or if we wanted to run the checks as a stand-alone test in google3. In fact, will we ever use this externally? AFAICT we will only need it internally, and probably as part of a script to run on someone workstation or a Rapid workflow or something like that. Is that correct?

thmsbinder · 2023-09-21T11:38:08Z

I assumed that it is only on github, written in golang, and all three steps in one binary, in order to enhance/harden the story around security. If it were not for the security story, we could give up the entire transparent-release repository.

More important are the data formats. I wonder if JSON is a good choice. It is flexible but rather fuzzy. How much should we lean on in-toto? Probably in-toto exists somewhere internally, but haven't checked.

The Go-Python interface used internally is just to call the Go binary from Python, which is OK but not great.

Summarizing, it would be much more convenient to have it all internal (and maybe not in Go).

tiziano88 · 2023-09-21T11:56:10Z

A couple of notes:

We still need the SLSA provenance generation in open source (not sure how much of that is in SLSA now vs here though).

We also need the endorsement format to be a standard (in-toto seems sensible for that), since our other open source libraries will also need to parse it (e.g. as part of attestation verification). Plus in the future we may want to attach claims to endorsements (e.g. saying what we endorse a binary for, instead of just endorsing it globally).

But for the parts between provenance verification and endorsement generation, we have more flexibility. And AFAICT we don't expect anyone other than us to actually run those steps. Although it would be nice to one day have a website in which we explain why we endorsed each binary, with either a formal or informal description of the steps we took before endorsing it. But I don't think that's necessary in the short term, so we can keep developing it internally. We should keep the idea of having a "policy" expressed via protobuf and getting a yes / no answer about a binary, so that eventually we can publish the policy as well as the policy verification, but we don't need to literally use the same binary internally and externally. We only need the policy to be inspectable externally, and possibly runnable in some way.

So yes, let's make some pragmatic decisions about where the code should live based on how we plan to use it first, since that's the main priority.

thmsbinder · 2023-09-21T12:22:45Z

Thoughts:

If our basic protos (verification_options, but digest even more) move to internal, no GitHub code can depend on them. In this regard we lock the door behind us. But it should be fine.

About programming language: It is nice to have Go here, but internally we have the readability burden, plus the language interoperability. Therefore, python seems the better choice for internal code.

tiziano88 · 2023-09-21T14:20:00Z

proto/digest.proto

+option go_package = "proto/oak/release";
+
+// Contains various digest formats for the same underlying file.
+message Digest {


The semantics is still not super clear IMO. Does this represent a single digest? Multiple digests that must all match? What if there are conflicting digests in binary and hex format, which one "wins"? IMO this is too flexible / ambiguous and will lead to bugs or vulnerabilities. I would just define a single message for a single digest using a single format, or at the very least have super clear comments describing all the steps to validate this message (but then that will have to be implemented in all the various languages).

I see that the comment actually says that they refer to the same file, but does it mean that one of them must match, or multiple ones? (this is not a new issue introduced by this PR, just something that we have not fully addressed yet)

A single Digest proto refers to a single File. So all digests contained in a single instance of Digest must be compatible in the sense that they come from the same file. Otherwise, it is undefined behavior. The type of digest and encoding is left open. Can specify more than one as needed.

The actual matching is addressed by having a repeated field further up, so if at least one digest in the repeated field matches, then this is a pass.

BTW I am planning to use this as the baseline proto for any Digest. There are a few equivalent versions around. This allows to have reusable utilities for Digest, e.g. matching one Digest against the other.

thmsbinder and others added 3 commits September 7, 2023 22:20

Reworked provenance verification following an all-new verification op…

24e4063

…tions proto definition

Add missing implmementation. Finetuning and beautifications.

a73fad1

Merge branch 'project-oak:main' into verification_options

aa876b9

thmsbinder requested a review from tiziano88 September 19, 2023 11:19

thmsbinder added 6 commits September 19, 2023 14:04

Fixed e2e test.

3eeb31b

Fixed linter issues.

1395d95

Fix linter issues.

cd0ce1b

Fix linter issues.

40328a6

Fix linter issues.

9ab4285

Fix linter issues.

8f1f263

tiziano88 reviewed Sep 19, 2023

View reviewed changes

thmsbinder added 10 commits September 20, 2023 11:02

Use snake case for proto field name.

eae0e44

Read file by shell not via external command.

18e77ea

Add --skip_verification to cmd/endorser.

f286736

Rework Digest proto.

8a03529

Single out and extend Digest proto.

4e5f6ce

Update instructions.

3773fbe

Fix linter issues.

7b81369

Fix proto packages.

d02e8d2

Delete unused proto.

dc4211c

Fix proto path.

cfdfd6c

thmsbinder requested a review from tiziano88 September 20, 2023 12:18

tiziano88 reviewed Sep 20, 2023

View reviewed changes

thmsbinder requested a review from tiziano88 September 21, 2023 11:06

tiziano88 approved these changes Sep 21, 2023

View reviewed changes

thmsbinder merged commit dd29c62 into project-oak:main Sep 22, 2023
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reworked provenance verification. #246

Reworked provenance verification. #246

thmsbinder commented Sep 19, 2023

tiziano88 Sep 19, 2023

thmsbinder Sep 20, 2023

thmsbinder Sep 20, 2023

tiziano88 Sep 19, 2023

thmsbinder Sep 20, 2023

tiziano88 Sep 19, 2023

thmsbinder Sep 20, 2023

tiziano88 Sep 19, 2023

thmsbinder Sep 20, 2023

thmsbinder Sep 20, 2023

tiziano88 Sep 19, 2023

thmsbinder Sep 20, 2023

tiziano88 Sep 21, 2023 •

edited

Loading

tiziano88 left a comment

thmsbinder commented Sep 21, 2023

tiziano88 commented Sep 21, 2023

thmsbinder commented Sep 21, 2023

tiziano88 commented Sep 21, 2023

thmsbinder commented Sep 21, 2023

tiziano88 Sep 21, 2023

tiziano88 Sep 21, 2023

thmsbinder Sep 21, 2023

	--verification_options="$(cat /tmp/ver_opts.textproto)"
	--verification_options="$(</tmp/ver_opts.textproto)"

		@@ -1,4 +1,4 @@
		// Copyright 2022 The Project Oak Authors
		// Copyright 2022-2023 The Project Oak Authors

Reworked provenance verification. #246

Reworked provenance verification. #246

Conversation

thmsbinder commented Sep 19, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiziano88 Sep 21, 2023 • edited Loading

Choose a reason for hiding this comment

tiziano88 left a comment

Choose a reason for hiding this comment

thmsbinder commented Sep 21, 2023

tiziano88 commented Sep 21, 2023

thmsbinder commented Sep 21, 2023

tiziano88 commented Sep 21, 2023

thmsbinder commented Sep 21, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tiziano88 Sep 21, 2023 •

edited

Loading