Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for the SCAI predicate #112

Merged
merged 4 commits into from
Jan 11, 2023

Conversation

marcelamelara
Copy link
Contributor

@marcelamelara marcelamelara commented Oct 13, 2022

The Software Supply Chain Attribute Integrity, or SCAI (pronounced "sky") data format is designed to capture functional attribute and integrity information about software artifacts and their supply chain. SCAI data can be associated with executable binaries, statically- or dynamically-linked libraries, software packages, container images, software toolchains, and compute environments.

Why do we need this? Existing supply chain data formats do not capture any information about the security functionality or
behavior of the resulting software artifact, nor do they provide sufficient evidence to support any claims of integrity of the supply chain processes they describe. The SCAI data format is designed to bridge this gap.

This PR addresses #76 . An example use case for SCAI is described at: #2 (comment)

},
"conditions": { /*object */ }, // optional
"evidence": { /* object */ } // optional
}]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking through this, we would want to include timestamps if there is an attribute that is only valid for a period of time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far, I've been thinking of attributes being quite static, but this is a really interesting point. Do you have a specific example in mind where an attribute might expire?

Copy link
Member

@pxp928 pxp928 Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking through this for GUAC and wanted to use SCAI to certify an artifact has been scanned (contains vulns or not). The attestation being valid for a period of time before it expires and needs to be re-ceritified (re-checked for vulnerabilities).

Certifier: The certifier creates attestations on the knowledge graph that helps translate queries about negative statements into queries on positive statements by certification. For example, instead of asking “Does artifact A have vulnerabilities?” to “Has artifact A been certified by a vulnerability scanner?”, the certifier takes in documents like OSV, VEX, bad actors and periodically (or via trigger), queries the graph to determine answers to these questions and does a certification in the knowledge graph.

Copy link
Member

@pxp928 pxp928 Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An example that I just made up and testing with:

{
  "_type": "https://in-toto.io/Statement/v0.1",
  "subject": [
      {
        "name": "git://github.com/kubernetes/kubernetes", 
        "digest": {"sha1": "5835544ca568b757a8ecae5c153f317e5736700e"}
      }
  ],
  "predicateType": "http://in-toto.io/attestation/scai/attribute-assertion/v0.1",
  "predicate": {
      "producer": {
        "type": "guac",
        "id": "guecsec/guac"
      },
      "attributes": [{
        "attribute": "scanned",
        "evidence": {
            "scanner": {
                "type": "OSV",
                "id": "osv.dev"
            },
            "results": [
              {
                "OSVID": "GHSA-jfh8-c2jp-5v3q"
              }],
            "date": "2022-10-03T12:00:00Z"
        }
      }]
  }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, thanks for this example. This kind of certification process is definitely something we want to be able to support. So if I understood correctly, it's not necessarily that a certifier's attestation expires after X amount of time. Rather the problem is, how do we indicate that one attestation is no longer valid because it is replaced or superseded by a more recent one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks! I'd definitely like to hear other's thoughts or use cases for having an optional date or timestamp field at the predicate level, too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a timestamp makes sense to me. The one tweak I'd suggest is being more specific about the name. date could mean a number things, instead should it be named something like scannedOn, collectedOn, generatedOn. This is discussed a bit here. Thoughts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed. scannedOn makes more sense in this case and follows the convention.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a predicate-level field, I think something like generatedOn makes sense to keep it more generally applicable to non-scanning use cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should add a generatedOn for this release and change it as necessary for later releases? What do you think?

@marcelamelara
Copy link
Contributor Author

The latest commit incorporates feedback from everyone so far:

  • Need for producer information (this is part of the SCAI spec, but was previously omitted for simplicity)
  • Clearer examples showcasing the use of the target and evidence fields.

"conditions": { /* object */ }, // optional
"evidence": { /* object */ } // optional
}],
"producerAttributes": [{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we not also capture the type/URI or ID of the producer to determine the schema of the evidence object? Based on our last call, we decided to tie the schema of the evidence object to the producer so that it does not remain arbitrary.

Copy link
Contributor Author

@marcelamelara marcelamelara Nov 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we discussed this a little on today's call. The evidence isn't actually necessarily tied to the producer of the SCAI predicate, as it may come from a third-party auditor/scanner etc.

That said, for the producer, I do think it would be useful to have some sort of identifier field for the producer. The main reason I haven't added it in explicitly, is because I think different use cases will identify the producer differently. For some, the producer's public key may be enough, others may want a URI, as you said. I wasn't sure how to encode these different use cases in a general manner.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pxp928 I wanted to ping you about this again as I try to come up with a way to capture a type and ID for the producer. The producer of the subject is not necessarily the producer of the evidence, so we shouldn't encode this expectation into the predicate. But are there other use cases for having a type or ID for a producer? And what would the format of these fields be?

I have a few different ways to think of a producer of a SCAI predicate:

  • The producer is a build system that is making assertions about the attributes of subject artifacts. In this case, the type might be the URI of the build service (e.g. GHA), and the ID the specific build job.
  • The producer is the actual compiler that built the subject. In case, the ID might describe the gcc binary itself. But I might also argue that we could describe the compiler in the target field of the producer.attributes object instead. In any case, what does the type/URI point to for a binary?

These are only two cases, but I would want to be able to capture these and other cases. And, I'm still trying to understand what a consumer of the predicate would do with this information.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are a few examples. I also saw your example above with GUAC as the producer. Is GUAC a tool, a service or something else in this case?

Producer is GHA:

{
     "_type": "https://in-toto.io/Statement/v0.1",
     "subject": [{
        "name": "my-app",
        "digest": { "sha256": "78ab6a8..." }
    }],
    "predicateType": "scai/attribute-report/v0.1",
    "predicate": {
        "subjectAttributes": [{
             "attribute": "WITH_STACK_PROTECTION",
             "conditions": { "build-flags": "-fstack-protector" }
        }],
        "producer": {
            "type": "https://url-to-gha",
            "id": "https://example-builder.com/user/repo/actions/runs/runID"
            "attributes": [{ 
                "attribute": "ATTESTED_BUILD",
                "evidence": {
                    "name": "my-app-slsa-provenance",
                    "digest": { "sha256": "4567890..." },
                    "locationURI": "http://example.com/rekor-instance",
                    "objectType": "application/vnd.in-toto+json"
                }
            }]
        }    
    }
}

Producer is tool:

{
     "_type": "https://in-toto.io/Statement/v0.1",
     "subject": [{
        "name": "my-app",
        "digest": { "sha256": "78ab6a8..." }
    }],
    "predicateType": "scai/attribute-report/v0.1",
    "predicate": {
        "subjectAttributes": [{
             "attribute": "WITH_STACK_PROTECTION",
             "conditions": { "build-flags": "-fstack-protector" }
        }],
        "producer": {
            "type": "ELF binary",
            "id": {
                "name": "gcc9.3.0",
                "digest": { "sha256": "78ab6a8..." },
                "locationURI": "http://us.archive.ubuntu.com/ubuntu/pool/main/g/gcc-defaults/gcc_9.3.0 1ubuntu2_amd64.deb"
            }
            "attributes": [{ 
                "attribute": "ATTESTED_BUILD",
                "evidence": {
                    "name": "my-app-slsa-provenance",
                    "digest": { "sha256": "4567890..." },
                    "locationURI": "http://example.com/rekor-instance",
                    "objectType": "application/vnd.in-toto+json"
                }
            }]
        }    
    }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So my comment was tied to when we spoke that the evidence field schema would be tied to a producer. In the latter meeting, we discussed this would not be the case. I agree the evidence filed would be different based on the type of producer (tool, service...etc.). Maybe we hold off for v0.1 but what you have for now should be sufficient:

        "producer": {
            "type": "<TYPE URI>",

spec/predicates/scai.md Outdated Show resolved Hide resolved
@marcelamelara marcelamelara requested a review from pxp928 December 15, 2022 16:17
Signed-off-by: Marcela Melara <[email protected]>
Copy link
Member

@pxp928 pxp928 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM for v0.1! Thanks @marcelamelara for the hard work on this.

@mikhailswift mikhailswift merged commit dad0835 into in-toto:main Jan 11, 2023
@marcelamelara marcelamelara deleted the scai-predicate branch January 26, 2023 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants