-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TEP-0089] Enable the signing and verification of TR results and the TR status #6782
Conversation
Skipping CI for Draft Pull Request. |
/kind feature |
The following is the coverage report on the affected files.
|
@jagathprakash Can we add this PR to the tracking issue #6597? Thanks! |
Done. |
The following is the coverage report on the affected files.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
if tr.IsDone() { | ||
trs.TaskRunResults = append(trs.TaskRunResults, taskResults...) | ||
if spireEnabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to repeat this block after line 211 as well for status update after fetching results from sidecar logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thats right. But signing itself is not done for results from sidecar logs. As such if we verify here, it will surely fail.
Results from sidecarlogs will be handled in a separate PR after this PR is done.
- image: ubuntu | ||
script: | | ||
#!/usr/bin/env bash | ||
sleep 20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we sleep for 20 seconds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not needed but does no harm.
The TR hare has been copied from TestTaskRunModificationSpire where we need the 20sec sleep so that we can modify the TR while the TR is still sleeping and hence fail the task when SPIRE is enabled.
/assign |
|
||
NAME VALUE | ||
∙ RESULT_MANIFEST commit,url,SVID,commit.sig,url.sig | ||
∙ RESULT_MANIFEST.sig MEUCIQD55MMII9SEk/esQvwNLGC43y7efNGZ+7fsTdq+9vXYFAIgNoRW7cV9WKriZkcHETIaAKqfcZVJfsKbEmaDyohDSm4= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bit of an inconsistency here:
We document the result name format as:
Name must consist of alphanumeric characters, '-', '_', and must start and end with an alphanumeric character (e.g. 'MyName', or 'my-name', or 'my_name'
But the actual regex used allows for .
I'm not sure what happens if we try to consume a result with a .
in the name - i.e. results.foo.sig.path
, or a more confusing example with results.path.path.path
🤔
We should resolve this ambiguity before we start using the .sig
suffix. /cc @tektoncd/core-maintainers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to resolving this ambiguity. Though FWIW .sig works.
But still for this PR, we could change this _sig or -sig.
WDYT @wlynch
if tr.Status.Annotations == nil { | ||
tr.Status.Annotations = map[string]string{} | ||
} | ||
tr.Status.Annotations[spire.VerifiedAnnotation] = "no" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to status, should we include this annotation? 🤔
- I'm not sure if it gives more information that isn't already available in the status.
- Missing this annotation doesn't tell you if this is verified or not, and older tekton versions won't be setting this.
- This annotation can be modified to not match the underlying state.
pkg/pod/status.go
Outdated
@@ -159,7 +164,28 @@ func MakeTaskRunStatus(ctx context.Context, logger *zap.SugaredLogger, tr v1beta | |||
return *trs, merr.ErrorOrNil() | |||
} | |||
|
|||
func setTaskRunStatusBasedOnStepStatus(ctx context.Context, logger *zap.SugaredLogger, stepStatuses []corev1.ContainerStatus, tr *v1beta1.TaskRun, podPhase corev1.PodPhase, kubeclient kubernetes.Interface, ts *v1beta1.TaskSpec) *multierror.Error { | |||
func setTaskRunStatusBasedOnSpireVerification(ctx context.Context, logger *zap.SugaredLogger, tr *v1beta1.TaskRun, trs *v1beta1.TaskRunStatus, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a little tricky to know what this is doing / modifying from just the function signature.
Should we change this to return a condition rather than mutating via the pointer?
Otherwise consider adding godoc to explain what's going on.
pkg/pod/status.go
Outdated
@@ -159,7 +164,28 @@ func MakeTaskRunStatus(ctx context.Context, logger *zap.SugaredLogger, tr v1beta | |||
return *trs, merr.ErrorOrNil() | |||
} | |||
|
|||
func setTaskRunStatusBasedOnStepStatus(ctx context.Context, logger *zap.SugaredLogger, stepStatuses []corev1.ContainerStatus, tr *v1beta1.TaskRun, podPhase corev1.PodPhase, kubeclient kubernetes.Interface, ts *v1beta1.TaskSpec) *multierror.Error { | |||
func setTaskRunStatusBasedOnSpireVerification(ctx context.Context, logger *zap.SugaredLogger, tr *v1beta1.TaskRun, trs *v1beta1.TaskRunStatus, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tr *v1beta1.TaskRun, trs *v1beta1.TaskRunStatus
Does tr.Status == trs? If so, do we need to pass in both values? If not, when can these differ?
pkg/pod/status.go
Outdated
trs.SetCondition(&apis.Condition{ | ||
Type: apis.ConditionType(v1beta1.TaskRunConditionResultsVerified.String()), | ||
Status: corev1.ConditionTrue, | ||
Reason: v1beta1.TaskRunReasonResultsVerified.String(), | ||
Message: "Successfully verified all spire signed taskrun results", | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The status isn't really authoritative for verification since it could be tampered with. Instead of giving users a field that they may put more trust in than they should, should we just make it easy to verify the signatures from any client? e.g. if we wanted tkn to verify the signatures instead of the controller, how would we recommend people to do it?
3. Modify TaskRun to sign and verify TaskRunStatus using SPIRE (done). | ||
4. Enabling Chains to verify the TaskRun Results. | ||
|
||
When the TaskRun result attestations feature is [enabled](./spire.md#enabling-taskrun-result-attestations) all TaskRuns will produce a signature alongside its results, which can then be used to validate its provenance. For example, a TaskRun result that creates user-specified results `commit` and `url` would look like the following. `SVID`, `RESULT_MANIFEST`, `RESULT_MANIFEST.sig`, `commit.sig` and `url.sig` are generated attestations by the integration of SPIRE and Tekton Controller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if a user tries to emit a result that conflicts with any of these fields or a .sig
suffix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Was discussing this with @chuangw6 and we thought we could a check at validation (validation webhook) to ensure that user does not add any results with name "*.sig", "RESULT_MANIFEST" and "SVID".
c.want.TaskRunStatusFields.Steps[0].ContainerState.Terminated.Message = s | ||
} | ||
|
||
logger, _ := logging.NewLogger("", "status") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want https://pkg.go.dev/knative.dev/[email protected]/logging/testing#TestLogger - this will add any logs to the test context
if tr.Status.StartTime.Time != c.want.StartTime.Time { | ||
t.Errorf("Expected TaskRun startTime to be unchanged but was %s", tr.Status.StartTime) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider modifying the time Comparer instead of doing another check.
kubeclient := fakek8s.NewSimpleClientset() | ||
got, err := MakeTaskRunStatus(ctx, logger, tr, &c.pod, kubeclient, &v1beta1.TaskSpec{}, sc) | ||
if err != nil { | ||
t.Errorf("MakeTaskRunResult: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Fatal (e.g. do you want the test to continue if we fail to make the status)?
@@ -507,6 +544,15 @@ func (c *Reconciler) reconcile(ctx context.Context, tr *v1beta1.TaskRun, rtr *re | |||
} | |||
|
|||
if podconvert.SidecarsReady(pod.Status) { | |||
if spireEnabled { | |||
// TTL for the entry is in seconds | |||
ttl := time.Duration(config.FromContextOrDefaults(ctx).Defaults.DefaultTimeoutMinutes) * time.Minute |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the purpose of this TTL? Does it need to match the pod timeout?
What happens if the pod/task timeout != the default?
…TR status. This PR enables the signing and verification of TR results and TR status. Before this change the spireAPIController object was injected into the TR reconciler but it was not used. After this change, - At the start of every reconcile run, the reconciler will verify if the signature on the status can be verified, else it will error out. - At the end of every reconcile run, the reconciler will sign the status and add it as an annotation. - When TR results are read from the termination message and converted into TR results, they will be verified. This commit is part of a series of PRs to implement TEP-0089. The implementation of TEP-0089 is tracked in the issue tektoncd#6597 SPIRE for non-falsifiable provenance.
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
The following is the coverage report on the affected files.
|
…TR status. This PR enables the signing and verification of TR results and TR status. Before this change the spireAPIController object was injected into the TR reconciler but it was not used. After this change, - At the start of every reconcile run, the reconciler will verify if the signature on the status can be verified, else it will error out. - At the end of every reconcile run, the reconciler will sign the status and add it as an annotation. - When TR results are read from the termination message and converted into TR results, they will be verified. This commit is part of a series of PRs to implement TEP-0089. The implementation of TEP-0089 is tracked in the issue tektoncd#6597 SPIRE for non-falsifiable provenance.
The following is the coverage report on the affected files.
|
@jagathprakash: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
The following is the coverage report on the affected files.
|
4. Modify Tekton Chains to verify the TaskRun Results. | ||
2. Add a configMap which initializes SPIRE (done). | ||
3. Modify TaskRun to sign and verify TaskRunStatus using SPIRE (done). | ||
4. Enabling Chains to verify the TaskRun Results. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we call out other TODOs here i.e. sign other CRD status, sidecar based results.
- Verify the content of `RESULT_MANIFEST` with the field `RESULT_MANIFEST.sig` with the SVID public key | ||
- Verify that there is a corresponding field for all items listed in `RESULT_MANIFEST` (besides SVID and `*.sig` fields) | ||
- Verify individual result fields | ||
- For each of the items in the results, verify its content against its associated `.sig` field |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this step, should we also verify the name of the result is in the verified RESULT_MANIFEST
list?
The spec and TaskRun annotations/labels are not signed when there are valid interactions from other controllers or users (i.e. cancelling taskrun). | ||
Editing the object annotations/labels or spec will not result in any unverifiable outcome of the status field. | ||
|
||
As the TaskRun progresses, the Pipeline Controller will reconcile the TaskRun object and continually verify the current hash against the `tekton.dev/status-hash-sig` before updating the hash to match the new status and creating a new signature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As the TaskRun progresses, the Pipeline Controller will reconcile the TaskRun object and continually verify the current hash against the `tekton.dev/status-hash-sig` before updating the hash to match the new status and creating a new signature. | |
As the TaskRun progresses, the Pipeline Controller will reconcile the TaskRun object and continually verify the current hash against the signature in the annotation `tekton.dev/status-hash-sig` before updating the hash to match the new status and creating a new signature. |
@@ -265,7 +295,7 @@ func createMessageFromResults(results []result.RunResult) (string, error) { | |||
// filterResults filters the RunResults and TaskResults based on the results declared in the task spec. | |||
// It returns a slice of any of the input results that are defined in the task spec, converted to TaskRunResults, | |||
// and a slice of any of the RunResults that don't represent internal values (i.e. those that should not be displayed in the TaskRun status. | |||
func filterResults(results []result.RunResult, specResults []v1.TaskResult) ([]v1.TaskRunResult, []result.RunResult) { | |||
func filterResults(results []result.RunResult, specResults []v1.TaskResult, spireEnabled bool) ([]v1.TaskRunResult, []result.RunResult) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am puzzled by the function itself and the doc. Is the first returned value supposed to be the results that users have declared and should show up in status, whereas the second returned value is supposed to be the internal results and not be displayed in the status?
If so, the description of 2nd returned value is confusing " and a slice of any of the RunResults that don't represent internal values (i.e. those that should not be displayed in the TaskRun status.".
if spireEnabled { | ||
if r.Key == spire.KeySVID || r.Key == spire.KeyResultManifest || strings.HasSuffix(r.Key, spire.KeySignatureSuffix) { | ||
filteredResults = append(filteredResults, r) | ||
continue | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on what you mentioned in the doc, "However, the verification materials are removed from the final results as part of the TaskRun status. It is stored in the termination messages (more details below):".
Sounds like those results that we generated for verification purpose are more like InternalTektonResultType
. I am curious if we should have set those ResultType
as InternalTektonResultType
during the signing process
@@ -33,13 +33,13 @@ import ( | |||
|
|||
"github.com/pkg/errors" | |||
"github.com/spiffe/go-spiffe/v2/workloadapi" | |||
"github.com/tektoncd/pipeline/pkg/apis/pipeline/v1beta1" | |||
v1 "github.com/tektoncd/pipeline/pkg/apis/pipeline/v1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need alias here.
@@ -159,7 +165,28 @@ func MakeTaskRunStatus(ctx context.Context, logger *zap.SugaredLogger, tr v1.Tas | |||
return *trs, merr.ErrorOrNil() | |||
} | |||
|
|||
func setTaskRunStatusBasedOnStepStatus(ctx context.Context, logger *zap.SugaredLogger, stepStatuses []corev1.ContainerStatus, tr *v1.TaskRun, podPhase corev1.PodPhase, kubeclient kubernetes.Interface, ts *v1.TaskSpec) *multierror.Error { | |||
func setTaskRunStatusBasedOnSpireVerification(ctx context.Context, logger *zap.SugaredLogger, tr *v1.TaskRun, trs *v1.TaskRunStatus, | |||
filteredResults []result.RunResult, spireAPI spire.ControllerAPIClient) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does filteredResults mean here?
if tr.IsSuccessful() && spireAPI != nil && | ||
((tr.Status.TaskSpec != nil && len(tr.Status.TaskSpec.Results) >= 1) || len(filteredResults) >= 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This condition is a bit hard to understand. Can we break it down a bit? Maybe something that can be reused on line 183.
i.e.
hasResults := tr.Status.TaskSpec != nil && len(tr.Status.TaskSpec.Results) >= 1
Hey @jagathprakash , |
@@ -6,13 +6,58 @@ weight: 1660 | |||
--> | |||
⚠️ This is a work in progress: SPIRE support is not yet functional |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this still the case after this PR?
/close @chuangw6 and I are hoping to pick this up soon thank you @jagathprakash |
@jerop: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
Enable the signing and verification of TR results and the TR status.
This PR enables the signing and verification of TR results and TR status.
Before this change the spireAPIController object was injected into the TR reconciler but it was not used.
After this change,
This commit is part of a series of PRs to implement TEP-0089. The implementation of TEP-0089 is tracked in the issue #6597 SPIRE for non-falsifiable provenance.
Changes
Submitter Checklist
As the author of this PR, please check off the items in this checklist:
/kind <type>
. Valid types are bug, cleanup, design, documentation, feature, flake, misc, question, tepRelease Notes