-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TEP-0004: Tekton Integrations - Annotations and Statuses #147
Conversation
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
TEP proposal for configuring post-pipeline integrations for Tekton Pipelines via annotations and statuses.
Update from discussion offline: have been discussing similarities b/w this and tektoncd/pipeline#1740, might make sense to have this proposal directly tackle notification related requirements |
Do we need to do something to ensure that updating the Run with new integration update status doesn't trigger new reconciliations for all watchers? For example, a Run starts, and the GitHub status reconciler is notified, and it POSTs to GitHub and updates |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing that is "half" clear for me is what this TEP propose overall:
- standard reserved annotation
*.integration.tekton.dev
- a new integration status field (optional)
- more than this ? (who is responsible for what, …)
As commented (below), I don't see too much value in the integration status part of the *Run
status. Or at least I don't see its value — other than "from a *Run
what integration did run ; but I don't see too much use case for this (enlighten me please 👼), I don't see why we would "couple" integration with the *Run
s.
More generic comments:
- We could use
we
instead ofI
😛 - I think we can, most of the time, use the present tense (instead of the future), as, if the TEP gets accepted and
implementable
, it will be "the present" (aka what it's doing).
**Examples:** | ||
|
||
- GitHub | ||
``` | ||
github.integrations.tekton.dev/owner: wlynch | ||
github.integrations.tekton.dev/repo: test | ||
github.integrations.tekton.dev/sha: 4d2f673c600a0f08c3f6737bc3f5c6a6a15d62e6 | ||
``` | ||
- Webhook | ||
``` | ||
webhook.integrations.tekton.dev/url: https://example.com:8080 | ||
``` | ||
- Slack | ||
``` | ||
slack.integrations.tekton.dev/url: https://tektoncd.slack.com | ||
slack.integrations.tekton.dev/channel: build-cop | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Who is responsible for adding those ? Are those annotation provided by the user (or something like triggers
, or both) or by the integration, or by both ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially, my expectation is the integration would provide the annotation it supports in its documentation, then users would set these fields in their Runs (filling them in with params / TriggerTemplates).
Eventually, I'd like to get to a point where Triggers can do this automatically for users. Initially via ObjectMeta templating, but eventually via interceptors (or possibly some other mechanism) that can also be provided by an integration. e.g. If we know that a Trigger is for GitHub events, and we can filter for these types of events, we should be able to provide the relevant annotations to configure related integration functionality automatically. If users want to configure one-off builds for testing, they can still set the annotations manually.
This also designed in a way that other non-Tekton Trigger mechanisms could also leverage this if they wanted, or we can even mix and match mechanisms to trigger builds and related integrations (e.g. want to build a slack bot that retests pull requests? just set these annotations and you're good to go).
I think the exact details of how Triggers implements these features is a bit hand-wavy at the moment since a lot of it is work actively being discussed / designed, so I didn't want to go too deep in this proposal. The main goal here is to carve out the areas within the Runs API to define this data.
- What integration the status is for (e.g. `github.integrations.tekton.dev`). | ||
- What object generation the status is for. | ||
- What was the outcome of the integration for this object (e.g. success, | ||
failure). | ||
- (optional) More details of status outcome. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to keep in mind is that TaskRun
and PipelineRun
are more or less "immutable", so I am not sure how fully useful the object generation will be as… except from the cancel
field in spec, the rest of the spec won't change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For TaskRun and PipelineRun spec
, I 100% agree. status
is where I see room for flexibility (since these are mutated during Pipeline execution anyway).
Re object/observed generation: I'm not particularly tied to this. I'm just borrowing this from our existing TaskRunStatus type which uses the Knative Status duck type.
1. Push a commit, notice pull request wasn't updated for a run. | ||
2. Look for all Runs with Integration annotation for the repo / commit (if | ||
empty, then you know there was no build kicked off). | ||
3. Find latest Run, `kubectl get` it, inspect integrations status field(s) for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only works when you got access to the cluster (which, might "usually" not be the case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, but I think it's a reasonable assumption to say that users should be able to get at the Run object (i.e. if not through the k8s API, by Tekton Results), and that this will generally be a common pattern for debugging.
Updated to mention results / alternative mechanisms.
pipeline run itself. This gives us an easy mechanism for users to do the | ||
following: | ||
|
||
1. Push a commit, notice pull request wasn't updated for a run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Independently of this (but maybe related), an integration should ensure to do the proper thing whenever the *Run
is created. e.g. for GitHub, as soon as the *Run
is created, the checks that this *Run
provides should be reseted (to pending
). But then, the question is how do you know which checks
this *Run
provides ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an integration should ensure to do the proper thing whenever the *Run is created
Agreed. There should definitely be some common behavior here to reset the state. As a first pass, I think clearing out the integration data should be sufficient, and should allow integrations to pick up a Run as if it was new.
But then, the question is how do you know which checks this *Run provides ?
I don't think we can or should enforce that the Run must declare the integrations it provides (though it declare some integrations in line if it wants to). This is to support integrations that are enforced automatically for users (e.g. CloudEvents, Results).
retryCount: 1 | ||
``` | ||
|
||
IntegrationStatus: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the first tmie we introduce this type / field — the detail about it is lower, so it feels a bit weird (because at that point of reading, I don't know what is this type, where it is and what's is "goal".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I agree this is weird. Rearranged this a bit to not reference the type exactly until later. Let me know if this makes more sense or if I should make more changes.
Similar to `TaskRunResults` and `PipelineResourceResults`, integration statuses | ||
are another form of external artifact generated as a result of the Pipeline. | ||
Because of this, it makes sense to store these statuses as a part of the | ||
Pipeline/TaskRun status, so that it is easy to find and see this data when | ||
something goes wrong. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not entirely sure about this. TaskRunResults
and PipelineResourceResults
are, by definition, results that are coming from the execution of the Task/Pipeline (aka what the task is supposed to do, like clone a git repository, …) through TaskRun/PipelineRun) This would not be the case with integration, as they do not necessary result from the execution of it (aka what the task/pipeline does), but only if it executed or not, …
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I felt that this section was redundant with User Stories, so I took it out for the time being.
The reason why I feel this belongs in the Run status is because these integration actions are directly caused by the Run itself. Storing related data with it helps us couple the data to the object lifecycle, and makes it easy for users to get this information regardless of the tool they use.
I found out that this actually models an existing pattern of subresources (https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/api-conventions.md#types-kinds), so I added another section about this under design details.
|
||
## Design Details | ||
|
||
### Integration Status |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a little bit of trouble around this new field in status and how useful they are from the PipelineRun
/TaskRun
perspective. I would "tend" to think, that I would need to look at the "integration" service/dashboard/… to see what failed and dig from there.
In my head, I am comparing this to the "food supply tracability chain" 🙃. When you buy your cereals, you can trace back to where/when/how each element of your cereals got in (aka where was it planted, where was it transformed, by whom it was shipped, …). From the top, you can get to the bottom — but from the bottom you don't really have a way (nor the need) to get to the top. I see PipelineRun
and TaskRun
as "the bottom" in there (from the integration perspective).
Let's take GitHub status checks as examples, and let's assume a Github integration (à-la commit-status-tracker
). I submit my PR, the checks appear as pending. If there is any failure (before the pod execution, during, …), it's the integration responsability to give the user the information back, pointing to what went wrong — and in there, which PipelineRun it is, it's state, how to dig into it, … The PipelineRun itself doesn't need to know what integration is watching it, and how it handles thing.
In addition, writing the status from the integration "part" means possible race, possible reconciling loop for all the watchers, etc…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I actually agree that users will generally start from the integration dashboard and working their way back to the Run.
That said, I think providing information the other way is equally as important for investigating when things do not work as expected. For example, say the access token we are using for GitHub exceeded quota (or the token was revoked, GitHub is down, etc)- this could manifest as statuses never showing up in GitHub, so we lose that link from the integration back to the Run. We currently don't have a mechanism to let users know that something is wrong.
Today our answer is "go look at the controller logs for errors", but this isn't really a good solution if the user doesn't have access to the pipeline controller (and they likely won't if this is a multitenant environment). Having a place to say: "we tried to update the status for , but we got this error" within a Run helps integrations give users transparency about what is happening as a result of their Task/PipelineRun, even though these things aren't necessarily executing as part of their Run, regardless of the state of the external integration.
I concede that in cases where everything worked this is less meaningful, but I think still provides benefit for consistency to help users verify that something actually happened.
To use your cereal analogy - this is like distributors keeping a log of outgoing shipments to grocery stores, so that if stores claim they never received a shipment, they can prove when the shipment left the warehouse, what was inside, tracking numbers, etc.
### Automatically injecting Tasks | ||
|
||
This alternative keeps the idea of running user tasks inside of the Pipeline, | ||
but we make it easier for users by automatically appending integration tasks to | ||
the end of their pipeline / adding them as finally tasks. | ||
|
||
This has a few drawbacks: | ||
|
||
- We can't call tasks if the pipeline never runs. | ||
- The user pipeline needs to have access to all secrets, which limits what we | ||
can do with shared multitenant secrets. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This describes what PipelineResource are doing 👼
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but with the caveats:
- PipelineResources can't do anything if the Pipeline fails to start (e.g. invalid TaskRefs)
- They can't do anything in response to "accepted, but not running yet" (e.g. GitHub CheckRuns make a distinction between
Queued
andIn Progress
pending statuses) - Requires users to insert all integration lifecycle steps (the eventual goal is for Triggers to make this easier for users).
As an alternative to storing integration status information within a | ||
Task/PipelineRun status, we could store this information in another CRD type | ||
that references back to the run. The advantage of this is that it avoids making | ||
an API change to the Task/PipelineRun object directly and the consequences that | ||
may arise. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need another type for this. It would be on the integration itself to decide how it wants to store/display/… the status.
What do you think if we were to address the notifications feature before we implement this TEP and then revisit it? This TEP is aiming to solve these problems:
My understanding of the notifications feature is that the notifications feature is all about creating integrations with external systems, but not just integrations: integrations with a level of importance greater than other integrations, such as Tasks within a Pipeline. Since notifications are about "notifying", i.e. telling the outside world about something that is happening, the problems you listed above feel to me like legit requirements for the notifications feature, e.g.:
|
@n3wscott You might be interested in this proposal, given your involvement in kubernetes/kubernetes#72637 (comment) and https://docs.google.com/document/d/1b1hRpJpSUE3xIG0v2IHFD0NBaxvRBTBnw3An7oZBV9I 😄 |
Depends on how notifications intend on handling status / receipt of delivery data. To me notifications is just a type of integration, so there's an argument to be made that this should happen soon rather than later so it's consistent with other integrations we add later on. I need to learn more about the current state/plans of the notifications effort to have a more informed opinion about this.
I would say this TEP is aiming to define a space within the Run API to enable other tools to address these problems, but not necessarily solve them directly (but I 100% agree that we should solve them). Even if we're only focused on in-pipeline notifications/actions for the time being, I think there's a benefit for having a space within the Run status / subresource to allow writing of status information and its effect on external services. I'll look into adding a subsection to detail a more concrete example that can leverage this design (likely PR status updates and/or uploading of Tekton Results), but I don't want to get too bogged down by the exact details of the example. |
Without a specific "integration" to talk about, I find it very hard to endorse this proposal. If you start looking at other parts of Tekton Pipelines there are a lot of pieces you could call "integrations", for example:
So this proposal being about "integrations" in general isn't clear to me. I don't think this is a direction we should go in until we have a clear need for it, which I don't see. Any tools outside of Tekton can store information in whatever way they see fit; I'm not convinced they need to be able to mutate Tekton types to do that. One situation which I think would convince me is if we built a tool inside of Tekton that needs it, for example notifications, and I think you've highlighted some very important requirements for that feature, but I don't think it makes sense to address those requirements on their own. |
Side note that the annotations part of the proposal is something that a tool can do today, without requiring us to explicitly accept a TEP about it. |
update Tasks into their Pipelines. In practice, this ends up not being a good | ||
mechanism for ensuring that external integration notifications always occur, for | ||
example: | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+100 to all of these problems!
Annotations / how to run integrations is a separate problem, and can come later. This helps address shorter term ideas with existing Notifications / CloudEvent work, while still enabling additional integrations to build on top of it independently.
Hi everyone! After reading into more of the current CloudEvents implementation, I decided to table the annotations / reconciler bits included in this proposal and instead focus on just integration statuses and the discoverability of this information in Runs. The CloudEvents implementation actually already follows much of the reconciler pattern previously described for some of the concerns raised above (e.g. to run on Pipeline failure) since it's currently baked into the Pipelines controller, with a separate discussion to pull it out into its own controller. I think the piece still missing here is exposing and making this information discoverable to Runs in a way that can also be used for other integrations. I'll likely raise the GitHub annotations/reconciler bits again in the future as a different TEP, reframed towards improving UX for GitHub CI statuses. 😃 |
After some more discussion with @bobcatfish, I'm going to close this proposal. It's still early days for notifications / integrations, and it's not necessarily clear that modifying the Run status / using sub-resources is necessary / the best way to go about representing this. I'll continue to think about how we surface this kind of information (e.g. whether a webhook/status update happened successfully), and see where we can fit this in with other notification efforts. Stay tuned! |
TEP proposal for configuring post-pipeline integrations for Tekton
Pipelines via annotations and statuses.
/cc @afrittoli @dlorenc @imjasonh @bobcatfish @dibyom @bigkevmcd