Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CloudEvents for Runs #4663

Merged
merged 1 commit into from
Mar 15, 2022
Merged

Conversation

afrittoli
Copy link
Member

@afrittoli afrittoli commented Mar 11, 2022

Changes

Emit CloudEvents for Runs. This is achieved by:

  • add a new read-only controller for Runs
  • emit CloudEvents only (no k8s events) on every reconcile of a Run
  • use an ephemeral cache to store sent events across reconcile runs.
    This is required because since the Runs controller only observes
    Runs, it does not have the context to know what was changed in the
    Run and though if a new event is required.

The ephemeral cache logic is largely taken from the same
functionality implemented in tektoncd/experimental/cloudevents

Fixes #3862
Depends-on #4659

Signed-off-by: Andrea Frittoli [email protected]

/kind feature

Submitter Checklist

As the author of this PR, please check off the items in this checklist:

  • Docs included if any changes are user facing
  • Tests included if any functionality added or changed
  • Follows the commit message standard
  • Meets the Tekton contributor standards (including
    functionality, content, code)
  • Release notes block below has been filled in or deleted (only if no user facing changes)

Release Notes

Tekton Pipelines can now generate CloudEvents for Runs. 
This feature is disabled by default. It can by enable by setting a sink URL and flipping the new feature flag "send-cloudevents-for-runs" to "true"

@tekton-robot tekton-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Mar 11, 2022
@afrittoli afrittoli added this to the Pipelines v0.34 milestone Mar 11, 2022
@afrittoli afrittoli mentioned this pull request Mar 11, 2022
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 84.6%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.7%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 14.3%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 82.9% -10.2
pkg/reconciler/events/cloudevent/cloudevent.go 91.2% 91.0% -0.2
pkg/reconciler/events/event.go 95.2% 66.7% -28.6
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 84.6%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.7%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 14.3%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 82.9% -10.2
pkg/reconciler/events/cloudevent/cloudevent.go 91.2% 91.0% -0.2
pkg/reconciler/events/event.go 95.2% 66.7% -28.6
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

@afrittoli
Copy link
Member Author

This PR include one commit from #4659 as it depends on it.
Once #4659 is merge I will remove the extra commit from here.

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 56.5%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.7%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 14.3%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 82.9% -10.2
pkg/reconciler/events/cloudevent/cloudevent.go 91.2% 91.0% -0.2
pkg/reconciler/events/event.go 95.2% 66.7% -28.6
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 78.3%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.7%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 14.3%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 82.9% -10.2
pkg/reconciler/events/cloudevent/cloudevent.go 91.2% 91.0% -0.2
pkg/reconciler/events/event.go 95.2% 66.7% -28.6
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 78.3%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.7%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 14.3%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 82.9% -10.2
pkg/reconciler/events/cloudevent/cloudevent.go 91.2% 91.0% -0.2
pkg/reconciler/events/event.go 95.2% 66.7% -28.6
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 78.3%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.7%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 14.3%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 82.9% -10.2
pkg/reconciler/events/cloudevent/cloudevent.go 91.2% 91.0% -0.2
pkg/reconciler/events/event.go 95.2% 66.7% -28.6
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

Copy link
Member

@lbernick lbernick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be easier to review if split up into one PR that adds the run controller and one that adds the cloud events (sorry to create more work!)

// cacheKey is a way to associate the Cache from inside the context.Context
type cacheKey struct{}

func withCacheClient(ctx context.Context, cfg *rest.Config) context.Context {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there's some duplication here with thewithFakeCacheClient func below that could probably be pulled into a helper that accepts a buffer size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// d, where d represents the state of the system (existing resources) needed for the test.
func getRunController(t *testing.T, d test.Data) (test.Assets, func()) {
t.Helper()
names.TestingSeed()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this line can probably be removed from the individual test cases

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good catch, done

@@ -28,6 +30,8 @@ func SetupFakeContext(t *testing.T) (context.Context, []controller.Informer) {
SendSuccessfully: true,
}
ctx = cloudevent.WithClient(ctx, &cloudEventClientBehaviour)
cacheClient, _ := lru.New(128)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could you please add an inline comment: lru.New(/*bufSize=*/ 128)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, in fact this is not needed anymore, it's handled by injection and I forgot to remove it here

@afrittoli
Copy link
Member Author

I think this would be easier to review if split up into one PR that adds the run controller and one that adds the cloud events (sorry to create more work!)

Hi @lbernick - I'm not sure how you would like to split this further 🙏
The cloud events are added in #4659, but I had to include that commit in this PR because it depends on it - thanks to GitHub that does not support PRs with dependencies.
Once #4659 is merged, I will rebase this and it will only include adding the controller (with the cache).

The only new part in this PR is 749bdba

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 78.3%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.1%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 50.0%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 82.9% -10.2
pkg/reconciler/events/cloudevent/cloudevent.go 91.2% 91.0% -0.2
pkg/reconciler/events/event.go 95.2% 66.7% -28.6
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

@afrittoli
Copy link
Member Author

/test pull-tekton-pipeline-integration-tests

Copy link
Member

@vdemeester vdemeester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small comment, otherwise LGTM 🐯

logger := logging.FromContext(ctx)

cacheClient, err := lru.New(size)
logger.Infof("CACHE CLIENT %+v", cacheClient)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it required in the logs as info or was it a "debug" ? 😛

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, since this happens once per controller start-up, I thought it would be useful to have it at info to know that the cache was initialised. If it should up a second time it means something is very wrong!

@tekton-robot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@tekton-robot tekton-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Mar 14, 2022
@lbernick
Copy link
Member

Hi @lbernick - I'm not sure how you would like to split this further 🙏

I was thinking moving the contents of controller.go and most of run.go into a separate PR (to merge before adding the emitCloudEvents functionality) however I'd be OK with lgtm-ing as is once you rebase this.

@tekton-robot tekton-robot removed the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Mar 14, 2022
@tekton-robot tekton-robot added the size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. label Mar 14, 2022
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 78.3%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.1%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 50.0%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 82.9% -10.2
pkg/reconciler/events/event.go 95.2% 66.7% -28.6
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

@afrittoli
Copy link
Member Author

Hi @lbernick - I'm not sure how you would like to split this further 🙏

I was thinking moving the contents of controller.go and most of run.go into a separate PR (to merge before adding the emitCloudEvents functionality) however I'd be OK with lgtm-ing as is once you rebase this.

Oh, I see. The reason I put this together in one PR is that the only thing the controller does though is to invoke emitCloudEvents, so if moved that in a separate PR I wouldn't be able to test much about the reconcile behaviour in the initial PR.

@lbernick
Copy link
Member

Oh, I see. The reason I put this together in one PR is that the only thing the controller does though is to invoke emitCloudEvents, so if moved that in a separate PR I wouldn't be able to test much about the reconcile behaviour in the initial PR.

I think that's ok-- no need to add more functionality for the purpose of being able to test something :) The main reason I suggested this was for ease of review. However again it's up to you.

@afrittoli
Copy link
Member Author

/test pull-tekton-pipeline-alpha-integration-tests

@afrittoli
Copy link
Member Author

Note that when leader election is enabled, enqueued resources will be split into buckets using an hashing function which tasks namespace/name as input (https://github.com/knative/pkg/blob/ce6f2877cef94d5c91bafe2ce4acba52ee22c1fe/controller/controller.go#L425-L431). This means that the same resource will be consistently re-enqueue to the same controller for every subsequent update, which implies that the local cloudevents cache works fine in the HA multi-master case too.

If leadership for a bucket is transferred to a different controller instance, the local cloudevent cache will be lost, similar to what happens in case of controller restart, hence the caveat in the docs.

@afrittoli
Copy link
Member Author

Thanks @lbernick - it's rebased now - only one commit in the PR :)

@@ -137,11 +139,25 @@ func SendCloudEventWithRetries(ctx context.Context, object runtime.Object) error
if err != nil {
return err
}
// Events for Runs require a cache of events that have been sent
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would you mind adding some tests in cloud_event_controller_test.go to cover the new functionality?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I tested that is in https://github.com/tektoncd/pipeline/pull/4663/files#diff-001bb78547301c75f05f989ea355d11512f0cfa73d378d0f0c75e498cdaa6214R249-R252 by running reconcile twice and checking that events are sent the first time and not the second one.
I don't think unit tests specific to SendCloudEventWithRetries would add much coverage, but I can try and add some if you'd like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be possible to just add a test case similar to this one for sending cloud events for a run?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test added, I hope this look ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add TestEmitCloudEvents as well now

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 78.3%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.1%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 50.0%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 88.6% -4.5
pkg/reconciler/events/event.go 95.2% 66.7% -28.6
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 78.3%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.1%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 50.0%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 88.6% -4.5
pkg/reconciler/events/event.go 95.2% 93.3% -1.9
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

@afrittoli
Copy link
Member Author

Uhm, interesting, it looks like a flake in an existing unit test :S

/test pull-tekton-pipeline-unit-tests

@afrittoli
Copy link
Member Author

/test pull-tekton-pipeline-unit-tests

Emit CloudEvents for Runs. This is achieved by:
- add a new read-only controller for Runs
- emit CloudEvents only (no k8s events) on every reconcile of a Run
- use an ephemeral cache to store sent events across reconcile runs.
  This is required because since the Runs controller only observes
  Runs, it does not have the context to know what was changed in the
  Run and though if a new event is required.

The ephemeral cache logic is largely taken from the same
functionality implemented in tektoncd/experimental/cloudevents

Fixes tektoncd#3862

Signed-off-by: Andrea Frittoli <[email protected]>
@tekton-robot
Copy link
Collaborator

The following is the coverage report on the affected files.
Say /test pull-tekton-pipeline-go-coverage to re-run this coverage report

File Old Coverage New Coverage Delta
pkg/apis/config/feature_flags.go 87.8% 86.0% -1.8
pkg/reconciler/events/cache/cache.go Do not exist 78.3%
pkg/reconciler/events/cache/cacheclient.go Do not exist 7.1%
pkg/reconciler/events/cache/cachefakeclient.go Do not exist 50.0%
pkg/reconciler/events/cloudevent/cloud_event_controller.go 93.1% 88.6% -4.5
pkg/reconciler/events/event.go 95.2% 93.3% -1.9
pkg/reconciler/run/controller.go Do not exist 100.0%
pkg/reconciler/run/run.go Do not exist 100.0%

@afrittoli
Copy link
Member Author

/test pull-tekton-pipeline-alpha-integration-tests

1 similar comment
@afrittoli
Copy link
Member Author

/test pull-tekton-pipeline-alpha-integration-tests

@lbernick
Copy link
Member

/lgtm

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 15, 2022
@tekton-robot tekton-robot merged commit 45b7c37 into tektoncd:main Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/feature Categorizes issue or PR as related to a new feature. lgtm Indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Cloud Events for Runs
4 participants