-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor reconcilers and introduce v1beta2 API #311
Conversation
@darkowlzz please delete the v1alpha CRDs, we haven't kept them around for the others controllers, so it's time to remove them here too. |
For status:
canonicalImageName: ghcr.io/stefanprodan/podinfo
conditions:
- lastTransitionTime: "2022-09-12T11:59:49Z"
message: successful scan, found 34 tags
observedGeneration: 1
reason: Succeeded
status: "True"
type: Ready
lastScanResult:
scanTime: "2022-09-12T11:59:49Z"
tagCount: 34
latestTags: '6.2.0, 6.1.8, 6.1.7, 6.1.6, 6.1.5, 6.1.4, 6.1.3, 6.1.2, 6.1.1, 6.1.0, ...'
observedExclusionList:
- ^.*\.sig$
observedGeneration: 1 |
@darkowlzz does this PR solves #166? |
@stefanprodan yes. In image-reflector-controller/controllers/imagerepository_controller.go Lines 185 to 188 in 5b6015b
|
03166ea
to
2b8520b
Compare
Removed alpha APIs and fixed the fuzz test failure along with some other minor changes to reduce the verbosity of the test logs from badger and test registry server. Added a rough implementation of latest tags in ImageRepository status: status:
canonicalImageName: ghcr.io/stefanprodan/podinfo
conditions:
- lastTransitionTime: "2022-09-13T23:23:46Z"
message: successful scan, found 34 tags
observedGeneration: 1
reason: Succeeded
status: "True"
type: Ready
lastScanResult:
latestTags: latest, 6.2.0, 6.1.8, 6.1.7, 6.1.6, 6.1.5, 6.1.4, 6.1.3, 6.1.2, 6.1.1
...
scanTime: "2022-09-13T23:48:52Z"
tagCount: 34
observedExclusionList:
- ^.*\.sig$
observedGeneration: 1 Looks like we may have to remove Also added previous tag in the new ImagePolicy version events to address fluxcd/image-automation-controller#437 by including previous tag in the update event.
When there's an update, it results in event:
This requires introducing a new field in ImagePolicy, status:
conditions:
- lastTransitionTime: "2022-09-13T23:03:29Z"
message: Latest image tag for 'ghcr.io/stefanprodan/podinfo' updated from 5.2.0
to 5.2.1
observedGeneration: 1
reason: Succeeded
status: "True"
type: Ready
latestImage: ghcr.io/stefanprodan/podinfo:5.2.1
observedGeneration: 1
observedPreviousImage: ghcr.io/stefanprodan/podinfo:5.2.0 These are implemented in the last commit. Need some feedback and help in improving the messaging. |
No we don't, we just trim the list to max 10 items without altering them.
I suggest we do:
|
I don't see how this would enhance the data available through https://pkg.go.dev/github.com/fluxcd/image-automation-controller/pkg/update? But, I could very well be overlooking something given my very limited Go knowledge 😉 |
@tun0 you would setup alerting for |
The |
OK so that issue is wrongly placed here, @darkowlzz please move it or create a new one in IAC. |
Okay, my bad. I moved it to image-reflector yesterday because of the misunderstanding. I'll move it to IAC again and add some more context around it. Thanks @tun0 for the clarification. |
2b8520b
to
7d204f8
Compare
Did some polishing of the latest tags and previous - new tag implementation with tests. Everything looked fine on some testing. I need some more help with the emitted events for ImageRepository. A new ImageRepository which is reconciled for the first time emits two events:
The first event is only a K8s event and it's also logged to provide user a signal that reconciliation took place and when the next scan would take place. Since the scanning is done only at specific periods, I think it's useful to log when it'll actually scan whenever a reconciliation runs. The first part of both the events look redundant if one checks the k8s events. Maybe we can change one of them to be different? Also, when ImageRepository is triggered to reconcile before the scan time, maybe due to an update in the interval value or some other configuration, it does a no-op reconciliation and emits K8s event and logs:
Just highlighting this behavior to get any feedback on this. |
This is confusing to me, what configuration are we referring to? If this is a no-op because the tags in the cache match the ones from upstream, then let’s log only:
|
It's referring to the image URL and the exclusion list at the moment. When they are changed, it performs a scan even if it's not time for a scan based on previous results and resets the scan timer. Regarding the no-op reconciliation, saying |
What happens at the next scan if the same tags are fetched, do we still issue an event or do we log only? |
Then it logs and emits k8s only event: |
Let's make that "no new tags found, next scan in 5m" |
8d98d7f
to
77c39d1
Compare
The code should be ready for review now while I'm still working on the spec docs. Regarding the ImageRepository events discussed above, now it only emits single event per reconciliation with different messages. successful scan, found 211 tags This is also sent to the notification controller. On subsequent reconciliation at the scan interval with no new tags: no new tags found, next scan in 1h0m0s This is not sent to the notification controller. It's only logged and recorded as k8s event. On reconciliation triggered before the scan interval without any configuration change that affect the scan result: no change in repository configuration since last scan, next scan in 58m39.516114621s This is also not sent to the notification controller. It's only logged and recorded as k8s event. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add validation to all v1.Duration
fields, example here: #314
4ca8cbd
to
771aa16
Compare
I added a getter for the Exclusion List that's in the ImageRepository API so that we avoid the issues we had with notification-controller new API fields with defaults and did some upgrade testing to see if anything breaks. Sharing the details below so that we can observe and discuss about it. Manual upgrade testWith the current latest version of the controller and API, I created an ImageRepository and an ImagePolicy as below: ---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageRepository
metadata:
name: podinfo
namespace: default
spec:
image: ghcr.io/stefanprodan/podinfo
interval: 10m
---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImagePolicy
metadata:
name: podinfo
namespace: default
spec:
imageRepositoryRef:
name: podinfo
policy:
semver:
range: 5.0.x This resulted in apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImageRepository
...
spec:
image: ghcr.io/stefanprodan/podinfo
interval: 10m
status:
canonicalImageName: ghcr.io/stefanprodan/podinfo
conditions:
- lastTransitionTime: "2023-02-07T18:15:54Z"
message: successful scan, found 41 tags
reason: ReconciliationSucceeded
status: "True"
type: Ready
lastScanResult:
scanTime: "2023-02-07T18:15:54Z"
tagCount: 41
observedGeneration: 1
---
apiVersion: image.toolkit.fluxcd.io/v1beta1
kind: ImagePolicy
...
spec:
imageRepositoryRef:
name: podinfo
policy:
semver:
range: 5.0.x
status:
conditions:
- lastTransitionTime: "2023-02-07T18:16:38Z"
message: 'Latest image tag for ''ghcr.io/stefanprodan/podinfo'' resolved to: 5.0.3'
reason: ReconciliationSucceeded
status: "True"
type: Ready
latestImage: ghcr.io/stefanprodan/podinfo:5.0.3
observedGeneration: 1 Updated the API and the controller, not the object manifests themselves, the objects became: apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImageRepository
...
spec:
exclusionList:
- ^.*\.sig$
image: ghcr.io/stefanprodan/podinfo
interval: 10m
provider: generic
status:
canonicalImageName: ghcr.io/stefanprodan/podinfo
conditions:
- lastTransitionTime: "2023-02-07T19:18:01Z"
message: 'successful scan: found 41 tags'
observedGeneration: 1
reason: Succeeded
status: "True"
type: Ready
lastScanResult:
latestTags:
- latest
- 6.3.3
- 6.3.2
- 6.3.1
- 6.3.0
- 6.2.3
- 6.2.2
- 6.2.1
- 6.2.0
- 6.1.8
scanTime: "2023-02-07T19:18:01Z"
tagCount: 41
observedExclusionList:
- ^.*\.sig$
observedGeneration: 1
---
apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
...
spec:
imageRepositoryRef:
name: podinfo
policy:
semver:
range: 5.0.x
status:
conditions:
- lastTransitionTime: "2023-02-07T19:18:01Z"
message: Latest image tag for 'ghcr.io/stefanprodan/podinfo' resolved to 5.0.3
observedGeneration: 1
reason: Succeeded
status: "True"
type: Ready
latestImage: ghcr.io/stefanprodan/podinfo:5.0.3
observedGeneration: 1 NOTE: Both the objects got converted to v1beta2 and some new default fields were added. In case of autologin, the user will have to make sure that the automatically populated Updated the ImagePolicy semver range to see some new status fields: apiVersion: image.toolkit.fluxcd.io/v1beta2
kind: ImagePolicy
...
spec:
imageRepositoryRef:
name: podinfo
policy:
semver:
range: 6.x
status:
conditions:
- lastTransitionTime: "2023-02-07T19:22:15Z"
message: Latest image tag for 'ghcr.io/stefanprodan/podinfo' updated from 5.0.3
to 6.3.3
observedGeneration: 2
reason: Succeeded
status: "True"
type: Ready
latestImage: ghcr.io/stefanprodan/podinfo:6.3.3
observedGeneration: 2
observedPreviousImage: ghcr.io/stefanprodan/podinfo:5.0.3 Previous image is being tracked in the status now which is used to construct update event message that shows the progression of the versions. |
- Removes SetImageRepositoryReadiness() and SetImagePolicyReadiness(). - Adds GetConditions() and SetConditions() to make the image API objects conditions.Setter. - Adds a default value in the ImageRepository.Spec.ExclusionList to exclude .sig artifacts and limits the number of max items in the list to 25 with API validation. - Adds GetExclusionList() for ExclusionList to get the exclusion list with default when empty. - Adds ImageRepository.Status.ObservedExclusionList to record the observed ExclusionList. Signed-off-by: Sunny <[email protected]>
The controllers package is filled with a lot of unrelated code. Organizing them by moving them out of controllers would help manage them and the controllers better. - Move the secrets related code to internal/secret package along with the secret related tests. These tests have testdata which are also moved to the new package. - Test registry server tests are moved to internal/test package. - tlsserver test server and tests are moved to internal/test package. Signed-off-by: Sunny <[email protected]>
Refactor the two reconcilers to add support for kstatus based status conditions and break down the reconciler into smaller easily testable units. It also fixes a lot of hidden bugs in the behavior of the reconcilers. A few of the fixed issues and change in behavior are: - Avoid writing to the ImageRepository spec for adding default exclusion list values. The default is now set via the CRD API. - Bug in the exclusion list filter. Multiple patterns in the list didn't filter the tags for all the patterns. - Updating image or the exclusion list of ImageRepository didn't force scan. It waited for the scan period before scanning. Now, if they are updated, scan is performed immediately. - For an existing ImageRepository, if the image is changed, the old scan result is now deleted as that no longer belongs to the new image. Also, silence badger and test registry server logs. Signed-off-by: Sunny <[email protected]>
Introduce new field in ImagePolicy status, observedPreviousImage, to store the previous image. This is needed to show image update version change in alerts. Introduce new field in ImageRepository status, latestTags, to store the 10 recent tags. This would help users debug image policy better. Signed-off-by: Sunny <[email protected]>
Signed-off-by: Sunny <[email protected]>
In alignment with OCIRepository API in source-controller, introduce per object spec.provider field for setting the contextual login provider option. Deprecate the autologin flags and error log when the flags are used. WARNING: This is a breaking change for the existing users with the flag set. Signed-off-by: Sunny <[email protected]>
- Update the image API go import. - Update the ImageRepository in tests to use spec.provider for setting cloud provider option. - Update test set up to use the CRDs from the current repo, patching the image CRDs downloaded along with the flux install manifest. Signed-off-by: Sunny <[email protected]>
771aa16
to
4346682
Compare
The way I read this, is that this is about latest from the point-of-view of the policy. Our existing Flux v1 notifications show the "actual" previous image, as in: what was actually configured in git for the specific workload. The difference is subtle, but especially for new workloads it could be confusing:
I understand this is a bit of a corner-case, but on the other hand I also kinda treat this as feature parity with Flux v1. |
@tun0 Sounds like a confusion we had before, earlier in the thread. This notification is only for the image policy, independent of the workload. We'll add similar notification in image-automation-controller that'll be specific to the workload and would show the previously used tag that was in git and the new tag. |
Ah, right. Makes sense! |
- ImageRepository spec doc - ImagePolicy spec doc Signed-off-by: Sunny <[email protected]>
Signed-off-by: Sunny <[email protected]>
Signed-off-by: Sunny <[email protected]>
4346682
to
391e876
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @darkowlzz 🥇
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I tested this on GKE with GCR and Artifact Registry and it worked fine.
Thank you @darkowlzz 🌹
What's the status of this? |
@tun0 nothing as of now. We got busy with Flux GitOps GA which included GitRepository, Kustomization and Receiver API only. image-automation-controller was not a part of it. Since we are over that now, the other controllers will get more attention gradually. So, it's one of the next things, but there's no set plans for it yet. Since you're interested, we can request you to try it as we develop that in the coming months. |
@darkowlzz: Thanks for the update. Looking forward to the request 😉 I'll gladly assist where possible. |
This change refactors image-reflector-controller to implement the new unified standards of flux controllers.
It uses the result finalizer from fluxcd/pkg#329 for computing the status conditions of
ImageRepository and ImagePolicy objects such that they are kstatus compliant.
Following are some highlights of the changes:
conditions.Setter
API from https://pkg.go.dev/github.com/fluxcd/pkg/[email protected]/conditions#Setter, along with other minor validation checks in the API fields.controllers/
directory by moving the code that are not directly/strictly related to reconciliation operation into theinternal/
directory, along with their tests..spec.provider
in ImageRepository.The new status conditions under various scenarios can be seen in:
Instructions for testing
To test this, checkout this branch and build the container image with
make docker-build IMG=<image-name>
. Before deploying the container image, update/install the CRDs to v1beta2 API withmake install
or apply the CRD definition manifests inconfig/crd/bases
. Create image objects and test.Fix: #259
Fix: #166