Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make operator workable on disconnected clusters #1234

Merged
merged 15 commits into from
Nov 21, 2023

Conversation

muellerfabi
Copy link

@muellerfabi muellerfabi commented Sep 8, 2023

In order to make the operator useable on disconnected clusters it is required to use the image digest, instead of tag. [1]

Why?
OpenShift uses a so called ImageContentSourcePolicy object that "translate between the image references stored in Operator manifests and the mirrored registry." [2] This only works with image digests - not tags.

The lines I added in Makefile are from an empty operator-sdk skeleton.

[1] https://docs.openshift.com/container-platform/4.12/operators/admin/olm-restricted-networks.html
[2] https://docs.openshift.com/container-platform/4.12/installing/disconnected_install/installing-mirroring-installation-images.html#olm-mirror-catalog-manifests_installing-mirroring-installation-images

@muellerfabi muellerfabi changed the title Updates to Makefile to enable generation of image digests for disconnected clusters Make operator workable on disconnected clusters Sep 8, 2023
Copy link
Collaborator

@NissesSenap NissesSenap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great @muellerfabi , this is something that have been requested for a long time.
I'm not too sure about updating the default grafana image, yes digests is best from a security point of view, but it also lowers the readability by allot.
Since you can overwrite the grafana image on your own, I don't see this as a huge issue in a disconnected env.

We do get a bit of a chicken-and-egg issue though, we normally perform a manual prepare PR that cantinas the updated bundle/manifests file before creating the tag. When creating the tag the container image will be built and published.

We could change our way of working in to cutting the release first and then update the docs, but it will look a bit strange in the commit history.
I'm leaning towards the second way, building the image locally, get the digest and add it to the bundle/manifests file manually before creating the tag.

Could you document how to do this in https://github.com/grafana-operator/grafana-operator/blob/master/PREPARE_RELEASE.md?

Container question that I'm way too lazy to test my self, how does it work with different build CPU architectures? Does that give the same digest?

controllers/config/operator_constants.go Outdated Show resolved Hide resolved
@muellerfabi
Copy link
Author

muellerfabi commented Sep 12, 2023

I need to do some refactoring because spec.relatedImages in CSV should also contain grafana image so that oc-mirror also mirrors that one.
And the added GrafanaDigest is wrong

@muellerfabi muellerfabi force-pushed the enable_image_digest branch 2 times, most recently from f2949e7 to 0c826f5 Compare September 12, 2023 16:11
@muellerfabi
Copy link
Author

Alright, I added another target in Makefile bundle/redhat for generating manifests specially for OpenShift Marketplace with digests. Also the Grafana Image can now be overwritten using env var.

Cannot tell if it is a good idea to keep it as it is, i.e. keeping the CSV with image tags in master like before, but running make bundle/redhat and just copying the updated CSV to the redhat-openshift-ecosystem repo.

Container question that I'm way too lazy to test my self, how does it work with different build CPU architectures? Does that give the same digest?

The image digest should contain all available architectures, because:

$ skopeo inspect --override-os=linux docker://docker.io/grafana/grafana:9.1.6 | jq '.Digest'
"sha256:ff68ed4324e471ffa269aa5308cdcf12276ef2d5a660daea95db9d629a32a7d8"

$ skopeo inspect --raw --override-os=linux docker://docker.io/grafana/grafana:9.1.6
{
   "schemaVersion": 2,
   "mediaType": "application/vnd.docker.distribution.manifest.list.v2+json",
   "manifests": [
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 2203,
         "digest": "sha256:f5518c6c6392bb767813b78d474ed3a228ca0673f1867770d8fd312067abc558",
         "platform": {
            "architecture": "amd64",
            "os": "linux"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 2198,
         "digest": "sha256:892db948b4f33724208035d2abc547ccfc14184ff98de12b595b6c2fa9cb13b2",
         "platform": {
            "architecture": "arm64",
            "os": "linux",
            "variant": "v8"
         }
      },
      {
         "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
         "size": 2198,
         "digest": "sha256:9a11066127b9e8582a270505e9ffc7687ae95bf98358542ced46eac17640ef50",
         "platform": {
            "architecture": "arm",
            "os": "linux",
            "variant": "v7"
         }
      }
   ]
}

...
And bases: are deprecated by kustomize, so I replaced it in three files

@muellerfabi muellerfabi force-pushed the enable_image_digest branch 2 times, most recently from ae0c085 to 8513a19 Compare September 13, 2023 06:57
Copy link
Collaborator

@NissesSenap NissesSenap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay in review.
I have added a few comments, I also have a general question.

I can't decide either on how we should manage the tag vs OLM files.
@grafana-operator/maintainers what do you think?
Should we build the image locally to get the digest and then upload it before releasing the tag?
Or should we create the tag and do a separate PR after where we fix the digest and everything OLM related?

In reality, we don't even need to patch the bundle/manifests file in this repo. But at the same time, it feels good to keep them in sync with the config that we have with the OLM repo.

Makefile Outdated Show resolved Hide resolved
config/manager/manager.yaml Show resolved Hide resolved
@hubeadmin
Copy link
Collaborator

Should we build the image locally to get the digest and then upload it before releasing the tag?
Or should we create the tag and do a separate PR after where we fix the digest and everything OLM related?

This is quite the conundrum, On one hand, I don't like the idea of having our repo-based OLM files out of sync for a release, on the other hand, I don't think we really expect users to use our manifests directly right?

Most of the manifest related use cases are for installation through OLM/OperatorHub.
So maybe we can forgo updating the specific version in our repo based manifests, instead just updating them in the operator hub release repos to the image digest?

So, my proposal essentially is:

  1. Keep the OLM manifests up to date in the repository, but without referencing an operator image version/tag (or just use latest), But always keep them up to date with the current operator regardless (as in, the manifests should always reflect the operator featureset etc. on a release, but just not use a versioned image).
  2. Update the image digest after we cut a release on this repository, update the image digest locally, and send the PR's to OperatorHub/OLM with the image referenced appropriately

@NissesSenap
Copy link
Collaborator

Yes, I'm leaning towards that as well.

We could even create a GitHub actions that automatically runs after the image tag have been generated and create a PR to both the repos.
Then all we need to do is go in and approve it.

@NissesSenap
Copy link
Collaborator

NissesSenap commented Sep 26, 2023

@muellerfabi we had an internal discussion.
We decided that we won't update the digest in this repo anymore. But I can document the new way of working, I will send a new commit for that in this PR.

But if you can take a look at the comments that I made, that would be great.

@NissesSenap
Copy link
Collaborator

So I did a few changes to PREPARE_RELEASE.md but I'm a bit unsure.
Do we really need to run the make generate, make manifests and make bundle?

Aren't all this config generated when running make bundle/redhat?
If not, we probably should just call on one the missing one to make sure that everything is included.

It would also be nice to run make bundle/redhat as a part of this PR, at least on my client I generate lots and lots of changes, and it's probably just a bunch of spaces.

@NissesSenap
Copy link
Collaborator

Created #1261 to automate the OLM release processes after this PR gets merged.

@muellerfabi
Copy link
Author

So I did a few changes to PREPARE_RELEASE.md but I'm a bit unsure. Do we really need to run the make generate, make manifests and make bundle?

Aren't all this config generated when running make bundle/redhat? If not, we probably should just call on one the missing one to make sure that everything is included.

It would also be nice to run make bundle/redhat as a part of this PR, at least on my client I generate lots and lots of changes, and it's probably just a bunch of spaces.

Not sure if make generate is more a part of the releasing process or more development related.
manifests is also called when you run make bundle. So this could be skipped.
Regaring make bundle: I didn't get how you want to proceed. Keep manifests with tags as it is or with digest? My initial idea was to keep this repository as it is and only run make bundle/redhat right before copying the content to the community-operators repo.

Copy link
Collaborator

@NissesSenap NissesSenap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my very slow reply, currently I have lots of things to do.

First of all, great job with the PR.

The only issue I can find is when I run make bundle/redhat, the output shows me.
My guess is that manager in coming from config/default/manager_config_patch.yaml or something like that. Not that it really matters, but it looks ugly ;).
Is it some easy way to get rid of it?

+    - image: docker.io/grafana/grafana@sha256:ff68ed4324e471ffa269aa5308cdcf12276ef2d5a660daea95db9d629a32a7d8
+      name: grafana
+    - image: ghcr.io/grafana-operator/grafana-operator@sha256:54784cb7c79a70740ed48052561c567af84af21f11a6d258060f4c2689a934c4
+      name: manager
+    - image: ghcr.io/grafana-operator/grafana-operator@sha256:54784cb7c79a70740ed48052561c567af84af21f11a6d258060f4c2689a934c4
+      name: grafana-operator-54784cb7c79a70740ed48052561c567af84af21f11a6d258060f4c2689a934c4-annotation

Also, please run make bundle/redhat one time so when we do it locally in the future the diff isn't that big. When I do a release, I normally look at the diffs and see how they look. In general, it's only image and date that is changed, more or less.

@muellerfabi
Copy link
Author

I am afraid, but you probably have to accept the noisy output, as far as i understand the pullspec module does not accept the "-q". Even make -s build/redhat does not help.

The duplicate image entry was related to the containerImage annotation in the CSV. I think it is not required anyway.

While trying to figure out why there are duplicate entries under relatedImages, I stumbled over these things (that are not 100% related to this PR):

  • Is there any reason why CRDs are stored in config/ as well as in config/crds/bases? The only difference I see it that former include descriptions. But latter ones are being bundled.
  • The ControllerManagerConfig from config/manager is generated into a configmap by kustomize and bundled. But it is not mounted / used by the controller deployment, i.e. the patch manager_config_patch.yaml is commented in config/default/kustomization.yaml.
  • I completely deleted the bundle dir and run make bundle/redhat. It showed that the file bundle/manifests/grafana-operator-operator-metrics-service_v1_service.yaml` was no more generated.

@NissesSenap
Copy link
Collaborator

@muellerfabi really nice, @pb82 will give this a try inside OLM and other than that I think we are getting ready to merge it :)
Thanks for your great job, especially since we have been slow on our side.

About your questions that is not related to the PR.

  • Is there any reason why CRDs are stored in config/ as well as in config/crds/bases? The only difference I see it that former includes descriptions. But latter ones are being bundled.

If I remember correctly, we use the one with descriptions when we generate the API.md file, which is the base for our CRD docs. .
Sadly we can't include all these descriptions in the CRD that we deploy to the cluster due to the etcd limit, and we have included the entire deployment object in the CRD, so you can do any changes you want to it.
So we run in to the max limit really quickly.

  • The ControllerManagerConfig from config/manager is generated into a configmap by kustomize and bundled. But it is not mounted / used by the controller deployment, i.e. the patch manager_config_patch.yaml is commented in config/default/kustomization.yaml.

My guess is that it's part of the default yaml when you create an operator using operator-sdk. We just didn't think of it, so we can probably remove it.

  • I completely deleted the bundle dir and run make bundle/redhat. It showed that the file bundle/manifests/grafana-operator-operator-metrics-service_v1_service.yaml` was no more generated.

Probably same as above, part of the default, and I don't think we ship that service by default. But to be honest I don't know.

@smuda
Copy link
Contributor

smuda commented Oct 31, 2023

As a side-note, ImageContentSourcePolicy is deprecated in 4.13 and will be replaced by ImageDigestMirrorSet and ImageTagMirrorSet. It seems ImageTagMirrorSet would be able to handle the main problem?

@muellerfabi
Copy link
Author

What is the status? Any changes required from my side?
@NissesSenap, thanks for the clarification!

As a side-note, ImageContentSourcePolicy is deprecated in 4.13 and will be replaced by ImageDigestMirrorSet and ImageTagMirrorSet. It seems ImageTagMirrorSet would be able to handle the main problem?

That is true, the need to use digests in disconnected environments is now softened. Yet in enterprise environments there is still a desire having a reasonable image.

@NissesSenap
Copy link
Collaborator

That is a good question. Life is of course easier if we don't have to create digests, but as you say, they are never bad to have.

So I guess let's keep it as is. Please update the PR, and we can see if some of the redhat maintainers got time to verify that everything is working with the new OLM config. If so, we should be okay to merge as I see it.

Copy link
Collaborator

@pb82 pb82 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, thanks @muellerfabi
The generated bundle no longer contains a version, but a sha

@pb82
Copy link
Collaborator

pb82 commented Nov 14, 2023

@muellerfabi @NissesSenap we have now two Makefile targets to create a bundle (make bundle and make bundle/redhat). Is the idea to keep to versioned bundles and images? And would we use the digest image directly from the OpenShift internal registry?

@NissesSenap NissesSenap self-requested a review November 14, 2023 13:10
@NissesSenap
Copy link
Collaborator

LGTM, @pb82 will do a final verification, then we should be able to merge this.
Thanks for all your work @muellerfabi, this is one of our oldest requests, and it's awesome to see someone from the community fixing it.

@pb82 pb82 merged commit ade4362 into grafana:master Nov 21, 2023
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants