Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for OCI registries in Flux plugin #5007

Closed
gfichtenholt opened this issue Jul 1, 2022 · 12 comments · Fixed by #4947, #4831, #4932, #5032 or #5070
Closed

Support for OCI registries in Flux plugin #5007

gfichtenholt opened this issue Jul 1, 2022 · 12 comments · Fixed by #4947, #4831, #4932, #5032 or #5070
Assignees
Labels
component/plugin-flux Issue related to kubeapps plugin to manage Helm charts via Flux kind/feature An issue that reports a feature (approved) to be implemented

Comments

@gfichtenholt
Copy link
Contributor

gfichtenholt commented Jul 1, 2022

This is a long-running issue I started a little over a month ago.
Flux CD added support for OCI registries recently
fluxcd/flux2#2597

My work is to expose this functionality to kubeapps end user via flux plugin.

Areas that I worked on so far:

Areas which I have not started yet:

  • Support other kinds of repositories, such as Harbor, Azure CR, AWS CR, etc.
  • Support for k8s docker registry credentials (OCI registries may use them)
  • Support for InsecureTLS flag
  • unit tests. I wrote one integration test so far which is testing (back)end-2-end,
    because I wanted to see what flux engine really does on the server-side
    I know I need to do the client-side tests.

Currently outstanding issues that will require verification and/or code changes when fixed:

I plan to be updating this issue as I make further progress with OCI support in flux plugin

@gfichtenholt gfichtenholt added the kind/proposal An issue that reports a new feature proposal to be discussed label Jul 1, 2022
@kubeapps-bot kubeapps-bot moved this to 🗂 Backlog in Kubeapps Jul 1, 2022
@gfichtenholt gfichtenholt self-assigned this Jul 1, 2022
@gfichtenholt gfichtenholt added the component/plugin-flux Issue related to kubeapps plugin to manage Helm charts via Flux label Jul 1, 2022
@gfichtenholt gfichtenholt moved this from 🗂 Backlog to 🏗 In Progress in Kubeapps Jul 1, 2022
@gfichtenholt gfichtenholt linked a pull request Jul 6, 2022 that will close this issue
Repository owner moved this from 🏗 In Progress to ✅ Done in Kubeapps Jul 9, 2022
@gfichtenholt gfichtenholt reopened this Jul 9, 2022
Repository owner moved this from ✅ Done to 🗒 Todo in Kubeapps Jul 9, 2022
@ppbaena ppbaena moved this from 🗒 Todo to 🏗 In Progress in Kubeapps Jul 11, 2022
@gfichtenholt gfichtenholt linked a pull request Jul 13, 2022 that will close this issue
Repository owner moved this from 🏗 In Progress to ✅ Done in Kubeapps Jul 13, 2022
@gfichtenholt gfichtenholt reopened this Jul 13, 2022
Repository owner moved this from ✅ Done to 🗒 Todo in Kubeapps Jul 13, 2022
@ppbaena
Copy link
Collaborator

ppbaena commented Jul 15, 2022

Hi Greg, could you update the current status of this issue and next steps planned?

@ppbaena ppbaena moved this from 🗒 Todo to 🏗 In Progress in Kubeapps Jul 15, 2022
@gfichtenholt
Copy link
Contributor Author

gfichtenholt commented Jul 16, 2022

Current status: in progress.
Items left to do before I can mark this Closed:

  • Update operation implementation for OCI repos + integration tests.
    Time estimate: 2 days
  • Production code cleanup (removing instrumentation via log.Info()).
    Other tidying up: 2 days
  • Integration test that creates a helm release out of OCI helm repository. Time estimate: 2 days
  • Integration tests: use cases involving TLS: 2 days
  • Regular unit tests (i.e. when k8s isn't running).
    I have so far only written and relied on integration tests running locally (not in CI)
    which was fine until a point. Now I need to write at least some rudimentary CRUD tests that will be run on CI
    Time estimate: 2 days min. The more time I am able to spend the more tests I can write, including some edge scenarios

These items are optional and can be addressed at any point in the future:

@gfichtenholt
Copy link
Contributor Author

  1. OCI Distribution Spec - https://github.com/opencontainers/distribution-spec/blob/main/spec.md#api
  2. Docker Registry HTTP API V2 Spec - https://docs.docker.com/registry/spec/api/#docker-registry-http-api-v2
Registry (1) (2)
ghcr.io yes yes
docker hub official 'registry' image (https://hub.docker.com/_/registry) yes yes
harbor yes no

@gfichtenholt
Copy link
Contributor Author

gfichtenholt commented Jul 20, 2022

Current status:

Repository owner moved this from 🏗 In Progress to ✅ Done in Kubeapps Aug 16, 2022
@gfichtenholt gfichtenholt reopened this Aug 16, 2022
Repository owner moved this from ✅ Done to 🗒 Todo in Kubeapps Aug 16, 2022
@gfichtenholt
Copy link
Contributor Author

gfichtenholt commented Aug 17, 2022

This PR is still in progress with lots of changes coming in, so adding this as comment here rather than creating a new issue: The current redis cache for helm OCI repos is broken. It loads the initial state into the cache just fine, but when the state on the remote changes, such as:

  • a new chart is pushed to repo
  • an existing chart is removed from repo
  • a new chart version (aka tag) is pushed
  • an existing tag is removed from a chart

will result in out-of-sync cache. Basically any change on remote server, the cache will get out of sync with the remote. The reason is that fluxcd chose not to implement a feature where the caller gets notified when source on remote changes (ref fluxcd/source-controller#839). I understand the seriousness of the issue and I am currently considering different fixes for this, such as:

  • disable cache for helm OCI repositories
  • implement logic to poll remote for changes and react (i.e. what flux is already doing)
  • manual resync, including delete/create of helm OCI repo

will update this PR as I narrow the solution down

@antgamdia antgamdia moved this from 🗒 Todo to 🏗 In Progress in Kubeapps Aug 18, 2022
Repository owner moved this from 🏗 In Progress to ✅ Done in Kubeapps Aug 22, 2022
@gfichtenholt gfichtenholt reopened this Aug 22, 2022
Repository owner moved this from ✅ Done to 🗒 Todo in Kubeapps Aug 22, 2022
@ppbaena ppbaena moved this from 🗒 Todo to 🏗 In Progress in Kubeapps Aug 23, 2022
Repository owner moved this from 🏗 In Progress to ✅ Done in Kubeapps Aug 26, 2022
@gfichtenholt gfichtenholt reopened this Aug 26, 2022
Repository owner moved this from ✅ Done to 🗒 Todo in Kubeapps Aug 26, 2022
@ppbaena ppbaena moved this from 🗒 Todo to 🏗 In Progress in Kubeapps Aug 29, 2022
@gfichtenholt
Copy link
Contributor Author

gfichtenholt commented Sep 1, 2022

fyi, flux v. 33 adds HelmRepository.spec.provider. I will need to find out more about it and whether it affects our code or not

update Sept 7. The decision was to go ahead and add this as a feature. Implementation wise, I was thinking this:

  1. setup an integration scenario that requires this feature (probably on GCP, since I've already done a bunch of work there, but I am open to others)
  2. add code to flux plugin to handle this setting. Should be simple as plugin-specific (custom) setting
  3. verify (2) is working with an integration test
  4. when we have cycles, we expose (2) in the UI

@gfichtenholt
Copy link
Contributor Author

Just FYI, this feature mentioned above appears to be something flux authors call "auto-login" or
"contextual login" that basically means using a "secured" HelmRepository without
a secret ref. It is important to understand that in order for this scenario to work,
the workload (i.e. flux controllers and kubeapps) needs to actually run on GKE cluster.
Another way to put it is "Workload Identity can't be used by Pods running on the host network"
ref https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity#limitations

If I had to summarize this approach in one sentence it would something like "instead of
configuring secrets based off of as service access tokens or service JSON keys,
per helm repository, one configures flux source-controller pod with a certain
'workload identity' which is provided by flux to GKE when needed for authentication"

@castelblanque
Copy link
Collaborator

Thanks for the info, Greg, that makes it more understandable. The term "contextual login" or "contextual auth" looks like a good fit for referring to it.

Wondering how we can manage the feature from the UI point of view. For example: user creating a HelmRepository from Kubeapps UI could select the provider from e.g. a dropdown. Initially it could be a fixed list of the options in the provider spec.
But for the experience to be optimal, the allowed options in the dropdown could be only two: 1) generic and 2) the provider in which Flux machinery runs only (e.g. gcp)?
Is the cloud provider something that we could find out from within Kubeapps?
In the multicluster scenario, the UI in which the repo would be created might not have access to the backend of the actual Flux running.

@gfichtenholt
Copy link
Contributor Author

Haven't thought this far ahead yet. As a first cut I was thinking of having users pick from a fixed list. I haven't done enough research to see if it's possible to find out automatically. Flux guys chose to put the burden on the user, so there is that

@gfichtenholt
Copy link
Contributor Author

gfichtenholt commented Sep 17, 2022

Status update: took 3 full days but I have now verified that the feature works as advertised in flux. That is, having the field spec.provider set to "gcp" is sufficient to make the HelmRepository work without a secret ref on GCP. Pretty neat.
The caveat is you have to set up your GKE cluster for workload identity, which is non-trivial and what took me the vast majority of the time. Thinking about how to set up flux plugin integration test for this. Will probably be guarded by a flag. Unset by default. If set, rely on some existing GKE cluster that is pre-created and pre-configured by someone prior to running the test

@gfichtenholt
Copy link
Contributor Author

I believe this issue can be now closed. All planned features have been implemented. If any new features rise we shall open separate PR targeting those

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment