Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update prometheus-federator to use dependencies from k8s 1.32.X+ #141

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

alexandreLamarre
Copy link
Contributor

@alexandreLamarre alexandreLamarre commented Dec 27, 2024

No description provided.

Comment on lines 61 to 60
if os.Getenv("MANAGE_CRD_UPDATES") == "true" {
updateCRDs = true
}
managedCRDs := common.ManagedCRDsFromRuntime(f.RuntimeOptions)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I simplified the management of CRDs here by reading from the runtime configuration file.

  • We manage (aka update) helm-locker CRDs IFF we are running an embedded helm-locker
  • We manage (aka update) helm-controller CRDs IFF we are running an embedded helm-controller

IMO this was an oversight from the original maintainers of the operator, that they blindly redeployed all CRDs even if we weren't running the matching embedded controller

Copy link
Contributor Author

@alexandreLamarre alexandreLamarre Jan 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can extend the tracker abstraction to customize the CRD objects before they are created, but I didn't see a need to do this yet, it becomes messy if there needs to be breaking changes to what fields are indicative of the CRD being managed by us or someone else

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only major concern with the summary provided is, if I'm a user who:

  • Wants to enable PromFeds helm-controller, but
  • NOT have prom-fed manage the CRDs

Is that still possible? Or are we drawing a line in the sand with this PR, so if you use PromFeds helm-controller then PromFed will always do the CRDs? In result, for k3s/rke2 users they should always disable helm-controller?

I understand that doing this makes our compatibility story much more simple. So that's fine if that's the case, we just need to make sure that's clear in release notes (and values.yaml descriptions) probably.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect it's probably rare that customers have multiple helm-controllers for necessary performance reasons. So maybe it's not a big deal to eliminate the option to use the PromFed helm-controller in an environment where another one exists?

And even then, maybe the correct solution is that customers with helm-controller external to PromFed needing 2, should just run 2 external ones of the same version.

Since that's much more simple than having to consider a crazy huge compatabiity matrix of what versions of PromFed work with what versions of helm-controller (and there by versions of k3s/rke2).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added back the crd flags you introduced in the charts, and they work the same as they did before for compatibility reasons.

The main difference is that we discourage the use of those flags now, since helm-controller should be disabled in rke2/k3s and we don't manage crds for things we don't run

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The managed-by annotation doesn't work as you would think it does -- whichever helm controller picks up the object first controls the object but it was never fleshed out beyond hardening currently picked up resources

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that annotation was a stepping stone towards allowing for concurrent helm-controllers/ helm-lockers / helm-project-operators but you can set up some tests to see it won't behave consistently

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 Yeah I have studied the code and tested it many times so I'm confused about the disconnect. Maybe it would help if we pair on this tomorrow and I can show how it works in my lab. As far as proving this via testing, I didn't think that was the scope of the discussion. I'm open to us trying to cover that in e2e testing...but also feels tangential to my current point.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh - I may see where I'm confused now. The ground work is there but the locks may be blocking that functionality. I'll sync up with @jbiers on this to see what we can sort out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really don't want to make it a non-singleton since rancher-monitoring is a singleton, and its companion app prometheus-federator is also a singleton

Users have been using it in this manner for a while, and the edge cases where we want a non-singleton are only clear to me if we use the underlying helm-project-operator to support multiple charts which is IMO not likely to happen at all since it isn't flexible and the tenancy around projects and specifically the helm-project-operator is hacky at best

@alexandreLamarre alexandreLamarre force-pushed the full-update branch 5 times, most recently from 0c2cdc4 to 2babdd2 Compare January 8, 2025 16:36
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
slightly rework crd management

Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
run go mod tidy

Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
alexandreLamarre and others added 3 commits January 13, 2025 14:55
Comment on lines +14 to +20
func Data(filename string) []byte {
data, err := fs.ReadFile(dataFs, path.Join("build", filename))
if err != nil {
panic(err)
}
return data
}
Copy link
Contributor Author

@alexandreLamarre alexandreLamarre Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Intentionally making this public in a public package so we can import a specific version of prometheus-federator and see the charts vendored within it.

It doesn't impact the immutability of the charts build into the binary, but allows alternate visibility into what chart is built into what commit of prometheus federator without having to deploy an image.

I'm open to reverting this improvement since it isn't necessary but thought it might be a useful change long term.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See pkg/chart/data.go, but I think the overhead of ~12Kb files with hopefully small diffs aside from major version bumps of project-monitoring should not impact the usual git operations too much based on my experience...

Especially since we plan on removing the chart hosting to a separate repository, I think the net impact will still be negative on the size/ amount of objects managed by git

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ehh. I'll have to digest this for a bit. My instinct is that I'll probably revert this with chart-split PR anyway. But maybe I can find a mental model where this makes sense.

Big picture the expectation we should have with rancher-project-monitring is that anytime rancher-monitoring changes, then rancher-porject-monitoring will automatically be changing too.

And if we ship rancher-monitoring we should also be shipping PromFed+RPM at the same cycle. This way we directly eliminate the lag between the two which causes us to use a mixture of images and generally skews CVE numbers.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When shipping an image or application logic it is more useful to see and stage the diffs and manually update them or open an automation PR with the diffs visible

Having the go:embed files ignored is bad practice IMO

Copy link
Contributor Author

@alexandreLamarre alexandreLamarre Jan 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I don't think automation will ever get to the point where rebasing monitoring will out of the box rebase project-monitoring since the resources tend to change alot between 5-10 major versions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The point of this is not to get in the way of any chart build or automation improvements but to have clear tracking of what gets built into the applications)

Signed-off-by: Alexandre Lamarre <[email protected]>
Comment on lines +32 to +37
# FIXME: causes a git diff because we add git tagged versions to the helm-chart metadata. I think we can omit this
# from builds since the helm-project-operator is a sub-chart and end-users shouldn't look to
# this metadata for clarifying information since the version of the prometheus-federator version chart is the one
# indicating changes from now on
# edit-charts "./build/charts/${BUILD_TARGET}/Chart.yaml" "${HELM_CHART_VERSION}" "${HELM_IMAGE_TAG}"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% about this, but I believe it is better to edit the version values of the embedded helm-charts before building the images (in a separate commit/PR) for clarity.

The prometheus-federator chart itself can have sed/perl auto-inject versions when building though, that I don't mind

Comment on lines +34 to +39
# FIXME: causes a git diff because we add git tagged versions to the helm-chart metadata. I think we can omit this
# from builds since the helm-project-operator is a sub-chart and end-users shouldn't look to
# this metadata for clarifying information since the version of the prometheus-federator version chart is the one
# indicating changes from now on

# edit-charts "build/charts/${CHART}/Chart.yaml" "${HELM_CHART_VERSION}" "${HELM_IMAGE_TAG}"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% about this, but I believe it is better to edit the version values of the embedded helm-charts before building the images (in a separate commit/PR) for clarity.

The prometheus-federator chart itself can have sed/perl auto-inject versions when building though, that I don't mind

Signed-off-by: Alexandre Lamarre <[email protected]>
@alexandreLamarre alexandreLamarre changed the title Update prometheus-federator to use dependencies from k8s 1.30+ Update prometheus-federator to use dependencies from k8s 1.32.X+ Jan 14, 2025
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
Signed-off-by: Alexandre Lamarre <[email protected]>
@alexandreLamarre alexandreLamarre marked this pull request as ready for review January 15, 2025 15:39
@alexandreLamarre alexandreLamarre requested a review from a team as a code owner January 15, 2025 15:39
go.mod Show resolved Hide resolved
github.com/onsi/ginkgo/v2 v2.22.2
github.com/onsi/gomega v1.36.2
github.com/rancher/lasso v0.0.0-20241202185148-04649f379358
github.com/rancher/wrangler/v3 v3.2.0-rc.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, this needs to use rc.2 and the go version thing will go back to .0

@@ -190,6 +189,9 @@ func Register(ctx context.Context, systemNamespace string, cfg clientcmd.ClientC
chart.Register(ctx,
systemNamespace,
opts.ControllerName,
// this has to be cluster-admin for k3s reasons
"cluster-admin",
"6443",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also bumping the embedded helm controller requires a reference to the api server port. Since the embedded helm-controller runs within k3s it has access to the apiserver port on localhost. I doubt it works as intended when its run embedded and I didn't have the bandwidth to test that.

github.com/caarlos0/env/v11 v11.1.0
github.com/google/uuid v1.6.0
github.com/hashicorp/go-multierror v1.1.1
github.com/k3s-io/helm-controller v0.16.6-0.20241210112214-b40937ee695b
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be v0.16.5 as that matches the version used by the stable release of k8s 1.32 for k3s/rke2.

@mallardduck
Copy link
Member

BTW - I just realized that since this is targeting 1.32 we should ideally wait to ship this along with Rancher 2.11 that officially supports 2.11. And we're using RC libraries intended for Rancher 2.11 which feels wrong to include this soon. Because we don't do branching right now we can't merge this into main and expect it to not affect the February release cycle.

So what if you changed this to target 1.31 and Rancher 2.10 based libraries that way it doesn't have to be something we only do in 2.11. Otherwise we have to some how implement branch strategy and do partial reverts before the February release cycle.

@alexandreLamarre
Copy link
Contributor Author

@mallardduck The intention was always to bump dependencies as aggressively as possible and branch off the main branch, and downgrade dependencies selectively

@mallardduck
Copy link
Member

@alexandreLamarre - I guess that wasn't super clear. Either way works in the end I guess.

However after merging this we need to be careful about what tag is used to test main. If main needs to match rancher main, then that means the tag needs to logically match that version and leave a gaps. That way we have either a major/minor that correlates to the Rancher version, but versions under it that we can use to fill in gaps later.

i.e. After merging this the tag for main becomes 6.x by default, and covers 2.11; so in turn 5.x is for 2.10 and so on.

I think my misunderstanding there also may mean my opinion in change ordering will be different here. Specifically I think we do need to sort out my effort to split charts+image code first. Since having that complete will make us more prepared for the branching once implemented here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants