-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce High Level MR metrics #683
Conversation
Signed-off-by: Dr. Stefan Schimanski <[email protected]>
Signed-off-by: ezgidemirel <[email protected]>
Signed-off-by: ezgidemirel <[email protected]>
pkg/statemetrics/xr_state_metrics.go
Outdated
func (r *XRStateRecorder) Record(ctx context.Context, gvk schema.GroupVersionKind) { | ||
l := &unstructured.UnstructuredList{} | ||
l.SetGroupVersionKind(gvk) | ||
err := r.client.List(ctx, l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't list from a client. List from a cache, i.e. pass in a cache here. Listing can be expensive, imagine 10000 objects.
pkg/reconciler/managed/reconciler.go
Outdated
} | ||
|
||
for _, ro := range o { | ||
ro(r) | ||
} | ||
|
||
// State recorder is started in the background to record MR states. | ||
go r.stateRecorder.Run(context.Background(), schema.GroupVersionKind(of)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never leak go routines from a constructor. This must be started outside with an availablectx
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made state recorders Runnable
and registered to manager to manage go routine's lifecycle with it.
pkg/statemetrics/mr_state_metrics.go
Outdated
func (r *MRStateRecorder) Record(ctx context.Context, gvk schema.GroupVersionKind) { | ||
l := &unstructured.UnstructuredList{} | ||
l.SetGroupVersionKind(gvk) | ||
err := r.client.List(ctx, l) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also here: pass a cache please to make sure this does not hit the apiserver. This can be a very heavy operation, listing thousands of MRs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be sufficient to accept client.Reader
and add a comment letting the caller know it must be cache-backed?
I don't feel strongly - mostly wondering if there's some way to do this up at the controller-runtime level of abstraction without moving down into client-go things. Would be nice, but not critical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about passing the following option to the NewManager
function in provider's main.go
?
mgr, err := ctrl.NewManager(ratelimiter.LimitRESTConfig(cfg, *maxReconcileRate), ctrl.Options{
Client: client.Options{Cache: &client.CacheOptions{Unstructured: true}},
...
})
This makes shouldByPassCache
method to return false in the List call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I no longer use unstructured list, so we'll use client's cache.
7c1aaab
to
54a8995
Compare
Thanks @ezgidemirel! Still needs some work per @sttts's comments but I like this direction much better. |
597a61f
to
535d7fa
Compare
Signed-off-by: ezgidemirel <[email protected]>
Signed-off-by: ezgidemirel <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"gvk"
seems to be misplaced in these instances.
@ezgidemirel Your PR description mentions some |
I was planning to add the state metrics with this PR, but then removed with this commit. I'll update the PR description. |
Signed-off-by: ezgidemirel <[email protected]>
|
||
mrs := mrList.GetItems() | ||
if len(mrs) == 0 { | ||
return nil |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Issue: This early return seems to prevent setting the metric to zero when all resources are deleted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what we can do at this stage. If we don't have any MRs left, we cannot send the value with the right gvk
. I think it's ok to say that, within a time framed monitoring, we'll display the correct metrics. If there are no MRs with a specific GVK in the last 15 mins, users won't see any metrics for that GVK.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ezgidemirel 🙏
@negz I see all your comments were resolved and we need to get this in to be able ship this functionality with the upcoming provider releases (i.e. tomorrow), so, I'll merge it now. Feel free add more comments if there are still open points and we can get them resolved with a follow up PR.
@mergenci, similarly, feel free to open a follow up PR regarding your comment on early return.
Hi @ezgidemirel I am running crossplane on 1.17 and I still can't see the
|
@eduardo-diaz-em, this is a crossplane-runtime PR. crossplane-runtime is used by Crossplane providers. Your provider version and which version of crossplane-runtime it uses is important, rather than your Crossplane version. For instance, provider-upjet-aws uses crossplane-runtime v1.16.0-rc.2.0. I acknowledge that it would be helpful if the documentation clarified distinction between Crossplane and Crossplane provider metrics. |
I'm using same xpkg.upbound.io/upbound/provider-aws-sqs:v1.9.0 which is using crossplane-runtime v1.16.0-rc.2.0.2024 along with crossplane v1.16 same happens to me, I can see all metrics but not the crossplane_managed_resources_* ones @eduardo-diaz-em managed to get this solved? This is my v1.16 chart config : extraObjects:
|
Please ignore, I found the problem was I was looking at crossplane pod metrics, but not at the provider pods ones :) |
Thanks for the quick follow up @pablomdc - if anyone else stops by this PR, one helpful resource is also this demo to use the high level metrics that I had put together a few months back with help from @ezgidemirel: https://github.com/jbw976/demo-metrics |
Buena esa @pablomdc I was also looking at crossplane pods rather than provider ones. Gracias! |
Description of your changes
This pull request introduces the following high level metrics in the MR reconciler and cherry-picks the relevant commit in this PR.
The
crossplane_managed_resource_time_to_first_reconcile_seconds
andcrossplane_managed_resource_first_time_to_readiness_seconds
metrics are pushed only once for each resource, capturing the initial reconciliation time after creation and the first instance the resource becomes ready, respectively. It's important to note that these metrics won't be recorded in the event of a pod restart. One potential solution to this issue is to introduce an additional status subresource, such as "observations," where these values can be stored persistently.Example provider PR which consumes changes-> crossplane-contrib/provider-kubernetes#224
Fixes #
I have:
make reviewable test
to ensure this PR is ready for review.