-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for optional cluster type labels to cluster deployment and cluster provision metrics #1925
Conversation
/retest |
pkg/controller/clusterdeployment/clusterdeployment_controller.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Configs to hiveConfig can be made as following:
"spec": {
"logLevel": "debug",
"metricsConfig": {
"metricsWithClusterTypeLabels": {
"managed_vpc": "api.openshift.com/managed-vpc",
"private_link": "api.openshift.com/private-link",
"sts": "api.openshift.com/sts"
}
},
"targetNamespace": "hive"
},
Screenshot of example metric to ensure it works well with and without optional labels being provided:
@2uasimojo ready for review
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## master #1925 +/- ##
==========================================
+ Coverage 40.50% 40.71% +0.20%
==========================================
Files 366 367 +1
Lines 34090 34239 +149
==========================================
+ Hits 13809 13939 +130
- Misses 19077 19091 +14
- Partials 1204 1209 +5
|
/retest |
/test e2e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't delve into the low-level details yet -- can do that once we've agreed on the more major things discussed inline.
pkg/controller/clusterdeployment/clusterdeployment_controller.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2uasimojo I think this addresses all of your comments. PTAL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Partial review, will continue tomorrow.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another chunk of comments. Still not done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, finally done! Sorry it took so long!
pkg/controller/clusterdeployment/clusterdeployment_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/clusterdeployment/clusterdeployment_controller.go
Outdated
Show resolved
Hide resolved
// Register the metrics to avoid panics during testing | ||
registerMetrics(mapClusterTypeLabelToValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So actually, only TestClusterDeploymentReconcile
(the first suite) panics if metrics aren't registered.
and
You can register them via this call at the top of that function and it works fine.
so
If you intend to add any kind of UT around the metrics configuration/labeling, you would want to be able to tailor the config accordingly in that test -- which you could then do by passing in a custom map
(or, if taking the suggestion above, a custom MetricsConfig
).
If you're not planning on adding that UT now, perhaps just beef up the comment here so future-we don't have to rediscover the above.
That said, I think you could -- and should -- add some unit tests for this PR. This would entail generating mocks for the following interfaces:
- Observer (for
HistogramVec.Observe()
) - ObserverVec (for
HistogramVec.CurryWith()
) - Counter (for
CounterVec.Inc()
) - Aww, shiiit, really?
CounterVec.CurryWith()
isn't associated with an interface??
Welp, I guess you can't UT the CounterVecs 🤷 (I even checked whether a more recent version of the prom golang client lib might have added that interface, but nope. This seemed such a silly omission that I opened an issue for it.)
I mean, I guess we could write our own interface in the hive project. We do this e.g. for the AWS client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics do not need to be registered, but they have to be defined at least - because code never checks whether a metric is registered, it always attempts to observe it, and prometheus libraries handle the rest.
I've changed the wording a bit, and I do intend to implement the unit testing in a follow-up PR
pkg/controller/clusterprovision/clusterprovision_controller_test.go
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2uasimojo Ready for review!
pkg/controller/clusterdeployment/clusterdeployment_controller.go
Outdated
Show resolved
Hide resolved
pkg/controller/clusterdeployment/clusterdeployment_controller.go
Outdated
Show resolved
Hide resolved
// Register the metrics to avoid panics during testing | ||
registerMetrics(mapClusterTypeLabelToValue) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metrics do not need to be registered, but they have to be defined at least - because code never checks whether a metric is registered, it always attempts to observe it, and prometheus libraries handle the rest.
I've changed the wording a bit, and I do intend to implement the unit testing in a follow-up PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if you want to live chat about the threading concern.
d210af8
to
ab05f82
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2uasimojo Ready for review!
PS: I squashed some of the commits, but I did push my changes before the squash - for ease of review
/test e2e ...while I review this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very, very close.
I was gonna say we can land it as is and address my comments in a followon... but the pointer-to-the-map thing would actually affect the API, so we have to get it right out of the gate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2uasimojo Ready for review!
} | ||
} | ||
|
||
// getLabelList combines fixed and optional labels and returns the list without duplicates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While doing this, I thought about how it would be easier to have an early feedback - if a user tries to cause an overlap between these labels. However, there is not simpler way to do it, and we wouldn't want to error out because of metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so a couple things:
- I think we should do this check. We could do it during
Register()
, or we could do it from withinNew*VecWithDynamicLabels()
itself. - ...and I think we should panic if there is an overlap (MustRegister() panics if something is wrong, so it fits the pattern).
- Since it is going to be possible for a hive upgrade to trigger this panic (we add a fixed label that the user already had configured) we should make the message very clear: "HiveConfig.Spec.AdditionalClusterDeploymentLabels[%q] conflicts with a fixed label for metric %s. Please rename your label." kind of thing.
- You may want to use the convenient Set class provided by apimachinery rather than rolling your own via
map[string]bool
.
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a lovely iteration. I think we're really getting somewhere with the design of the objects/methods.
Most of my comments suggest refactoring, consolidation, simplification, cleanup. However, the couple of things still holding this from merging are:
- We need to decide whether we ever allow the empty string as a label value. You have previously asserted that that somehow risks defining a metric with too few labels, and we agreed to always default.
- I like the idea of panicking if the user tries to configure an optional label with the same key as a fixed one.
See inline for details.
Thanks!
pkg/controller/clusterdeployment/clusterdeployment_controller.go
Outdated
Show resolved
Hide resolved
} | ||
} | ||
|
||
// getLabelList combines fixed and optional labels and returns the list without duplicates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so a couple things:
- I think we should do this check. We could do it during
Register()
, or we could do it from withinNew*VecWithDynamicLabels()
itself. - ...and I think we should panic if there is an overlap (MustRegister() panics if something is wrong, so it fits the pattern).
- Since it is going to be possible for a hive upgrade to trigger this panic (we add a fixed label that the user already had configured) we should make the message very clear: "HiveConfig.Spec.AdditionalClusterDeploymentLabels[%q] conflicts with a fixed label for metric %s. Please rename your label." kind of thing.
- You may want to use the convenient Set class provided by apimachinery rather than rolling your own via
map[string]bool
.
func NewCounterVecWithDynamicLabels(counterOpts *prometheus.CounterOpts, | ||
labelOptions ...func(labels *dynamicLabels)) *CounterVecWithDynamicLabels { | ||
counterVecMetric := &CounterVecWithDynamicLabels{ | ||
CounterOpts: counterOpts, | ||
dynamicLabels: &dynamicLabels{}, | ||
} | ||
for _, o := range labelOptions { | ||
o(counterVecMetric.dynamicLabels) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This paradigm seems overcomplicated to me. Why not just
func NewCounterVecWithDynamicLabels(counterOpts *prometheus.CounterOpts, | |
labelOptions ...func(labels *dynamicLabels)) *CounterVecWithDynamicLabels { | |
counterVecMetric := &CounterVecWithDynamicLabels{ | |
CounterOpts: counterOpts, | |
dynamicLabels: &dynamicLabels{}, | |
} | |
for _, o := range labelOptions { | |
o(counterVecMetric.dynamicLabels) | |
} | |
func NewCounterVecWithDynamicLabels(counterOpts *prometheus.CounterOpts, | |
fixedLabels []string, optionalLabels map[string]string) *CounterVecWithDynamicLabels { | |
counterVecMetric := &CounterVecWithDynamicLabels{ | |
CounterOpts: counterOpts, | |
dynamicLabels: &dynamicLabels{ | |
fixedLabels: fixedLabels, | |
optionalLabels: optionalLabels, | |
}, | |
} |
Then we don't need the With*()
functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - didn't you want the Opts pattern here so we can allow for future expansion of this interface if needed? Or did I just completely misunderstand why Opts pattern was needed at all?
Nevertheless, I'm reverting to the original way which you suggest here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the original ask, you only asked for more of a CounterOpts
and HistogramOpts
, and when I was learning about the Opts pattern and how to implement it, I decided to allow for future expansion by implementing a more flexible option. Anyways, I'm over-explaining here. Implemented your change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I see. I hadn't been thinking about the options pattern. Now that I've looked at it in that light, I get the advantages. In particular, if we wanted to add fields to our fancy structure, we would have to add arguments to the constructor and then change all the calls to it. In this case, short term loss (more, and more complicated, code) but long term gain (easier to enhance in the future). If you want to put it back in a fup, I'm down. But for now this is good.
This commit adds labels to indicate if cluster is of type STS, Private-Link or Managed-VPC for ClusterProvision controller metrics
This commit adds labels to indicate if cluster is of type STS, Private-Link or Managed-VPC for ClusterDeployment controller metrics. These labels are added only to those metrics which already report "cluster_type" label.
Allow users to add more cluster-type labels for metrics logging. These labels should be applied to all ClusterDeployment and ClusterProvision metrics that are logged per cluster-type. Optional labels can be provided via metricsConfig section in the hiveConfig. Users can name the label on the metric and assign it to a corresponding clusterDeployment (or provision) label. - While working on these changes, 2 metrics from ClusterProvision controller (metricClusterProvisionsTotal and metricInstallErrors) were moved to clusterProvision\metrics to collate all the relevant metrics and support was added for optional labels. - Registering of metrics was moved later instead of init() time because the latter is executed before we could read the hiveConfig. Consequently the metrics needed to be explicitly registered for unit test purposes to avoid nil panics
- Create a new interface for metrics with dynamic/optional labels and implement observing methods to simplify declaring and logging of the metric - Make cluster_type label a part of optional label logic, so it can eventually be made opt-in via hiveconfig - Rename MetricsWithClusterTypeLabels to AdditionalClusterDeploymentLabels - Always add optional labels (while currying) in sorted order - Remove DefaultClusterType constant from clusterdeployment_types because it was only used at one place that was removed as a part of this commit - Add documentation for metrics - Rename files and package names as per Golang naming conventions - Add Hive Operator metrics in documentation - Add omitempty to metricsConfig fields - Set cluster_type only if config is empty - Ensure metric writes/observes are protected from race conditions
This commit implements some of the changes recommended based on PR comments. Main change is applying Opts pattern to the interface for metrics with dynamic labels and apply a buildLabels helper to ensure there is no overlap between fixed and optional labels as we define and observe the corresponding metrics. A side effect to the buildLabels helper is that we no longer need to build a curried vector for a metric before observing it - we instead build the map for labels to value and observe them via `With` (or `GetMetricWith`) method, which is similar to how we observe metrics without optional label support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@2uasimojo Ready for review
func NewCounterVecWithDynamicLabels(counterOpts *prometheus.CounterOpts, | ||
labelOptions ...func(labels *dynamicLabels)) *CounterVecWithDynamicLabels { | ||
counterVecMetric := &CounterVecWithDynamicLabels{ | ||
CounterOpts: counterOpts, | ||
dynamicLabels: &dynamicLabels{}, | ||
} | ||
for _, o := range labelOptions { | ||
o(counterVecMetric.dynamicLabels) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused - didn't you want the Opts pattern here so we can allow for future expansion of this interface if needed? Or did I just completely misunderstand why Opts pattern was needed at all?
Nevertheless, I'm reverting to the original way which you suggest here.
func NewCounterVecWithDynamicLabels(counterOpts *prometheus.CounterOpts, | ||
labelOptions ...func(labels *dynamicLabels)) *CounterVecWithDynamicLabels { | ||
counterVecMetric := &CounterVecWithDynamicLabels{ | ||
CounterOpts: counterOpts, | ||
dynamicLabels: &dynamicLabels{}, | ||
} | ||
for _, o := range labelOptions { | ||
o(counterVecMetric.dynamicLabels) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the original ask, you only asked for more of a CounterOpts
and HistogramOpts
, and when I was learning about the Opts pattern and how to implement it, I decided to allow for future expansion by implementing a more flexible option. Anyways, I'm over-explaining here. Implemented your change.
pkg/controller/metrics/metrics.go
Outdated
if typeStr, ok := obj.GetLabels()[hivev1.HiveClusterTypeLabel]; ok { | ||
// GetLabelValue returns the value of the label if set, otherwise a default value. | ||
func GetLabelValue(obj metav1.Object, label string) string { | ||
if typeStr, ok := obj.GetLabels()[label]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, so I do not remember when I discovered a problem - I distinctly remember us debating what a default value should be (like N/A or unknown). Nevertheless, I decided to track down the actual problem in the library because just 1 person remembering something like this is not ideal for future-proofing.
Okay, so With
or WithLabelValues
internally calls GetMetricWith
from vec.go, and the problem comes when getOrCreateMetricWithLabels
creates the map with size equivalent to labels
passed instead of reading from metric.Desc
to determine the correct size (Note - even the extractLabelValues
only relies on labels provided)
TL;DR
When the metric is created, it relies on the labels provided instead of metric definition as the dimension for creating the map. So if you provide empty string as label, it ignores it and creates the resultant metric map with less dimensions, consequently leading to always dropping some labels while observing.
Maybe With
might not lead to such a problem - WithLabelValues
definitely does. But if we know that happens, we don't want to treat them differently as a safety.
PS: These are the closest relates issues/discussions I found related to this topic:
variable labels
missing labels
flexible metric exposition
pkg/controller/metrics/metrics.go
Outdated
if typeStr, ok := obj.GetLabels()[hivev1.HiveClusterTypeLabel]; ok { | ||
// GetLabelValue returns the value of the label if set, otherwise a default value. | ||
func GetLabelValue(obj metav1.Object, label string) string { | ||
if typeStr, ok := obj.GetLabels()[label]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay; so then my comment still applies. If the label exists but happens to have an empty value, this code as written will cause it to be observed with an empty value rather than our default.
Now, based on the code you pointed to, it seems to me like that would be okay, because we would actually be passing in the empty string as an argument, as opposed to omitting the argument entirely, so the number of dimensions would still be correct. And if that's true, I would argue it's the appropriate thing to do: i.e. there's a difference in behavior for the user between the CR lacking the label ("unspecified") and the label existing with an empty value (the value is specified, and it's ""
).
Whichever way we decide to do it, it should be consistent. Right now the fixed labels always convert ""
to the default, but the dynamic ones don't.
}, | ||
} | ||
if repeatedLabel := counterVecMetric.getRepeatedLabel(); repeatedLabel != "" { | ||
log.Errorf("HiveConfig.Spec.AdditionalClusterDeploymentLabels[%q] conflicts with a fixed label for the metric %s. Please rename your label.", repeatedLabel, counterOpts.Name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to make this fatal, right?. (And if you turn this into a panic
, you don't need the log
arg.)
return finalList | ||
} | ||
|
||
// getRepeatedLabel detects and returns the first optional label it encounters, that overlaps with another label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This would be more useful if it returned all repeated labels so the consumer doesn't have to iterate. Can be done in a fup though.
Beautiful! /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: 2uasimojo, suhanime The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
1 similar comment
/retest |
@suhanime: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
In addition to the cluster-type label, it would be helpful for users to be able to provide their own set of labels (like STS, Private Link, Managed VPC etc) to look for while logging metrics. Allow support for such optional labels for metrics of ClusterDeployment and ClusterProvision controller
xref: https://issues.redhat.com/browse/HIVE-2045
/assign @2uasimojo