-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop metrics CRD #934
Comments
One lesson from v1 was to separate the metrics from the experiment. This reverses that. What has changed? Or do we conclude that it just didn't make anything easier? |
This change does not really address a problem with (non-builtin) metrics -- they are hard to define. In part, I think this is because there is no good way to test them. If each is it's own object I can imagine a tool that would let us test just a single metric. An authoring tool of some sort. By folding into an experiment, this seems harder to do. How do we assist users to define them? |
If we go down this route, it seems like we should rethink the metrics (sub)-crd. There would be a lot of repetition in it. For example, can we reuse secret, url, headers, authtype, method, (proposed) versionInfo -- can we share it with the existing versionInfo -- even jqExpression. Clearly these can't be shared across all metrics, but is there value in group metrics with common properties like these. |
Hmm... I agree @kalantar with your latest point, which really is where some thought needs to go in terms of the design. Re: authoring tool, we were discussing #865 in this context. This subcommand can be modified as follows... iter8ctl debug -e <experiment-name> -m <metric-name> -a <Iter8 analytics URL> So, I don't think much changes in that regard. |
I am ok with leaving this issue open until I would really prefer if we can get to an agreement on this issue ... nail the design for reuse of secrets, url, headers, authType, method etc., and absorb it in
|
Yes, this is indeed my conclusion. I think the CRD fields for metrics are beautiful :) But I am not at all sure a separate CRD is helping... in fact, I don't think it is. Hence, the case for ditching the CRD and retaining the fields. Separating them out this way also enabled us to evolve the metric fields much quicker (with minimal changes to the controller), and much of the heavy lifting for metrics being done by the analytics service. But I think we do have an excellent handle now on the actual fields needed to fetch metrics (with examples for 4 different metrics DBs & builtins & mocks). |
Re: metrics specification... a better attempt to factor out common fields is below. metricDefaults: # optional; default values for metrics
# <everything in metrics CRD, except name and type and versionInfo, can go here... all these fields are optional >
# <the idea is to default any field in metrics that can be defaulted>
metrics:
- name: ...
<exactly the same set of fields as in metrics CRD> # if a field is repeated here, and in metricDefaults, this will override the default. |
My impression is that we will need something like: metricsGroup:
- url ... # <defaults for everything in metrics CRD, except name and type and versionInfo, can go here
metrics:
- name: ...
- # another metricsGroup To support use of 2 metrics backend ... which, it is my impression is often the case for a/b testing. newrelic (reward) and prometheus (slo validation). Is this not true? |
How about the following? Same idea as yours, with some name changes. backends:
# in the default stanza, we expect to see the same fields as in metrics CRD, except name, type, versionInfo
- default: <defaults specific to this backend>
# in each metric, we only allow name, type, and versionInfo -- so no concept of overriding anymore
metrics: <metrics using this backend> In this design, if you have different defaults for the metrics with the same metrics URL (extremely unlikely scenario), it is still supported; you simply list them as two different backends. |
Please see #942 (comment) for an example. |
If we are saying this supercedes #930, we should indicate here that we still want to shorten the names of the fields. |
Note: This issue also needs to address all concerns from #930. In particular, co-locate |
kind: experiment
spec:
metrics: # required
# anywhere versionInfo appears, it is a list whose length needs to match length of versionNames
# name and namespace will be commonly used by Iter8's builtin metrics
# interpretation of name and namespace is metric specific
# conventions for the metrics / configmap yet to be sorted out
versionInfo: # required
- namespace:
iter8-mock/user-engagement: "15.0"
iter8-mock/mean-latency:
- namespace:
mean-latency:
criteria:
reward:
- metric: iter8-mock/user-engagement
preferredDirection: high # low for reward metrics
objectives:
- metric: iter8-openshift-route/mean-latency
upperLimit: 200 # msec
versionNames:
- current
- candidate params, bodykind: configmap
data:
backends:
- name: iter8-openshift-route
secret: namespace/name # of the secret
url: # take out templating for url for now
headers: # are templates
jqExpression:
provider:
method:
authType:
metrics:
- name: mean-latency
type: gauge
params:
- name: param
value: |
some complicated prometheus query with things like $name, $namespace, $serviceName, $elapsedTime
- name: mean-latency-other
type: gauge
params:
- name: body
value: |
some complicated prometheus POST query with things like $name, $namespace, $serviceName, $elapsedTime |
Today |
We can drop It was a noble concept when conceived (intended for use alongside confidence / other measures); we simply haven't used it yet (and it is adding complexity without value at the moment). |
When we discussed this issue, we indicated that fields of a
Should |
Except
You should be able to override everything within the experiment (or define new backends / metrics, not pre-defined). |
Given that we are not really going to be able to get rid of |
kind: configmap
data:
backends:
- name: iter8-openshift-route
secret: namespace/name # of the secret
url: # take out templating for url for now
headers: # are templates
jqExpression:
provider:
method:
authType:
metrics:
- name: mean-latency
type: gauge
params:
- name: param
value: some complicated prometheus query with things like $name, $namespace, $serviceName, $elapsedTime
# stuff like $name, $namespace, and $serviceName will come from the `versionInfo` section of this metric in the experiment |
Now that I look at the definition of
I am unsure that I like the use of |
Sorry... I am looking at your example (and see others) with |
We do not make any assumptions in terms of breakdown between configmap and experiment metrics...
As far the schema... for both, we will use the same schema as above (a list of backends, each with a list of metrics). |
Can there be more than one configmap? Do we use a fixed name for the configmap? Or is there a reference in the experiment? or via an annotation in the configmap? |
Could we please start with a single configmap, that analytics service is bootstrapped with to begin with? Name of the configmap can be fixed in analytics' deployment spec. Definitely no reference to configmap in the experiment to start with (if we feel the need for it, we can annotate the experiment subsequently). Idea is that Iter8 will come with a few pre-baked backends and metrics for many domains which the user can start using (with almost no changes; except perhaps URL). Of course, if the user wants to extend the backends or metrics within a given backend, they can do so with individual experiments to begin with, and change the configmap subsequently (we can document this process). Moved |
It needs also to be configured in iter8ctl. |
We should also have it configured in etc3; it should be validated if users may modify it. The sooner a problem is found, the better. |
The validation can happen in analytics. I see no reason why every iter8 component needs to be exposed to this configmap (at least to begin with). If the configmap is invalid, analytics will simply not start (and quit with an appropriate error message). Analytics already does the heavy lifting when it comes to (fetching) metrics. It can do so when it comes to validation of metrics definitions (in the configmap) as well. When Validation in Python will be using Pydantic models. |
At least initially, the configmap will be an install time artifact. So, the problem will be found at the soonest possible instant -- as soon as Iter8 is installed. Eventually, we can decide how to let users edit the configmap more dynamically (if we find that this what users actually want). The user always has the option of ignoring / overriding what is in the configmap and directly specifying metrics conf in the experiment (which is typed and validated at the CRD level). |
We will also defer validation that any criteria's reference to a metric is valid to analytics as well? |
Yes, that's correct. Analytics can simply complain (in an In other words, when the user does Anyway, this is one of the use-cases we identified for |
This doesn't seem intuitive especially if it is a configmap. Perhaps that is just a poor choice and that it would be only in an experiment. From closest to furthest use. The closest would be in the reward/objective itself: criteria:
rewards:
- metric:
versionInfo:
objectives:
- metric:
versionInfo: It is my observation that if there is more than 1 reward/criteria it is likely that much of the criteria:
versionInfo:
rewards:
objectives: If we define metrics in the experiment, it may make sense to include versionInfo here. In particular, we would expect the use of the variable to be here in the definition of the metric. When this is in the experiment, it makes sense. However, when in a configmap, less so since we don't know what the versions are. A scenario that uses fixed names would work though. metrics:
versionInfo:
- name: metric_name
type: Gauge
params:
- name: another_metric
criteria:
rewards:
objectives: Furthest is the Does this make sense? I am proposing to not include them in reward/objective but do allow in critiera and metrics. |
If you take a specific backend, for example, Prometheus add-on supplied by Istio, and look at the metrics exported by it, there are specific label values associated with different versions. And this association is the same across metrics exported by Istio. For example, you will see The above situation is the same with Linkerd, Sysdig (used in Code Engine), KFServing (Prometheus add on I created for the KFServing project), and so on. The expectation is that
metrics:
versionInfo:
- name: metric_name
type: Gauge
params:
- name: another_metric Above yaml seems to be mixing a list with a map key, which is invalid (unless the idea is to put metric definitions under versionInfo, which is way too complicated). I am assuming the intent is more like ... metrics:
versionInfo: <a map here>
metricInfo:
- name: metric_name
type: Gauge
params:
- name: another_metric This is conceptually equivalent to putting the versionInfo under backends, with the only difference being user has more typing to do and the overall CR looks more complicated because of the needless extra field under metrics (which used to be simply a list of metrics). IMO, IMO, none of the above discussion changes whether we use a configmap or a CRD for describing backends. The discussion re: where we put versionInfo within experiment remains the same. |
Is your feature request related to a problem? Please describe the problem.
Iter8 has two CRDs at the moment, experiment and metric. Here are the disadvantages of two CRDs (see below for its lone advantage).
Describe the feature/solution you'd like
Note: The metric objects below will mirror the structure of the new metrics CRD as described in #933 (i.e., they will include versionInfo). They will also not include sampleSize. This can be reintroduced at a later stage (non-breaking CRD changes are always ok).
Will this feature/solution bring new benefits for Iter8 users? What are they?
See the problems created by a single CRD above.
Will this feature/solution bring new benefits for Iter8 developers? What are they?
Simplifies code in all Iter8 components. Metrics will no longer be in
status
section of the experiment, but inspec
.Does this issue require a design doc/discussion? If there is a link to the design document/discussions, please provide it below.
Yes, see above. There is one disadvantage of this approach which is the experiment spec just got bigger. Considering the fact that the metrics is simply moving from status to spec, it is not really that big of a disadvantage.
How will this feature be tested?
How will this feature be documented?
Additional context
Assuming this feature makes it into
v2alpha3
, we can close #912 without having to change anything. Iter8 install will automatically become simplified (one-liner) because of this change, because there won't be anybuiltin
metrics objects to install anymore.The text was updated successfully, but these errors were encountered: