Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Common metrics registry #1832

Closed
na-- opened this issue Feb 1, 2021 · 7 comments · Fixed by #2071
Closed

Common metrics registry #1832

na-- opened this issue Feb 1, 2021 · 7 comments · Fixed by #2071
Assignees
Labels
breaking change for PRs that need to be mentioned in the breaking changes section of the release notes bug enhancement high prio refactor
Milestone

Comments

@na--
Copy link
Member

na-- commented Feb 1, 2021

Currently, k6 metrics (both built-in ones and custom ones), as well as any measurements that are made for them, are very decoupled from the places that consume them in the Engine (i.e. thresholds and the end-of-test summary and outputs). At a first glance, this might seem like a good architectural choice, but the reality is that instead of loosely coupled, they are very tightly coupled at a distance. This brings far more issues than it solves, if it even solves any... 😞

With the current architecture, k6 doesn't "know" all of the built-in metrics that actually exist. They are defined as global variables in lib/metrics/metrics.go, but they are unconnected to anything else. And while there's a separate issue just to refactor that ugliness away (#572), not knowing the set of built-in metrics when the init context is executed and custom metrics can be defined makes it possible for users to define custom metrics with the same names as the built-in ones, but with different metric and value types.

Compounding the issue, custom metrics themselves are quite problematic. There isn't absolutely any validation (#1435), so custom metrics from different VUs (or even the same VU) can redefine both built-in metrics and other custom metrics with different metric and value types! Currently, this is not going to throw an exception in k6:

import { Rate, Trend, Gauge } from 'k6/metrics'

let metric1 = new Rate('same_metric_name');
let metric2 = new Trend('same_metric_name', true);
let metric3 = new Trend('same_metric_name', false);
let metric4 = new Gauge('http_req_duration'); // same name as built-in metric, but completely different type! 

The metrics constructor doesn't really do validation or create anything in a centralized place. It just returns something like a metric "template" that is attached to every metric Sample later generated by the JS .add() method. So we don't really know the custom metrics and their types when the Engine starts running the thresholds. It only realizes a metric exists when it "sees" a sample from it for the first time, and copies whatever "template" it first sees for the metric name 😭 https://github.com/loadimpact/k6/blob/46d53d99deba98add23079df93579408fa7ea094/core/engine.go#L406-L413

So, this will not only not produce an error:

import http from 'k6/http';
import { Gauge } from 'k6/metrics'

let blah = new Gauge('http_req_duration');

export default function () {
    blah.add(100); // mess up everything
    http.get("https://httpbin.test.k6.io/");
    http.get("https://httpbin.test.k6.io/");
}

It will produce messed up summary like this:

...
     http_req_connecting........: avg=70ms     min=0s       med=70ms     max=140.01ms p(90)=126.01ms p(95)=133.01ms
     http_req_duration..........: 204.468661 min=100 max=410.000517
     http_req_receiving.........: avg=213.77µs min=196.04µs med=213.77µs max=231.49µs p(90)=227.95µs p(95)=229.72µs
...

The current architecture is blocking all of these issues and probably others as well:

I've opened this issue as the first prerequisite step for solving all of the ones listed above. I think the way to do that would be to have something like a single central "metrics registry" in k6. A list (or, rather, a map) of metric names with their types - both the built-in ones and all custom ones.

Any attempts at re-defining built-in or previously defined custom metrics should check the metric and value types of the new metric. If they don't match the already existing metric, an exception should be raised. And if they do match, instead of creating a new object, we should just return a *Metric pointer to the already existing metric object. To reduce undefined behavior even further, and to save us from having to ensure thread-safety, we should probably restrict the creation of custom metrics only to the initial init context (from which we get the exported script options).

@mstoykov
Copy link
Contributor

mstoykov commented Jun 3, 2021

A thing that isn't mentioned in the above is that now that k6 has extensions, extensions also might need to emit metrics and currently, this is done like this and the metrics get defined in the same global way

This and the connected issues more or less require that we break this and all extensions depending on it unless we want stats.New to be adding to some global instance, which I guess is fine for the application k6 uses, but it will be kind of hard to test and IMO bad design decision, so this seems like it will require some additions to the extension support

WDYT @na-- @imiric

@imiric
Copy link
Contributor

imiric commented Jun 3, 2021

I think it's fine to break this since a) it was never documented, and b) this is why xk6 is still considered to be in alpha state ;) So I think we're in the clear to do changes like this that will ultimately result in a cleaner and safer API.

@na--
Copy link
Member Author

na-- commented Jun 3, 2021

Agree with @imiric, plus the breaking change won't be that big, especially compared to the module path change. IIRC, only a few extensions use custom metrics for now, so the sooner we make a better API for that, the better.

@mstoykov
Copy link
Contributor

mstoykov commented Jun 9, 2021

I did look at the prometheus code and more accurately at the go client and the registry that is there as this seems to be used through the rest of the go code of prometheus.

Some overview notes from what I understood:

  1. The registry has 2 relevant methods
    1.1 Register (interface) which register a new Collector (more on that later)
    1.2 Gather(inteface) which goes through all the Collectors and calls Collect on them providing them with a channel on which they send metrics which then the Gather method Writes to another Metric struct (which is protobuf tagged and seems to be mostly designed to be serialized to protobuf). There are additional Descriptions of the metrics being passed around, but that is beside the point. In the end, Gather returns a slice of structs containing the metrics families continuing the metric struct that "written".The Metric type itself has pointers to the 5 types of Metrics(Counter, Gauge, Histogram, Summary, Untyped) and the one that is populated is what the metric is.
  2. Collectors in general are what the actual metric types are and they have both the Collect method and Write which writes them to (correct field in ) the common Metric struct from above. They additionally have specific methods to change their values.
  3. Prometheus doesn't have Trends, it has histograms that are not the same It has Summary which uses this package. There is some examples on how it is not very accurate and apparently not aggregatable in prometheus, which I think is a deal breaker for k6, as I don't think we want to drop our Trends.
  4. It also doesn't have composite Metrics such as the HTTPTrail one, although that can probably be done ... somehow 🤷 although I guess not with this exact same setup as the Metric type can only have 1 thing inside (and the HTTP trail has multiple Trends + counter+ rate).

Given the above, the direct usage of the prometheus code seems impossible without either big changes to it (defeating the point) or breaking changes to k6 ...

On the other hand, the architecture looks interesting, but I really prefer if at least the first PR on this is less overwhelming and that if possible we don't emit metrics that won't be collected (see #1321 ).

I will probably just prefer if we have a Registry that instead of registering stuff in ... creates the metrics and returns them and leave the emitting of the metrics as is (using the PushIfNotDone) at least for now.

This will still let us have

  1. multiple registries (for easier testing at least)
  2. as long as after the pushing we check that the metrics were created by the registry so we prevent rogue metric type creation and emittion. Although probably something like sealed interfaces will be better at preventing this ;)

@na--
Copy link
Member Author

na-- commented Jun 9, 2021

I will probably just prefer if we have a Registry that instead of registering stuff in ... creates the metrics and returns them and leave the emitting of the metrics as is (using the PushIfNotDone) at least for now.

What do we gain by having the registry create the metrics? Why not just have a Register() method that accepts their interface type and a Get() method that returns it. For the built-in metrics, if we want to, we can also have the concrete metric objects themselves accessible as properties of a normal struct, with their specific non-interface types. That way, we won't need to make a map lookup and a type assertion every time we emit a metric.

So, instead of this: https://github.com/k6io/k6/blob/0e1f44192f799005f77c5eabe5ba84a6b724ef83/lib/metrics/metrics.go#L32-L36

Have something like this:

type MetricsRegistry interface {
    Get(name string) Metric
    Register(name string, metric MEtric) error
}

type BuiltInMetrics struct {
    VUs Gauge
    VUsMax Gauge
    Iterations Counter
    IterationDuration Trend
    // ...
}

func RegisterBuiltInMetrics(registry MetricsRegistry) (*BuiltInMetrics) {
    builtin := &BuiltInMetrics{
        VUs: NewGauge("vus"),
        VUsMax: NewGauge("vus_max"),
        Iterations: NewCounter("iteration_duration"),
        IterationDuration: NewTrend("iteration_duration", true),
        // ...
    }
    
    // This can probably be easily automated with some light reflection
    registry.Register(builtin.VUs.Name(), &builtin.VUs)
    registry.Register(builtin.VUsMax.Name(), &builtin.VUsMax)
    // ...

    return builtin
}

This results in a type-safe struct that can be used internally in k6, without any globals (#572) or map lookups or locking, but also in a registry that has a metric_name -> MetricObject relationship. That registry and can be used by custom metrics in the users' scripts, to add new ones and to validate they aren't trying to register the same metric with a different type, and it could also be used by extensions.

Edit: if Metric has a Name() string method, then we probably don't even need the first parameter of Register()

@mstoykov
Copy link
Contributor

mstoykov commented Jun 9, 2021

I don't see anything preventing us from just ... not registering something (in an extension for example) and emitting the metric.

One of the things the prometheus architecture does is that the registry is how you gather the metrics and it only gathers from registered metrics so if you don't register your metrics they don't get gathered.

Having the Registry be the place that has NewGauge and co let's us return Metric types that can't be created through any other way effectively guiding whoever codes to only be able to actually make a metric and emit a sample for a metric that is registered.

@na--
Copy link
Member Author

na-- commented Jun 9, 2021

hmm yeah, good point 👍

and, thinking about it, it will also clean up my pseudo-code above, removing the need for repetition (or reflection):

func RegisterBuiltInMetrics(registry MetricsRegistry) (*BuiltInMetrics) {
    return &BuiltInMetrics{
        VUs: registry.NewGauge("vus"),
        VUsMax: registry.NewGauge("vus_max"),
        Iterations: registry.NewCounter("iteration_duration"),
        IterationDuration: registry.NewTrend("iteration_duration", true),
        // ...
    }
}

@na-- na-- modified the milestones: v0.33.0, v0.34.0 Jun 16, 2021
@mstoykov mstoykov self-assigned this Jun 21, 2021
@mstoykov mstoykov linked a pull request Jul 20, 2021 that will close this issue
@mstoykov mstoykov modified the milestones: v0.34.0, v0.35.0 Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking change for PRs that need to be mentioned in the breaking changes section of the release notes bug enhancement high prio refactor
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants