Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup of the custom metrics #1435

Closed
sniku opened this issue May 7, 2020 · 6 comments
Closed

Cleanup of the custom metrics #1435

sniku opened this issue May 7, 2020 · 6 comments

Comments

@sniku
Copy link
Collaborator

sniku commented May 7, 2020

Custom metrics (Counter, Gauge, Trend, Rate) behave unexpectedly when non-numeric values are added. Some non-numeric values are casted to numeric values (true => 1) and some break the internal state of the metric.

Currently, any value passed to .add() is accepted and treated in some way towards the internal state.

Regardless of what is passed to .add(), k6 doesn't raise exceptions. This is the correct behavior of a load testing tool because the tool needs to be resilient against failing SUT.
SUT failures are expected and should not cause the test execution to be unreliable.
Non-numeric values should not be counted towards the internal state, but they should not raise exceptions either.

Rate

Rate considers undefined, NaN and "strings" as positive values and counts them towards the internal state.

It considers false and null as negative values and marshals both to 0. I think this is probably correct.

I have to say that it's rather funny that value -1 is considered positive and is marshaled to +1. Perhaps this is by design, but I believe that negative values should be counted towards the "negative" side of the metric.

All non-numeric values fail JSON marshaling when the JSON output is used k6 run -o json

WARN[0001] JSON: Sample couldn't be marshalled to JSON   error="json: unsupported value: NaN" filename=

Proposed refactoring

Positive values: true, 1,2,3..., 0.1,...,
Negative values: null, false, 0, -1, -2,..., -0.1,...
Error values: Nan, undefined, "string", any other non-numeric value

All positive values should be marshaled to 1, all negative values should be marshaled to 0.

When an error value is added to the metric, k6 should produce a warning, increment an invalid_value counter for the metric, and continue with the iteration.
k6 should not raise Exception and abort the iteration.

Example:

rate.add( response.json("crocodile.0.is_happy"); // no ifs/validation is required from the user, and the execution of the iteration will continue when the "is_happy" is "undefined". 
// same as rate.add(undefined);
WARN[000X] Rate metric: unsupported value: undefined

The result of this should look similar to:
image

I think this behavior makes k6 more resilient when executing stress tests. Ideally, the user should not need to worry about breaking iterations when the SUT is crashing.

Counter

The current default behavior of the Counter metric is not to count but to sum. I believe we should consider it a bug.

Example:

let c = Counter("my_counter");
c.add(-200);
c.add(200);

// c.count is `0` while it should be `2`.
// c.sum should be `0`

The original metrics implementation in k6 was partially based on Prometheus (#429 (comment)) which has a Counter type that is cumulative (https://prometheus.io/docs/concepts/metric_types/#counter).
As the WIP Openmetrics project is going to be some extension of the Prometheus format, it should be considered in the context of k6 as well (#858, although we don’t have to follow Prometheus of course if we don’t think it makes any sense).

The proposed change adds functionality to the Counter metric and fixes bugs, but doesn't break the loose compatibility with Prometheus definition of the metric.

proposed refactoring.

The Counter metric should have 2 default aggregator methods: count and sum.

count should be incremented on every call to .add().
sum should be modified on every numeric value .add(numeric_value).

image

 c.add(false);     // increment count, don't modify sum
 c.add(true);      // increment count, don't modify sum
 c.add(0);         // increment count, don't modify sum
 c.add(1);         // increment count, modify sum
 c.add(-1);        // increment count, modify sum
 c.add(0.1);        // increment count, modify sum
 c.add(null);     // increment count, don't modify sum
 c.add(undefined); // increment count, don't modify sum
 c.add(NaN);       // increment count, don't modify sum
 c.add("string");  // increment count, don't modify sum
 c.add(r);         // increment count, don't modify sum

Gauge

proposed refactoring.

 g.add(false);     // increment `invalid_value` counter. Reset gauge to 0.
 g.add(true);      // increment `invalid_value` counter. Reset gauge to 0.
 g.add(0);         // state: 0  // CORRECT // marshalls to 0
 g.add(1);         // state: 1  // CORRECT // marshalls to 1
 g.add(-1);        // state: -1 // CORRECT // marshalls to -1
 g.add(0.1);        // state: 0.1 // CORRECT // marshalls to 0.1
 g.add(null);      // increment `invalid_value` counter. Reset gauge to 0.
 g.add(undefined); //  increment `invalid_value` counter. Reset gauge to 0.
 g.add(NaN);       //  increment `invalid_value` counter. Reset gauge to 0.
 g.add("string");  //  increment `invalid_value` counter. Reset gauge to 0.
 g.add(r);         // increment `invalid_value` counter. Reset gauge to 0.

image

Trend

The trend metric is currently the most resilient metric type we have. It doesn't completely die when invalid values are added. Nevertheless, it exhibits incorrect behavior in several cases.

 t.add(false);      // increment `invalid_value` counter. Don't modify the internal state
 t.add(true);      // increment `invalid_value` counter. Don't modify the internal state
 t.add(0);         // state: 0  // CORRECT // marshalls to 0
 t.add(1);         // state: 1  // CORRECT // marshalls to 1
 t.add(-1);        // state: -1 // CORRECT // marshalls to -1
 t.add(null);     // increment `invalid_value` counter. Don't modify the internal state
 t.add(undefined); // increment `invalid_value` counter. Don't modify the internal state
 t.add(NaN);      // increment `invalid_value` counter. Don't modify the internal state
 t.add("string"); // increment `invalid_value` counter. Don't modify the internal state
 t.add(r);        // increment `invalid_value` counter. Don't modify the internal state

Current Behavior

import {sleep} from "k6"
import { Counter, Gauge, Trend, Rate } from "k6/metrics";

let r = Rate("my_rate");
let c = Counter("my_counter");
let g = Gauge("my_gauge");
let t = Trend("my_trend");

function testRate(){
 r.add(false);     // state -1 //    ?    // marshalls to 0
 r.add(true);      // state +1 //    ?    // marshalls to 1
 r.add(0);         // state -1 // CORRECT // marshalls to 0
 r.add(1);         // state +1 // CORRECT // marshalls to 1
 r.add(-1);        // state +1 // WAT?    // marshals to -1
 r.add("string");  // state +1 //    ?    // fails json marshaling
 r.add(undefined); // state +1 //  BUG    // fails json marshaling
 r.add(null);      // state -1 //    ?    // marshals to "0"
 r.add(NaN);       // state +1 //    ?    // fails JSON marshaling
 r.add(r);         // state +1 //  BUG    // fails JSON marshaling
}

function testCounter(){
 c.add(false);     // no change // CORRECT // marshalls to 0
 c.add(true);      // state +1  // CORRECT // marshalls to 1
 c.add(0);         // state -1  // CORRECT // marshalls to 0
 c.add(1);         // state +1  // CORRECT // marshalls to 1
 c.add(-1);        // state -1  // CORRECT // marshals to -1
 c.add(null);      // no change //   ?     // marshals to "0"
 c.add(undefined); // BUG - state change to NaN and the entire metric is broken
 c.add(NaN);       // BUG - the same
 c.add("string");  // BUG - the same
 c.add(r);         // BUG - the same
}

function testGauge(){
 g.add(false);     // state: 0  //   ?     // marshalls to 0 - should it?
 g.add(true);      // state: 1  //   ?     // marshalls to 1
 g.add(0);         // state: 0  // CORRECT // marshalls to 0
 g.add(1);         // state: 1  // CORRECT // marshalls to 1
 g.add(-1);        // state: -1 // CORRECT // marshals to -1
 g.add(null);      // state: 0  //   ?     // marshals to 0
 g.add(undefined); // BUG - state change to NaN and the entire metric is broken
 g.add(NaN);       // BUG - the same
 g.add("string");  // BUG - the same
 g.add(r);         // BUG - the same
}

function testTrend(){
 t.add(false);     // state: 0  //   invalid     // marshalls to 0 - should it?
 t.add(true);      // state: 1  //   invalid     // marshalls to 1
 t.add(0);         // state: 0  // CORRECT // marshalls to 0
 t.add(1);         // state: 1  // CORRECT // marshalls to 1
 t.add(-1);        // state: -1 // CORRECT // marshals to -1
 t.add(null);      // state: 0  //   invalid     // marshals to 0
 t.add(undefined); // BUG - avg changes to NaN - other aggregation functions seem unaffected?
 t.add(NaN);       // BUG - the same
 t.add("string");  // BUG - the same
 t.add(r);         // BUG - the same
}

export default function() {

 testRate();
 testCounter();
 testGauge();
 testTrend();

 sleep(0.1);
}

image

Prevent users from redefining built-in metrics

Example script.

import http from 'k6/http';
import {sleep} from 'k6';
import { Counter } from 'k6/metrics';
export let options = {
  iterations: 10,
  vus: 1,
};
let durationCounter = new Counter('http_req_duration');
export default function() {
  http.get('https://test-api.k6.io/public/crocodiles/1/');
  durationCounter.add(__ITER * 10000);
  sleep(1);
};

Produces Trend metric with values from both, built-in Trend metric and the custom Counter metric.
image
The k6 cloud backend is also confused, showing the values of the built-in Trend metric on the main chart and Counter values in the Analysis Panel.

Suggested error:
ERRO: you can't redefine the 'http_req_duration' built-in metric.

Additional options

(To be discussed)

Allow users to specify the initial value for metrics

new Counter('http_req_errors', {initial: 0, });
@sniku sniku added the bug label May 7, 2020
@sniku sniku changed the title WIP: Unexpected behavior of custom metrics Cleanup of the custom metrics May 8, 2020
@na--
Copy link
Member

na-- commented May 21, 2020

Found a previous issue of yours on the same topic 😅 #908

@na-- na-- added this to the v0.30.0 milestone Nov 3, 2020
@na--
Copy link
Member

na-- commented Nov 3, 2020

This is also causing issues when streaming metrics to the cloud (and probably other outputs), we get NaN as the value.

@na-- na-- added the high prio label Nov 3, 2020
@dgzlopes
Copy link
Member

dgzlopes commented Dec 9, 2020

OpenMetrics spec was released a few weeks ago.

Probably it's interesting to give it a look while working on this ticket! (Pawel said that k6 metrics implementation was more-or-less based on Prometheus format).

If our metrics adhere to the spec and we follow the best practices, there is less context switch for users working with k6 that come from OTel or Prometheus.

@na--
Copy link
Member

na-- commented Feb 1, 2021

#1832 (or something like it) is a prerequisite for the current issue.

@na-- na-- modified the milestones: v0.31.0, v0.32.0 Feb 24, 2021
@na-- na-- modified the milestones: v0.32.0, v0.33.0 Apr 14, 2021
@na-- na-- modified the milestones: v0.33.0, v0.34.0 Jun 16, 2021
@mstoykov mstoykov modified the milestones: v0.34.0, v0.35.0 Sep 1, 2021
@mstoykov
Copy link
Contributor

mstoykov commented Nov 2, 2021

With #1876 now all metrics are checked for NaN values and gets everything properly translated to numbers (true is 1, false is 0, everything else that isn't a number is an error, and numbers are just its values).

It also will either throw an exception if throw is enabled or log a warning.

This IMO closes most of this issue and specifically the more non-nuanced issues which were bricking the whole metric. The leftovers IMO should be added to separate new issues so they can be discussed separately.

@sniku
Copy link
Collaborator Author

sniku commented Nov 2, 2021

The most important bugs were fixed, so I'm closing the issue. Not all features were implemented, but those features can be added as separate issues in the future.
I'll open a small issue about WARN messages shortly.

@sniku sniku closed this as completed Nov 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants