Proposal for adding `http_req_errors` metric #1311

sniku · 2020-01-17T16:46:18Z

This is a proposal for adding a new, standard metric to k6 called http_req_errors that records the number of failed HTTP requests.

Some of this functionality can be achieved today with a custom metric, but it's so basic that I suggest it should be included by default.

Feature Description

Successful requests often have longer http_req_duration than failed requests. This metric would allow k6 to display a separate http_req_duration for successful requests and separate http_req_duration_errors for failed requests.

It's not possible to define which HTTP status codes should be considered errors because these codes are context-specific. For example, HTTP-404 is not an error if someone is testing for 404s.
This problem can be solved by adding additional configuration options, as described below.

The biggest advantage of this functionality is the UX improvement.
Currently, k6 doesn't distinguish between failed and successful requests, therefore the terminal output for a test that has only failed requests is not distinguishable from a successful test.

With this change, one could see at first glance that the test was/wasn't successful without adding checks and thresholds.

Suggested Solution (optional)

let options = {
   http_errors: ["status>399"],  // default setting
   http_errors_exclude: [404, 403, 1099]
};

As I mentioned, I think these shouldn't be global config options, they should probably be a part of the new JS HTTP API. And while I agree with the default value of considering everything above 399 an error by default, we probably need some better way of specifying the rule. Maybe some sort of a JS-callback that evaluates HTTP responses before k6 emits metrics, maybe a more powerful rule engine (along the lines of #1313)...

We also shouldn't forget that non-2xx HTTP statuses isn't the only way an HTTP request can fail, network/DNS/protocol/etc. problems are also a possible error cause. So, whatever way we pick to filter these out, we should probably be able to handle these as well...

Also, as I mentioned in #1298, maybe we're looking at this from the wrong angle. We have to consider both issues together, since it might be enough to just let users tag requests that they expect to fail with some new system tag - then, if k6 sees such a tag, it won't add a true value to the new http_req_errors metric?

sniku · 2020-01-27T16:03:38Z

I agree that most of this functionality can be achieved today, with the current version of k6 if

the end-user spends enough time to learn about the default capabilities of k6,
implemented custom Rate metric for errors,
custom Trend metric for response time, and
add an if condition for every http response.

It can be done, but strongly disagree that it should be.

For the sake of the argument, here's the simplest implementation I could come up with for a script with one batch and one regular request (without resorting to helper utils).

import http from "k6/http";
import { sleep } from "k6";
import { Rate, Trend } from "k6/metrics";

let req_success_dur = Trend("req_success_dur");
let my_httpx_errors = Rate("my_httpx_errors", false); // isTime=false
let test_trend = Trend("test_trend"); // isTime=false

export let options = {
  thresholds: {
    'my_httpx_errors': [ 
      'rate<0.1'
    ],
    'req_success_dur': [
      'p(95)<1000'
    ]
  },
  stages: [
    { duration: "2m", target: 100 }, // below normal load
    { duration: "5m", target: 100 }, 
    { duration: "2m", target: 200 }, // normal load
    { duration: "5m", target: 200 },
    { duration: "2m", target: 300 }, // around the breaking point
    { duration: "5m", target: 300 },
    { duration: "2m", target: 400 }, // beyond the breaking point
    { duration: "5m", target: 400 }, 
    { duration: "10m", target: 0 },  // scale down. Recovery stage.
  ]  
}

export default function() {
  let responses = http.batch(
    [
      ["GET", "https://httpbin.test.loadimpact.com/status/503"],
      ["GET", "https://httpbin.test.loadimpact.com/status/403"],
      ["GET", "https://httpbin.test.loadimpact.com/status/404"],
      ["GET", "https://httpbin.test.loadimpact.com/status/201"],
    ]
  )

  Object.values(responses).forEach(resp => {
    if (resp.status >= 400 && resp.status != 404 && resp.status != 403) {
      my_httpx_errors.add(true);
    }
    else {
      req_success_dur.add(Math.round(resp.timings.duration));
      my_httpx_errors.add(false);
    }
  });

  let resp = http.get('https://httpbin.test.loadimpact.com/status/200')

  if (resp.status >= 400 && resp.status != 404 && resp.status != 403) {
    my_httpx_errors.add(true);
  }
  else {
    req_success_dur.add(Math.round(resp.timings.duration));
    my_httpx_errors.add(false);
  }

  sleep(1);
}

This script works. It records http_req_success_duration and http_reqs_errors, but it's not pretty or easy to maintain.
I can't imagine recommending this approach to users without some discomfort 😅

I agree that http_req_duration_errors is unnecessary. Few people are interested in the response duration of failed requests. I retract my request for this metric. 👍

I agree that the http_req_errors should be a Rate rather than a Counter 👍

Here's the terminal output of the above script.

The new, correct metrics are highlighted in green. The http_req_blocking and friends are still accounting for successful+failed requests. That's not great.
One can replicate this logic for the remaining minor metrics, but at that point, the script and output would be too messy.

Again, I think that k6 should by default record only metrics for successful requests, and the manual workaround implemented above is not a viable solution.

In any case, I think http_req_duration is a bit of a red herring in this case. Correct me if I'm wrong, but it seems to me that the problem isn't that http_req_duration becomes too small when there are errors. Rather, the problem is that there are errors and we're not detecting them (i.e. our thresholds pass and the CI status is green). So, I think users should just be able to set a threshold for the HTTP error rate, for example thresholds: { "http_req_errors": ["rate<0.01"] }. Then the http_req_duration values don't matter, since this other threshold would be triggered and fail the test.

When doing stress testing, errors are expected to appear and you want to continue the test for an extended period of time to see when the system recovers. I believe that it's still useful to know what the http_req_duration is for successful requests when 90% of other requests are failing. It helps to determine the stress failure mode. If errors are included in http_req_duration the metric isn't meaningful.

Another reason for implementing something like this:

let options = {
   http_errors: ["status>399"],  // default setting
   http_errors_exclude: [404, 403, 1099]
};

is for the Cloud UI to be able to understand which requests should be considered successful/failed.

Today the cloud UI is assuming that status code >= 400 is a failure, and marks it red in the UI. This is clearly incorrect in many cases. Cloud should be able to understand these settings and make use of them for UI and performance alerts.

The error metric should be displayed in the cloud by default, with the same prominence as http_req_duration.

A complementary functionality would be to introduce a change to the HTTP API. I purposefully didn't mention it in the original issue description because that's a rabbit hole 😄
In any case, something like this would be ideal for exceptions that are not handled by the global options configuration.

http.get({url: "https://httpbin.test.loadimpact.com/status/503", expect: {status: 503}});

Even though this is a more flexible solution, I suggest we don't do that as a part of this work. I think that the options solution is solving the problem in 95% of the cases,
and the change to the HTTP API can still be implemented at a later date.

(...) we probably need some better way of specifying the rule. Maybe some sort of a JS-callback that evaluates HTTP responses before k6 emits metrics, maybe a more powerful rule engine (along the lines of #1313)...

Whatever format we decide on, the format needs to be readable by the cloud. It would be great if it didn't depend on JS.

We also shouldn't forget that non-2xx HTTP statuses isn't the only way an HTTP request can fail, network/DNS/protocol/etc. problems are also a possible error cause. So, whatever way we pick to filter these out, we should probably be able to handle these as well...

That's a good point. Perhaps we should rename the http_req_errors to req_errors to record all failures rather than just http failures. What do you think?

Also, as I mentioned in #1298, maybe we're looking at this from the wrong angle. We have to consider both issues together, since it might be enough to just let users tag requests that they expect to fail with some new system tag - then, if k6 sees such a tag, it won't add a true value to the new http_req_errors metric?

#1298 would benefit from the HTTP API change I'm suggesting above, but I think the options solution would also work for them. We could go deeper into discussing the HTTP change, but I'm afraid of 🐰 holes 😄

And if users want to only have a threshold for the response times of successful requests, they can mostly do it even now, by just specifying the threshold like this: thresholds: { "http_req_duration{status:200}": ["p(95)<100"] }. It's not very flexible, but if that's a big problem, we should just improve the threshold matching flexibility (#1313), not just add new metrics willy-nilly, since there would be no end of the metrics we'd like to add otherwise...

Errors are fundamental to load/stress testing. We should track and display errors by default in k6 UI and cloud UI. This should not be left for the users to define and implement on their end as a special case. I believe we should also have default thresholds based on the rate of errors.

I'm sorry about the length of this comment. I need to learn how to be more concise 😅

na-- · 2020-01-28T14:43:30Z

It can be done, but strongly disagree that it should be.

Completely agree.

For the sake of the argument, here's the simplest implementation I could come up with for a script with one batch and one regular request (without resorting to helper utils).

I'd say that something like this is a perfect place for helper utils 😉 Or, more precisely, a wrapper around http.batch() / http.request() that implements whatever logic is important to you. And, in general, if such logic is not easy or straightforward to write as a wrapper, it would be much more difficult for us to actually implement as a generic k6 feature... So here's how I would write that script, which I'd argue is very readable (especially if I move the wrappers in a separate library):

import http from "k6/http";
import { sleep } from "k6";
import { Rate, Trend } from "k6/metrics";

let req_success_dur = Trend("req_success_dur");
let my_httpx_errors = Rate("my_httpx_errors", false); // isTime=false

function checkResponse(resp) {
    if (resp.status >= 400 && resp.status != 404 && resp.status != 403) {
        my_httpx_errors.add(true);
        return false;
    }
    req_success_dur.add(Math.round(resp.timings.duration));
    my_httpx_errors.add(false);
    return true;
}

function myGet(url, params) {
    let resp = http.get(url, params);
    checkResponse(resp);
    return resp;
}

function myBatch(reqs) {
    var resps = http.batch(reqs);
    Object.values(resps).forEach(checkResponse);
    return resps;
}

export let options = {
    thresholds: {
        'my_httpx_errors': [
            'rate<0.1'
        ],
        'req_success_dur': [
            'p(95)<1000'
        ]
    },
    stages: [
        { duration: "2m", target: 100 }, // below normal load
        { duration: "5m", target: 100 },
        { duration: "2m", target: 200 }, // normal load
        { duration: "5m", target: 200 },
        { duration: "2m", target: 300 }, // around the breaking point
        { duration: "5m", target: 300 },
        { duration: "2m", target: 400 }, // beyond the breaking point
        { duration: "5m", target: 400 },
        { duration: "10m", target: 0 },  // scale down. Recovery stage.
    ]
}

export default function () {
    myBatch([
        ["GET", "https://httpbin.test.loadimpact.com/status/503"],
        ["GET", "https://httpbin.test.loadimpact.com/status/403"],
        ["GET", "https://httpbin.test.loadimpact.com/status/404"],
        ["GET", "https://httpbin.test.loadimpact.com/status/201"],
    ]);

    myGet('https://httpbin.test.loadimpact.com/status/200')

    sleep(1);
}

I can't imagine recommending this approach to users without some discomfort sweat_smile

While, as I said, I'd like to improve these parts of k6 to handle the most common use cases in a better way (while also allowing the handling of more complex ones with some work), I don't think something like the above wrapper is too onerous of a workaround until then.

I agree that http_req_duration_errors is unnecessary. Few people are interested in the response duration of failed requests. I retract my request for this metric. 👍
I agree that the http_req_errors should be a Rate rather than a Counter 👍

👍 🤝

Here's the terminal output of the above script.

The new, correct metrics are highlighted in green. The http_req_blocking and friends are still accounting for successful+failed requests. That's not great.
One can replicate this logic for the remaining minor metrics, but at that point, the script and output would be too messy.

Agree, the current output is messy. I'll respond to your other issue (#1319) later, but in general I agree that we shouldn't show most of the HTTP metrics we currently show, unless users explicitly request them or have a threshold on them. But we can't remove them until we have a way that allows users to add them back... So, I guess this is tracked in #570, for lack of a better issue.

Again, I think that k6 should by default record only metrics for successful requests, and the manual workaround implemented above is not a viable solution.

Maybe, I can see how this makes sense, though I think this should be configurable... For example, you're thinking in terms of application errors (i.e. HTTP status code >= 400), but there are also network/DNS/HTTPS/etc. issues, for which the duration measurement might be more important... And we shouldn't forget intentional/unavoidable errors, which we should also measure (#1298), and that decision has to be taken locally, not be a global option.

In any case, I can see use cases for both measuring "errors" and for not measuring them, so, in the interest of not locking out any users, I think this should be configurable... We can have a flag (and other options) in the new JS HTTP API that enables/disables this behavior. In that API, not measuring them can also be the default value, if we think that's the generally correct approach, but in the current API we should preserve the current behavior, I think...

In any case, I think http_req_duration is a bit of a red herring in this case. Correct me if I'm wrong, but it seems to me that the problem isn't that http_req_duration becomes too small when there are errors. Rather, the problem is that there are errors and we're not detecting them (i.e. our thresholds pass and the CI status is green). So, I think users should just be able to set a threshold for the HTTP error rate, for example thresholds: { "http_req_errors": ["rate<0.01"] }. Then the http_req_duration values don't matter, since this other threshold would be triggered and fail the test.

When doing stress testing, errors are expected to appear and you want to continue the test for an extended period of time to see when the system recovers. I believe that it's still useful to know what the http_req_duration is for successful requests when 90% of other requests are failing. It helps to determine the stress failure mode. If errors are included in http_req_duration the metric isn't meaningful.

Hmm you're right, this is a very valid use case. 👍

Another reason for implementing something like this:
let options = {
   http_errors: ["status>399"],  // default setting
   http_errors_exclude: [404, 403, 1099]
};
is for the Cloud UI to be able to understand which requests should be considered successful/failed.

Today the cloud UI is assuming that status code >= 400 is a failure, and marks it red in the UI. This is clearly incorrect in many cases. Cloud should be able to understand these settings and make use of them for UI and performance alerts.

As I mentioned earlier, I strongly think that such options aren't the best UX for this behavior. And while I wouldn't mind something like this in the new HTTP API, I'm especially against having them in the global options. Even if we ignore #833, it doesn't make sense to me from a UX perspective, since in a single script it's quite possible that you can have some 404/403 errors that are expected and that you want to measure, and others that you consider errors.

So, I think if we just tag the metrics from the expectedly erroneous requests (however those are defined) with a expected_error: true metric tag (or something like that), it will be even easier for the cloud (and everything else, like thresholds especially) to distinguish such requests...

The error metric should be displayed in the cloud by default, with the same prominence as http_req_duration.

I don't understand this point, sorry.

A complementary functionality would be to introduce a change to the HTTP API. I purposefully didn't mention it in the original issue description because that's a rabbit hole smile
In any case, something like this would be ideal for exceptions that are not handled by the global options configuration.
http.get({url: "https://httpbin.test.loadimpact.com/status/503", expect: {status: 503}});

This proposal I actually like 😅 Again, it probably should fall under the new HTTP API, but in my head, this (or something like this) is a nice way to handle it. Basically, the k6 HTTP client can emit expected_error: true for the generated metrics if it sees that the status it received matches the expected one, and the cloud/thresholds/etc. can safely consider the generated metrics.

Even though this is a more flexible solution, I suggest we don't do that as a part of this work. I think that the options solution is solving the problem in 95% of the cases,
and the change to the HTTP API can still be implemented at a later date.

I'm trying to argue that it will be both easier and better (i.e. 100% of cases) not to have global options 😉

(...) we probably need some better way of specifying the rule. Maybe some sort of a JS-callback that evaluates HTTP responses before k6 emits metrics, maybe a more powerful rule engine (along the lines of #1313)...

Whatever format we decide on, the format needs to be readable by the cloud. It would be great if it didn't depend on JS.

Again, we have per-request metrics, it would be much simpler for the cloud to process something it already understands, than to parse (probably with differences from k6...) some new global option.

We also shouldn't forget that non-2xx HTTP statuses isn't the only way an HTTP request can fail, network/DNS/protocol/etc. problems are also a possible error cause. So, whatever way we pick to filter these out, we should probably be able to handle these as well...

That's a good point. Perhaps we should rename the http_req_errors to req_errors to record all failures rather than just http failures. What do you think?

Not sure... My gut feeling is that we shouldn't mix these things, so http_req_errors seems better to me right now. However, this is a bigger discussion regarding how we want to handle different protocols in the future...

Also, as I mentioned in #1298, maybe we're looking at this from the wrong angle. We have to consider both issues together, since it might be enough to just let users tag requests that they expect to fail with some new system tag - then, if k6 sees such a tag, it won't add a true value to the new http_req_errors metric?

#1298 would benefit from the HTTP API change I'm suggesting above, but I think the options solution would also work for them. We could go deeper into discussing the HTTP change, but I'm afraid of rabbit holes smile

I disagree. As far as I understand, they want to consider only some 404 errors as expected. After their resource is created, I don't see why they'd want to ignore any other errors. Also, #1298 (comment) ... 😉

And if users want to only have a threshold for the response times of successful requests, they can mostly do it even now, by just specifying the threshold like this: thresholds: { "http_req_duration{status:200}": ["p(95)<100"] }. It's not very flexible, but if that's a big problem, we should just improve the threshold matching flexibility (#1313), not just add new metrics willy-nilly, since there would be no end of the metrics we'd like to add otherwise...

Errors are fundamental to load/stress testing. We should track and display errors by default in k6 UI and cloud UI. This should not be left for the users to define and implement on their end as a special case. I believe we should also have default thresholds based on the rate of errors.

I agree 👍

I'm sorry about the length of this comment. I need to learn how to be more concise sweat_smile

Me too 😅

ppcano · 2020-04-21T13:11:36Z

wow!! I will be brave and participate in this conversation.

I won't discuss the details because I just want to share the cases where I've experienced a default HTTP error metric could be useful:

Simple Threshold on HTTP errors

Define easily thresholds based on HTTP errors without having to define custom metrics.

thresholds: {
    'http_req_errors': ['rate<0.1'],
    'http_req_errors{custom-tag:hola}': ['rate<0.2']
},

Visualize HTTP errors in Datadog

Provide a general visualization for HTTP error rate or error counter in Datadog. This is not possible today because Datadog does not allow to query metric by tags using the OR operator.

status:200 OR status:201 OR status:202

When reviewing other Datadog integrations I found the following rigor error metrics:

rigor.real_browser.client_errors: gauge,Number of responses with a status code between 400 and 499.
rigor.real_browser.connection_errors: gauge,Number of responses where the status code is 504 or 0 (a browser-aborted request).
rigor.real_browser.server_errors: gauge,Number of responses where the status code is 500 or higher (excluding 504).
rigor.real_browser.errors: gauge,Total count of responses with status codes greater than or equal to 400.

sniku · 2020-04-28T11:11:41Z

I'm adding this screenshot from overloaded system to the conversation:

Currently, k6 includes HTTP-0 Response time in the end-of-test summary. This is probably not what we want. The HTTP-0 responses have either "0s" duration or "60s" duration (timeout) which skews the results for the requests that have actually finished.
In my opinion, HTTP-0 Responses are errors and should be excluded.

simskij · 2021-01-20T11:08:18Z

@sniku, @robingustafsson and @na--, can we bump http_reqs_errors to be included in v0.31.0? It would greatly improve the UX of test building, even if it's possible to do it manually today.

na-- · 2021-01-20T11:24:30Z

We still don't really have an agreement on what http_reqs_errors should be and how it should be configured, and we can't really prioritize it without consensus on these things. I thought @sniku's httpx wrapper on https://jslib.k6.io/ was an attempt to arrive at a nice design for this feature, and also a viable workaround until we have something ready to bake into k6?

And IIRC, that eventual design was tentatively intended to be included into the new k6 HTTP API, not the current one. Not only because we probably want #1321 beforehand and we don't want to make #883 worse with global options, but also because global options will likely be too restrictive for this feature. You might want to treat HTTP errors one way in one scenario and another way in the other scenarios. So, these distinctions need to be made on something like an HTTP-client level, not globally, which is what httpx currently emulates.

ppcano · 2021-01-22T04:03:16Z

Prioritization

We still don't really have an agreement

@sniku I think we should spend some time finding an agreement on this API.

And IIRC, that eventual design was tentatively intended to be included into the new k6 HTTP API.

I don't know the status of the new k6 HTTP API but it looks experimental and further in the future. I wonder what are the impediments to include this on k6/http.

API Design

Skipping cosmetics preferences, IMO, the global option together with the per-request expect tag/option solves all the cases with good UX. I believe the global option will be used mainly while the per-request tag will cover edge cases.

Breaking changes

I think we all agree that error response times should be excluded "somehow" from success responses. I raise the concern that if error response times are excluded from http_req_duration, it could break some existing thresholds.

sniku · 2021-01-22T11:31:39Z

@ppcano, we have discussed this feature during the last k6 sync call, and we have decided to work on the specification. The feature is not as simple as I initially thought, so it takes some time to work out all the edge cases, and address all the valid concerns raised above.

We plan to include this feature in v0.31 release. I'll update this issue or create a new one when the specification has been finalized—hopefully, today.

na-- · 2021-01-29T10:52:56Z

Closing this in favor of the new issue: #1828

sniku added the feature label Jan 17, 2020

na-- added the evaluation needed proposal needs to be validated or tested before fully implementing it in k6 label Jan 17, 2020

na-- mentioned this issue Jan 20, 2020

Enhance available threshold values for Rate, Gauge and Trend metrics #1312

Open

na-- mentioned this issue Jan 28, 2020

Add explicit tracking and ignoring of metrics and sub-metrics #1321

Open

na-- mentioned this issue Nov 10, 2020

Being able to edit metrics emitted by requests before that is done #1716

Closed

sniku mentioned this issue Jan 28, 2021

Proposal for adding a new standard http_req_failures metric #1828

Closed

na-- closed this as completed Jan 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal for adding `http_req_errors` metric #1311

Proposal for adding `http_req_errors` metric #1311

sniku commented Jan 17, 2020

na-- commented Jan 20, 2020 •

edited

Loading

Suggested Solution (optional)

sniku commented Jan 27, 2020 •

edited

Loading

na-- commented Jan 28, 2020

ppcano commented Apr 21, 2020

sniku commented Apr 28, 2020

simskij commented Jan 20, 2021 •

edited

Loading

na-- commented Jan 20, 2021 •

edited

Loading

ppcano commented Jan 22, 2021

sniku commented Jan 22, 2021

na-- commented Jan 29, 2021

Proposal for adding http_req_errors metric #1311

Proposal for adding http_req_errors metric #1311

Comments

sniku commented Jan 17, 2020

Feature Description

Suggested Solution (optional)

na-- commented Jan 20, 2020 • edited Loading

Suggested Solution (optional)

sniku commented Jan 27, 2020 • edited Loading

na-- commented Jan 28, 2020

ppcano commented Apr 21, 2020

sniku commented Apr 28, 2020

simskij commented Jan 20, 2021 • edited Loading

na-- commented Jan 20, 2021 • edited Loading

ppcano commented Jan 22, 2021

sniku commented Jan 22, 2021

na-- commented Jan 29, 2021

Proposal for adding `http_req_errors` metric #1311

Proposal for adding `http_req_errors` metric #1311

na-- commented Jan 20, 2020 •

edited

Loading

sniku commented Jan 27, 2020 •

edited

Loading

simskij commented Jan 20, 2021 •

edited

Loading

na-- commented Jan 20, 2021 •

edited

Loading