Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Efficient output of metrics to a binary file #321

Closed
ragnarlonn opened this issue Sep 20, 2017 · 9 comments
Closed

Efficient output of metrics to a binary file #321

ragnarlonn opened this issue Sep 20, 2017 · 9 comments

Comments

@ragnarlonn
Copy link

  • The JSON output can generate a lot of data. Would be nice to have a more compact CSV output option also. Or perhaps the option of limiting what type of data we write (maybe we only want to log http_req_duration and response code for each request)

  • Some tools (e.g. Artillery) avoid writing data to disk during the test. This can sometimes be useful (e.g. when the internal representation is much smaller than the output JSON format, we're doing a fair number of transactions per second and we have a decent amount of memory available). Maybe we should have an option that tells k6 to only write JSON/CSV output to disk after the test has completed?

@liclac
Copy link
Contributor

liclac commented Oct 3, 2017

JSON output was never meant to be used "for real", more for testing; it's horribly inefficient in its current form. If we want to log to disk in realtime, then JSON is a rubbish format for it; with the amount of data we generate, we'll want something binary (maybe even protobuf?).

@ppcano
Copy link
Contributor

ppcano commented Oct 24, 2017

@ragnarlonn good points 👍

I find two possible initial common cases:

  • users interested in all the reported metrics.
  • users interested in the final outcome with the aggregated metric values

k6 could provide either a new output reporter or extend the current JSON one. Some ideas:

k6 run -o json=debug.json
k6 run -o json_agg=debug.json

k6 run -o json=debug.json -json aggregation
k6 run -o json=debug.json -json realtime

@micsjo
Copy link
Contributor

micsjo commented Oct 24, 2017

the json variants could be wrapping the original and parsing with jq (or on windows powershell has json parsing built in) to get a different output. Not really convenient but a stop-gap in the mean time

@liclac
Copy link
Contributor

liclac commented Oct 24, 2017

The first question is: do we want fast JSON output (keeping in mind that the JSON output format was designed for debugging or small tests in the first place), or do we want an efficient way to output large amounts of data?

If we want fast JSON output for the kinds of volumes a mid-size k6 test spits out, we're gonna have to compromise readability to do it, at which point the merits of using JSON in the first place seem dubious at best. It seems like a better idea to have a -o bin=samples.k6 to write in a binary format, then offer tools to convert that to JSON or whatever after the fact.

(We should also use this binary format for Insights, ingest seems to be a bottleneck there too.)

@ppcano
Copy link
Contributor

ppcano commented Oct 24, 2017

The binary format is a different issue than the one I referred; I have created a new issue #351 to track it individually.

@danron
Copy link

danron commented Mar 26, 2018

I would like to suggest, as Ragnar points out, a way to allow only certain metrics to be logged with a command line parameter. That together with CSV output will probably be fast enough for most situations. Almost every relevant tool/database can work with or import CSV so it is a very good candidate for a format.

I do not think JSON is needed for anything other than debugging test scripts or K6 itself.

So basically I suggest these flags:

--out csv=myfile.csv --metric-whitelist http_req_duration

And maybe also a blacklist flag:

--metric-blacklist http_req_connecting

We could take a look at this and do a PR. What do you think?

@danron
Copy link

danron commented Mar 26, 2018

Another approach could be to have only the built in metrics in a CSV file and one row per http operation. Something like this:

timestamp, iter, vu, proto, status, method, url, usertags, http_req_duration, http_req_connect, ...

With the ability to white/blacklist metrics.

This approach doesn't work well for custom metrics since all the http_ columns would be empty for custom metrics and the other way around.

@na--
Copy link
Member

na-- commented Jul 11, 2019

We probably need an efficient binary format for on-disk storage of metrics, and someone is currently working on a CSV output (#1067), but there's definitely room for improvement when it comes to the current JSON output. Probably the easiest optimization would be to use a faster JSON encoder (there are a bunch of those, like this or this), but there are definitely other things that could be improved.

I noticed something while I was reviewing the CSV output pull request recently, which seemed to be copied from the JSON one - it wrote to the disk directly in the Collect() method. I was right, and the JSON output seems to do the same: https://github.com/loadimpact/k6/blob/c85438949d00c2ca8a89d04b2837bb087a3d3201/stats/json/collector.go#L112-L134

This is a problem, since the Engine currently calls Collect() synchronously every 50ms, and that happens as part of the same for-select loop that also collects samples from the local executor: https://github.com/loadimpact/k6/blob/c85438949d00c2ca8a89d04b2837bb087a3d3201/core/engine.go#L224-L248

This needs to be benchmarked, but my gut feeling here is that this is very sub-optimal and may be causing some performance issues.

@na-- na-- changed the title Improve performance when logging individual transactions Efficient output of metrics to a binary file Jan 21, 2021
@oleiade
Copy link
Member

oleiade commented Oct 2, 2023

The team decided to close this issue.

If there were a need to address parts, or the whole, of this problem in the future, we would prefer for the concrete, smaller, issues involved in its resolution to be detailed in more specific GitHub issues.

@oleiade oleiade closed this as completed Oct 2, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants