Efficient output of metrics to a binary file #321

ragnarlonn · 2017-09-20T12:05:05Z

The JSON output can generate a lot of data. Would be nice to have a more compact CSV output option also. Or perhaps the option of limiting what type of data we write (maybe we only want to log http_req_duration and response code for each request)
Some tools (e.g. Artillery) avoid writing data to disk during the test. This can sometimes be useful (e.g. when the internal representation is much smaller than the output JSON format, we're doing a fair number of transactions per second and we have a decent amount of memory available). Maybe we should have an option that tells k6 to only write JSON/CSV output to disk after the test has completed?

liclac · 2017-10-03T17:58:04Z

JSON output was never meant to be used "for real", more for testing; it's horribly inefficient in its current form. If we want to log to disk in realtime, then JSON is a rubbish format for it; with the amount of data we generate, we'll want something binary (maybe even protobuf?).

ppcano · 2017-10-24T09:56:41Z

@ragnarlonn good points 👍

I find two possible initial common cases:

users interested in all the reported metrics.
users interested in the final outcome with the aggregated metric values

k6 could provide either a new output reporter or extend the current JSON one. Some ideas:

k6 run -o json=debug.json
k6 run -o json_agg=debug.json

k6 run -o json=debug.json -json aggregation
k6 run -o json=debug.json -json realtime

micsjo · 2017-10-24T13:22:45Z

the json variants could be wrapping the original and parsing with jq (or on windows powershell has json parsing built in) to get a different output. Not really convenient but a stop-gap in the mean time

liclac · 2017-10-24T14:10:21Z

The first question is: do we want fast JSON output (keeping in mind that the JSON output format was designed for debugging or small tests in the first place), or do we want an efficient way to output large amounts of data?

If we want fast JSON output for the kinds of volumes a mid-size k6 test spits out, we're gonna have to compromise readability to do it, at which point the merits of using JSON in the first place seem dubious at best. It seems like a better idea to have a -o bin=samples.k6 to write in a binary format, then offer tools to convert that to JSON or whatever after the fact.

(We should also use this binary format for Insights, ingest seems to be a bottleneck there too.)

ppcano · 2017-10-24T20:41:32Z

The binary format is a different issue than the one I referred; I have created a new issue #351 to track it individually.

danron · 2018-03-26T09:36:32Z

I would like to suggest, as Ragnar points out, a way to allow only certain metrics to be logged with a command line parameter. That together with CSV output will probably be fast enough for most situations. Almost every relevant tool/database can work with or import CSV so it is a very good candidate for a format.

I do not think JSON is needed for anything other than debugging test scripts or K6 itself.

So basically I suggest these flags:

--out csv=myfile.csv --metric-whitelist http_req_duration

And maybe also a blacklist flag:

--metric-blacklist http_req_connecting

We could take a look at this and do a PR. What do you think?

danron · 2018-03-26T13:41:18Z

Another approach could be to have only the built in metrics in a CSV file and one row per http operation. Something like this:

timestamp, iter, vu, proto, status, method, url, usertags, http_req_duration, http_req_connect, ...

With the ability to white/blacklist metrics.

This approach doesn't work well for custom metrics since all the http_ columns would be empty for custom metrics and the other way around.

na-- · 2019-07-11T14:48:39Z

We probably need an efficient binary format for on-disk storage of metrics, and someone is currently working on a CSV output (#1067), but there's definitely room for improvement when it comes to the current JSON output. Probably the easiest optimization would be to use a faster JSON encoder (there are a bunch of those, like this or this), but there are definitely other things that could be improved.

I noticed something while I was reviewing the CSV output pull request recently, which seemed to be copied from the JSON one - it wrote to the disk directly in the Collect() method. I was right, and the JSON output seems to do the same: https://github.com/loadimpact/k6/blob/c85438949d00c2ca8a89d04b2837bb087a3d3201/stats/json/collector.go#L112-L134

This is a problem, since the Engine currently calls Collect() synchronously every 50ms, and that happens as part of the same for-select loop that also collects samples from the local executor: https://github.com/loadimpact/k6/blob/c85438949d00c2ca8a89d04b2837bb087a3d3201/core/engine.go#L224-L248

This needs to be benchmarked, but my gut feeling here is that this is very sub-optimal and may be causing some performance issues.

oleiade · 2023-10-02T11:37:38Z

The team decided to close this issue.

If there were a need to address parts, or the whole, of this problem in the future, we would prefer for the concrete, smaller, issues involved in its resolution to be detailed in more specific GitHub issues.

ppcano mentioned this issue Oct 24, 2017

How to access built-in metrics after the test completion #351

Closed

2 tasks

ppcano mentioned this issue Dec 9, 2017

Improve the way that metrics are pushed to the Load Impact cloud service #405

Closed

dstpierre added a commit to dstpierre/k6 that referenced this issue Dec 19, 2017

added a binary collector close grafana#321

e590cd6

dstpierre mentioned this issue Dec 19, 2017

Proposal: binary format #427

Closed

na-- added feature performance labels Oct 3, 2018

Sirozha1337 mentioned this issue Jun 29, 2019

Add output option for csv format #1067

Merged

na-- mentioned this issue Jul 2, 2019

Investigate telegraf integration in k6 #1064

Closed

na-- changed the title ~~Improve performance when logging individual transactions~~ Efficient output of metrics to a binary file Jan 21, 2021

oleiade closed this as completed Oct 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficient output of metrics to a binary file #321

Efficient output of metrics to a binary file #321

ragnarlonn commented Sep 20, 2017

liclac commented Oct 3, 2017

ppcano commented Oct 24, 2017

micsjo commented Oct 24, 2017

liclac commented Oct 24, 2017

ppcano commented Oct 24, 2017

danron commented Mar 26, 2018 •

edited

Loading

danron commented Mar 26, 2018 •

edited

Loading

na-- commented Jul 11, 2019

oleiade commented Oct 2, 2023

Efficient output of metrics to a binary file #321

Efficient output of metrics to a binary file #321

Comments

ragnarlonn commented Sep 20, 2017

liclac commented Oct 3, 2017

ppcano commented Oct 24, 2017

micsjo commented Oct 24, 2017

liclac commented Oct 24, 2017

ppcano commented Oct 24, 2017

danron commented Mar 26, 2018 • edited Loading

danron commented Mar 26, 2018 • edited Loading

na-- commented Jul 11, 2019

oleiade commented Oct 2, 2023

danron commented Mar 26, 2018 •

edited

Loading

danron commented Mar 26, 2018 •

edited

Loading