-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient output of metrics to a binary file #321
Comments
JSON output was never meant to be used "for real", more for testing; it's horribly inefficient in its current form. If we want to log to disk in realtime, then JSON is a rubbish format for it; with the amount of data we generate, we'll want something binary (maybe even protobuf?). |
@ragnarlonn good points 👍 I find two possible initial common cases:
|
the json variants could be wrapping the original and parsing with jq (or on windows powershell has json parsing built in) to get a different output. Not really convenient but a stop-gap in the mean time |
The first question is: do we want fast JSON output (keeping in mind that the JSON output format was designed for debugging or small tests in the first place), or do we want an efficient way to output large amounts of data? If we want fast JSON output for the kinds of volumes a mid-size k6 test spits out, we're gonna have to compromise readability to do it, at which point the merits of using JSON in the first place seem dubious at best. It seems like a better idea to have a (We should also use this binary format for Insights, ingest seems to be a bottleneck there too.) |
The |
I would like to suggest, as Ragnar points out, a way to allow only certain metrics to be logged with a command line parameter. That together with CSV output will probably be fast enough for most situations. Almost every relevant tool/database can work with or import CSV so it is a very good candidate for a format. I do not think JSON is needed for anything other than debugging test scripts or K6 itself. So basically I suggest these flags: --out csv=myfile.csv --metric-whitelist http_req_duration And maybe also a blacklist flag: --metric-blacklist http_req_connecting We could take a look at this and do a PR. What do you think? |
Another approach could be to have only the built in metrics in a CSV file and one row per http operation. Something like this: timestamp, iter, vu, proto, status, method, url, usertags, http_req_duration, http_req_connect, ... With the ability to white/blacklist metrics. This approach doesn't work well for custom metrics since all the http_ columns would be empty for custom metrics and the other way around. |
We probably need an efficient binary format for on-disk storage of metrics, and someone is currently working on a CSV output (#1067), but there's definitely room for improvement when it comes to the current JSON output. Probably the easiest optimization would be to use a faster JSON encoder (there are a bunch of those, like this or this), but there are definitely other things that could be improved. I noticed something while I was reviewing the CSV output pull request recently, which seemed to be copied from the JSON one - it wrote to the disk directly in the This is a problem, since the This needs to be benchmarked, but my gut feeling here is that this is very sub-optimal and may be causing some performance issues. |
The team decided to close this issue. If there were a need to address parts, or the whole, of this problem in the future, we would prefer for the concrete, smaller, issues involved in its resolution to be detailed in more specific GitHub issues. |
The JSON output can generate a lot of data. Would be nice to have a more compact CSV output option also. Or perhaps the option of limiting what type of data we write (maybe we only want to log http_req_duration and response code for each request)
Some tools (e.g. Artillery) avoid writing data to disk during the test. This can sometimes be useful (e.g. when the internal representation is much smaller than the output JSON format, we're doing a fair number of transactions per second and we have a decent amount of memory available). Maybe we should have an option that tells k6 to only write JSON/CSV output to disk after the test has completed?
The text was updated successfully, but these errors were encountered: