Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track and improve k6 memory usage and performance #1167

Open
na-- opened this issue Sep 20, 2019 · 19 comments
Open

Track and improve k6 memory usage and performance #1167

na-- opened this issue Sep 20, 2019 · 19 comments

Comments

@na--
Copy link
Member

na-- commented Sep 20, 2019

We currently don't have a good idea how changes in k6 affects its performance. In particular, we knew that core-js used a lot of memory (#1036 (comment)), but we never systematically tracked that, so we lack any baseline to compare against if we ever want to update our currently bundled version. I did some very unscientific measurements with /usr/bin/time on my laptop, and I've shared the formatted results below. I used a lot of current and previous k6 versions to run a script with 200 VUs executing a million iterations of a default function that just makes a simple http.get() operation against a localhost Go server. Here are the results:

k6 version Used Memory RPS [1] Times Inputs and pagefaults
0.16.0 271700k 21739 0:47.80elapsed - 91.32user 21.64system (236%CPU) 0major+70139minor, 0inputs+0outputs
0.17.0 274496k 18185 0:57.17elapsed - 90.21user 23.45system (198%CPU) 0major+72619minor, 0inputs+0outputs
0.17.1 272884k 17857 0:58.07elapsed - 88.25user 24.53system (194%CPU) 0major+72161minor, 0inputs+0outputs
0.17.2 914512k [2] 17241 1:04.02elapsed - 104.05user 26.49system (203%CPU) 0major+314970minor, 0inputs+0outputs
0.18.0 905072k 15765 1:08.58elapsed - 109.52user 31.23system (205%CPU) 0major+305646minor, 0inputs+0outputs
0.18.2 904848k 15745 1:08.75elapsed - 110.31user 31.64system (206%CPU) 0major+307379minor, 0inputs+0outputs
0.19.0 987608k 14812 1:12.95elapsed - 118.03user 35.35system (210%CPU) 0major+339240minor, 0inputs+0outputs
0.20.0 1011588k 14419 1:15.08elapsed - 122.31user 33.90system (208%CPU) 103major+339912minor, 40232inputs+0outputs
0.21.0 1006776k 14170 1:16.20elapsed - 123.62user 33.52system (206%CPU) 121major+335868minor, 42240inputs+0outputs
0.21.1 988204k 14026 1:17.11elapsed - 124.65user 33.28system (204%CPU) 115major+334937minor, 41656inputs+0outputs
0.22.0 1915272k [3] 14910 1:17.75elapsed - 128.98user 33.49system (208%CPU) 0major+616641minor, 0inputs+0outputs
0.22.1 1928812k 14949 1:18.53elapsed - 129.59user 33.54system (207%CPU) 114major+622426minor, 29792inputs+0outputs
0.23.0 1940520k 14427 1:20.25elapsed - 131.79user 33.97system (206%CPU) 115major+627730minor, 29968inputs+0outputs
0.23.1 1950936k 15220 1:17.29elapsed - 128.74user 33.22system (209%CPU) 131major+615768minor, 34048inputs+0outputs
0.24.0 1904352k 14202 1:21.48elapsed - 133.21user 33.30system (204%CPU) 139major+570873minor, 35696inputs+0outputs
0.25.0 1096308k [4] 14050 1:16.88elapsed - 131.29user 32.66system (213%CPU) 145major+341927minor, 37144inputs+0outputs
0.25.1 1129756k 14882 1:12.81elapsed - 127.30user 31.59system (218%CPU) 10major+354609minor, 1832inputs+0outputs
latest master 1115896k 14760 1:13.01elapsed - 122.58user 31.42system (210%CPU) 0major+343543minor, 0inputs+0outputs
new executors 1168584k 14809 1:12.07elapsed - 132.81user 34.58system (232%CPU) 0major+349069minor, 0inputs+0outputs
new executors, no core-js [5] 346344k 16683 1:01.75elapsed - 101.36user 29.45system (211%CPU) 0major+63926minor, 0inputs+0outputs
new executors, no core-js, no babel [5] 337876k 17737 0:58.24elapsed - 101.83user 29.48system (225%CPU) 0major+62637minor, 0inputs+0outputs
new executors, core-js updated from 2.5.1 to 3.2.1 1295056k 15331 1:10.01elapsed - 129.55user 35.52system (235%CPU) 0major+385981minor, 0inputs+0outputs

[1] Not actually very representative, because these tests were run against a localhost server on my laptop, but still useful as a measure to track the overhead that the rest of k6 introduces. As a comparison, I created this simple script that just hammers a URL, and I got out ~42000 requests per second on my machine out of it.

I'm currently willing to chalk up most of the difference between this simple script and k6 to the overhead of JS execution, since when I also replaced the JS VUs with a lib.MiniRunnerVU so that each iteration didn't have any JS execution and just made a single request to the localhost server, I also got out ~40000 RPS out of k6.

And to repeat, none of that is representative or matters much in the real world, since when I tested a server on the local network, both k6 and the simple Go script achieved the same result of ~7000 RPS, likely because then the test was IO and not CPU bound. Probably the only use for tracking that metric is to monitor if it suddenly drops, since that would likely mean we introduced something that worsened the performance significantly.

[2] This memory increase was caused by the addition of core-js in every VU.

[3] Again, this was caused by a core-js and babel update

[4] The huge drop was caused by #1038 🎉 but again, core-js was responsible

[5] just commented out these lines, but useful to know, because we plan to allow the possibility of users choosing not to use core-js: #1049 (comment)

So yeah, in summary, we should have some sort of a CI task that monitors the memory consumption and performance of k6, so that we can spot sudden changes in either. Whether that's on every PR or once daily or something else needs more discussion and investigation. And we probably should avoid CircleCI and do this either in Azure Pipelines or GitHub Actions, whichever turns out to be better (#959). For the performance part, if we track it through the iterations per second metric, we'd need #355, which is currently in progress.

@na-- na-- added this to the v0.26.0 milestone Sep 20, 2019
@ragnarlonn
Copy link

Do you have any idea why the RPS numbers are dropping from 21k to 15-16k over time?

Btw, I just tested (with k6 0.25.1) if starting new repetitions involved any kind of large overhead. I compared a script doing a single http request per iteration with one doing a hundred:

export function default() {
  for (var i=0; i<100; i++) {
    http.get(URL);
  }
};

At least for k6 0.25.1, the difference doesn't seem to be big. Using the above script that loops through a hundred requests seems to result in 5-10% higher RPS numbers.

@na--
Copy link
Member Author

na-- commented Sep 20, 2019

No, I currently don't have a good idea why the iterations per second (and thus, the RPS) count dropped from 21k to 15-16k. The biggest drop was between 0.16.0 and 0.17.0, and I tried to spot something suspicious in the diff, but there are just too many changes to tell that way, so we'd have to do it commit by commit... 😞

I initially thought it might be some scheduling change, since it turns out that k6 v0.16.0 was the last k6 version that still used iterations per VU. In v0.17.0 the iterations were switched to be shared across all VUs. But I tried the new per-vu-iterations executor from #1007, which I know doesn't have any unnecessary synchronization overhead, and I still got only 16.5-17.9k iterations per second (when I disabled core-js), so at best it's a slight improvement.

Regarding multiple requests per iteration, I also haven't tested that, I planned to explore it today. But I think http.batch() should have a better performance than a for loop with http.get().

@na--
Copy link
Member Author

na-- commented Sep 20, 2019

Just to confirm @ragnarlonn's results, tweaking the script in various ways (for loops inside of the default function, http.batch(), reducing the number of VUs, bigger metricSamplesBufferSize, enabling discardResponseBodies) produces 5-15% improvements at best.

@na--
Copy link
Member Author

na-- commented Sep 20, 2019

@mstoykov, I ran the same benchmark with an updated core-js library. I added it at the bottom of the table, as you can see, there's a ~600KB increase in memory usage per VU 😞

@na--
Copy link
Member Author

na-- commented Sep 24, 2019

For posterity/comparison, here's an issue about a previous performance investigation: #540

@na-- na-- modified the milestones: v0.26.0, v0.27.0 Oct 10, 2019
@ppcano ppcano pinned this issue Oct 11, 2019
@ppcano ppcano unpinned this issue Oct 14, 2019
@hassanbakerrts
Copy link

Just to add more statistics: I did a test with 1K VUs for 30 minutes and it needed 30 Gb to run. Otherwise, I have " dial tcp XX.XXX.XXX.XX:80: socket: too many open files "

@na--
Copy link
Member Author

na-- commented Oct 25, 2019

@hassanbakerrts, can you give some more details about the script you ran? What was the size of the JS code you were executing (either in KB or lines of code)? Did you import some browserified library? Or open a big data file? I'm asking because 1k VUs taking 30GB of RAM means each VUs was ~30MB - very different from what we've observed with simple scripts. 😕 Also, which k6 version and OS did you use, and did you run the k6 docker container or a native k6 executable?

Regarding too many open files, you can increase that limit at the OS level: https://stackoverflow.com/questions/34588/how-do-i-change-the-number-of-open-files-limit-in-linux

@ragnarlonn
Copy link

@na-- @hassanbakerrts Could it be results storage that take up the extra space? At least my tests have been 30-second ones, but if you run it for 30 minutes I guess it will collect a lot more results data.

@na--
Copy link
Member Author

na-- commented Oct 27, 2019

@ragnarlonn, it could be, though usually that becomes a problem for longer running tests. Still, it mostly depends on the number of HTTP requests, so with 1k VUs and little sleep, it might be the cause... Here are the related issues: #763 and #1068

@hassanbakerrts
Copy link

@na-- @ragnarlonn I imported lodash during setup but not when VUs were running. My VUs code was to ask for four GET requests. There is a sleep(random(1,4)) where random returns seconds for each VU. We had to run it on ECU (AWS) cluster, I think the OS there is linux but not sure which one (those things were handled by DevOps team). I used docker image to run k6.

@na--
Copy link
Member Author

na-- commented Oct 29, 2019

Ah, given that each k6 VU is its own separate runtime, just importing lodash would add significant memory overhead to all VUs, even if you only use in the setup() function. Since you imported in the global scope (also called init context in k6), each VU would load and evaluate that JS code, even if it doesn't subsequently use it. And unfortunately, you currently can't import lodash only in the setup function 😞

@ragnarlonn
Copy link

@na-- That sounds like a perhaps interesting feature to add: being able to import & use things in the setup() context only, which means it will not affect memory usage during the test. Would it be hard to implement? (although perhaps it would be necessary to stagger the execution of the setup function for all the VUs, as otherwise peak memory usage would not change. It would just not be constantly high anymore like it is if lodash is present all the time in each VU context)

@na--
Copy link
Member Author

na-- commented Oct 30, 2019

@ragnarlonn, I though about this some yesterday, but there a big problem with having different imports/dependencies for setup() or anything else - cloud/distributed execution.

As you probably recall, one of the first things k6 cloud script.js does when you run it on your machine, is to execute the init context of your script once. This is done in order to get the exported script options, but also to build a .tar archive of your script and all of its dependencies (similar to k6 archive script.js). Crucially, it doesn't and can't execute setup() at that point - it's executed on a single machine after we start the test in our cloud. So, yes, it would be very hard to implement different imports for setup(), since in order to do it, we'd have to figure out some completely different way of defining setup than the current user-friendly export function setup() { /* ... */ }, and that way would have to allow explicit definition of dependencies without actually executing them.

Unless we can think of a clever and easy solution, I think we shouldn't spend a lot of time on this compared to the other memory-saving improvements we want to add to k6 - HDR histograms, reduction of core.js, goja improvements, shared read-only data files, streaming reading of big files, etc. The reason for that is that this seems like it won't deliver very big benefits, but would require substantial changes.

After all, I'd argue that most people who import hefty JS libraries don't use them only in their setup. And also, I forgot to mention yesterday, but I tried to run a script that imports lodash (this version) and running it with k6 run --paused --vus 1000 script.js takes only around 4GB of RAM. For comparison, running the exact same script without importing lodash takes only ~2.4GB... So the majority of the memory usage in the @hassanbakerrts's case wasn't caused by that library...

@ragnarlonn
Copy link

@na-- Oh yeah, I didn't think about the distributed execution...And I can't think of a simple and user-friendly way to do this either. The only thing that comes to mind is adding some special import statement to init() that allows you to specify which imports you want available in other contexts (i.e. export function init() { import setup::lodash.js }), which may not be very user-friendly/intuitive. Sounds like a wise decision to focus on the other improvements you mention.

@na--
Copy link
Member Author

na-- commented Oct 30, 2019

The problem with having custom import/exports, or custom syntax of any kind, is that would be running afoul of ECMAScript parsers (specifically in our case, Babel). We could do it with an exported method that returns a simple JS string array with extra files that need to be included in any archive bundle, or have a property in the exported options, but that again doesn't solve all of the issues of how to then import one of these files only in the VU that's executing setup(). To be clear, none of the issues are unsolvable if we spend enough time and accept some sub-optimal UX (or rather, developer experience) at times, I just don't think this should be high on the optimization priority list right now.

@na--
Copy link
Member Author

na-- commented Nov 14, 2019

Now that #1206 was merged in master, and somewhat prompted by https://community.k6.io/t/k6-cpu-use-during-a-test/303, I decided to check how performance differed for a simple script with the two compatibility modes. Here's the ultra-simple script I used:

var http = require("k6/http");
var sleep = require("k6").sleep;

exports.default = function () {
    http.get("https://httpbin.test.loadimpact.com/get");
    sleep(Math.random() * 2 + 3);
}

Running it with /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=extended produced:

100.54user 10.45system 1:53.56elapsed 97%CPU (0avgtext+0avgdata 9113532maxresident)k
0inputs+0outputs (0major+2869751minor)pagefaults 0swaps

and /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base produced:

21.59user 6.59system 1:01.27elapsed 46%CPU (0avgtext+0avgdata 1147084maxresident)k
0inputs+0outputs (0major+335435minor)pagefaults 0swaps

So, roughly an 8-fold memory consumption improvement (from ~9.11GB to ~1.14GB), much less CPU usage and a lot faster initialization.

The initialization time was actually the most noticeable difference, if you have RAM to spare. Running the same commands again, but with --paused, and immediately aborting k6 as soon as the status switched from init to paused (i.e. as soon as all VUs were initialized) produced very stark results. For /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=extended --paused:

72.90user 5.37system 0:54.90elapsed 142%CPU (0avgtext+0avgdata 7855780maxresident)k
0inputs+0outputs (0major+2502677minor)pagefaults 0swaps

And for /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base --paused:

1.80user 0.55system 0:02.28elapsed 103%CPU (0avgtext+0avgdata 483576maxresident)k
0inputs+0outputs (0major+147906minor)pagefaults 0swaps

Almost immediate initialization of 3500 distinct JS runtimes is quite impressive 😄 But yeah, for the context of this particular issue, we should have CI tasks that monitor both the extended (i.e. Babel + CoreJS) and base performance, since they're completely different beasts.

@mstoykov
Copy link
Contributor

mstoykov commented Nov 14, 2019

I decided to throw my test webpack packager for k6 for a test.
I did test both the simple script above (custom_main.js), the same script, but using es6 import/export syntax(main.js) and the webpack transform of it(dist/app.bundle.js) and here are the results:

$ /usr/bin/time k6 run --vus 3500 --duration 60s custom_main.js
http_reqs..................: 47070  784.498669/s
147.14user 21.68system 2:09.00elapsed 130%CPU (0avgtext+0avgdata 9650724maxresident)k
0inputs+0outputs (0major+3015214minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base custom_main.js
http_reqs..................: 47431  790.515908/s
50.68user 23.59system 1:01.78elapsed 120%CPU (0avgtext+0avgdata 1333832maxresident)k
168inputs+0outputs (4major+390294minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s main.js
http_reqs..................: 48057  800.949499/s
151.07user 20.54system 2:12.46elapsed 129%CPU (0avgtext+0avgdata 9279780maxresident)k
1416inputs+0outputs (2major+2909761minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s build/app.bundle.js
http_reqs..................: 45230  753.832637/s
142.08user 20.62system 2:11.20elapsed 124%CPU (0avgtext+0avgdata 9676844maxresident)k
63600inputs+0outputs (517major+3031793minor)pagefaults 0swaps

$ /usr/bin/time k6 run --vus 3500 --duration 60s --compatibility-mode=base build/app.bundle.js
http_reqs..................: 45808  763.461697/s
51.19user 17.97system 1:01.87elapsed 111%CPU (0avgtext+0avgdata 1432368maxresident)k
0inputs+0outputs (0major+417146minor)pagefaults 0swaps

Table:

test script time elapsed memory rps
custom_main.js + corejs & babel 2:09.00 9650724 784
custom_main.js + base 1:01.78 1333832 790
main.js + corejs & babel 2:12.46 9279780 800
main.js + webpack + corejs & babel 2:11.20 9676844 753
main.js + webpack base 1:01.87 1432368 763

All the number fluctuate but they generally stay in the same ballpark and the RPS is apparently not affected in this particular case.

For me this confirms that the webpack setup I propose is now viable way to make k6 scripts much more performant when using --compatibility-mode=base

@mstoykov
Copy link
Contributor

mstoykov commented Aug 12, 2020

I tested the latest goja with some es6 support with and without corejs.
There is ... an increase in the binary which is definitely because of goja which also increased by more than 10k lines

29836   k6-noes6
31640   k6-es6
31552   k6-es6-nocorejs

The tested script was pretty simple:

import http from "k6/http";
import { sleep } from "k6";

export default function() {
    http.get("https://httpbin.test.loadimpact.com/get");
    sleep(Math.random() * 2 + 3);
}

With --vus 3500 --duration 60s
results:

k6 run time kb memory per VU CPU used rps
k6-noes6 1:36.21 2871 270% 759
k6-es6 1:26.18 2673 232% 743
k6-es6-nocorejs 1:07.37 531 95% 734

The numbers move somewhat but are around those magnitudes. As the script always takes 1m05.2s (the sleep is apparently pretty unrandom ;) ), it looks like the no corejs version takes only 2 seconds to initialize while the others take 20 or 30 seconds respectively, which for 3500 VUs isn't bad, but isn't as instant.
The memory difference of 5x is probably the most significant benefit.
The CPU difference is probably (I haven't had time to extensively test this) because of the faster initialization

@na-- na-- modified the milestones: v1.0.0, TBD Nov 9, 2022
@imiric
Copy link
Contributor

imiric commented Feb 15, 2023

Hey all, an update on this old issue: we recently updated our "Running large tests" guide, and added benchmark results for v0.42.0 in our k6-benchmarks repository.

We plan to automate benchmarking soon:tm:, so that we can continually track k6 performance for all releases.

@codebien codebien removed this from the TBD milestone Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants