-
Notifications
You must be signed in to change notification settings - Fork 250
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance benchmark: Express #2417
Comments
I've tried to get a basic idea for how long this benchmark would take (on my pc anyway) and I actually ran into a very obscure bug. It's so obscure that I'm not even sure if it should be fixed. I've detailed it in #2418 |
Thanks a lot for wanting to add the benchmark! And the issue is also a juicy one we should fix. Beers for everyone! 🍻 |
But not yet in file paths! 🍻 |
The issue should be fixed @Lakitna so feel free to proceed 😉 Do you want me to release a new beta version? |
Awesome! I'll make it work without a new release. Though it is a shame that we can't make this benchmark work for Stryker@3 as a comparison. |
Hmm good point. Maybe we can remove the tests that use the fixture? |
Or we can backport this fix, your choice. |
Yeah... I had a bit of a brainfart there... I'm so used to never skipping tests that I didn't even consider it. But for a benchmark that's not an issue. |
I'm running in an issue here. It feels like I'm missing a symlink or something. I've created my branch in
See also full logs: Am I wrong in assuming that the master branch contains correct code? |
I see you've figured it out already (PR). Question: what are your thoughts on running these tests? Just from time to time on our own (dev) pcs? Or on Github actions hardware? Want to use tools for it? Any guidance here would be excellent, I have not a lot of experience in this field. |
I sadly didn't. The PR is WIP because it currently pulls Stryker from NPM. I also don't have a lot of experience with performance testing, especially in an open-source setting. We can't enforce a periodical performance test here. Not the way you can do it when you have a dedicated team anyway. However, this is a topic I'm interested in. It might be good to create a quick perf test at some point that can be used to ensure that a new PR does not introduce a major slowdown. After that, I think we should test at least before every release. But that would either take a long time (run perf tests on the current version (from NPM) and on the new version (release candidate)) or require identical hardware every time. In any case, I'll read up on the subject and come back to you. |
I'll be working on this today @Lakitna. I'll be investigating if it makes sense to use git submodules for our performance test. I was pointed in this direction by @hugo-vrijswijk . It saves disk space, but also makes it easier to keep them up-to-date. This will be my first endeavour with git submodules 🤞. |
Git submodules would not work in this case, because we have to skip that one test that fails in Stryker@3. Other than that, it would be a good improvement.
I'm not sure we should want to update benchmarks. Benchmarks are most useful if they're always the same. This is why I mention the specific commit on which the benchmark is based https://github.com/stryker-mutator/stryker/pull/2431/files?file-filters%5B%5D=.js#diff-c10b5c0440a9a6b469ebef610ebc860aR4 |
Way ahead of you #2433 😎 (either that or initial test run failed locally and took me a while to figure out why and had a personal facepalm moment 🤦♂️) |
My results: Baseline resultsHardwareDELL Intel [email protected]
[email protected]
So a performance improvement for me 😎 |
Wait, Your result
My result
Looks like I had a lot of timeouts where you did not. I'll take a look later today if I can pinpoint this. I think it might be |
It might have to do with that, indeed. Old fashion process starvation. What did you do during testing? For example, browsing the internet is usually enough to screw up the run when running at max concurrency. I brought an additional laptop to work today 😎 |
I've updated my comment with the results. So 00:07:42 with [email protected], a significant performance improvement. This is unfortunate in a way, I was hoping to reproduce your experience @Lakitna, but alas. @Lakitna, could you maybe also try it out with the new instructions in #2402? |
I tried to run but ran into an issue with the submodule. By now you might have noticed that I think you're working on the branch
then:
Now the submodule if pulled over HTTPS, making it easier for everyone to clone Stryker. Also: I got build errors on Edit: I found the global |
Hardware
I also have a pretty powerful GPU, but Stryker does not make use of it. Maybe an interesting one for the far future. :) Though I'm not sure how well it would work. [email protected]
That's some speed improvement! There are still timeouts, though I'm definitely not running on a clean environment. When running with
Edit: I can get rid of the timeouts when I increase the timeout
|
Have to admit, a bit jealous 😉
It's since then merged to Great to see you're having the same performance improvement. I'm curious to know how your specific use case differs from express 🤷♂️. Maybe very large source files? Which get even bigger with mutation switching? I can't really explain the timeouts in your settings. Setting Stryker's own calculation for the timeout per mutant should work: |
I know right! I had to endure a few years of maddening slowness before I got it though 😒 The Compared to my (closed source) project express has a lot larger files.
Are any of these numbers triggering anything for you? |
As expected, the total runtime gradually increases. I am surprised how high I had to crank I've never encountered anything remotely like this in my project. Has anyone? edit: I've re-run beta on my project, and there too I got more timeouts. Though I only got 2 extras. They appeared in this code: const component = _.omit(someObject, [
'a',
'b',
'c',
'd',
'e',
'f',
'g',
'h',
'i',
'j',
'k',
'l',
'm',
'n',
]); There are two timeouts reported here. One on the mutation |
Do you have a hard disk by any chance (instead of SSD)? Maybe the disk is busy with I/O and that is causing the timeouts? The Maybe some other reasons why timeouts appear? Encoding 4k video in the background? Mining bitcoin? There can be many reasons for these timeouts. Maybe there is tooling that can monitor your PC better? Running with higher timeout settings is fine for Stryker, you're just waiting a bit longer than you probably strictly must. We will be implementing #1472 soon after Stryker 4, so you will be able to exclude the specific mutations with a comment. Not ideal, but better than nothing. |
I'm sporting an SSD, and it's barely in use during a run (max 4%). So that would be unlikely. I did observe the following in my project though. It is not as pronounced in the Express bench. You're looking at available memory (RAM) during the run. It looks like something is causing a memory leak. You can also see that something panicked when it couldn't get more memory and quickly cleaned up at about 90% into my run. The vertical line on the end is the end of the run. However, I do think this is unrelated to the timeouts issue. |
I'm not running on a dedicated machine. I definitely have other programs open. But hardware usage is minimal. Below is the almost idle baseline in which I run Stryker. With my kind of hardware and this much of it available I shouldn't have to worry about timeouts. |
I didn't expect a lot of difference between beta.4 and beta.5. Just to be sure some runs: In
Not sure what happened in the first run... But overall it seems to be pretty much the same result. Currently, I'm running 15 runs with increasing concurrency. Might help with the default concurrency setting. |
I've run the Express bench in
I created a Google Sheet to be able to draw charts out of this. https://docs.google.com/spreadsheets/d/11dqDoxqbXVCQiBVtMq_eZpgMljTMPL-voI1MQdA-gDA/edit?usp=sharing There is some interesting stuff in there I think. |
Yeah, we should try to figure out why there are differences. I think the previous runs are influencing new runs in the same process. Mocha attaches global handlers to the process and reporting the current run as failed. For example, the @Lakitna I'm curious what happens if you use the |
Let's find out 😄 I'll run a bunch of times with a few different concurrencies. I'll probably do that tonight after work. The quest for stable runs continues! Edit: I have a command lined up for 16 runs. 2 each for the following concurrencies: |
@Lakitna you have 8 cores right? Looking at it 8 is a threshold for "stable" tests. So i guess instead of logical cores - 1 we should use physical ones |
This triggered me. I wonder if Mochas
I still wouldn't call it fully stable, but it is a lot better. |
"I still wouldn't call it fully stable, but it is a lot better." |
Exactly, if the score is stable I can live with some instabilities at this point. We've been working on this for quite a while now. But there is a 4.5 percentpoint difference between the highest scoring run and the lowest scoring run. That's a significant difference. As a user I would find it annoying that the same line of code can be |
Oh man, runs with the command runner take a very long time. Here are the results: {
"$schema": "../../../packages/core/schema/stryker-schema.json",
"testRunner": "command"
}
I've updated the Google sheet accordingly. See the second tab https://docs.google.com/spreadsheets/d/11dqDoxqbXVCQiBVtMq_eZpgMljTMPL-voI1MQdA-gDA/edit?usp=sharing In this setup, the run becomes useless with >= 9 concurrency. After that, it becomes slightly more useful with every step down in concurrency. Timeouts stabilize with <= 5 concurrency but survived doesn't. At concurrency 1 things became less stable, but the system was used more during the Execution times are also interesting here. There is a huge gap between concurrency 5 and anything below that. I'll queue up some concurrency 4 later to see how long those take. This one can have a major impact on CI. Update: Concurrency 4 actually clocks in smack in the middle of 3 and 5. The trendline seems to be pretty accurate. I think it's weird we're not seeing the expected stability here. According to these numbers, the Mocha runner is actually more stable across concurrencies. It is notable that running multiple times with the same concurrency is very stable. Finally a fun fact: These numbers correspond to a total runtime of 09:30:51. My poor CPU 😄 |
I've removed the milestone. We still want to find the root cause of the discrepancies in timeouts. Pretty sure it's something specific in express tests, but we don't want to block the 4.0 milestone for this. Hope you agree @Lakitna |
https://www.npmjs.com/package/mmap-object these two looks promising for now @nicojs |
Hmm I get where you're coming from. I don't like it though. The simple fact that this is possible feels iffy to me. I at least want to verify that we can make things work properly with another project. How about https://github.com/tj/commander.js? It's CommonJS with Jest. 358 tests running in 7.369 seconds. |
@Lakitna I took a look at our serialization. Could you please run flamegraph again on branch https://github.com/kmdrGroch/stryker/tree/test? I would like to compare these 2. |
I had some busy times last week, so it's a bit later than usual. Here we go:
I'm not familiar enough with your changes to make any judgements here ;) |
and i see you ran code on beta 10, i think beta 9 is more relevant as a base test :) |
Yeah, I feel like it would be worth it to get profiling working on child processes. However, it would involve spawning the child processes in a different way. That stuff is finicky. I can get the child processes to show up in the Chrome inspector, but it won't actually create a profiling report due to the short life of the processes. I've also added beta.9 to the results above ;) |
I've actually managed to get some CPU profiles from child processes! 😃 When running node with Do the runner children exit with Download, extract, and open with Chrome inspector: child-process-289.zip |
I think we had some issue that childprocesses don't exit in Linux (or at least in some runner cases). I'm not sure about exiting tho. From what I saw previously I noticed that unless there is "timeout" in process childprocess doesn't exit! It restarts when there is a timeout (the ones you can see after running mutation tests). |
And I'm not exactly sure how this inspector works. It's just messy for me :/ are you maybe free in several days and could give me an me tour in these reports? |
Stryker itself will exit child processes when it runs to the end (either with error or not). However, I noticed hanging chrome browsers after running the karma integration tests on Linux, causing me to open #2519, but Gareth actually mentioned to me that he haven't noticed any dangling processes so that needs further investigation.
Yes. Stryker uses |
I wonder if it's as simple as using I don't have access to a Linux machine at the moment though. Is this is a thing that can be tested in CI? |
if you have spare time you could always try WSL, its faster to setup than pure linux (unless you work on Mac) |
I've got it working! 🎉 It turns out These are all the CPU profiles I managed to generate with the Express bench. Notice how not every process generated a profile, probably due to crashing processes. Whoops, that's a big file! It's 38mb! Github doesn't like that, so here is a WeTransfer link: https://we.tl/t-V0M8XJ9WEF. Since I didn't have to change |
I'm also not an expert in any way. You might be better off looking for something on Youtube. However, the new profiles are a lot easier to read. There is actually stuff in there that I recognize ;) Edit: It should be possible to make a |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
As discussed here #1514 (comment)
To get a better feel for the performance impact of changes in Stryker we should add some more benchmarks.
For this benchmark, we'll use https://github.com/expressjs/express.
This is a simple one for a straightforward node case:
Baseline results
[email protected]
[email protected]
With
"maxConcurrentTestRunners": 8
:The text was updated successfully, but these errors were encountered: