-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ghost-based benchmark have worse performance on node v20.3.0/v20.3.1 #48654
Comments
What do those benchmarks test? Can you post profiles or flamegraphs of v20.2.0 and v20.3.0? Even better would be to test the commits between libuv v1.44.2 and v1.45.0 (there are about 100) to pinpoint the exact commit. If your benchmarks are amenable to something that works with
Assumes libuv is in a directory adjacent to node. |
BenchmarkAbout the benchmark, I create a ghost.js server based on node in a docker container, and use ab to send requests to test the throughput of the server. FlamegraphBy comparing these two versions, it can be clearly found that the flame graph of node Rollback TestI switch to With testing of 1 instance, I found that the throughput after this commit was about 20% lower than before. |
Can you test outside of docker? There's a known docker issue that will be Replace ab with (for example) wrk if at all possible. ab is notoriously unreliable.
How do you know for sure? Did you bisect or are you guessing? |
Thanks, I'm following your suggestion to retest my benchmark. The results are expected soon and I will update here. By the way, why is ab unreliable? |
For one, it's too easy to misuse: it gets confused when the server sends back responses with different sizes, unless you pass It has plenty of other bugs and quirks, even though it's over 2 decades old. |
Thank you for the explanation about ab. Inside the docker containerI just run 1 instance. The test results of Outside the docker containerWhether it is multi-instance or single-instance, the test results of wrk show that the performance of the two versions is similar. ResultsMaybe you are right, there may indeed be a bug related to docker. And it seems that ab is really not reliable. I saw libuv has been upgraded to 1.46.0 and fixed a docker related bug. In addition, I saw that the latest version of node (v20.4.0) also upgraded libuv to 1.46.0. I'll be retesting our benchmarks on node-v20.4.0 to see if that fixes the slowdown. AdditionalI found through How I TestedAbout the question
I rolled back the node-v20.3.0 commit in the container. When rolling back to this commit of upgrade to libuv 1.45.0, I found that the performance changed dramatically before and after applying this commit. So I'm sure that the upgrade of libuv had a negative impact on our benchmark. |
Thanks for testing, that's very helpful. I'm also very interested in the results for v20.4.0.
That seems likely but can you check with If that does indeed show it's the sqpoll thread, is it an option for you to test with a v6.x kernel? |
Performance of Node v20.4.0On our benchmark, the performance of node-v20.4.0 is similar to that of node-20.3.0, and both are much worse than node-v20.2.0. libuv upgrade to 1.46.0 doesn't seem to work. Maybe there is another bug? Perf Data of v20.2.0 and v20.3.0By the data of perf, we can find that in the 20.3.0 version, Node V20.2.0Node V20.3.0 |
Can you test with a newer kernel? I think you're looking at an io_uring performance bug in the 5.x kernels that's been fixed in newer kernels. Another thing you may want to check is if the profile looks different with |
I have updated the kernel to v6.4.3 and is retesting all data. It will take a day or two. I will update my data as soon as possible. |
After updating kernel to v6.4.3, here are the newest results. Worse Performance on benchmarkInside the docker container - Results of wrkHere I only tested 1 instance. And here is the result I manually tested by running server and wrk in docker. The baseline is node-v20.2.0 (100%) and the performance is
Outside the docker containerAll results were tested and collected by 1 instanceThe performance of v20.2.0, v20.3.0 and v20.4.0 is about the same. Everything looks normal full instance (The number is 128 on our ICX server)I keep the CPU close to full load, and start an additional instance at this time.
Abnormal CPU loadResults of BenchmarkI run 1 instance outside the docker container. The CPU utilization of node v20.4.0 and node v20.3.0 is similar. They are still about 80% higher than node v20.2.0 on kernel v6.4.3. But the performance of the three versions is very close. Results of perfI will update the results of perf later. |
Thanks for the update. For my understanding: what does "full instance" (vis-a-vis "1 instance") mean? |
Single/1 instance means |
Clear, thanks. Two additional questions if you don't mind:
|
|
I ran into this, upgrading from 18 to 20.10.0 I have a micro-service situation, about 20 node-js processes that communicate with each other using HTTP, all on the same machine. They also do some reading and writing to disk. Usually very low CPU utilization also in production. With Node 20.0 I started getting HTTP connection errors (timeouts, I suppose) between node processes. I quickly went back to 18. I have an automated test suite that I run. On one of my development machines (Intel i5 4250U, Debian 12), the time command gives me roughly 20s user CPU and 3s system CPU time to complete the tests, using Node 18. On Node 20 however, I still get 20s system CPU time, but 40s system time! Yes, more than 10x system CPU time. Node -18 - all versions good. macOS/Windows 11 - 20.10.0 slighly faster than 18 So this seems to be Linux specific. Please let me know if I can provide more useful information. |
@zo0ok I'm not sure if your situation is similar to mine. As far as I know, the abnormal CPU utilization is related to IO_URING when node version >= v20.3.0. Referring to this comment #49937 (comment), you can try to set the environment variable |
@Septa2112 , yes UV_USE_IO_URING=0 helped, and Node 20 behaves as expected now. Thank you! |
We hit this as well on a 5.10 era kernel - UV_USE_IO_URING fixed it |
Version
v20.3.0/v20.3.1
Platform
Intel ICX Linux 5.15.0-72-generic
Intel SPR Linux 5.15.0-72-generic
Subsystem
No response
What steps will reproduce the bug?
Benchmark - An internal benchmark based on Ghost.js
Test version of node - v20.2.0 and v20.3.0/v20.3.1
OS - Linux
Test Platform - Intel ICX and SPR
We used the above two versions of node to test our benchmark.
And found that the performance of node v20.3.0 was not as good as that of v20.2.0.
How often does it reproduce? Is there a required condition?
No response
What is the expected behavior? Why is that the expected behavior?
No response
What do you see instead?
This value represents the RPS. It can be found that the version of node-v20.3.0 is nearly 40% lower than that of node-v20.2.0.
And after a rollback test, we found that this performance impact is related to the upgrade of libuv to 1.45.0, and the issue can be located: #48078
Additional information
Any thoughts on this issue?
The text was updated successfully, but these errors were encountered: