-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
wrk is the bottleneck for plaintext test and json test #5207
Comments
An option we have tried successfully is to use the database server as a second load generator fort plaintext and json. It requires synchronization but it works, and showed that ulib is more than twice as fast as the current numbers. |
@Xudong-Huang See #3538 and #4480. 10 Gbps network is basically the bottleneck currently. |
@zloster Thanks for the info. I just test in the Azure cloud and notice that the wrk server is 100% cpu while the http server is only about 50%. |
In any case the benchmark is unable to actually benchmark the server implementations. In plaintext/physical benchmark there does seem to be a hard limit (network?) which means that the difference between the top candidates probably are things like varying response length or similar things. In the json benchmark last I looked wrk seemed to be the bottleneck. |
We are still discussing the load generation bottleneck. We are considering a few different options:
|
My 2 cents from running into this issue.
|
@talawahtech The wrk limitation on plaintext is probably coming from the fact that pipelining is handled by a LUA script. It its early versions this was a feature that was handled natively. Earlier this year the script has been optimized and when the switch was done I could see much better perf, still bottlenecked. Based on that I think we should try to update wrk to add the native pipeline support. However this won't solve the json scenario. Our current approach in the ASP.NET team is to use two clients machines (as mentioned by @bhauer too) which in this case solves every non-database scenarios ... unless for Ulib which is still putting two clients on their knees surprisingly. As for the simplest solution, I think it's definitely to decrease the number of available cores on the server, as docker supports it oob. But I haven't gathered numbers yet. |
@sebastienros I may be mistaken, but I am not seeing a version of wrk that has native support for pipelining. The example that they provide in their scripts directory is pretty much the same approach as TechEmpower. According to the docs, since the work involved in setting up the pipelined request happens in the init() function (which is only called once) the performance impact should be minimal. |
@sebastienros how do you guys aggregate the results from the two different machines running wrk? If it is something that you can share I would love to take a look. |
We have a jobs queue of both machines, and a "driver" app that orchestrate these, once the two jobs are ready to start, we send the command simultaneously, then aggregate the results from wrk. The obvious issue is that we need to trust that both results were issued in the same time frame. So it's not as precise as having a single instance, but the results were consistent when I tried it of different target frameworks. |
There are several tools on Linux that provide this functionality. |
One important point is that the node creating the load (currently wrk) should have more resources than the server. Lowering # cores should work, but care should be taken to chose the same cores, perhaps avoid sharing the same physical cpu amongst logical cpus (hyperthreads) and so forth. Multiple worker nodes make a lot of sense. To avoid overlap timing issues one could add a margin of one second or so when starting and stopping and excluding the margin from the result. I measured libreactor way back and needed two load generating nodes with the same spec as the server to saturate it, so 2:1 makes sense. |
FYI Fredrik also created a high-performance benchmarking tool called pounce. I've only done some preliminary testing, but I saw a 20-25% improvement compared to wrk. |
I would recommend using rewrk For other reasons as well --
|
As we can see that the top tests of plaintext and json are so close, the wrk become the bottleneck of those tests.
I think we should make the server full, not the client full, so that we can know the real capability of the server.
To achieve that, should we give more cpu cores for wrk?
Another solution could be just run the wrk and the server on the same host, so that to make the whole system busy.
The text was updated successfully, but these errors were encountered: