Use multiple cores by the API #224

alexey-yarmosh · 2022-11-07T13:16:22Z

Currently we have a 4 cores CPU.
Node effectively uses only one of them.
We should consider options to load idle cores: cluster, worker_threads, etc.

patrykcieszkowski · 2022-11-09T11:08:13Z

I'm not seeing the core usage on this screenshot, but generally speaking there's no benefit to using worker_threads for I/O operations, and that's 90% (if not more) of what this API service is doing. I would say you should consider spawning new threads for CPU intensive work, like hashing, encryption or parsing and leave I/O operations with the main Event Loop.

alexey-yarmosh · 2022-11-10T09:01:28Z

worker_threads is only one of the ideas to consider. Key here is running 4 API instances on a 4 cores machine to increase the I/O throughput, and that most likely should be done either by node cluster, or on the high level - by docker swarm.

jimaek · 2022-11-10T10:29:19Z

high level - by docker swarm.

Not possible, we use host networking, meaning only 1 container can run at any moment on a VM. Also it would probably increase pub-sub traffic and make syncing of probes more difficult. So ideally we want to only move specific operations to other threads and not run the whole API per thread.

alexey-yarmosh · 2022-11-10T13:11:21Z

we don't have intensive sync work to move from main thread. So when 2 VMs will not be enough we should choose between adding new VMs or running per thread.

MartinKolarik · 2022-11-10T14:35:20Z

cluster seems like the right thing to do, just needs to check how the messages are routed. We want them always delivered only to one client (app instance), not all clients.

alexey-yarmosh · 2023-03-09T11:21:21Z

The key thing here is to set up socket.io communication. Currently the only way to implement it is by using @socket.io/redis-adapter on every worker. While working correctly that means redis pub/sub traffic will increase dramatically:
currently - 1 fetchScokets with 2 instances with 1 worker generates 2 redis messages.
with 4 workers - 1 fetchSockets with 2 instances with 4 worker generates 8 redis messages.

As redis is our current bottleneck I am not sure we need to proceed with the solution described above. At least until #269 is implemented.

I've posted a question about combining @socket.io/redis-adapter with @socket.io/cluster-adapter, maybe that is doable. More details can be found there: socketio/socket.io#4659

alexey-yarmosh · 2023-03-13T11:16:25Z

Linked issue:
socketio/socket.io-cluster-adapter#1

MartinKolarik · 2023-03-13T13:33:56Z

The fetchSocket() calls are cached now, aren't they? The effect should not be that significant.

alexey-yarmosh · 2023-03-13T15:33:37Z

That is true, it is cached. And I don't think effect will be that big.

But anyway our servers are usually loaded on ~15% even under perf tests, while redis barely deals with the load. So we shouldn't make the narrow place more narrow to tune the API servers.

alexey-yarmosh · 2023-03-21T10:02:42Z

So what we can see on the image is that currently our bottleneck is the perf of the node process itself. (Redis CPU is ~3%, while API CPU is ~100%).

From the flamegraph we are observing that that there are some sync operation done mostly by npm packages, those are:

socket.io res/req encoding/decoding
redis data JSON.stringifying before sending to the DB
http validation
a lot of small newrelic operations (not sure what they are)

We can't get rid of most of that operations, but we should benefit from doing them in parallel and applying a cluster to the API seems to make sense now.

Other possible optimizations:

ipv4 check worth caching (it is the most heavy part of the http validation)
redis operations are already pipelined (there is an internal queue, and it's size grows and shrinks while we are creating new redis.json.set promises). I'll additionally check that it is used in all cases.
maybe we can benefit from redis lua scripts to move some set logic from node to redis.
socket.io perf doesn't seem to be a problem (API's CPU is at 100% while there are ~4000 RPS through socket.io. Even old benchmarks showed that socket io is able to deal with ~10000RPS.)
there a lot of newrelic operations, as I remember API sends logs to NR itself, so probably that can be changed, but seems not more than that.

P.S.
Redis load generated by mtr (5 RPS * 4 seconds * 500 probes):
5 req * 4 seconds * (3 + 3 + 1 + 1 + 1 + 500 * (1 + 2 + 4 * 2 + 1 + 1 + 2) + 4) = 150260 redis writes
150260 / 20 measurement time = ~7500 redis operations per second.

Socket.io load generated by mtr (5 RPS * 4 seconds * 500 probes):
5 req * 4 seconds * 500 probes * (1 request + 1 ack + 4 progress + 1 result) = 70000 socketio messages
70000 / 20 measurement time = 3500 redis messages per second.

MartinKolarik · 2023-03-21T11:47:46Z

Can you explain the 500 * (1 + 2 + 4 * 2 + 1 + 1 + 2) part? Does each probe make 15 writes per second per test (since this is multiplied by requests and time in the beginning)?

alexey-yarmosh · 2023-03-21T11:53:50Z

Not 15 writes per second, just 15 writes per probe per measurement. So it is 150000 writes overal, and since such measurements usually take ~20 seconds to end it is 150000/20 = 7500 writes per second.

#240 (comment)

MartinKolarik · 2023-03-21T11:58:10Z

Oh ok so the first part "5 req * 4 seconds" means 5 rps for 4 seconds = 20 measurements overall, right?

alexey-yarmosh · 2023-03-21T11:59:44Z

Yeah, that is the case when CPU gets to 100% after a few mtr requests.

alexey-yarmosh added the performance label Nov 7, 2022

jimaek assigned alexey-yarmosh Mar 7, 2023

alexey-yarmosh mentioned this issue Apr 6, 2023

feat: use clustering on the API side #322

Merged

MartinKolarik closed this as completed in #322 Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use multiple cores by the API #224

Use multiple cores by the API #224

alexey-yarmosh commented Nov 7, 2022

patrykcieszkowski commented Nov 9, 2022

alexey-yarmosh commented Nov 10, 2022

jimaek commented Nov 10, 2022

alexey-yarmosh commented Nov 10, 2022

MartinKolarik commented Nov 10, 2022 •

edited

Loading

alexey-yarmosh commented Mar 9, 2023 •

edited

Loading

alexey-yarmosh commented Mar 13, 2023

MartinKolarik commented Mar 13, 2023

alexey-yarmosh commented Mar 13, 2023

alexey-yarmosh commented Mar 21, 2023 •

edited

Loading

MartinKolarik commented Mar 21, 2023

alexey-yarmosh commented Mar 21, 2023 •

edited

Loading

MartinKolarik commented Mar 21, 2023

alexey-yarmosh commented Mar 21, 2023

Use multiple cores by the API #224

Use multiple cores by the API #224

Comments

alexey-yarmosh commented Nov 7, 2022

patrykcieszkowski commented Nov 9, 2022

alexey-yarmosh commented Nov 10, 2022

jimaek commented Nov 10, 2022

alexey-yarmosh commented Nov 10, 2022

MartinKolarik commented Nov 10, 2022 • edited Loading

alexey-yarmosh commented Mar 9, 2023 • edited Loading

alexey-yarmosh commented Mar 13, 2023

MartinKolarik commented Mar 13, 2023

alexey-yarmosh commented Mar 13, 2023

alexey-yarmosh commented Mar 21, 2023 • edited Loading

MartinKolarik commented Mar 21, 2023

alexey-yarmosh commented Mar 21, 2023 • edited Loading

MartinKolarik commented Mar 21, 2023

alexey-yarmosh commented Mar 21, 2023

MartinKolarik commented Nov 10, 2022 •

edited

Loading

alexey-yarmosh commented Mar 9, 2023 •

edited

Loading

alexey-yarmosh commented Mar 21, 2023 •

edited

Loading

alexey-yarmosh commented Mar 21, 2023 •

edited

Loading