Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High number of redis commands per second #240

Closed
alexey-yarmosh opened this issue Nov 21, 2022 · 4 comments
Closed

High number of redis commands per second #240

alexey-yarmosh opened this issue Nov 21, 2022 · 4 comments
Assignees
Labels
info Useful information inside performance

Comments

@alexey-yarmosh
Copy link
Member

alexey-yarmosh commented Nov 21, 2022

New Relic redis agent shows too many "commands per second":

E.g. 100 seconds, 1 RPS, 100 probes. Based on code analysis it should be:

Total commands:
100 seconds * 1 RPS * (2 rate limit + 3 create measurement + 100 probes * (3 geoip + 2 ack + 3 * 2 progress + 3 result + 1 if finished + 1 measurement time) + 3 mark finished) = ~160800
Commands per second:
160800 / 100 = ~1608

At the same time redis agent shows ~4000 commands per second:
image

At the same time redis is also used as adapter for socket.io and that should add to the final CPS number, but anyway it is too big.

We need to make sure that there are no unnecessary commands we are sending to redis as this is our main bottlneck.

@alexey-yarmosh
Copy link
Member Author

The reason for such a big number of redis commands is high number of probe:measurement:progress and subsequent redis writes. Redis profiler shows that a single ping request to 1 probe tooks 17 progress messages before final result message. Every line was sent through websockets and written to redis separately.
Profiler log:

1671545286.835788 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"PING google.com (216.58.215.174) 56(84) bytes of data.\\\\n\""
1671545286.859462 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=1 ttl=122 time=1.73 ms\\\\n\""
1671545287.038700 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=2 ttl=122 time=1.01 ms\\\\n\""
1671545287.239136 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=3 ttl=122 time=0.977 ms\\\\n\""
1671545287.440352 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=4 ttl=122 time=0.909 ms\\\\n\""
1671545287.641530 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=5 ttl=122 time=0.938 ms\\\\n\""
1671545287.841793 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=6 ttl=122 time=1.01 ms\\\\n\""
1671545288.042048 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=7 ttl=122 time=0.940 ms\\\\n\""
1671545288.243631 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=8 ttl=122 time=0.925 ms\\\\n\""
1671545288.444014 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=9 ttl=122 time=1.04 ms\\\\n\""
1671545288.645291 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=10 ttl=122 time=0.893 ms\\\\n\""
1671545288.846174 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=11 ttl=122 time=0.939 ms\\\\n\""
1671545289.048380 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=12 ttl=122 time=0.953 ms\\\\n\""
1671545289.248041 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=13 ttl=122 time=1.05 ms\\\\n\""
1671545289.448929 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=14 ttl=122 time=1.02 ms\\\\n\""
1671545289.649714 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=15 ttl=122 time=0.984 ms\\\\n\""
1671545289.851647 "JSON.STRAPPEND" "gp:measurement:CcwvqVtL8Auc9Ybe" "$.results.qNrO77Aps6IrYgTf.result.rawOutput" "\"64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=16 ttl=122 time=0.945 ms\\\\n\\\\n--- google.com ping statistics ---\\\\n16 packets transmitted, 16 received, 0% packet loss, time 3012ms\\\\nrtt min/avg/max/mdev = 0.893/1.016/1.734/0.190 ms\\\\n\""

Final message:

PING google.com (216.58.215.174) 56(84) bytes of data.
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=1 ttl=122 time=1.73 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=2 ttl=122 time=1.01 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=3 ttl=122 time=0.977 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=4 ttl=122 time=0.909 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=5 ttl=122 time=0.938 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=6 ttl=122 time=1.01 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=7 ttl=122 time=0.940 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=8 ttl=122 time=0.925 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=9 ttl=122 time=1.04 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=10 ttl=122 time=0.893 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=11 ttl=122 time=0.939 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=12 ttl=122 time=0.953 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=13 ttl=122 time=1.05 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=14 ttl=122 time=1.02 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=15 ttl=122 time=0.984 ms
64 bytes from mad41s07-in-f14.1e100.net (216.58.215.174): icmp_seq=16 ttl=122 time=0.945 ms

--- google.com ping statistics ---
16 packets transmitted, 16 received, 0% packet loss, time 3012ms
rtt min/avg/max/mdev = 0.893/1.016/1.734/0.190 ms

Based on that, suggestion from @MartinKolarik seems extra valid:

We can, if it helps, throttle the partial updates a bit using the following logic: send the first part of output right away, then an update at most once a second, and then the final result as soon as the measurement is fully done.

We should agree on the throttle parameters and apply that.

@alexey-yarmosh
Copy link
Member Author

Other redis commands are valid and synced with the logic of the service.

@jimaek
Copy link
Member

jimaek commented Dec 20, 2022

There's a task open for that already #225

Please close either this one or that one and let's proceed with the idea.

To me it's important for the output to remain feeling real-time. So it needs some local tests to see what would feel as lag or delay to a human

@alexey-yarmosh
Copy link
Member Author

Saving valid N of operations calculation here. 100 seconds * 1 RPS * 10 probes * 17 progress messages:
100 seconds * 1 RPS * (3 rate limit + 3 fetch probes + 1 create probes awaiting + 1 create measurement + 1 expire measurement + 10 probes * (1 measurement request + 2 ack + 17 progress messages * 2 progress writes + 1 probes awaititng + 1 decr + 2 result) + 4 mark finished)

=> 100 * 1 * (3 + 3 + 1 + 1 + 1 + 10 * (1 + 2 + 17 * 2 + 1 + 1 + 2) + 4) = 42300

@alexey-yarmosh alexey-yarmosh added the info Useful information inside label Mar 22, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
info Useful information inside performance
Projects
None yet
Development

No branches or pull requests

2 participants