Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dead lettering en masse can overload the DLQ #5312

Closed
illotum opened this issue Jul 21, 2022 · 38 comments · Fixed by #6467
Closed

Dead lettering en masse can overload the DLQ #5312

illotum opened this issue Jul 21, 2022 · 38 comments · Fixed by #6467
Assignees
Milestone

Comments

@illotum
Copy link

illotum commented Jul 21, 2022

We observe a potential bug in our brokers

Classic mirrored DLQ replica, at a certain load, flips into CPU and memory churn and cannot recover. First reported here and later confirmed on multiple RabbitMQ versions.

With appropriate data set the instance is OOM-killed, but even on smaller sets I had the broker crossing into red and back for a long time after the test.

Screen Shot 2022-07-20 at 7 58 47 PM

Reproduction

Environment

> cat ../rmq1.conf
listeners.tcp.default = 127.0.0.1:5671
management.tcp.port = 15671
cluster_formation.peer_discovery_backend = classic_config
cluster_formation.classic_config.nodes.1 = rmq2@3c22fb6527ee

> cat ../rmq2.conf
listeners.tcp.default = 127.0.0.1:5672
management.tcp.port = 15672
cluster_formation.peer_discovery_backend = classic_config
cluster_formation.classic_config.nodes.1 = rmq1@3c22fb6527ee

> echo $RABBITMQ_ENABLED_PLUGINS
rabbitmq_management

> alias brun "bazel run --config=local"

Two brokers

brun broker RABBITMQ_CONFIG_FILE=(realpath ../rmq1.conf) RABBITMQ_NODENAME=rmq1 RABBITMQ_DIST_PORT=25671
brun broker RABBITMQ_CONFIG_FILE=(realpath ../rmq2.conf) RABBITMQ_NODENAME=rmq2 RABBITMQ_DIST_PORT=25672
brun -- rabbitmqctl -n rmq1@3c22fb6527ee set_policy ha "." '{"ha-mode":"all","ha-sync-mode":"automatic","queue-mode":"lazy"}' --apply-to queues

Long message backlog nacked between queues

./bin/runjava com.rabbitmq.perf.PerfTest -u "input" -h amqp://localhost:5671 --auto-delete false -f persistent --queue-args x-dead-letter-exchange=,x-dead-letter-routing-key=dlq -x 4 -y 0 -s 1000 -pmessages 250000

# "dlq" created beforehand, say in web console
./bin/runjava com.rabbitmq.perf.PerfTest -u "input" -h amqp://localhost:5671 --auto-delete false -f persistent --queue-args x-dead-letter-exchange=,x-dead-letter-routing-key=dlq -x 0 -y 10 -na -re false

Observations

Slave replica memory footprint stays abnormally high, oscillating +/- 2GB, for many hours (so far up to 4) after the test. There is a significant uptick in CPU as well.

queue_slave_procs: 13.3358 gb (62.38%)
binary: 6.2803 gb (29.37%)
allocated_unused: 1.6457 gb (7.7%)
other_proc: 0.0529 gb (0.25%)
code: 0.0348 gb (0.16%)
other_system: 0.0195 gb (0.09%)
other_ets: 0.0037 gb (0.02%)
plugins: 0.0031 gb (0.01%)
atom: 0.0014 gb (0.01%)
metrics: 0.0014 gb (0.01%)
mgmt_db: 0.001 gb (0.0%)
connection_other: 0.0001 gb (0.0%)
mnesia: 0.0001 gb (0.0%)
msg_index: 0.0 gb (0.0%)
quorum_ets: 0.0 gb (0.0%)
queue_procs: 0.0 gb (0.0%)
quorum_queue_dlx_procs: 0.0 gb (0.0%)
stream_queue_procs: 0.0 gb (0.0%)
stream_queue_replica_reader_procs: 0.0 gb (0.0%)
connection_readers: 0.0 gb (0.0%)
connection_writers: 0.0 gb (0.0%)
connection_channels: 0.0 gb (0.0%)
quorum_queue_procs: 0.0 gb (0.0%)
stream_queue_coordinator_procs: 0.0 gb (0.0%)
reserved_unallocated: 0.0 gb (0.0%)

From the couple hours I spent on this:

  1. Confirmed as low as 1M messages. On my laptop 3M is enough to trigger memory alarm.
  2. Confirmed for 1-20 consumers.
  3. Confirmed for messages as small as 1 byte.
  4. The broker recovers after 1-2 minutes of churn
    • at 250k messages
    • non-lazy queues
    • after deleting "dlq"
@illotum
Copy link
Author

illotum commented Jul 21, 2022

And a note from @thuandb (queue depth mistmatch because of a different env):

It is confirmed that IO is the bottleneck causing high memory utilization during queue sync.
Using gp3 with maximum throughput (1000MB/s) helps with large messages.
Message size: 640k bytes
Queue depth: 115k messages
sync batch size: 256.
Memory only slightly increase @ mirrors.

@lukebakken
Copy link
Collaborator

This may or may not be related, the symptoms here reminding me of this message - https://groups.google.com/g/rabbitmq-users/c/-_J_K4T7yTI/

@michaelklishin
Copy link
Member

What RabbitMQ version was used?

@illotum
Copy link
Author

illotum commented Jul 21, 2022

My test was run from a somewhat stale master, at e2edb1b. In prod this was observed on 3.8.22 and 3.8.27.

@illotum
Copy link
Author

illotum commented Jul 21, 2022

Updated description to remove the mention of allocs: I haven't actually looked at allocations yet. Memory may sway because of paging.

@lukebakken
Copy link
Collaborator

lukebakken commented Jul 23, 2022

I have put together a reproduction project here:

https://github.com/lukebakken/rabbitmq-server-5312

It uses RMQ 3.10.6 and the latest PerfTest.

I can reproduce the issue as described.

Workarounds that address the issue:

@michaelklishin
Copy link
Member

Using CQv2 or quorum queue solves the problem => closing.

@johanrhodin
Copy link
Contributor

We still have this issue with v2 classic queues. Will try to reproduce next week.

@lukebakken
Copy link
Collaborator

Putting this here since the issue seems to be the same - https://groups.google.com/g/rabbitmq-users/c/Kr2QZx1TREA

@illotum
Copy link
Author

illotum commented Sep 7, 2022

So in total, two reports about this still happening in CMQv2? @johanrhodin Looking forward to reproduction steps as well.

@lukebakken lukebakken self-assigned this Sep 7, 2022
@lukebakken lukebakken reopened this Sep 7, 2022
@danielllek
Copy link

I've also found another workaround - switching queue-mode from normal to lazy for queues that were causing problems solved the issue for me.

@lukebakken
Copy link
Collaborator

lukebakken commented Oct 20, 2022

FWIW I just revisited this using the latest main code for RabbitMQ and PerfTest. Turns out now I can reproduce this using mirrored classic v2 queues -

image

As expected quorum queues do not show this issue (using the latest main for RabbitMQ and PerfTest as well).

cc @michaelklishin any reason to keep this open at this point?

@michaelklishin
Copy link
Member

We can leave it open if some CMQ user would like to investigate and contribute a fix.

What immediately stands out on the screenshot is the gen_server2 buffer size for that CMQ mirror replica.

@mkuratczyk
Copy link
Contributor

I believe the problem comes from the way at-most-once dead lettering works. For quorum queues, we added at-least-once strategy that uses publisher confirms internally, but classic queues don't support that. When a lot of messages are dead-letter, we are effectively publishing to a mirrored queue without any flow-control mechanism, which overloads the DLQ process, which causes a memory spike.

Assuming that's true, we would need to have classic queues support at-least-once dead-lettering, which is something that CQv2 should probably support at some point, but whether the same implementation would work for CQv1 and how much work would need to be done to support it with mirroring, I'm not sure.

Alternatively, perhaps it's possible to apply back-pressure using some other mechanism, to prevent DLQ getting overloaded.

Feel free to have a look at QQ at-least-once support PR for inspiration:
#3100

@lukebakken
Copy link
Collaborator

@michaelklishin @mkuratczyk interestingly enough, the gen_server2 internal priority queue (gen_server2 buffer in the top UI) does eventually drain, restoring memory usage to normal levels.

@mkuratczyk
Copy link
Contributor

mkuratczyk commented Oct 21, 2022

Sorry, I think I jumped to a conclusion here. The process that takes so much memory is the input queue, not the dlq. Given that, my theory that we are overloading the dlq by publishing to it without any flow control is wrong. Which doesn't mean the current naive CQ dead-lettering process isn't the culprit, but I it's definitely not what I thought it was.

Scratch that. I must have run the test incorrectly (I guess I was still publishing to the input queue). Now I see the dlq queue using lots of memory, which is what I'd expect.

mkuratczyk added a commit that referenced this issue Oct 24, 2022
This doubles the publish rate using the repro steps from
#5312
and somewhat reduces the memory usage.

```
rabbitmqctl -n rabbit-0 set_policy --apply-to queues ha "cmq.*" '{"ha-mode":"exactly", "ha-params": 3, "ha-sync-mode": "automatic"}'

$ perf-test -ad false -f persistent -qa x-dead-letter-exchange=,x-dead-letter-routing-key=cmq-dlq,x-queue-master-locator=client-local,x-queue-version=1 -x 4 -y 0 -s 1000 -C 250000 -u cmq-input-v1
BEFORE: id: test-164723-973, sending rate avg: 6156 msg/s
AFTER: id: test-165708-401, sending rate avg: 11572 msg/s

$ perf-test -ad false -f persistent -qa x-dead-letter-exchange=,x-dead-letter-routing-key=cmq-dlq,x-queue-master-locator=client-local,x-queue-version=1,x-queue-mode=lazy -x 4 -y 0 -s 1000 -C 250000 -u cmq-input-v1lazy
BEFORE: id: test-165020-228, sending rate avg: 6239 msg/s
AFTER: id: test-165848-489, sending rate avg: 11758 msg/s

$ perf-test -ad false -f persistent -qa x-dead-letter-exchange=,x-dead-letter-routing-key=cmq-dlq,x-queue-master-locator=client-local,x-queue-version=2 -x 4 -y 0 -s 1000 -C 250000 -u cmq-input-v2
BEFORE: id: test-165314-597, sending rate avg: 6818 msg/s
AFTER: id: test-170027-308, sending rate avg: 13891 msg/s
```
mkuratczyk added a commit that referenced this issue Oct 25, 2022
This seems to be a large contributor to memory spikes observed in
#5312.
Initial testing shows significantly lower memory usage in with the issue
5312 workload.
@mkuratczyk
Copy link
Contributor

@illotum @johanrhodin could you please test the cmq-optimisations branch, also available as an OCI image pivotalrabbitmq/rabbitmq:cmq-optimisations-otp-max-bazel/?

For me it roughly doubles the publish rate during the first phase of the test and, I assume, when messages are published internally to the dlq. Thanks to this, it seems to also lower the memory usage by at least half. In my system, that's a difference between OOMkill and a smooth operations, albeit still with a relatively high memory usage. Is such an improvement sufficient?

Note: when using CQv2, it's also useful to increase the dist_buf_busy_limit. CQv2 is so much faster that it can easily overload the distribution channel and then we have a "hiccup" during the consumption phase.

Tested with CQv1, CQv1 lazy and CQv2. Each of these had a memory alarm/oomkill using the main branch while the results with this branch are better (although, again, the memory used is still high):
Screenshot 2022-10-25 at 15 59 11

@mkuratczyk mkuratczyk changed the title Churn in classic mirrored DLQ Dead lettering en masse can overload the DLQ Oct 26, 2022
@mkuratczyk
Copy link
Contributor

mkuratczyk commented Oct 26, 2022

One more thing worth sharing: ultimately the problem is that we don't use publisher confirms when internally publishing to classic queues (which is what happens during dead-lettering). I came up with a similar scenario without mirroring - put the input queue on one node and the dlq on another node and follow the same steps, just without the mirroring policy. Since mirroring makes this even worse (is responsible for part of the memory usage), you will need to increase the number/size of the messages but there are combinations of values that will lead to the dead letter queue being overloaded, even without mirroring.

I think all of this just proves what we knew: we need to have publisher confirms / flow control for internal publishing/dead-lettering to a classic queues, just like we now do for quorum queues.

@illotum
Copy link
Author

illotum commented Oct 26, 2022

@mkuratczyk your branch shows significant improvement across all my tests. I can still see the spike, but it's much harder to cross into alarm now.

@mkuratczyk
Copy link
Contributor

I have made one more commit to the branch to bring back some parts I removed previously, as it showed some performance regression with larger (not embedded) messages. I will continue testing next week but I'd also appreciate another round of testing on your side. Especially with workloads that are different than the reproduction steps of this issue, as the memory usage reduction in this scenario is significant and consistent, but the change affects mirroring in general, so I want to be careful with this, as lots of users may be affected.

@illotum
Copy link
Author

illotum commented Nov 2, 2022

Not too rigorous, but I've been running various sync scenarios over the last couple days: cluster member loss, DLQ, lazy and not, various message sizes. Tests show similar or better profile in all cases (often insignificant), so far I see no degradation.

@illotum
Copy link
Author

illotum commented Nov 22, 2022

@mkuratczyk anything else I can help with re your performance branch?

@michaelklishin
Copy link
Member

@illotum the core team will discuss what's ahead for that branch. Thank you the details and not letting this slip.

@mkuratczyk
Copy link
Contributor

@illotum What versions would you expect this to be back-ported to? While we didn't find any regressions, we know RabbitMQ is used in so many ways/configurations that it could have some negative impact in some of them, so we are wary of back-porting this/releasing in a patch release (especially for "older" versions like 3.9 where very few changes happen at this point).

@illotum
Copy link
Author

illotum commented Nov 23, 2022

I'd be happy with 3.10, and fine with 3.11. We are only rolling out 3.11, so this would be a good release to pick as first.

@michaelklishin
Copy link
Member

3.11 is what the core team had in mind.

@danielllek
Copy link

If it's possible for 3.10, I'd very happy :)

@mkuratczyk
Copy link
Contributor

@illotum My suggestion would be to ship it in 3.11 for now. Start using it, keep an eye on these instances and then, if you see the benefits and don't see regressions, we could consider backporting to 3.10.

mkuratczyk added a commit that referenced this issue Nov 24, 2022
This doubles the publish rate using the repro steps from
#5312
and somewhat reduces the memory usage.

```
rabbitmqctl -n rabbit-0 set_policy --apply-to queues ha "cmq.*" '{"ha-mode":"exactly", "ha-params": 3, "ha-sync-mode": "automatic"}'

$ perf-test -ad false -f persistent -qa x-dead-letter-exchange=,x-dead-letter-routing-key=cmq-dlq,x-queue-master-locator=client-local,x-queue-version=1 -x 4 -y 0 -s 1000 -C 250000 -u cmq-input-v1
BEFORE: id: test-164723-973, sending rate avg: 6156 msg/s
AFTER: id: test-165708-401, sending rate avg: 11572 msg/s

$ perf-test -ad false -f persistent -qa x-dead-letter-exchange=,x-dead-letter-routing-key=cmq-dlq,x-queue-master-locator=client-local,x-queue-version=1,x-queue-mode=lazy -x 4 -y 0 -s 1000 -C 250000 -u cmq-input-v1lazy
BEFORE: id: test-165020-228, sending rate avg: 6239 msg/s
AFTER: id: test-165848-489, sending rate avg: 11758 msg/s

$ perf-test -ad false -f persistent -qa x-dead-letter-exchange=,x-dead-letter-routing-key=cmq-dlq,x-queue-master-locator=client-local,x-queue-version=2 -x 4 -y 0 -s 1000 -C 250000 -u cmq-input-v2
BEFORE: id: test-165314-597, sending rate avg: 6818 msg/s
AFTER: id: test-170027-308, sending rate avg: 13891 msg/s
```
mkuratczyk added a commit that referenced this issue Nov 24, 2022
This seems to be a large contributor to memory spikes observed in
#5312.
Initial testing shows significantly lower memory usage in with the issue
5312 workload.
@mkuratczyk
Copy link
Contributor

I've opened a PR and shared some results of today's tests #6467

mergify bot pushed a commit that referenced this issue Nov 25, 2022
This doubles the publish rate using the repro steps from
#5312
and somewhat reduces the memory usage.

```
rabbitmqctl -n rabbit-0 set_policy --apply-to queues ha "cmq.*" '{"ha-mode":"exactly", "ha-params": 3, "ha-sync-mode": "automatic"}'

$ perf-test -ad false -f persistent -qa x-dead-letter-exchange=,x-dead-letter-routing-key=cmq-dlq,x-queue-master-locator=client-local,x-queue-version=1 -x 4 -y 0 -s 1000 -C 250000 -u cmq-input-v1
BEFORE: id: test-164723-973, sending rate avg: 6156 msg/s
AFTER: id: test-165708-401, sending rate avg: 11572 msg/s

$ perf-test -ad false -f persistent -qa x-dead-letter-exchange=,x-dead-letter-routing-key=cmq-dlq,x-queue-master-locator=client-local,x-queue-version=1,x-queue-mode=lazy -x 4 -y 0 -s 1000 -C 250000 -u cmq-input-v1lazy
BEFORE: id: test-165020-228, sending rate avg: 6239 msg/s
AFTER: id: test-165848-489, sending rate avg: 11758 msg/s

$ perf-test -ad false -f persistent -qa x-dead-letter-exchange=,x-dead-letter-routing-key=cmq-dlq,x-queue-master-locator=client-local,x-queue-version=2 -x 4 -y 0 -s 1000 -C 250000 -u cmq-input-v2
BEFORE: id: test-165314-597, sending rate avg: 6818 msg/s
AFTER: id: test-170027-308, sending rate avg: 13891 msg/s
```

(cherry picked from commit b5226c1)
mergify bot pushed a commit that referenced this issue Nov 25, 2022
This seems to be a large contributor to memory spikes observed in
#5312.
Initial testing shows significantly lower memory usage in with the issue
5312 workload.

(cherry picked from commit d7d129c)
@michaelklishin michaelklishin added this to the 3.11.4 milestone Nov 25, 2022
@SimonUnge
Copy link
Member

@illotum My suggestion would be to ship it in 3.11 for now. Start using it, keep an eye on these instances and then, if you see the benefits and don't see regressions, we could consider backporting to 3.10.

@mkuratczyk I would like to re-visit the idea of back-porting this to the 3.10 branch! In terms of getting feedback from us (AWS) in terms of regression (and benefits), that feedback will be easier to provide if we get this in the 3.10 branch as we have a larger adoption on 3.10 currently in our fleet of brokers.

Are there anything particular you require in terms of feedback from us, in order to backport to 3.10 sooner, rather than later?

@mkuratczyk
Copy link
Contributor

I understand you have more 3.10 but do you have 3.11.4+ as well already running and behaving well?

@SimonUnge
Copy link
Member

@mkuratczyk We actually do not have any brokers on 3.11 yet, as we currently only support 3.10.*, hence issues collecting data. We will go live with 3.11 early next year but as of now, if we want live data, it will need to be from a 3.10 version.
We have not seen any issues with our local testing, but we can not collect live data.

Hence the wish for a 3.10-backport for us to more quickly get feedback, and also address some the issue we see on our brokers of course.

@SimonUnge
Copy link
Member

Hi, just wanted to give this issue a bump, in the hope to get it back-ported to 3.10 - in our fleet we weekly encounter issues we think will get resolved with this fix, and would be able to quickly provide feedback. But it would need to be back-ported to a 3.10 release before we can do that. So I hope a 3.10 backport before real world data feedback is available would be an option?

We still have seen no regression issues with 3.11, but that is only based on local testing.
@mkuratczyk What would be a good convincing argument from us to get this backported to 3.10 sooner rather than later :)?

@michaelklishin
Copy link
Member

#6467 cherry-picks cleanly to v3.10.x. The changes seem node-local so I don't find them risky.

@michaelklishin
Copy link
Member

Another observation: #6467 relies on Sets v2 that were introduced in Erlang 24. RabbitMQ 3.10.x already requires 24.x, and later this year will require 25.

@michaelklishin
Copy link
Member

#6867

@michaelklishin
Copy link
Member

Shipped in 3.10.14.

michaelklishin added a commit that referenced this issue Jan 17, 2023
michaelklishin added a commit that referenced this issue Jan 17, 2023
michaelklishin added a commit that referenced this issue Jan 17, 2023
(cherry picked from commit 4419707)
michaelklishin added a commit that referenced this issue Jan 17, 2023
(cherry picked from commit e3a0304)
michaelklishin added a commit that referenced this issue Jan 17, 2023
(cherry picked from commit 4419707)
@SimonUnge
Copy link
Member

Shipped in 3.10.14.

Awesome, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
7 participants