Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection churn and disk usage of Raft WAL segments #6447

Closed
kjnilsson opened this issue Nov 23, 2022 Discussed in #5258 · 1 comment · Fixed by #6468
Closed

Connection churn and disk usage of Raft WAL segments #6447

kjnilsson opened this issue Nov 23, 2022 Discussed in #5258 · 1 comment · Fixed by #6468
Labels
Milestone

Comments

@kjnilsson
Copy link
Contributor

Discussed in #5258

Originally posted by dhxgit July 18, 2022
Hi,

I have found a reproduce-able issue with up to and including version 3.10.6 of rabbitmq.

What happens:

  • Every Connection attempts seems to be stored/logged in quorum segment files
    • The needed storage grows and grows until full
      • ( Queue gets cleaned up if a message flows through the queue )

Reproduction Setup: ( see https://github.com/dhxgit/rmq-storage-issue )

  • We start the official rabbitmq container (see docker-compose.yaml)
  • We start a python test container (also in my repo) which in a loop with a lot of workers
    • connects to a queue with basic_consume
    • kills basic_consume after a short 2~3 second timeout
  • After a few minutes of runtime we can restart the rabbitmq-container
    • this will write out the segments files into the quorum directory
    • this will also happen without restart when the wal-file gets rotated

Result:

  • Given enough time * clients * frequency of connections * queues
    • -> Server storage will fill up, even though the queues are completely empty.
      ( The segments files will be cleaned up if any message flow through the queue, but if that does not happen the storage grows until completely full )

Even though this is a somewhat unusual case (un-used queues that are checked by consumers that constantly restart consuming), I believe that it should not fill up the storage with segments files which just seem to contain the consumer-tags.

@kjnilsson kjnilsson added the bug label Nov 23, 2022
@kjnilsson
Copy link
Contributor Author

It is likely that is patch is all that is needed:

diff --git a/deps/rabbit/src/rabbit_fifo.erl b/deps/rabbit/src/rabbit_fifo.erl
index b3defbfd23..2ef2ba5076 100644
--- a/deps/rabbit/src/rabbit_fifo.erl
+++ b/deps/rabbit/src/rabbit_fifo.erl
@@ -549,9 +549,10 @@ apply(#{system_time := Ts, machine_version := MachineVersion} = Meta,
     Effects = [{monitor, node, Node} | Effects1],
     checkout(Meta, State0, State#?MODULE{enqueuers = Enqs,
                                          last_active = Ts}, Effects);
-apply(Meta, {down, Pid, _Info}, State0) ->
-    {State, Effects} = handle_down(Meta, Pid, State0),
-    checkout(Meta, State0, State, Effects);
+apply(#{index := Idx} = Meta, {down, Pid, _Info}, State0) ->
+    {State1, Effects1} = handle_down(Meta, Pid, State0),
+    {State, Reply, Effects} = checkout(Meta, State0, State1, Effects1),
+    update_smallest_raft_index(Idx, Reply, State, Effects);
 apply(Meta, {nodeup, Node}, #?MODULE{consumers = Cons0,
                                      enqueuers = Enqs0,
                                      service_queue = _SQ0} = State0) ->

michaelklishin added a commit that referenced this issue Nov 24, 2022
Resolves a pathological case where consumers are added but never
explicitly cancelled, and no messages are consumed for prolonged
periods of time.

Closes #6447.

Authored-by: Karl Nilsson <[email protected]>
Committed-by: Michael Klishin <[email protected]>
@michaelklishin michaelklishin added this to the 3.11.4 milestone Nov 24, 2022
mergify bot pushed a commit that referenced this issue Nov 24, 2022
Resolves a pathological case where consumers are added but never
explicitly cancelled, and no messages are consumed for prolonged
periods of time.

Closes #6447.

Authored-by: Karl Nilsson <[email protected]>
Committed-by: Michael Klishin <[email protected]>
(cherry picked from commit bcea35d)
mergify bot pushed a commit that referenced this issue Nov 24, 2022
Resolves a pathological case where consumers are added but never
explicitly cancelled, and no messages are consumed for prolonged
periods of time.

Closes #6447.

Authored-by: Karl Nilsson <[email protected]>
Committed-by: Michael Klishin <[email protected]>
(cherry picked from commit bcea35d)
(cherry picked from commit 3901c6a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants