messaging: duplicate message delivery #5796

derekperkins · 2020-02-06T16:46:05Z

We're seeing significant issues around duplicate message delivery for queues with backlogs. When a queue is running in near-realtime, we see 0 duplicates. If there is a backlog, there is some interesting behavior that happens with the queue cache and we see 20-30% duplicate message deliveries. The queue is processing quickly, so we are not exceeding the queue server timeout (vt_ack_wait=300).

Here's a chart for the last 24 hours showing duplicate rates for a queue with significant backlog.

Over the same period is a queue processing more messages with 0 duplicates until we stopped our queue consumer for 15 minutes to build up a backlog. Once it worked through those, there were no more duplicates.

Possibly related to this, we see weird behavior when new consumers connect to the message manager. For context on the chart:

Ready to run: time_next <= NOW()
Waiting to run: time_next > NOW()
Failed: time_next = MaxInt64 (this is our internal usage)

You can see a large jump in the status of 1M messages that coincides with us increasing consumers. We see this every time consumers change, and these metrics are collected via an out of band query, so the table itself must be changing. I can't explain what is happening. We see similar behavior if we run a query to reschedule messages that are failed or already acked.

My gut feeling is that something is happening in the message cache, but I don't have any data yet to support that.

The text was updated successfully, but these errors were encountered:

sougou · 2020-02-22T17:34:37Z

Most likely, you're hitting the situation where sends are failing. If a send fails, then the message is retried almost immediately: https://github.com/vitessio/vitess/blob/master/go/vt/vttablet/tabletserver/messager/message_manager.go#L120-L126.

We could change this behavior to postpone a message even if the send failed, but that may cause unnecessary postponements.

derekperkins mentioned this issue Mar 3, 2020

Messaging Wishlist #5882

Open

12 tasks

GuptaManan100 added Component: Cluster management P3 Type: Bug labels Dec 1, 2020

ajm188 removed the P3 label Mar 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

messaging: duplicate message delivery #5796

messaging: duplicate message delivery #5796

derekperkins commented Feb 6, 2020

sougou commented Feb 22, 2020

messaging: duplicate message delivery #5796

messaging: duplicate message delivery #5796

Comments

derekperkins commented Feb 6, 2020

sougou commented Feb 22, 2020