reader: don't decrement total_rdy on message receipt #179

alpaker · 2017-05-10T16:18:11Z

This brings reader behavior into agreement with nsqd behavior (after nsqio/nsq#404) and removes an opportunity for max_in_flight violations (#177).

This block was removed because it's redundant given _redistributed_rdy() and it's easier to maintain invariants without it.

@mreiferson

mreiferson · 2017-05-10T17:40:27Z

Thanks! I'm gonna read through the diff more carefully, but in the meantime tests are failing on Python 3.5 https://travis-ci.org/nsqio/pynsq/jobs/230827968

alpaker · 2017-05-10T18:29:26Z

D'oh! Fixed.

mreiferson · 2017-05-11T17:05:09Z

nsq/reader.py

@@ -358,7 +344,7 @@ def _maybe_update_rdy(self, conn):
        if self.backoff_timer.get_interval() or self.max_in_flight == 0:
            return

-        if conn.rdy <= 1 or conn.rdy < int(conn.last_rdy * 0.25):
+        if conn.rdy == 1:


I think I understand this change but want to make sure — this is because we initially set RDY 1 and we want to make sure we adjust to an appropriate per-connection max-in-flight, right?

Perhaps it's worth a comment since it took me a little while to figure this out.

Right. Updating RDY here used to serve two purposes: Periodically reupping RDY when it dipped too low, and going from an initial throttled RDY 1 on startup to full-throttle connection-max-in-flight. Now only the second case is relevant.

mreiferson · 2017-05-11T17:09:02Z

nsq/reader.py

@@ -665,10 +646,21 @@ def _redistribute_rdy_state(self):
                    logger.info('[%s:%s] idle connection, giving up RDY count', conn.id, self.name)
                    self._send_rdy(conn, 0)

+            conns = self.conns.values()
+
+            in_flight = [c for c in conns if c.in_flight]


I still don't quite understand why we need to use in_flight when total_rdy should represent the same thing? The accounting for both total_rdy and in_flight used to happen in 2 subsequent lines in the same block.

This might be further complicated because the max_in_flight variable below is terribly named. Perhaps a better name would be available_rdy?

mreiferson · 2017-05-11T17:11:19Z

nsq/reader.py

@@ -677,7 +669,7 @@ def _redistribute_rdy_state(self):
            # We also don't attempt to avoid the connections who previously might have had RDY 1
            # because it would be overly complicated and not actually worth it (ie. given enough
            # redistribution rounds it doesn't matter).
-            possible_conns = list(self.conns.values())
+            possible_conns = [c for c in conns if not (c.in_flight or c.rdy)]


Same comment here re: use of in_flight

Consider a case where max_in_flight is less than the connection count and a task takes longer than low_rdy_idle_timeout. Then we'll set RDY 0 on a connection even while it's still processing a message (https://github.com/nsqio/pynsq/blob/master/nsq/reader.py#L664). So we can end up in a situation where total_rdy is strictly less than the total count of in-flight messages. If we go from from RDY 0 to RDY 1 on a connection that has messages available for delivery, we guarantee a max-in-flight violation.

The above last sentence should be:

If we go from from RDY 0 to RDY 1 on a connection (other than the one just set to RDY 0) that has messages available for delivery, we guarantee a max-in-flight violation.

Note that this is not difficult to trigger. All you need is:

max_in_flight less than the number of connections

typical task duration longer than low_rdy_idle_timeout

msgs available on each connection

OK, I think that makes sense, but what about the section above? The sum of in_flight can also be less than total_rdy, in which case we'd enter this loop below thinking we have more than we actually do to give out?

Sorry, I think I misunderstood our earlier discussion to have meant that that was the kind of case you weren't worried about. The possibility you mention would be taken care of with:

in_flight_or_rdy = sum(max(c.in_flight, c.rdy) for c in conns) if backoff_interval: max_in_flight = max(0, 1 - in_flight_or_rdy) else: max_in_flight = self.max_in_flight - in_flight_or_rdy

That's essentially the logic I proposed in #177 (comment).

With the proposed changes, the full diff would look like https://github.com/nsqio/pynsq/compare/master...alpaker:no-decr-total-rdy-2?diff=unified&expand=1&name=no-decr-total-rdy-2#diff-2a8bc85bf9c95f482da1eb0490a60251R654

Cool, let's update this PR with that change.

As per above, I also think we should prefix the max_in_flight variable in this method with available_.

Thanks!

alpaker · 2017-05-12T20:30:35Z

@mreiferson Updated as discussed. (I went with your earlier suggestion of available_rdy for the variable name in _redistribute_rdy().)

mreiferson · 2017-05-13T13:49:24Z

nsq/reader.py

+                    c = random.choice([c for c in conns if c.in_flight])
+                    logger.info('[%s:%s] too many msgs in flight, giving up RDY count', c.id, self.name)
+                    self._send_rdy(c, 0)
+                except IndexError:


Prefer not using exceptions for flow control here.

mreiferson · 2017-05-13T13:49:50Z

nsq/reader.py

            else:
-                max_in_flight = self.max_in_flight - self.total_rdy
+                available_rdy = self.max_in_flight - in_flight_or_rdy


Now that we're using in_flight and rdy, I think this needs a max(0, ...) too

Both issues fixed.

redistribution logic accordingly. This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404) and removes an opportunity for max_in_flight violations (nsqio#177).

mreiferson

LGTM

alpaker force-pushed the no-decr-total-rdy branch from 9d2f54c to fa348cf Compare May 10, 2017 17:55

mreiferson added the bug label May 10, 2017

mreiferson reviewed May 11, 2017

View reviewed changes

alpaker force-pushed the no-decr-total-rdy branch 3 times, most recently from d7867f4 to ba6c2b4 Compare May 12, 2017 19:28

mreiferson requested changes May 13, 2017

View reviewed changes

reader: Don't decrement total_rdy on message receipt. Adjust RDY

47d1693

redistribution logic accordingly. This brings reader behavior into agreement with nsqd behavior (compare nsqio/nsq#404) and removes an opportunity for max_in_flight violations (nsqio#177).

alpaker force-pushed the no-decr-total-rdy branch from ba6c2b4 to 47d1693 Compare May 13, 2017 16:25

mreiferson approved these changes May 13, 2017

View reviewed changes

mreiferson merged commit d24ee1c into nsqio:master May 13, 2017

alpaker deleted the no-decr-total-rdy branch May 13, 2017 16:43

mreiferson mentioned this pull request May 13, 2017

reader: max_in_flight check in _send_rdy() doesn't take in-flight msgs into account #177

Closed

mreiferson mentioned this pull request Jun 3, 2017

consumer: redistribute RDY when connections are active nsqio/go-nsq#208

Merged

mreiferson mentioned this pull request Mar 8, 2019

consumer: cleanup RDY handling; fix pausing nsqio/go-nsq#249

Merged

alpaker mentioned this pull request Sep 14, 2020

reader: don't send RDY 1 on a new connection if that would violate max-in-flight #254

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

reader: don't decrement total_rdy on message receipt #179

reader: don't decrement total_rdy on message receipt #179

alpaker commented May 10, 2017 •

edited

Loading

mreiferson commented May 10, 2017

alpaker commented May 10, 2017

mreiferson May 11, 2017

mreiferson May 11, 2017

alpaker May 11, 2017

mreiferson May 11, 2017 •

edited

Loading

mreiferson May 11, 2017 •

edited

Loading

mreiferson May 11, 2017

alpaker May 11, 2017

alpaker May 11, 2017

alpaker May 11, 2017 •

edited

Loading

mreiferson May 11, 2017 •

edited

Loading

alpaker May 11, 2017

alpaker May 11, 2017

mreiferson May 12, 2017

alpaker commented May 12, 2017

mreiferson May 13, 2017

mreiferson May 13, 2017

alpaker May 13, 2017

mreiferson left a comment

reader: don't decrement total_rdy on message receipt #179

reader: don't decrement total_rdy on message receipt #179

Conversation

alpaker commented May 10, 2017 • edited Loading

mreiferson commented May 10, 2017

alpaker commented May 10, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mreiferson May 11, 2017 • edited Loading

Choose a reason for hiding this comment

mreiferson May 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alpaker May 11, 2017 • edited Loading

Choose a reason for hiding this comment

mreiferson May 11, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alpaker commented May 12, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mreiferson left a comment

Choose a reason for hiding this comment

alpaker commented May 10, 2017 •

edited

Loading

mreiferson May 11, 2017 •

edited

Loading

mreiferson May 11, 2017 •

edited

Loading

alpaker May 11, 2017 •

edited

Loading

mreiferson May 11, 2017 •

edited

Loading