-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
reader: don't send RDY 1 on a new connection if that would violate max-in-flight #254
reader: don't send RDY 1 on a new connection if that would violate max-in-flight #254
Conversation
It might be worth elaborating on the above. Before sending RDY on the new connection,
After this adjustment, we know that
where If
The two above displayed inequalities together imply
It follows that if |
Thanks @alpaker It looks like as of the commit I originally added the comment block in this method, we were checking for max in flight violations in Trying to find when/why we dropped that check... |
47d1693#diff-2a8bc85bf9c95f482da1eb0490a60251 and then ultimately removed in d94d2e5#diff-2a8bc85bf9c95f482da1eb0490a60251
Looks like we just missed a call site... |
So, my doing 😬. Sorry about that. On a quick scan, these other call sites seem like they might also need checks: Reader._maybe_update_rdy() Should I work something up for those as well? |
I'm not sure, need to get all this logic back in my head 😄 If we need to put that check at every call site, why wouldn't we put it back in |
I can't think of a functional reason to prefer one approach. (I find it easier to think through various cases if If we do restore the max-in-flight check to
where
|
Would be great to prove this by adding tests that fail and then pass when we fix these up (for |
I'm working up the test cases. One question about what you had in mind here:
If we just make this change:
The behavior when
Is that what you were thinking? |
Yes, the latter. |
Am going on vacation for a couple of days but should have something by the end of the weekend. |
9d04289
to
cec3349
Compare
@mreiferson I've made the changes we discussed and added relevant unit tests in 6f2e662 and 0dcb417. There's an edge case I encountered while bulking out the tests that appears unrelated to the series of changes I started in #179, but I included a fix because it interacts with the changes made here: When An unrelated issue beyond RDY management that I noticed, although I'm not sure it'd be considered a bug: When probing a single connection in backoff, a
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are awesome, great work, thanks! 💯 😍
Left a few questions...
nsq/reader.py
Outdated
@@ -626,6 +632,10 @@ def set_max_in_flight(self, max_in_flight): | |||
self._send_rdy(conn, 0) | |||
self.total_rdy = 0 | |||
else: | |||
for conn in self.conns.values(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just reset rdy_timeout
on all conns regardless of max_in_flight
?
I realize that we only need it when max_in_flight > 0
because we reset rdy_timeout
in _send_rdy
, which is forcibly called on all connections when max_in_flight == 0
, but that's mostly true for _redistribute_rdy_state
too. To me, this is a readability thing for our future selves...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. Will make that change.
@@ -660,6 +670,17 @@ def _redistribute_rdy_state(self): | |||
if self.need_rdy_redistributed: | |||
self.need_rdy_redistributed = False | |||
|
|||
if self.total_rdy > self.max_in_flight: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this block instead just set RDY 0 for all connections and let the rest of the logic below identify conns to receive non-zero ready?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the current redistribution logic I think that change could make it unsafe to call set_max_in_flight()
with messages in flight. Right now redistribution avoids giving positive RDY to a connection with a message in flight, so something like the following could happen:
- Let max-in-flight be >
len(conns)
. - With a message in flight on conn
c
, reduce m-i-f but keep it >=len(conns)
. _redistribute_rdy_state()
sets RDY 0 on all connections, includingc
.- Because
c
has something in flight, this redistribution round leavesc
at RDY 0. - Subsequent redistribution rounds see m-i-f >=
len(conns)
and so don't redistribute RDY, leavingc
starved.
We could make the simplification you suggest if we allow redistribution to give RDY to a conn with something in flight:
--- a/nsq/reader.py
+++ b/nsq/reader.py
@@ -720,7 +720,7 @@ class Reader(Client):
# We also don't attempt to avoid the connections who previously might have had RDY 1
# because it would be overly complicated and not actually worth it (ie. given enough
# redistribution rounds it doesn't matter).
- possible_conns = [c for c in conns if not (c.in_flight or c.rdy)]
+ possible_conns = [c for c in conns if not c.rdy]
while possible_conns and available_rdy:
available_rdy -= 1
conn = possible_conns.pop(random.randrange(len(possible_conns)))
which (I think) is fine from a correctness standpoint and will at worst increase the variance in how long it takes to service all connections equally.
Good catch, but this code is already complicated enough :) |
cec3349
to
41c1b4a
Compare
@alpaker want to squash that rebase commit and we can land this? |
@alpaker @mreiferson anyway we can get this merged? this hits us every now and again and makes us sad. this PR looks ready. |
…e max-in-flight constraint.
c499512
to
16abeb7
Compare
@mreiferson Squashed and GTG. |
@alpaker thanks very much, as soon as this lands i'll roll it out. |
Fixes #252.
Here we add a condition to
Reader._on_connection_ready()
so that we only send RDY 1 on a new connection if doing so won't lead to a max-in-flight violation. Because we earlier in_on_connection_ready()
ensure that all connections have RDY less than or equal to the new per-connection max in flight, this condition will only prevent us from sending RDY 1 on a new connection whenmax_in_flight
is less than the connection count, in which case we know that RDY redistribution will give RDY to this connection when it's safe.(Closed the previous PR and opened a new one using a different branch for the sake of cleaner git history.)