-
-
Notifications
You must be signed in to change notification settings - Fork 279
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix on_conflict
strategy handling
#629
Fix on_conflict
strategy handling
#629
Conversation
3687a7c
to
34182dd
Compare
Sorry, I have no idea why the specs are failing on CI. They pass locally using ruby 2.6.6 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for reporting the issue and taking the time to try and fix it.
I believe this is not the fix that we need though. It is mixing up the concepts a bit.
This is likely the fault of the documentation (which leaves a lot to be wished for, sorry about that).
I'll take some time to try and understand your problem later. I do believe you are misusing the strategy.
@@ -23,8 +23,8 @@ class UntilAndWhileExecuting < BaseLock | |||
# @yield to the caller when given a block | |||
# | |||
def lock(origin: :client) | |||
return lock_failed(origin: origin) unless (token = locksmith.lock) | |||
return yield token if block_given? | |||
token = locksmith.lock || lock_failed(origin: origin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replace strategy should only be used for either the server or the client process if I remember correctly.
Secondly, the strategy should run instead of the lock. Any return value from the strategy is coincidental and should not cause the job to execute which this implementation does.
In other words, this is not the right solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I had a look now, you can get this information if you validate your worker.
expect(MyWorker).to have_valid_sidekiq_options
sidekiq_options lock: :until_and_while_executing,
on_conflict: {
client: :replace,
server: :reschedule
}
Or if you rather raise and let the job be retried when it is able to get the lock:
sidekiq_options lock: :until_and_while_executing,
on_conflict: {
client: :replace,
server: :raise
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I messed up the formatting of the commit message. For us this is a regression caused by 8c8d54c. Before this change the test I added worked without an exception and behaved like we expected. We might have used a non-feature that wasn't intended to work. Since we are using :until_and_while_executing
I don't think that the server
on_conflict
strategy will ever trigger. We are scheduling jobs on the client side to basically de-bounce events. For a group of events we only want one job to run X seconds after the last of the grouped events fired. This worked fine before and should never need a server
on_conflict
strategy because the jobs are scheduled to run in the future and should always only have one job per lock_args
in the queue.
The exception I posted happens because Middleware::Client#lock lock_instance.lock
now doesn't yield but returns the return value of BaseLock#lock_failed
when the job was locked by strategy. This breaks the Sidekiq middleware interface because the middleware is supposed to either yield
or return an item
or nil/false
. I believe the change restores the previous behavior if yielding inside lock
when a strategy was successful:
return call_strategy unless (locked_token = locksmith.lock(&block)) |
if (_token = lock_instance.lock) |
I just saw the current documentation of lock_failed
which mentions it's supposed to return void
, but I belive it still returns what it did before (String, nil
) so the change might work for our use case by accident 😄
Maybe the proper fix would be to pass the block into the on_conflict
strategies and yield there if they were successful?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went through the code again and i now believe the return value of OnConflict::Strategy#call
isn't documented correctly. It should have the same as BaseLock#lock
. I think the change works (for our use case) because for on_conflict: replace
, lock_failed
basically behaves like lock
:
def lock_failed(origin: :client) |
client_strategy.call { lock if replace? } |
block&.call |
Which then calls BaseLock#lock
again and returns String, nil
.
WDYT about changing the return type of the strategies to String, nil
and make all but replace
return nil
explicitly instead of implicitly like it's done for most of them currently?
👋 FYI I am having the exact same issue. Using the suggested fix doesn't change anything, same error : on_conflict: {
client: :replace,
server: :reschedule
} |
@Intrepidd I just updated the specs with your (and the generally suggested |
Thanks, that's odd. The exception I get is the same : Maybe worth noting : I'm enqueuing the job in the future with |
@dsander I'm having a similar issue as well with the same backtrace as in the original post To Reproduce: Versions:
Backtrace is exactly the same. It fails in here https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/client.rb#L196 |
2337952
to
2bea226
Compare
@Intrepidd @TheSmartnik I have added tests for both of your use cases and the pass locally, I still have no clue why CI fails though. |
d1762f9
to
e1610b5
Compare
That took some digging, turned out that Sidekiq 6.2.2 broke the specs., |
If you all upgrade to sidekiq-unique-jobs v7.1.6 you should be golden. Sorry about the extra long response time, I have had a rough couple of months. |
e1610b5
to
f7cce0e
Compare
This fixes the exception posted below when using `on_conflict: replace` and potentially other bugs related to `on_conflict`. Before [this change][1] `BasicLock.lock` always returned either the acquired lock or the return value of `call_strategy`. Since `Middleware::Client#lock` now only yields when `lock_instance.lock` yields we have to also `yield` inside the lock instances when the lock was acquired via `lock_failed` (i.e. a `on_conflict` strategy). ``` NoMethodError: undefined method `key?' for "f5d69f8fd2e1f3dde8cee02e":String from sidekiq/client.rb:197:in `atomic_push' from sidekiq/client.rb:190:in `block (2 levels) in raw_push' from redis.rb:2489:in `block in multi' from redis.rb:69:in `block in synchronize' from monitor.rb:202:in `synchronize' from monitor.rb:202:in `mon_synchronize' from redis.rb:69:in `synchronize' from redis.rb:2483:in `multi' from sidekiq/client.rb:189:in `block in raw_push' from connection_pool.rb:63:in `block (2 levels) in with' from connection_pool.rb:62:in `handle_interrupt' from connection_pool.rb:62:in `block in with' from connection_pool.rb:59:in `handle_interrupt' from connection_pool.rb:59:in `with' from sidekiq/client.rb:188:in `raw_push' from sidekiq/client.rb:74:in `push' from sidekiq/worker.rb:240:in `client_push' from sidekiq/worker.rb:215:in `perform_in' ``` mhenrixon#590 [1]: s://github.com/mhenrixon/sidekiq-unique-jobs/commit/8c8d54c8b9dea363a7d8b8aeaceb2e82966b8503
When using the `on_conflict` `replace` strategy `base_lock` has to return the return value of the conflict strategy. The replace strategy removes the previous job from the scheduled/rety set or the queue. It is then re-queued inside `lock_failed`/`call_strategy`, thus we have to return the return value of which came from `lock` inside the block of ` client_strategy.call`.
31195e9
to
b3255e1
Compare
@mhenrixon Thanks! That allowed me to get rid of the sidekiq version lock, while it "fixed" the exception, it still looks like it did not fix the I removed [this change]((v7.1.5...v7.1.6#diff-3744df1729b19fd14c4d3ff1cbedc47bb8c3cc13131595a4ec579e03a2713fc8R112) in my last commit which made the specs pass again. |
Close #629 The problem originally was that in some situations, a string was returned (for the lock). This should never happen, any conflict strategy must return nil to avoid pretending to be successful.
Hi ! Thanks for looking into it ! On my end I don't see the error again but am facing an odd behaviour : I'm using As mentionned before, I'm using |
@Intrepidd that's expected, the replace strategy deletes the old job and pushes the new one in. Which is why it should only be used with |
I'm sorry I wasn't clear. No job end up in the queue, leaving it empty And I am using
|
@Intrepidd @mhenrixon This happens because in #640 the middleware always returns
|
Hey sorry about that @Intrepidd and @dsander. I was mistaking replace for reschedule. I'll see if I can replicate this with a middleware test first :) |
@mhenrixon You can use those: projectivetech@23dc652. Not sure how well they fit into the test suite though, I always like to test things from the outside when I am not super familiar with a library 😄 |
* OnConflict, return nil Close #629 The problem originally was that in some situations, a string was returned (for the lock). This should never happen, any conflict strategy must return nil to avoid pretending to be successful. * Refactor (thanks reek for pointing it out) * Ensure the replace strategy yields This should be considered a success * Adds documentation
Hi! 👋
we found a bug related to the
on_conflict
option. The change fixes it for our use case and hopefully as well for the rest of the uniqueness strategies.This fixes the exception posted below when using
on_conflict: replace
and potentially other bugs related to
on_conflict
. Before this changeBasicLock.lock
always returned either the acquired lock or the returnvalue of
call_strategy
. SinceMiddleware::Client#lock
now onlyyields when
lock_instance.lock
yields we have to alsoyield
insidethe lock instances when the lock was acquired via
lock_failed
(i.e. aon_conflict
strategy).#590