Fix `on_conflict` strategy handling #629

dsander · 2021-08-02T16:16:47Z

Hi! 👋

we found a bug related to the on_conflict option. The change fixes it for our use case and hopefully as well for the rest of the uniqueness strategies.

This fixes the exception posted below when using on_conflict: replace
and potentially other bugs related to on_conflict. Before this change
BasicLock.lock always returned either the acquired lock or the return
value of call_strategy. Since Middleware::Client#lock now only
yields when lock_instance.lock yields we have to also yield inside
the lock instances when the lock was acquired via lock_failed (i.e. a
on_conflict strategy).

NoMethodError: undefined method `key?' for "f5d69f8fd2e1f3dde8cee02e":String
  from sidekiq/client.rb:197:in `atomic_push'
  from sidekiq/client.rb:190:in `block (2 levels) in raw_push'
  from redis.rb:2489:in `block in multi'
  from redis.rb:69:in `block in synchronize'
  from monitor.rb:202:in `synchronize'
  from monitor.rb:202:in `mon_synchronize'
  from redis.rb:69:in `synchronize'
  from redis.rb:2483:in `multi'
  from sidekiq/client.rb:189:in `block in raw_push'
  from connection_pool.rb:63:in `block (2 levels) in with'
  from connection_pool.rb:62:in `handle_interrupt'
  from connection_pool.rb:62:in `block in with'
  from connection_pool.rb:59:in `handle_interrupt'
  from connection_pool.rb:59:in `with'
  from sidekiq/client.rb:188:in `raw_push'
  from sidekiq/client.rb:74:in `push'
  from sidekiq/worker.rb:240:in `client_push'
  from sidekiq/worker.rb:215:in `perform_in'

#590

dsander · 2021-08-02T16:29:11Z

Sorry, I have no idea why the specs are failing on CI. They pass locally using ruby 2.6.6

mhenrixon

Thank you for reporting the issue and taking the time to try and fix it.

I believe this is not the fix that we need though. It is mixing up the concepts a bit.

This is likely the fault of the documentation (which leaves a lot to be wished for, sorry about that).

I'll take some time to try and understand your problem later. I do believe you are misusing the strategy.

mhenrixon · 2021-08-03T05:52:21Z

lib/sidekiq_unique_jobs/lock/until_and_while_executing.rb

@@ -23,8 +23,8 @@ class UntilAndWhileExecuting < BaseLock
      # @yield to the caller when given a block
      #
      def lock(origin: :client)
-        return lock_failed(origin: origin) unless (token = locksmith.lock)
-        return yield token if block_given?
+        token = locksmith.lock || lock_failed(origin: origin)


The replace strategy should only be used for either the server or the client process if I remember correctly.

Secondly, the strategy should run instead of the lock. Any return value from the strategy is coincidental and should not cause the job to execute which this implementation does.

In other words, this is not the right solution.

So I had a look now, you can get this information if you validate your worker.

expect(MyWorker).to have_valid_sidekiq_options

sidekiq_options lock: :until_and_while_executing, on_conflict: { client: :replace, server: :reschedule }

Or if you rather raise and let the job be retried when it is able to get the lock:

sidekiq_options lock: :until_and_while_executing, on_conflict: { client: :replace, server: :raise }

Sorry I messed up the formatting of the commit message. For us this is a regression caused by 8c8d54c. Before this change the test I added worked without an exception and behaved like we expected. We might have used a non-feature that wasn't intended to work. Since we are using :until_and_while_executing I don't think that the server on_conflict strategy will ever trigger. We are scheduling jobs on the client side to basically de-bounce events. For a group of events we only want one job to run X seconds after the last of the grouped events fired. This worked fine before and should never need a server on_conflict strategy because the jobs are scheduled to run in the future and should always only have one job per lock_args in the queue.

The exception I posted happens because Middleware::Client#lock lock_instance.lock now doesn't yield but returns the return value of BaseLock#lock_failed when the job was locked by strategy. This breaks the Sidekiq middleware interface because the middleware is supposed to either yield or return an item or nil/false. I believe the change restores the previous behavior if yielding inside lock when a strategy was successful:

sidekiq-unique-jobs/lib/sidekiq_unique_jobs/lock/base_lock.rb

Line 45 in f4f4ce3

return call_strategy unless (locked_token = locksmith.lock(&block))

sidekiq-unique-jobs/lib/sidekiq_unique_jobs/middleware/client.rb

Line 28 in f4f4ce3

if (_token = lock_instance.lock)

I just saw the current documentation of lock_failed which mentions it's supposed to return void, but I belive it still returns what it did before (String, nil) so the change might work for our use case by accident 😄

Maybe the proper fix would be to pass the block into the on_conflict strategies and yield there if they were successful?

Went through the code again and i now believe the return value of OnConflict::Strategy#call isn't documented correctly. It should have the same as BaseLock#lock. I think the change works (for our use case) because for on_conflict: replace, lock_failed basically behaves like lock:

sidekiq-unique-jobs/lib/sidekiq_unique_jobs/lock/base_lock.rb

Line 109 in da53bb2

def lock_failed(origin: :client)

sidekiq-unique-jobs/lib/sidekiq_unique_jobs/lock/base_lock.rb

Line 119 in da53bb2

client_strategy.call { lock if replace? }

sidekiq-unique-jobs/lib/sidekiq_unique_jobs/on_conflict/replace.rb

Line 45 in da53bb2

block&.call

Which then calls BaseLock#lock again and returns String, nil.

WDYT about changing the return type of the strategies to String, nil and make all but replace return nil explicitly instead of implicitly like it's done for most of them currently?

Intrepidd · 2021-09-01T14:21:40Z

👋 FYI I am having the exact same issue.

Using the suggested fix doesn't change anything, same error :

on_conflict: { 
  client: :replace, 
  server: :reschedule 
}

dsander · 2021-09-01T16:21:13Z

@Intrepidd I just updated the specs with your (and the generally suggested on_conflict server side configuration and it works without issues. Can you post the exception you are getting?

Intrepidd · 2021-09-01T16:52:14Z

Thanks, that's odd.

The exception I get is the same : NoMethodError: undefined method 'key?' for "xxxxxxxxxxxx":String
Unfortunately I don't have a full backtrace, do you know how I could get one ? (Standard rails app)

Maybe worth noting : I'm enqueuing the job in the future with perform_in, effectively (trying to) using sidekiq unique jobs as a debouncer.

TheSmartnik · 2021-09-08T12:25:55Z

@dsander I'm having a similar issue as well with the same backtrace as in the original post

To Reproduce:

Versions:
sidekiq (6.2.2)
sidekiq-unique-jobs (7.1.5)

class MyWorker
  include Sidekiq::Worker
  sidekiq_options lock: :until_and_while_executing, on_conflict: :reject, retry: false # same with { server: :reject, client: :reject }

  def perform(id)
  end
end

irb(main):001:0> MyWorker.perform_at(3.hours.from_now, 1)
=> "7c7a3f1c1bb909802c58efba"
irb(main):002:0> MyWorker.perform_at(3.hours.from_now, 1)
2021-09-08T12:22:08.816Z pid=22291 tid=fgb uniquejobs=client until_and_while_executing=uniquejobs:0d1124b8750bfa013beb1f52f6b608a7 INFO: Adding dead MyWorker job 819e5093ad0174935916b0d4
Traceback (most recent call last):
        1: from (irb):2
NoMethodError (undefined method `key?' for true:TrueClass)

Backtrace is exactly the same. It fails in here https://github.com/mperham/sidekiq/blob/master/lib/sidekiq/client.rb#L196

dsander · 2021-09-15T09:59:04Z

@Intrepidd @TheSmartnik I have added tests for both of your use cases and the pass locally, I still have no clue why CI fails though.

dsander · 2021-09-15T11:18:10Z

That took some digging, turned out that Sidekiq 6.2.2 broke the specs.,

mhenrixon · 2021-09-24T15:47:32Z

If you all upgrade to sidekiq-unique-jobs v7.1.6 you should be golden.

Sorry about the extra long response time, I have had a rough couple of months.

This fixes the exception posted below when using `on_conflict: replace` and potentially other bugs related to `on_conflict`. Before [this change][1] `BasicLock.lock` always returned either the acquired lock or the return value of `call_strategy`. Since `Middleware::Client#lock` now only yields when `lock_instance.lock` yields we have to also `yield` inside the lock instances when the lock was acquired via `lock_failed` (i.e. a `on_conflict` strategy). ``` NoMethodError: undefined method `key?' for "f5d69f8fd2e1f3dde8cee02e":String from sidekiq/client.rb:197:in `atomic_push' from sidekiq/client.rb:190:in `block (2 levels) in raw_push' from redis.rb:2489:in `block in multi' from redis.rb:69:in `block in synchronize' from monitor.rb:202:in `synchronize' from monitor.rb:202:in `mon_synchronize' from redis.rb:69:in `synchronize' from redis.rb:2483:in `multi' from sidekiq/client.rb:189:in `block in raw_push' from connection_pool.rb:63:in `block (2 levels) in with' from connection_pool.rb:62:in `handle_interrupt' from connection_pool.rb:62:in `block in with' from connection_pool.rb:59:in `handle_interrupt' from connection_pool.rb:59:in `with' from sidekiq/client.rb:188:in `raw_push' from sidekiq/client.rb:74:in `push' from sidekiq/worker.rb:240:in `client_push' from sidekiq/worker.rb:215:in `perform_in' ``` mhenrixon#590 [1]: s://github.com/mhenrixon/sidekiq-unique-jobs/commit/8c8d54c8b9dea363a7d8b8aeaceb2e82966b8503

When using the `on_conflict` `replace` strategy `base_lock` has to return the return value of the conflict strategy. The replace strategy removes the previous job from the scheduled/rety set or the queue. It is then re-queued inside `lock_failed`/`call_strategy`, thus we have to return the return value of which came from `lock` inside the block of ` client_strategy.call`.

dsander · 2021-09-26T09:38:31Z

@mhenrixon Thanks! That allowed me to get rid of the sidekiq version lock, while it "fixed" the exception, it still looks like it did not fix the on_conlifct: :replace client strategy: https://github.com/projectivetech/sidekiq-unique-jobs/runs/3712228560?check_suite_focus=true#step:6:22

I removed [this change]((v7.1.5...v7.1.6#diff-3744df1729b19fd14c4d3ff1cbedc47bb8c3cc13131595a4ec579e03a2713fc8R112) in my last commit which made the specs pass again.

Close #629 The problem originally was that in some situations, a string was returned (for the lock). This should never happen, any conflict strategy must return nil to avoid pretending to be successful.

mhenrixon · 2021-09-27T08:49:32Z

@dsander could you check if #640 fixes the issue for you? The problem I see with your code is that you actually changed the internal API to be invalid. There should be no change from [void] to [String, void]. The API is expected to support void or nil only.

Intrepidd · 2021-09-27T09:39:01Z

Hi ! Thanks for looking into it !

On my end I don't see the error again but am facing an odd behaviour :

I'm using on_conflict: :replace but when queuing a conflicting job, the previous one just disappears from the queue.

As mentionned before, I'm using perform_in so I am talking about the scheduled queue

mhenrixon · 2021-09-27T09:41:55Z

@Intrepidd that's expected, the replace strategy deletes the old job and pushes the new one in. Which is why it should only be used with on_conflict: { server: :client }.

Intrepidd · 2021-09-27T09:44:22Z

I'm sorry I wasn't clear.

No job end up in the queue, leaving it empty

And I am using

on_conflict: {
      client: :replace,
      server: :reschedule
    }

dsander · 2021-09-27T09:47:15Z

@Intrepidd @mhenrixon This happens because in #640 the middleware always returns nil. From the sidekiq middleware docs:

Not calling yield or returning anything other than job will result in no further middleware being called and the job will not be pushed to the queue. (Note that yield will return an equivalent value to job)

mhenrixon · 2021-09-27T09:52:17Z

Hey sorry about that @Intrepidd and @dsander. I was mistaking replace for reschedule. I'll see if I can replicate this with a middleware test first :)

dsander · 2021-09-27T09:54:47Z

@mhenrixon You can use those: projectivetech@23dc652. Not sure how well they fit into the test suite though, I always like to test things from the outside when I am not super familiar with a library 😄

* OnConflict, return nil Close #629 The problem originally was that in some situations, a string was returned (for the lock). This should never happen, any conflict strategy must return nil to avoid pretending to be successful. * Refactor (thanks reek for pointing it out) * Ensure the replace strategy yields This should be considered a success * Adds documentation

dsander closed this Aug 2, 2021

dsander reopened this Aug 2, 2021

dsander force-pushed the feature/fix-on-conflict-handling branch 2 times, most recently from 3687a7c to 34182dd Compare August 2, 2021 16:28

mhenrixon reviewed Aug 3, 2021

View reviewed changes

mhenrixon added bug v7.x labels Sep 3, 2021

dsander force-pushed the feature/fix-on-conflict-handling branch from 2337952 to 2bea226 Compare September 15, 2021 09:55

dsander force-pushed the feature/fix-on-conflict-handling branch 3 times, most recently from d1762f9 to e1610b5 Compare September 15, 2021 11:12

dsander force-pushed the feature/fix-on-conflict-handling branch from e1610b5 to f7cce0e Compare September 26, 2021 09:27

dsander added 5 commits September 26, 2021 11:35

Update return types for `OnConflict#call methods

3c08b5b

Add spec testing on_conflict replace using perform_in

10359c2

Add spec for on_conflict reject

66fb5cc

dsander force-pushed the feature/fix-on-conflict-handling branch from 31195e9 to b3255e1 Compare September 26, 2021 09:36

Test the on_conflict: :replace returns the new job id

630ac09

mhenrixon mentioned this pull request Sep 27, 2021

OnConflict::Replace: yield when lock was achieved #640

Merged

mhenrixon closed this in #640 Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix `on_conflict` strategy handling #629

Fix `on_conflict` strategy handling #629

dsander commented Aug 2, 2021 •

edited

Loading

dsander commented Aug 2, 2021

mhenrixon left a comment

mhenrixon Aug 3, 2021

mhenrixon Aug 3, 2021 •

edited

Loading

dsander Aug 3, 2021

dsander Aug 3, 2021

Intrepidd commented Sep 1, 2021

dsander commented Sep 1, 2021

Intrepidd commented Sep 1, 2021 •

edited

Loading

TheSmartnik commented Sep 8, 2021

dsander commented Sep 15, 2021

dsander commented Sep 15, 2021

mhenrixon commented Sep 24, 2021

dsander commented Sep 26, 2021

mhenrixon commented Sep 27, 2021

Intrepidd commented Sep 27, 2021

mhenrixon commented Sep 27, 2021

Intrepidd commented Sep 27, 2021 •

edited

Loading

dsander commented Sep 27, 2021

mhenrixon commented Sep 27, 2021

dsander commented Sep 27, 2021

Fix on_conflict strategy handling #629

Fix on_conflict strategy handling #629

Conversation

dsander commented Aug 2, 2021 • edited Loading

dsander commented Aug 2, 2021

mhenrixon left a comment

Choose a reason for hiding this comment

mhenrixon Aug 3, 2021

Choose a reason for hiding this comment

mhenrixon Aug 3, 2021 • edited Loading

Choose a reason for hiding this comment

dsander Aug 3, 2021

Choose a reason for hiding this comment

dsander Aug 3, 2021

Choose a reason for hiding this comment

Intrepidd commented Sep 1, 2021

dsander commented Sep 1, 2021

Intrepidd commented Sep 1, 2021 • edited Loading

TheSmartnik commented Sep 8, 2021

dsander commented Sep 15, 2021

dsander commented Sep 15, 2021

mhenrixon commented Sep 24, 2021

dsander commented Sep 26, 2021

mhenrixon commented Sep 27, 2021

Intrepidd commented Sep 27, 2021

mhenrixon commented Sep 27, 2021

Intrepidd commented Sep 27, 2021 • edited Loading

dsander commented Sep 27, 2021

mhenrixon commented Sep 27, 2021

dsander commented Sep 27, 2021

Fix `on_conflict` strategy handling #629

Fix `on_conflict` strategy handling #629

dsander commented Aug 2, 2021 •

edited

Loading

mhenrixon Aug 3, 2021 •

edited

Loading

Intrepidd commented Sep 1, 2021 •

edited

Loading

Intrepidd commented Sep 27, 2021 •

edited

Loading