Unreleased locks when Resque terminates using the loner option #26

tjsousa · 2015-01-22T15:17:45Z

Hi guys,

I've found an odd behaviour when using the loner option, where unreleased locks (without timeout) can be stuck in Redis after a process termination occurs while Resque is enqueueing a job (Heroku deploy environment).

After taking a look at the code, I guess the issue is that the lock is being acquired during a before_enqueue hook, which, during a process termination, can complete successfully while the actual Resque enqueue operation doesn't (it's not transactional from what I know). After this, no new process can actually enqueue a similar job for execution as the lock is already in Redis, and subsequent jobs of the same kind will be inhibited forever.

I was wondering if a possible strategy to handle this situation could be done using a two-phase process where an after_enqueue hook could be used to finally acquire the lock after the actual job enqueue operation completed.

The text was updated successfully, but these errors were encountered:

lantins · 2015-01-22T22:42:41Z

Hey @tjsousa

I don't use loner myself, so its difficult for me to judge how this should be played out.

When the process is terminated and it gets 'stuck', do you know HOW its killed?

I'd love to see a PR on how this could be fixed :)
Or we can discuss it some more here and figure out a way together.

edjames · 2015-07-06T15:20:21Z

Hi

See my reply to another issue (which is pretty much the same behaviour you describe here): #17

Hope that helps.

tjsousa · 2015-07-06T23:02:32Z

Thank you @edjames !

Although I have worked around this problem, it actually still persists. Looking at #17, my particular scenario does not use a hash as a parameter, so I'm not sure it has a common cause.

@lantins Answering your question, I know it happens when our app running in Heroku sends KILL signals to our Resque processes, although I wasn't able to pin-point the exact cause (as it only happens sometimes).

lantins · 2015-08-04T01:43:37Z

This reminds me of this issue: lantins/resque-retry#61

tjsousa · 2015-09-07T22:04:17Z

@lantins, thanks to that share I was finally able to dedicate some time in replicating the issue and, in fact, was able to confirm the lack of lock removal in the case of a dirty exit (e.g. SIGKILL in the child job).

Using a similar approach, we can do the lock cleanup from the worker process through a on_failure hook.

lantins mentioned this issue Aug 4, 2015

Even though the job is not in the queue, loaner prevents it from being added #17

Closed

tjsousa mentioned this issue Sep 7, 2015

Explicitly remove lock upon job's dirty exit using worker's on failure hook #27

Merged

lantins closed this as completed in #27 Nov 11, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unreleased locks when Resque terminates using the loner option #26

Unreleased locks when Resque terminates using the loner option #26

tjsousa commented Jan 22, 2015

lantins commented Jan 22, 2015

edjames commented Jul 6, 2015

tjsousa commented Jul 6, 2015

lantins commented Aug 4, 2015

tjsousa commented Sep 7, 2015

Unreleased locks when Resque terminates using the loner option #26

Unreleased locks when Resque terminates using the loner option #26

Comments

tjsousa commented Jan 22, 2015

lantins commented Jan 22, 2015

edjames commented Jul 6, 2015

tjsousa commented Jul 6, 2015

lantins commented Aug 4, 2015

tjsousa commented Sep 7, 2015