Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.4] Redis Queue: Fix Job having timeout set different from the one of the queue #18919

Closed
wants to merge 8 commits into from

Conversation

Belphemur
Copy link
Contributor

In the previous version, laravel added the possibility to have each job set their own timeout independently of the queue timeout.

This feature hasn't been implemented in the redis queue and leads to a duplication of job if used.

Scenario

Queue timeout: 60 seconds
Job Timeout: 300 Seconds
Worker on queue: 2

Current way the queue works

Before taking a job from the Redis queue, it migrates the queue:reserved. This check is done by verifying if any score in the reserved queue is lower than the current timestamp. Any job reaching this condition is put back in the working queue.

After the migration, the worker takes the job, assign in the reserved queue and set it's score as NOW()+ timeout of the queue.

Problem

The job with a timeout of 300 is taken by a worker with a timeout of 60. The score in reserved will be NOW() + 60 sec. The second worker come along, the job hasn't finished yet (by its 300 timeout) but reached the queue timeout. The second worker then takes the job. The job is then duplicated in the queue.

Solution

This PR verify if a job has a timeout set when poping it from the queue and putting it in the reserved queue. If it's the case, as per documentation, that timeout as precedence on the one of the queue.

Antoine Aflalo added 2 commits April 24, 2017 11:15
Since a job can have a timeout set that is different from the queue, we can't rely only of the timeout of the queue to set the score of the job in the reserved queue. It needs to reflect the real timeout of the job.
@m1guelpf
Copy link
Contributor

Seems to be related to #16257

@Belphemur
Copy link
Contributor Author

@m1guelpf yes it's directly related to it. I can't finish the redis part correctly, I keep getting weird NIL error.

Maybe @taylorotwell would have a better idea how to resolve this.

@m1guelpf
Copy link
Contributor

m1guelpf commented Apr 24, 2017

@Belphemur Could you fix the tests and add a new one to prevent things like this from happening in the future?

Test working
@Belphemur
Copy link
Contributor Author

Belphemur commented Apr 24, 2017

@m1guelpf I finally got the LUA part right and I added a new test to prove it works


if(job ~= false) then
-- Increment the attempt count and place job on the reserved queue...
reserved = cjson.decode(job)
reserved = cjson.decode(job)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you make sure you don't have extra white space at all line endings?

@Belphemur
Copy link
Contributor Author

@tillkruss done, thanks for spotting it

@themsaid
Copy link
Member

themsaid commented Apr 24, 2017

Quoting the docs:

The --timeout value should always be at least several seconds shorter than your retry_after configuration value. This will ensure that a worker processing a given job is always killed before the job is retried. If your --timeout option is longer than your retry_after configuration value, your jobs may be processed twice.

The behaviour you described is not redis-specific, if you're using any queue driver your jobs will be processed multiple times if the timeout value is greater than the retry_after value.

@Belphemur
Copy link
Contributor Author

Belphemur commented Apr 24, 2017

I don't understand it's usage. It feels like we have then 2 differents way to set a timeout for a job, only one of them is going to kill it and the other will duplicate it.

Do you have an example of a job you'd like to duplicate because it hasn't finished in the given amount of time instead of killing it and restarting it?

Edit: re-read upgrade guide. Rename and explanation present in 5.3. Removed that part of the comment.

@themsaid
Copy link
Member

@Belphemur that configuration is there since the beginning, but it was renamed to retry_after by the release of 5.3 and that change is documented.

@Belphemur
Copy link
Contributor Author

@themsaid I apologise, I edited my comment. With the timeout that can be set in the job, the retry_after get a little bit dangerous. It's easy to forget it and set a timeout in a job that is way bigger than the retry_after.

@taylorotwell
Copy link
Member

Closing since this is not a bug as explained by @themsaid. It's true that your retry_after must always be longer than your longest timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants