-
Notifications
You must be signed in to change notification settings - Fork 1
Conversation
dd9dbae
to
3fccfd3
Compare
1b03b29
to
3fccfd3
Compare
3a6d8cb
to
939ae3e
Compare
…ed for some work in the guard loop. Also removes the double duty of timer as a timer and a status, status is now a dedicated enum
939ae3e
to
b33b4a5
Compare
…n in the case of exception
b33b4a5
to
b58fc7d
Compare
@paperless-ngx/backend when folks have time, this would be nice to have reviewed and merged. |
@stumpylog this and the other PRs seem great, Im a little worried that you are the only person at the moment who knows the code well enough to have a meaningful review. Im not the right person for that but Im not even sure how best to test this either, I could at least do that. |
I'm helped by the original having some pretty decent documentation of how the pieces all fit together: https://django-q.readthedocs.io/en/latest/architecture.html. Plus rightly or not, I'm trusting the original suite of tests to be pretty good and will catch it if something majorly breaks. Another eventual option is ditching django-q for something like celery or rq which is maintained. That's a major effort though. Good point about testing. I'm not sure where I could document this, but I do:
|
Cool thanks. Well then, at least in my testing this works a-ok, I cant actually get it to timeout (set |
I used a setting of 30 with this pdf: https://health.mo.gov/lab/pdf/PublicWaterMassMailing.pdf. It's scanned with no digital layer, so it takes a while, at least on my machine. |
Weird, im now getting
|
Weird, I'm not sure how I never ran into that, but I believe it is fixed now. |
Sweet, that fixed that weird one. I still cant get it to timeout, this is with
or an even longer one:
|
Timeout of 300 or 30? It looks like the second document finished in about 3 minutes, so that would align with a 300s timeout not triggering. |
Doh, stupid mistake sorry that setting is seconds, not ms. Ok progress! But Im a bit confused because after that it still consumed:
|
Ah, very interesting. Looks like the timeout happened around here, then thanks to this TODO catching everything, it was just thought of as a pdfminer exception, and onward goes the consumption. Hm, I'll have to think about that. |
…ss likely to catch it
I'll have to test this, but I believe it will work for even crazy low timeouts as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bravo again
This pull request updates how django-q handles its timeout checking to be more effecient, while making the status of a worker more clear.
Previous Architecture
Previously, the Sentinel guard loop decremented a shared
Value
, then terminated the process if the shared value reached 0, as this value indicated a timeout. Somewhat confusingly, this same timer value was used to reflect worker status, where certain magic numbers indicated the worker was idle (-1) or had ceased working due to reaching a limit for recycling (-2).New Architecture
Now, these have been split up and the worker is responsible for setting its own status, while also handling its own timeout.
Worker Status
Instead of magic numbers, a
IntEnum
is utilized alongside aValue
. This is still shared between the Sentinel and the worker, but thanks to the enum, it is more clear what statuses are possible and what they mean. The value is anI
or unsigned integer, which is why anIntEnum
was chosen.Worker Timeout
Directly before beginning the task given to it, the worker utilizes
signal.alarm
to schedule an exception to be raised in a given number of seconds. If the work completes in time, the alarm is canceled. Otherwise, the task fails and the worker status will be set to indicate a timeout, causing the sentinel to reincarnate the worker.In the case of no timeout, the alarm is passed a value of 0, which causes no alarm to be scheduled, ending up with infinite time to do the work.
As mentioned in the file, this technique is borrowed from rq and seems pretty straightforward. After testing with paperless, this is actually quite useful, as the timeout exception makes it much more clear a task has failed due to timing out. However, if a task catches
Exception
orBaseException
and doesn't re-raise something, a worker could keep trying to work forever. Nothing in a paperless async task should do that though, so these changes are alright for us.