Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix handling completed job with expired result when work horse dies #2154

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fancyweb
Copy link

@fancyweb fancyweb commented Nov 22, 2024

Since #2039, Job.get_status() doesn't return None but raises InvalidJobOperation when refresh=True.

This change was not handled properly in Worker.monitor_work_horse().

I stumbled upon this with a use-case where a job with result_ttl=0 succeeds but the work horse doesn't exit with 0 (because it's wrapped). Job.get_status() ends up being called and raises an uncaught error.

There's no existing test for the behavior and I wasn't able to produce one.

@fancyweb fancyweb force-pushed the fix/get-status-raises branch from 17d8e7c to 4a344b0 Compare November 22, 2024 18:23
Copy link

codecov bot commented Nov 22, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.46%. Comparing base (2de9491) to head (b82912d).
Report is 106 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2154      +/-   ##
==========================================
- Coverage   93.61%   93.46%   -0.16%     
==========================================
  Files          28       30       +2     
  Lines        3760     4114     +354     
==========================================
+ Hits         3520     3845     +325     
- Misses        240      269      +29     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@fancyweb
Copy link
Author

Ref DataDog/dd-trace-py#11512

Here's the scenario as I understood it:

  1. A job with result_ttl=0 is enqueued
  2. This job is executed and completed
  3. DataDog's Worker.perform_job() wrapper calls get_result(), raises the new error, and makes the horse process fails
  4. rq calls get_result() again and raises the new error too 😅

@selwin
Copy link
Collaborator

selwin commented Nov 28, 2024

Mind adding a test for this?

@fancyweb fancyweb force-pushed the fix/get-status-raises branch from 4a344b0 to 0975c00 Compare November 28, 2024 13:28
@fancyweb
Copy link
Author

@selwin After looking at it again, I was able to add a test demonstrating the issue.

@fancyweb fancyweb force-pushed the fix/get-status-raises branch from 0975c00 to 1c7b3da Compare November 28, 2024 14:57
@selwin
Copy link
Collaborator

selwin commented Nov 30, 2024

@fancyweb do you mind checking why the tests failed on Python 3.9?

@fancyweb fancyweb force-pushed the fix/get-status-raises branch from 1c7b3da to 5e6baa9 Compare December 6, 2024 15:56
@fancyweb fancyweb force-pushed the fix/get-status-raises branch from 5e6baa9 to b82912d Compare December 6, 2024 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants