Fix handling completed job with expired result when work horse dies #2154

fancyweb · 2024-11-22T18:18:57Z

Since #2039, Job.get_status() doesn't return None but raises InvalidJobOperation when refresh=True.

This change was not handled properly in Worker.monitor_work_horse().

I stumbled upon this with a use-case where a job with result_ttl=0 succeeds but the work horse doesn't exit with 0 (because it's wrapped). Job.get_status() ends up being called and raises an uncaught error.

There's no existing test for the behavior and I wasn't able to produce one.

codecov · 2024-11-22T18:29:10Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.46%. Comparing base (2de9491) to head (b82912d).
Report is 106 commits behind head on master.

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2154      +/-   ##
==========================================
- Coverage   93.61%   93.46%   -0.16%     
==========================================
  Files          28       30       +2     
  Lines        3760     4114     +354     
==========================================
+ Hits         3520     3845     +325     
- Misses        240      269      +29

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

fancyweb · 2024-11-22T19:17:42Z

Ref DataDog/dd-trace-py#11512

Here's the scenario as I understood it:

A job with result_ttl=0 is enqueued
This job is executed and completed
DataDog's Worker.perform_job() wrapper calls get_result(), raises the new error, and makes the horse process fails
rq calls get_result() again and raises the new error too 😅

selwin · 2024-11-28T12:40:44Z

Mind adding a test for this?

fancyweb · 2024-11-28T13:31:48Z

@selwin After looking at it again, I was able to add a test demonstrating the issue.

selwin · 2024-11-30T12:30:16Z

@fancyweb do you mind checking why the tests failed on Python 3.9?

fancyweb force-pushed the fix/get-status-raises branch from 17d8e7c to 4a344b0 Compare November 22, 2024 18:23

fancyweb mentioned this pull request Nov 22, 2024

fix(rq): handle new Job.get_status() exception DataDog/dd-trace-py#11512

Open

2 tasks

fancyweb force-pushed the fix/get-status-raises branch from 4a344b0 to 0975c00 Compare November 28, 2024 13:28

fancyweb force-pushed the fix/get-status-raises branch from 0975c00 to 1c7b3da Compare November 28, 2024 14:57

fancyweb force-pushed the fix/get-status-raises branch from 1c7b3da to 5e6baa9 Compare December 6, 2024 15:56

Fix handling completed job with expired result when work horse dies

b82912d

fancyweb force-pushed the fix/get-status-raises branch from 5e6baa9 to b82912d Compare December 6, 2024 16:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix handling completed job with expired result when work horse dies #2154

Fix handling completed job with expired result when work horse dies #2154

fancyweb commented Nov 22, 2024 •

edited

Loading

codecov bot commented Nov 22, 2024 •

edited

Loading

fancyweb commented Nov 22, 2024

selwin commented Nov 28, 2024

fancyweb commented Nov 28, 2024

selwin commented Nov 30, 2024

Fix handling completed job with expired result when work horse dies #2154

Are you sure you want to change the base?

Fix handling completed job with expired result when work horse dies #2154

Conversation

fancyweb commented Nov 22, 2024 • edited Loading

codecov bot commented Nov 22, 2024 • edited Loading

Codecov Report

fancyweb commented Nov 22, 2024

selwin commented Nov 28, 2024

fancyweb commented Nov 28, 2024

selwin commented Nov 30, 2024

fancyweb commented Nov 22, 2024 •

edited

Loading

codecov bot commented Nov 22, 2024 •

edited

Loading