Skip to content

Commit

Permalink
[cuebot] Fix the issue with auto-retrying killed frames (#1444)
Browse files Browse the repository at this point in the history
Auto-retrying frames killed automatically by the OOM-kill logic was not
working as expected. RQD is currently not able to report exit_signal=9
when a frame is killed by the OOM logic. The current solution sets
exitStatus to Dispatcher.EXIT_STATUS_MEMORY_FAILURE before killing the
frame, this enables auto-retrying frames affected by the logic when they
report with a frameCompleteReport.
  • Loading branch information
DiegoTavares authored Aug 29, 2024
1 parent 87394e6 commit d5da19f
Showing 1 changed file with 12 additions and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -153,7 +153,18 @@ public void handleFrameCompleteReport(final FrameCompleteReport report) {
final String key = proc.getJobId() + "_" + report.getFrame().getLayerId() +
"_" + report.getFrame().getFrameId();

if (dispatchSupport.stopFrame(frame, newFrameState, report.getExitStatus(),
// rqd is currently not able to report exit_signal=9 when a frame is killed by
// the OOM logic. The current solution sets exitStatus to
// Dispatcher.EXIT_STATUS_MEMORY_FAILURE before killing the frame, this enables
// auto-retrying frames affected by the logic when they report with a
// frameCompleteReport. This status retouch ensures a frame complete report is
// not able to override what has been set by the previous logic.
int exitStatus = report.getExitStatus();
if (frameDetail.exitStatus == Dispatcher.EXIT_STATUS_MEMORY_FAILURE) {
exitStatus = frameDetail.exitStatus;
}

if (dispatchSupport.stopFrame(frame, newFrameState, exitStatus,
report.getFrame().getMaxRss())) {
if (dispatcher.isTestMode()) {
// Database modifications on a threadpool cannot be captured by the test thread
Expand Down

0 comments on commit d5da19f

Please sign in to comment.