You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The test run with cloud output is not resilient towards external restart of operator's pod. This happens mainly due to the controller not storing its full state with cloud output execution. When operator is restarted by external actor, the flow of the controller may be broken in case of any test run; and in case of test run with cloud output specifically, it may lead to the test run being started but not finalized.
More precisely, FinishJobs is set to finalize always by timeout, regardless of the state of runner pods; since f08da61. But in case of restart of the operator's pod, the test run ID is lost and it's not possible to finalize the test. Full solution for such cases is to store the test run ID independently from the pod lifecycle, i.e. externally. Additionally, FinishJobs rely on cloud.InspectOutput.TotalDuration field which would also be lost in case of a restart.
The text was updated successfully, but these errors were encountered:
The test run with cloud output is not resilient towards external restart of operator's pod. This happens mainly due to the controller not storing its full state with cloud output execution. When operator is restarted by external actor, the flow of the controller may be broken in case of any test run; and in case of test run with cloud output specifically, it may lead to the test run being started but not finalized.
More precisely,
FinishJobs
is set to finalize always by timeout, regardless of the state of runner pods; since f08da61. But in case of restart of the operator's pod, the test run ID is lost and it's not possible to finalize the test. Full solution for such cases is to store the test run ID independently from the pod lifecycle, i.e. externally. Additionally,FinishJobs
rely oncloud.InspectOutput.TotalDuration
field which would also be lost in case of a restart.The text was updated successfully, but these errors were encountered: