Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make cloud output test runs resilient to operator's restarts #108

Closed
yorugac opened this issue May 13, 2022 · 1 comment
Closed

Make cloud output test runs resilient to operator's restarts #108

yorugac opened this issue May 13, 2022 · 1 comment
Labels
bug Something isn't working FSM Connected to FSM Type: Improvement

Comments

@yorugac
Copy link
Collaborator

yorugac commented May 13, 2022

The test run with cloud output is not resilient towards external restart of operator's pod. This happens mainly due to the controller not storing its full state with cloud output execution. When operator is restarted by external actor, the flow of the controller may be broken in case of any test run; and in case of test run with cloud output specifically, it may lead to the test run being started but not finalized.

More precisely, FinishJobs is set to finalize always by timeout, regardless of the state of runner pods; since f08da61. But in case of restart of the operator's pod, the test run ID is lost and it's not possible to finalize the test. Full solution for such cases is to store the test run ID independently from the pod lifecycle, i.e. externally. Additionally, FinishJobs rely on cloud.InspectOutput.TotalDuration field which would also be lost in case of a restart.

@yorugac yorugac added bug Something isn't working Type: Improvement labels May 13, 2022
@yorugac yorugac changed the title Make cloud output test runs resilient to restarts Make cloud output test runs resilient to operator's restarts May 13, 2022
@yorugac yorugac added the FSM Connected to FSM label Jul 15, 2022
@yorugac
Copy link
Collaborator Author

yorugac commented Oct 13, 2023

This was resolved as part of #138

@yorugac yorugac closed this as completed Oct 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working FSM Connected to FSM Type: Improvement
Projects
None yet
Development

No branches or pull requests

1 participant