Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[E2E] Agent recovers gracefully when upgrade is interrupted unexpectedly #5217

Open
ycombinator opened this issue Jul 29, 2024 · 2 comments
Open
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Testing

Comments

@ycombinator
Copy link
Contributor

ycombinator commented Jul 29, 2024

Extracted from #2176:

Test that the agent reliably restarts with the current version when an upgrade to a new version is interrupted unexpectedly, e.g. restarting the Agent while downloading or right before switching the symlink and ensuring that the pre-upgrade Agent version is the one that comes up.

On Linux there is very likely a window here where failure is not recoverable, see analysis in #4834.

The reason release() not making a system call is interesting is because this means that the watcher is still child of the agent process, meaning it dies when the agent dies. This cannot be changed without a system call happening on Unix. This means there is an opportunity where the agent is dead and the watcher is also dead, unable to roll back. I think we may need to use something like https://github.com/sevlyar/go-daemon?tab=readme-ov-file#how-it-works to fix this. I suspect our existing test for this case isn't precise enough. The watcher from the agent starting the upgrade would need to be killed, and the agent we upgraded to would have to exit before it launched the watcher. This is not how it works today. I don't think this PR makes anything any worse but it doesn't fix this problem.

@cmacknz
Copy link
Member

cmacknz commented Jul 30, 2024

Updated the description, I think there is an actual bug here on Linux our tests aren't finding.

@cmacknz cmacknz added the Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team label Jul 30, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent-control-plane (Team:Elastic-Agent-Control-Plane)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Testing
Projects
None yet
Development

No branches or pull requests

3 participants