Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for downloading upgrade artifacts async from checkin/action workflow #1706

Closed
Tracked by #1666
joshdover opened this issue Nov 10, 2022 · 4 comments · Fixed by #2205
Closed
Tracked by #1666

Add support for downloading upgrade artifacts async from checkin/action workflow #1706

joshdover opened this issue Nov 10, 2022 · 4 comments · Fixed by #2205
Assignees
Labels
8.7-candidate bug Something isn't working Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team

Comments

@joshdover
Copy link
Contributor

We've observed some machines on slow connections downloading the upgrade artifact as slow as 50Kb/s. Our largest artifact size is on linux, clocking in at around 370MB. Because of this upgrades can constantly fail, even if the download could possibly complete if given enough time.

We cannot currently just increase the download timeout (currently 10 minutes) because while processing the upgrade action, Agent will stop checking in with Fleet Server and will not receive other types of actions or policy updates.

We should break the upgrade sequence into two steps to avoid this problem and be able to download the agent binary while the agent continues to check in and receive updates and then resume the upgrade once the download is complete.

Blocks:

@joshdover
Copy link
Contributor Author

cc @cmacknz @pierrehilbert as discussed earlier this week. Dropping into the next sprint, but feel free to move.

@amitkanfer
Copy link
Contributor

@jlind23 marking this as P1

@jlind23 jlind23 added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team labels Dec 16, 2022
@cmacknz
Copy link
Member

cmacknz commented Jan 5, 2023

@blakerouse did we address this as part of the changes in V2? The notes on #2410 suggests we have.

@blakerouse
Copy link
Contributor

@cmacknz No I do not believe so. The Fleet gateway code still uses a single loop to run actions so that action should run and while its running other actions (like checkin) will be blocked.

leehinman added a commit to leehinman/elastic-agent that referenced this issue Jan 27, 2023
- handle upgrade in separate go routine
- increase download timeout to 2 hours
- add KeepAliveSettings
- set idle connection timeout to 30 sec

Closes elastic#1706
Duplicate elastic#1666
leehinman added a commit to leehinman/elastic-agent that referenced this issue Feb 7, 2023
- handle upgrade in separate go routine
- increase download timeout to 2 hours
- add KeepAliveSettings
- set idle connection timeout to 30 sec

Closes elastic#1706
Duplicate elastic#1666
leehinman added a commit to leehinman/elastic-agent that referenced this issue Mar 7, 2023
- handle upgrade in separate go routine
- increase download timeout to 2 hours
- add KeepAliveSettings
- set idle connection timeout to 30 sec

Closes elastic#1706
Duplicate elastic#1666
leehinman added a commit to leehinman/elastic-agent that referenced this issue Mar 8, 2023
- handle upgrade in separate go routine
- increase download timeout to 2 hours
- add KeepAliveSettings
- set idle connection timeout to 30 sec

Closes elastic#1706
Duplicate elastic#1666
leehinman added a commit to leehinman/elastic-agent that referenced this issue Mar 9, 2023
- handle upgrade in separate go routine
- increase download timeout to 2 hours
- add KeepAliveSettings
- set idle connection timeout to 30 sec

Closes elastic#1706
Duplicate elastic#1666
leehinman added a commit to leehinman/elastic-agent that referenced this issue Mar 10, 2023
- handle upgrade in separate go routine
- increase download timeout to 2 hours
- add KeepAliveSettings
- set idle connection timeout to 30 sec

Closes elastic#1706
Duplicate elastic#1666
leehinman added a commit to leehinman/elastic-agent that referenced this issue Mar 10, 2023
- handle upgrade in separate go routine
- increase download timeout to 2 hours
- add KeepAliveSettings
- set idle connection timeout to 30 sec

Closes elastic#1706
Duplicate elastic#1666
leehinman added a commit to leehinman/elastic-agent that referenced this issue Mar 16, 2023
- handle upgrade in separate go routine
- increase download timeout to 2 hours
- add KeepAliveSettings
- set idle connection timeout to 30 sec

Closes elastic#1706
Duplicate elastic#1666
leehinman added a commit that referenced this issue Mar 21, 2023
- handle upgrade in separate go routine
- increase download timeout to 2 hours
- add KeepAliveSettings
- set idle connection timeout to 30 sec
- skip duplicate upgrade requests
- cancel current upgrade if upgrade comes in for a newer version, proceed with newer upgrade
- added unit tests
- prevent multiple upgrades from command line

Closes #1706
Duplicate #1666
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
8.7-candidate bug Something isn't working Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants