Add support for downloading upgrade artifacts async from checkin/action workflow #1706

joshdover · 2022-11-10T18:51:49Z

We've observed some machines on slow connections downloading the upgrade artifact as slow as 50Kb/s. Our largest artifact size is on linux, clocking in at around 370MB. Because of this upgrades can constantly fail, even if the download could possibly complete if given enough time.

We cannot currently just increase the download timeout (currently 10 minutes) because while processing the upgrade action, Agent will stop checking in with Fleet Server and will not receive other types of actions or policy updates.

We should break the upgrade sequence into two steps to avoid this problem and be able to download the agent binary while the agent continues to check in and receive updates and then resume the upgrade once the download is complete.

Blocks:

Increase upgrade download timeout to 2 hours #1666

joshdover · 2022-11-10T18:52:34Z

cc @cmacknz @pierrehilbert as discussed earlier this week. Dropping into the next sprint, but feel free to move.

amitkanfer · 2022-11-28T19:30:22Z

@jlind23 marking this as P1

cmacknz · 2023-01-05T22:24:31Z

@blakerouse did we address this as part of the changes in V2? The notes on #2410 suggests we have.

blakerouse · 2023-01-09T15:15:16Z

@cmacknz No I do not believe so. The Fleet gateway code still uses a single loop to run actions so that action should run and while its running other actions (like checkin) will be blocked.

- handle upgrade in separate go routine - increase download timeout to 2 hours - add KeepAliveSettings - set idle connection timeout to 30 sec Closes elastic#1706 Duplicate elastic#1666

- handle upgrade in separate go routine - increase download timeout to 2 hours - add KeepAliveSettings - set idle connection timeout to 30 sec - skip duplicate upgrade requests - cancel current upgrade if upgrade comes in for a newer version, proceed with newer upgrade - added unit tests - prevent multiple upgrades from command line Closes #1706 Duplicate #1666

joshdover added bug Something isn't working 8.7-candidate labels Nov 10, 2022

joshdover mentioned this issue Nov 10, 2022

Increase upgrade download timeout to 2 hours #1666

Closed

7 tasks

jlind23 added Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team labels Dec 16, 2022

pierrehilbert assigned leehinman Jan 3, 2023

leehinman mentioned this issue Jan 27, 2023

make download of upgrade artifacts async #2205

Merged

5 tasks

leehinman closed this as completed in #2205 Mar 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for downloading upgrade artifacts async from checkin/action workflow #1706

Add support for downloading upgrade artifacts async from checkin/action workflow #1706

joshdover commented Nov 10, 2022

joshdover commented Nov 10, 2022

amitkanfer commented Nov 28, 2022

cmacknz commented Jan 5, 2023

blakerouse commented Jan 9, 2023

Add support for downloading upgrade artifacts async from checkin/action workflow #1706

Add support for downloading upgrade artifacts async from checkin/action workflow #1706

Comments

joshdover commented Nov 10, 2022

joshdover commented Nov 10, 2022

amitkanfer commented Nov 28, 2022

cmacknz commented Jan 5, 2023

blakerouse commented Jan 9, 2023