[devworkspace] Investigate the scale-to-zero after a period of inactivity #16683

sleshchenko · 2020-04-21T11:37:59Z

Is your enhancement related to a problem? Please describe.

The CloudShell needs the scale-to-zero after a period of inactivity.
It's needed to figure out what is the best/good enough option to go.

Here I put some options but implementor should feel free to try to find other ones:

CloudShell is responsible for tracking activity and scaling itself to 0.
Pros:

does not seem to be difficult to implement;
Cons:
workspace service account needs permissions to update the workspace to be able to stop itself if user does not login at all, or the latest used token is expired;
in the future, each editor should implement scale-to-zero, like theia;

CloudShell tracks activity and sends it to a dedicated activity manager(possible embedded into controller). Activity manager stops the workspace after the inactivity period.
Pros:

each editor should only report activity which happens but the scale-to-zero mechanism is reused;
Cons:
It seems to be a more difficult option with a new component with REST API (not sure if it's typically for operators) which possible needs database.

amisevsk · 2020-04-21T20:22:50Z

Another con for option 2 is that it requires workspaces to be aware of the operator that controls them. Currently, once a workspace is provisioned, it does not care about the existence of the workspace controller at all -- it is more-or-less entirely self contained. If we require workspaces to report back to the controller in some way, it will complicate both workspaces and the controller significantly.

Are we sure that this is a controller feature? I would expect the workspace creator to be responsible for shutting down their workspace (c.f. to a user-defined deployment, for example). Similarly, for examples like cloud shell, I would expect the console to shut down cloud shells when they are no longer needed.

In terms of implementing, I think the cleanest way to do it would be to separate concerns between the workspace and the controller:

Activity monitor runs in workspace (perhaps as a small container?)
Workspace deployment has a health check for the activity monitor
Controller detects when health check fails (activity timeout reached) and scales down deployment.

Note this issue would depend on #16696

amisevsk · 2020-04-21T20:24:06Z

One convenient way to implement activity monitoring, if we go down the route of customizing the oauth-proxy to suit our needs, would be to do the activity monitoring there, since all user requests are processed there already. This would have the benefit of requiring no additional changes to editors, etc.

sleshchenko · 2020-04-22T05:58:35Z

if we go down the route of customizing the oauth-proxy to suit our needs, would be to do the activity monitoring there, since all user requests are processed there already.

I assume it might be possible but not sure if it's so easy with WebSocket connections... Typically WebSocket connection should have ping/pong messages when tab is opened but there is no any activity. And OpenShiftOAuth Proxy just makes sure that HTTP request to upgrade is authorized, but messages themselves are not validated and not sure if possible. Putting it here for further investigation, I like the idea in general.

metlos · 2020-04-22T11:33:12Z

Not sure if it would be applicable in this concrete case, but wouldn't some general purpose scale-to-0 solution, like https://github.com/deislabs/osiris, be a more appropriate solution for the users?

amisevsk · 2020-04-22T16:41:45Z

@metlos It depends on how osiris is implemented -- what metric is used for an "idle" pod? I can imagine e.g. editing text in Theia but not compiling to appear as "idle", whereas leaving a cryptocurrency miner in the background to be "busy".

amisevsk · 2020-05-13T21:55:44Z

In the context of our current work with including cloud terminals in the console directly, shell access will be provided by exec directly -- this pushes the burden of activity monitoring/scale-to-zero on the workspace creator (the OpenShift console in this case).

sleshchenko added kind/enhancement A feature request - must adhere to the feature request template. engine/devworkspace Issues related to Che configured to use the devworkspace controller as workspace engine. labels Apr 21, 2020

l0rd mentioned this issue Apr 21, 2020

Cloud Shell Che Workspace #15434

Closed

38 tasks

che-bot added the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Apr 21, 2020

vzhukovs removed the status/need-triage An issue that needs to be prioritized by the curator responsible for the triage. See https://github. label Apr 21, 2020

amisevsk mentioned this issue Apr 21, 2020

[workspace-controller] Implement "stopped" state for deployed workspaces #16696

Closed

sleshchenko self-assigned this May 22, 2020

This was referenced May 26, 2020

Make che-machine-exec track activity and stop workspace CR by idle timeout eclipse-che/che-machine-exec#106

Merged

Add stopping by idle timeout for command-line-terminal devfile/devworkspace-operator#84

Merged

sleshchenko closed this as completed in devfile/devworkspace-operator#84 May 27, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[devworkspace] Investigate the scale-to-zero after a period of inactivity #16683

[devworkspace] Investigate the scale-to-zero after a period of inactivity #16683

sleshchenko commented Apr 21, 2020

amisevsk commented Apr 21, 2020

amisevsk commented Apr 21, 2020

sleshchenko commented Apr 22, 2020

metlos commented Apr 22, 2020

amisevsk commented Apr 22, 2020

amisevsk commented May 13, 2020

[devworkspace] Investigate the scale-to-zero after a period of inactivity #16683

[devworkspace] Investigate the scale-to-zero after a period of inactivity #16683

Comments

sleshchenko commented Apr 21, 2020

Is your enhancement related to a problem? Please describe.

amisevsk commented Apr 21, 2020

amisevsk commented Apr 21, 2020

sleshchenko commented Apr 22, 2020

metlos commented Apr 22, 2020

amisevsk commented Apr 22, 2020

amisevsk commented May 13, 2020