Feature Request: Long running HTTP-based requests #176

duglin · 2022-04-08T16:42:24Z

This is a bit of question and request for a feature if my assumptions are correct...

Based on the ACA Deep Dive video from 2022/12/01, the HTTP auto-scaling aspect of apps is based on the number of incoming requests there are to the app. The video implied that metrics from envoy is used to determine the number of instances of the app to have running and, I assume, which instance to route each request to.

This model works fine for short-lived request processing. However, this will run into issues for long running processing or for asynchronous request process where the client (or the app) does not need to worry about any possible response and a 202 is expected to be returned as quickly as possible - while the processing continues in the background.

There are customer who will have scenarios where they generate a large number of HTTP requests and the amount of time needed to process each one can vary - from very quick to hours. Assuming that the inbound connection can remain active for the duration of those long-running processes is risky. Plus, once the connection is broken the HTTP infrastructure of ACA will assume that instance of the app is idle and either send a 2nd request to it (which could violate any concurrency policy defined, e.g. max concurrency of 1), or worse it could scale down that instance and kill the processing.

There are scenarios where either side of the connection (client vs app/server) knows whether or not to immediately return a 202. For example, the client could know that the request it's about to send is either long running or it doesn't care about waiting for a response so it might include some "flag" in the request to indicate its desire for asynchronous processing. How the results are retrieved is a different issue. Likewise, the app/server could know that its processing is such that the clients do not need to wait for the processing to be completed and it'll want to return a 202 immediately while background processing continues.

Aside from the aspects mentioned above, there's also the infrastructure to consider. If there are a large number of incoming requests it means the infrastructure would need to scale up at the same time as the app. So envoy, the activators, etc... might all need to scale to support persistent long lived (hours?) long connections despite them not actually being needed beyond the initial delivery of the request.

Regardless of the reasons, or the mechanisms used to signal the desired semantics to the infrastructure, it would be good for ACA to support the idea of an incoming HTTP request being used to initiate an asynchronous long-running process.

There are many ways to achieve this, and many aspects to consider (e.g. do we want to support any QoS features - e.g. retries) but even w/o the advanced features I think we need to consider some level of support.

chriswue · 2022-08-08T21:21:30Z

Seems very similar to #24

duglin · 2022-08-09T13:07:57Z

I think #24 is different since I think that's talking more about having a manual invoke type of scaling flag on KEDA. This issue is more about changing how we determine when the processing of a request is completed - meaning, assuming "done" == "connection is closed" is risky in some cases.

chriswue · 2022-08-09T21:43:14Z

@duglin Technically maybe but as far as I can see the underlying use-case is the same: Prevent pods from being killed by the autoscaler if they are still processing.
Maybe there should be a busy probe or something and pods that report busy don't get killed.

ahmelsayed · 2022-08-10T22:01:10Z

@chriswue There was indeed an issue with applications being marked as inactive despite having active connections if the connection was for longer than 5 minutes. That should be fixed now, so an application is always considered active if it has any active connections regardless of their length. It should be going out starting next week.

chriswue · 2022-08-11T01:28:30Z

@ahmelsayed While I appreciate a technical fix for a specific issue I think the bigger picture use-case still needs a robust solution: Don't kill pods that are processing a chunk of work (think pods that pull items of a work-queue and may take many minutes or even hours to process or a client that submits a request that it knows takes a long time to process so it just comes back periodically to check the progress). Relying on the presence of an established connection from a client is not a robust solution.

duglin · 2022-08-18T13:16:46Z

my point exactly! :-)

simonjj · 2024-07-18T22:32:57Z

Closing this for now. Please reopen if you feel strongly.

duglin added the enhancement New feature or request label Apr 8, 2022

torosent self-assigned this Apr 14, 2022

rvanmaanen mentioned this issue Dec 19, 2022

Jobs Support #526

Closed

simonjj closed this as completed Jul 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Long running HTTP-based requests #176

Feature Request: Long running HTTP-based requests #176

duglin commented Apr 8, 2022

chriswue commented Aug 8, 2022

duglin commented Aug 9, 2022

chriswue commented Aug 9, 2022 •

edited

Loading

ahmelsayed commented Aug 10, 2022

chriswue commented Aug 11, 2022

duglin commented Aug 18, 2022

simonjj commented Jul 18, 2024 •

edited

Loading

Feature Request: Long running HTTP-based requests #176

Feature Request: Long running HTTP-based requests #176

Comments

duglin commented Apr 8, 2022

chriswue commented Aug 8, 2022

duglin commented Aug 9, 2022

chriswue commented Aug 9, 2022 • edited Loading

ahmelsayed commented Aug 10, 2022

chriswue commented Aug 11, 2022

duglin commented Aug 18, 2022

simonjj commented Jul 18, 2024 • edited Loading

chriswue commented Aug 9, 2022 •

edited

Loading

simonjj commented Jul 18, 2024 •

edited

Loading