Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: Long running HTTP-based requests #176

Closed
duglin opened this issue Apr 8, 2022 · 7 comments
Closed

Feature Request: Long running HTTP-based requests #176

duglin opened this issue Apr 8, 2022 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@duglin
Copy link

duglin commented Apr 8, 2022

This is a bit of question and request for a feature if my assumptions are correct...

Based on the ACA Deep Dive video from 2022/12/01, the HTTP auto-scaling aspect of apps is based on the number of incoming requests there are to the app. The video implied that metrics from envoy is used to determine the number of instances of the app to have running and, I assume, which instance to route each request to.

This model works fine for short-lived request processing. However, this will run into issues for long running processing or for asynchronous request process where the client (or the app) does not need to worry about any possible response and a 202 is expected to be returned as quickly as possible - while the processing continues in the background.

There are customer who will have scenarios where they generate a large number of HTTP requests and the amount of time needed to process each one can vary - from very quick to hours. Assuming that the inbound connection can remain active for the duration of those long-running processes is risky. Plus, once the connection is broken the HTTP infrastructure of ACA will assume that instance of the app is idle and either send a 2nd request to it (which could violate any concurrency policy defined, e.g. max concurrency of 1), or worse it could scale down that instance and kill the processing.

There are scenarios where either side of the connection (client vs app/server) knows whether or not to immediately return a 202. For example, the client could know that the request it's about to send is either long running or it doesn't care about waiting for a response so it might include some "flag" in the request to indicate its desire for asynchronous processing. How the results are retrieved is a different issue. Likewise, the app/server could know that its processing is such that the clients do not need to wait for the processing to be completed and it'll want to return a 202 immediately while background processing continues.

Aside from the aspects mentioned above, there's also the infrastructure to consider. If there are a large number of incoming requests it means the infrastructure would need to scale up at the same time as the app. So envoy, the activators, etc... might all need to scale to support persistent long lived (hours?) long connections despite them not actually being needed beyond the initial delivery of the request.

Regardless of the reasons, or the mechanisms used to signal the desired semantics to the infrastructure, it would be good for ACA to support the idea of an incoming HTTP request being used to initiate an asynchronous long-running process.

There are many ways to achieve this, and many aspects to consider (e.g. do we want to support any QoS features - e.g. retries) but even w/o the advanced features I think we need to consider some level of support.

@duglin duglin added the enhancement New feature or request label Apr 8, 2022
@torosent torosent self-assigned this Apr 14, 2022
@chriswue
Copy link

chriswue commented Aug 8, 2022

Seems very similar to #24

@duglin
Copy link
Author

duglin commented Aug 9, 2022

I think #24 is different since I think that's talking more about having a manual invoke type of scaling flag on KEDA. This issue is more about changing how we determine when the processing of a request is completed - meaning, assuming "done" == "connection is closed" is risky in some cases.

@chriswue
Copy link

chriswue commented Aug 9, 2022

@duglin Technically maybe but as far as I can see the underlying use-case is the same: Prevent pods from being killed by the autoscaler if they are still processing.
Maybe there should be a busy probe or something and pods that report busy don't get killed.

@ahmelsayed
Copy link
Member

@chriswue There was indeed an issue with applications being marked as inactive despite having active connections if the connection was for longer than 5 minutes. That should be fixed now, so an application is always considered active if it has any active connections regardless of their length. It should be going out starting next week.

@chriswue
Copy link

@ahmelsayed While I appreciate a technical fix for a specific issue I think the bigger picture use-case still needs a robust solution: Don't kill pods that are processing a chunk of work (think pods that pull items of a work-queue and may take many minutes or even hours to process or a client that submits a request that it knows takes a long time to process so it just comes back periodically to check the progress). Relying on the presence of an established connection from a client is not a robust solution.

@duglin
Copy link
Author

duglin commented Aug 18, 2022

my point exactly! :-)

@simonjj
Copy link
Collaborator

simonjj commented Jul 18, 2024

Closing this for now. Please reopen if you feel strongly.

@simonjj simonjj closed this as completed Jul 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants