Set timeout on a function #300

angel-ivanov · 2018-03-13T09:29:46Z

Bug
Feature
Enhancement

Detailed Description

Sometimes a function run might hang up, so it would be nice if we can set a timeout in the function definition, and then if a run exceeds it, to be terminated.

tenczar · 2018-05-24T00:06:32Z

There are several considerations that need to be made and I'm open to feedback from the team and whoever may be interested.

Our current implementation for function invocation relies on a small web service in each function container that accepts calls from the FaaS to invoke a function. This means that for the life of the pod/container a process is running that is listening for http requests and invokes the desired function in response to those requests. This presents a problem for terminating functions that may have opened resources that need to be released to terminate cleanly.

I see two potential ways to address this problem. The first relies on function code that 'listens' for interrupts which are triggered by our base image implementations after the desired timeout. The downside is that the user must explicitly code their function to listen for these interrupts, clean up their resources and send an error. This is a lot of overhead for functions that should be focused on their business logic.

The other way to make sure that resources are cleaned up in the event of a timeout would be to kill the whole process. That could mean killing and restarting the web service or even killing and restarting the pod. The problem with this approach is that we need to guarantee that the FaaS implementation will never try to invoke the same function more than once currently on the same pod or we will end up terminating a function unrelated to the one that hit the timeout.

One final, unsavory, solution could be to ignore the resource clean up. If a function is run many many times and hits the timeout on most of those runs without cleaning up resources then the pod could become slow and unreliable. Eventually the healthz checks will fail and the pod will be terminated and restarted. If we combine this with a regular pruning check for all the pods we might be able to get away with sticking our heads in the sand on this issue.

How does everyone feel about these options?

imikushin · 2018-05-25T01:32:05Z

I'd suggest we pass a timeout (or, as found in Go context, a deadline, which is an absolute timestamp) in the context and expect the function to respect it.

In function-server (in the base-images), we can watch if the timeouts are being respected and set the pod's health status accordingly. Unhealthy instances will get replaced.

imikushin · 2018-06-08T22:28:35Z

Let's use timeouts in our CLI and API, and deadlines - internally - in the function context. See dispatchframework/java-base-image#14 (comment)

tenczar · 2018-08-13T18:22:01Z

This works has been completed in all of the base images and within Dispatch itself. Timeouts can now be set at function creation.

berndtj added this to the June Preview Release milestone May 8, 2018

berndtj assigned tenczar May 11, 2018

berndtj modified the milestones: June Preview Release, July Preview Release Jun 5, 2018

berndtj modified the milestones: July Preview Release, August Milestone Jul 2, 2018

tenczar closed this as completed Aug 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Set timeout on a function #300

Set timeout on a function #300

angel-ivanov commented Mar 13, 2018

tenczar commented May 24, 2018

imikushin commented May 25, 2018

imikushin commented Jun 8, 2018

tenczar commented Aug 13, 2018

Set timeout on a function #300

Set timeout on a function #300

Comments

angel-ivanov commented Mar 13, 2018

Detailed Description

tenczar commented May 24, 2018

imikushin commented May 25, 2018

imikushin commented Jun 8, 2018

tenczar commented Aug 13, 2018