Skip to content
This repository has been archived by the owner on Nov 16, 2020. It is now read-only.

Set timeout on a function #300

Closed
1 of 3 tasks
angel-ivanov opened this issue Mar 13, 2018 · 4 comments
Closed
1 of 3 tasks

Set timeout on a function #300

angel-ivanov opened this issue Mar 13, 2018 · 4 comments
Assignees

Comments

@angel-ivanov
Copy link

  • Bug
  • Feature
  • Enhancement

Detailed Description

Sometimes a function run might hang up, so it would be nice if we can set a timeout in the function definition, and then if a run exceeds it, to be terminated.

@berndtj berndtj added this to the June Preview Release milestone May 8, 2018
@tenczar
Copy link
Contributor

tenczar commented May 24, 2018

There are several considerations that need to be made and I'm open to feedback from the team and whoever may be interested.

Our current implementation for function invocation relies on a small web service in each function container that accepts calls from the FaaS to invoke a function. This means that for the life of the pod/container a process is running that is listening for http requests and invokes the desired function in response to those requests. This presents a problem for terminating functions that may have opened resources that need to be released to terminate cleanly.

I see two potential ways to address this problem. The first relies on function code that 'listens' for interrupts which are triggered by our base image implementations after the desired timeout. The downside is that the user must explicitly code their function to listen for these interrupts, clean up their resources and send an error. This is a lot of overhead for functions that should be focused on their business logic.

The other way to make sure that resources are cleaned up in the event of a timeout would be to kill the whole process. That could mean killing and restarting the web service or even killing and restarting the pod. The problem with this approach is that we need to guarantee that the FaaS implementation will never try to invoke the same function more than once currently on the same pod or we will end up terminating a function unrelated to the one that hit the timeout.

One final, unsavory, solution could be to ignore the resource clean up. If a function is run many many times and hits the timeout on most of those runs without cleaning up resources then the pod could become slow and unreliable. Eventually the healthz checks will fail and the pod will be terminated and restarted. If we combine this with a regular pruning check for all the pods we might be able to get away with sticking our heads in the sand on this issue.

How does everyone feel about these options?

@imikushin
Copy link
Contributor

I'd suggest we pass a timeout (or, as found in Go context, a deadline, which is an absolute timestamp) in the context and expect the function to respect it.

In function-server (in the base-images), we can watch if the timeouts are being respected and set the pod's health status accordingly. Unhealthy instances will get replaced.

@berndtj berndtj modified the milestones: June Preview Release, July Preview Release Jun 5, 2018
@imikushin
Copy link
Contributor

Let's use timeouts in our CLI and API, and deadlines - internally - in the function context. See dispatchframework/java-base-image#14 (comment)

@berndtj berndtj modified the milestones: July Preview Release, August Milestone Jul 2, 2018
@tenczar
Copy link
Contributor

tenczar commented Aug 13, 2018

This works has been completed in all of the base images and within Dispatch itself. Timeouts can now be set at function creation.

@tenczar tenczar closed this as completed Aug 13, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants