Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable job timeout #906

Merged
merged 4 commits into from
Mar 2, 2021
Merged
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 38 additions & 7 deletions builtin/bins/dkron-executor-shell/shell.go
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@ package main

import (
"encoding/base64"
"errors"
"fmt"
"log"
"os"
"os/exec"
Expand Down Expand Up @@ -74,12 +76,6 @@ func (s *Shell) ExecuteImpl(args *dktypes.ExecuteRequest, cb dkplugin.StatusHelp
cmd.Stderr = reportingWriter{buffer: output, cb: cb, isError: true}
cmd.Stdout = reportingWriter{buffer: output, cb: cb}

// Start a timer to warn about slow handlers
slowTimer := time.AfterFunc(2*time.Hour, func() {
log.Printf("shell: Script '%s' slow, execution exceeding %v", command, 2*time.Hour)
})
defer slowTimer.Stop()

stdin, err := cmd.StdinPipe()
if err != nil {
return nil, err
Expand All @@ -96,18 +92,53 @@ func (s *Shell) ExecuteImpl(args *dktypes.ExecuteRequest, cb dkplugin.StatusHelp
stdin.Close()

log.Printf("shell: going to run %s", command)

jobTimeout := args.Config["timeout"]
var jt time.Duration

if jobTimeout != "" {
jt, err = time.ParseDuration(jobTimeout)
if err != nil {
return nil, errors.New("shell: Error parsing job timeout")
}
}

err = cmd.Start()
if err != nil {
return nil, err
}

// Warn if buffer is overritten
var jobTimeoutMessage string
var jobWasKilled bool
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The jobWasKilled value actually signifies something different. I suggest renaming it to jobTimedOut, so it is in line with jobTimeoutMessage.


slowTimer := time.AfterFunc(jt, func() {
Copy link
Member

@vcastellm vcastellm Mar 3, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this will be 0s in case there is no timeout set. There should not be timeout when not specifying a timeout.

Could you please open a new PR with the change @andreygolev?

And yes! sorry for my fast typing, thanks for the work here @andreygolev of course!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, sure! :)

err = cmd.Process.Kill()
if err != nil {
jobTimeoutMessage = fmt.Sprintf("shell: Job '%s' execution time exceeding defined timeout %v. SIGKILL returned error. Job probably was not killed", command, jt)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Job probably was not killed" - the probability of this is debatable.
Errors returned can include e.g. Process Done.
In order to prevent digging deep to find out whether it was actually killed or not (on several platforms), maybe just say "Job may not have been killed".

} else {
jobTimeoutMessage = fmt.Sprintf("shell: Job '%s' execution time exceeding defined timeout %v. Job was killed", command, jt)
}

jobWasKilled = true
return
})

defer slowTimer.Stop()

// Warn if buffer is ovewritten
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You made a typo correcting the typo ;D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol :)

if output.TotalWritten() > output.Size() {
log.Printf("shell: Script '%s' generated %d bytes of output, truncated to %d", command, output.TotalWritten(), output.Size())
}

err = cmd.Wait()

if jobWasKilled {
_, err := output.Write([]byte(jobTimeoutMessage))
if err != nil {
log.Printf("Error writing output on timeout event: %v", err)
}
}

// Always log output
log.Printf("shell: Command output %s", output)

Expand Down
7 changes: 7 additions & 0 deletions dkron/job.go
Original file line number Diff line number Diff line change
Expand Up @@ -357,6 +357,13 @@ func (j *Job) Validate() error {
return err
}

if j.Executor == "shell" && j.ExecutorConfig["timeout"] != "" {
_, err := time.ParseDuration(j.ExecutorConfig["timeout"])
if err != nil {
return fmt.Errorf("Error parsing job timeout value")
}
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Would you mind creating a test in api_test.go for this validation, analoguous to the other validation tests?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

return nil
}

Expand Down
4 changes: 3 additions & 1 deletion website/content/usage/executors/shell.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ shell: Run this command using a shell environment
command: The command to run
env: Env vars separated by comma
cwd: Chdir before command run
timeout: Force kill job after specified time. Format: https://golang.org/pkg/time/#ParseDuration.
```

Example
Expand All @@ -25,7 +26,8 @@ Example
"shell": "true",
"command": "my_command",
"env": "ENV_VAR=va1,ANOTHER_ENV_VAR=var2",
"cwd": "/app"
"cwd": "/app",
"timeout": "24h"
}
}
```