feat: Job deadlines #88

dalehamel · 2019-09-16T23:58:41Z

Fixes #80 and paves the way for #11

I think this is pretty elegant and I have to give the credit for both ideas to @jerr

This allows for ensuring that bpftrace is signalled with SIGINT to dump its map before exiting.

Have tested this and it works more reliably than the interactive traces via TTY attach.

A nice benefit is you can now collect data for a pre-set interval before exiting 😂

ghost

Found some fixes!

P.S. share your ideas, feedbacks or issues with us at https://github.com/fixmie/feedback (this message will be removed after the beta stage).

pkg/cmd/run.go

Co-Authored-By: fixmie[bot] <44270338+fixmie[bot]@users.noreply.github.com>

pkg/tracejob/job.go

This allows for bpftrace to take some extra time to process and print bpf map data. For very large maps or in some edge cases the user may want to override this value

pkg/tracejob/job.go

This is configurable now :)

dalehamel · 2019-09-17T00:42:16Z

thanks @leodido

BTW one side-effect of this is that anything that trips the job deadline will appear as a failed job. That's not ideal, but if we check the exit status of the container (ie, if it actually succeeded) before the pod is GC'd we should be able to rescue the true exit status.

For now i don't think it much matters if a job passed its deadline reports as failed, even if this was expected. One way around this would be to have the tracejob runner time out, so that the job exits cleaning and shows as completed if this happens.

I think that can be addressed in a separate PR though unless anyone feels strongly.

leodido

Ok the pipeline job is not green for the reason explained by @dalehamel but this PR looks amazing!

Nevertheless this can get merged in since that adjustment/fix will be approached in another PR soon (ideally before receiving other PRs from non-maintainers).

dalehamel · 2019-09-18T03:27:38Z

Before merging I want to try and let the trace runner die gracefully, I don't feel right making 'failed jobs' a norm

…ompleted

dalehamel · 2019-09-18T04:05:44Z

pkg/tracejob/job.go

@@ -184,6 +187,11 @@ func (t *TraceJobClient) DeleteJobs(nf TraceJobFilter) error {
 func (t *TraceJobClient) CreateJob(nj TraceJob) (*batchv1.Job, error) {

 	bpfTraceCmd := []string{
+		"/bin/timeout",


I'm pretty happy with this solution, I tested it out and it allows trace jobs to complete within their deadline and print their maps. It's also a more reliable way to ensure maps actually do get printed, and access their log data.

dalehamel · 2019-09-18T04:10:02Z

@leodido can you take another look please?

I basically just added the timeout command, and upped the k8s deadline to include the grace period. This should give plenty of time for the process to shut down cleanly, so it doesn't need to rely on the pre-stop hook.

So, if a job times out, we will have a failed job that is past its activity deadline, similar to exit 124 from the timeout command, indicating a job did actually pass its deadline and didn't exit as it was supposed to. This should ideally be a rare case.

In most cases, we should see that the job is able to complete and get the output from the logs, even if it is a map or histogram.

leodido

💣 I like it!

dalehamel added 2 commits September 16, 2019 19:53

Require all tracejobs to have a deadline, allow overriding this

56b8dab

Add pre-stop hook to print maps before exit

7b8ccc0

dalehamel requested review from leodido and fntlnz September 16, 2019 23:58

dalehamel changed the title ~~Job deadlines~~ Feat - Job deadlines Sep 16, 2019

ghost reviewed Sep 16, 2019

View reviewed changes

pkg/cmd/run.go Outdated Show resolved Hide resolved

Update pkg/cmd/run.go

2c03046

Co-Authored-By: fixmie[bot] <44270338+fixmie[bot]@users.noreply.github.com>

dalehamel commented Sep 17, 2019

View reviewed changes

pkg/tracejob/job.go Outdated Show resolved Hide resolved

dalehamel commented Sep 17, 2019

View reviewed changes

pkg/tracejob/job.go Outdated Show resolved Hide resolved

dalehamel mentioned this pull request Sep 17, 2019

RFC: Scheduled traces #11

Open

Add configurable deadline grace period

6fef7b6

This allows for bpftrace to take some extra time to process and print bpf map data. For very large maps or in some edge cases the user may want to override this value

leodido reviewed Sep 17, 2019

View reviewed changes

pkg/tracejob/job.go Outdated Show resolved Hide resolved

chore: remove comment

80cd6b8

This is configurable now :)

leodido self-requested a review September 17, 2019 00:49

leodido previously approved these changes Sep 17, 2019

View reviewed changes

dalehamel added 2 commits September 18, 2019 00:03

Merge remote-tracking branch 'origin/master' into job-deadlines

7029bba

Allow adequate time for bpftrace to shut down, so jobs will show as c…

c0f61a9

…ompleted

dalehamel dismissed leodido’s stale review via c0f61a9 September 18, 2019 04:04

dalehamel commented Sep 18, 2019

View reviewed changes

Fix error in format string on deadline grace period

7f9d490

leodido self-requested a review September 18, 2019 23:13

leodido approved these changes Sep 18, 2019

View reviewed changes

dalehamel changed the title ~~Feat - Job deadlines~~ feat: Job deadlines Sep 18, 2019

dalehamel merged commit 7da686a into master Sep 18, 2019

dalehamel deleted the job-deadlines branch September 18, 2019 23:16

lainra mentioned this pull request Oct 4, 2019

No "/bin/timeout" binary in the trace container #97

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Job deadlines #88

feat: Job deadlines #88

dalehamel commented Sep 16, 2019

ghost left a comment

dalehamel commented Sep 17, 2019

leodido left a comment •

edited

Loading

dalehamel commented Sep 18, 2019

dalehamel Sep 18, 2019

dalehamel commented Sep 18, 2019

leodido left a comment

feat: Job deadlines #88

feat: Job deadlines #88

Conversation

dalehamel commented Sep 16, 2019

ghost left a comment

Choose a reason for hiding this comment

dalehamel commented Sep 17, 2019

leodido left a comment • edited Loading

Choose a reason for hiding this comment

dalehamel commented Sep 18, 2019

dalehamel Sep 18, 2019

Choose a reason for hiding this comment

dalehamel commented Sep 18, 2019

leodido left a comment

Choose a reason for hiding this comment

leodido left a comment •

edited

Loading