return an error instead of panicking when failing to get edge #2382

maxlaverse · 2021-09-28T20:18:22Z

Currently, buildkitd crashes with a panic when it can't get an edge in solver.NewInputRequest(). This is unfortunate as it affects all the other builds running on the daemon at the same time.

As suggested by a comment left in the code, returning an errored pipe is much more elegant and leaves only the problematic build with a return leaving incoming open, instead of crashing the daemon.

See #2303

tonistiigi

If you can reproduce these errors (or incoming open) then please also try to figure out in what conditions they appear. Reaching these errors should not be possible without another bug.

tonistiigi · 2021-09-29T00:37:04Z

solver/scheduler.go

@@ -351,7 +352,7 @@ type pipeFactory struct {
 func (pf *pipeFactory) NewInputRequest(ee Edge, req *edgeRequest) pipe.Receiver {
 	target := pf.s.ef.getEdge(ee)
 	if target == nil {
-		panic("failed to get edge") // TODO: return errored pipe
+		return pipe.NewErroredPipe(req, fmt.Errorf("failed to get edge"))


would it be easier to call pf.NewFuncRequest(func() {return errors.Errorf()}) here?

If I got it right, this would only fail the sender pipe and not the receiver, which we are looking for in NewInputRequest()

not sure I understand.

Said differently, pf.NewFuncRequest(func() {return errors.Errorf()}) doesn't return an errored pipe.Receiver. If you call .Status().Err on it, you get nil

If you call .Status().Err on it, you get nil

After you call Receive you still get nil?

Yes. Maybe I'm not getting something right :)

func TestPanic(t *testing.T) { s := NewSolver(SolverOpt{ ResolveOpFunc: testOpResolver, }) defer s.Close() e := edge{} pf := &pipeFactory{s: s.s, e: &e} receivedPipe := pf.NewFuncRequest(func(_ context.Context) (interface{}, error) { return nil, errors.Errorf("failed to get edge") }) receivedPipe.Receive() require.Error(t, receivedPipe.Status().Err) }

We only need an errored pipe in this one place and therefore I have nothing against dropping the special purpose errored pipe structure I introduce. On the other hand, it's quite simple to understand.

The event is already captured by the solver internally.

func TestPanic(t *testing.T) { f := func(_ context.Context) (interface{}, error) { return nil, errors.Errorf("failed to get edge") } p, start := pipe.NewWithFunction(f) go start() for { if !p.Receiver.Receive() { time.Sleep(time.Millisecond) // in real code this is achieved with signal continue } require.Error(t, p.Receiver.Status().Err) break } }

In your example you can replace receivedPipe.Receive() with time.Sleep(time.Second). Function runs async (and Receive() does not block). You can't get the actual function completion time or wait for Receive() as these are already captured internally in schedulers loop thread.

Thanks for the explanation !

jamesalucas · 2021-09-29T17:01:01Z

Would either of you be able to review my comments in #2303 to help me pinpoint the root cause of this error when it occurs in our CI env? Many thanks!

tonistiigi

DCO failing + please squash commits

solver/scheduler.go

Signed-off-by: Maxime Lagresle <[email protected]>

jamesalucas · 2021-10-01T08:11:01Z

Would this change work if patched to a slightly older version of buildkit? I have patched it in to the version I'm running in our CI and now get the following panic:

panic: interface conversion: interface {} is nil, not *solver.edgeState

goroutine 14 [running]:
github.com/moby/buildkit/solver.(*edge).processUpdate(0xc000392b40, 0x180c4d0, 0xc00220ab40, 0x451025)
        /src/solver/edge.go:461 +0x21ac
github.com/moby/buildkit/solver.(*edge).unpark(0xc000392b40, 0xc006413d10, 0x1, 0x1, 0xc006413d50, 0x1, 0x1, 0xc006413d20, 0x1, 0x1, ...)
        /src/solver/edge.go:326 +0x7d
github.com/moby/buildkit/solver.(*scheduler).dispatch(0xc00033d810, 0xc000392b40)
        /src/solver/scheduler.go:136 +0x425
github.com/moby/buildkit/solver.(*scheduler).loop(0xc00033d810)
        /src/solver/scheduler.go:104 +0x168
created by github.com/moby/buildkit/solver.newScheduler
        /src/solver/scheduler.go:35 +0x1b3
Error: buildkit process has exited

maxlaverse · 2021-10-01T09:22:20Z

@jamesalucas can you try this instead ? Maybe the handling around the pipe was different before:, or you're hitting another bug ? ae8af5c

I'll keep my eyes on crashes and the stack trace you posted.

tonistiigi · 2021-10-01T14:49:26Z

@maxlaverse looks like this needs a follow-up fix. In https://github.com/moby/buildkit/blob/master/solver/edge.go#L461 the value needs to be edgeState. Returning &req.edgeState might do it or changing the processing to avoid converting interface after error.

Not quite sure why the error says "interface {} is nil," though. You might want to test this by adding a code that manually produces getedge() error x% of the time.

maxlaverse force-pushed the panic_failed_to_get_edge branch from ae8af5c to 44b6e29 Compare September 28, 2021 20:18

tonistiigi reviewed Sep 29, 2021

View reviewed changes

solver/scheduler.go Outdated Show resolved Hide resolved

maxlaverse force-pushed the panic_failed_to_get_edge branch from cbf6639 to 0e7de55 Compare September 30, 2021 07:25

return an error instead of panicking when failing to get edge

b6d092d

Signed-off-by: Maxime Lagresle <[email protected]>

maxlaverse force-pushed the panic_failed_to_get_edge branch from 0e7de55 to b6d092d Compare September 30, 2021 12:17

tonistiigi approved these changes Sep 30, 2021

View reviewed changes

tonistiigi merged commit b8e4ed1 into moby:master Sep 30, 2021

maxlaverse deleted the panic_failed_to_get_edge branch October 1, 2021 08:25

maxlaverse mentioned this pull request Oct 1, 2021

don't cast Value when pipe is errored #2385

Merged

tonistiigi added the needs-cherry-pick/v0.9 label Oct 2, 2021

tonistiigi mentioned this pull request Oct 4, 2021

[v0.9] cherry-picks #2391

Merged

vladaionescu mentioned this pull request Oct 4, 2021

Error: buildkit process has exited earthly/earthly#1220

Open

crazy-max added this to the v0.9.1 milestone Feb 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

return an error instead of panicking when failing to get edge #2382

return an error instead of panicking when failing to get edge #2382

maxlaverse commented Sep 28, 2021 •

edited

Loading

tonistiigi left a comment

tonistiigi Sep 29, 2021

maxlaverse Sep 29, 2021

tonistiigi Sep 29, 2021

maxlaverse Sep 29, 2021

tonistiigi Sep 29, 2021 •

edited

Loading

maxlaverse Sep 29, 2021

tonistiigi Sep 29, 2021

tonistiigi Sep 29, 2021

maxlaverse Sep 29, 2021

jamesalucas commented Sep 29, 2021

tonistiigi left a comment

jamesalucas commented Oct 1, 2021

maxlaverse commented Oct 1, 2021

tonistiigi commented Oct 1, 2021

return an error instead of panicking when failing to get edge #2382

return an error instead of panicking when failing to get edge #2382

Conversation

maxlaverse commented Sep 28, 2021 • edited Loading

tonistiigi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tonistiigi Sep 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jamesalucas commented Sep 29, 2021

tonistiigi left a comment

Choose a reason for hiding this comment

jamesalucas commented Oct 1, 2021

maxlaverse commented Oct 1, 2021

tonistiigi commented Oct 1, 2021

maxlaverse commented Sep 28, 2021 •

edited

Loading

tonistiigi Sep 29, 2021 •

edited

Loading