-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
return an error instead of panicking when failing to get edge #2382
Conversation
ae8af5c
to
44b6e29
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you can reproduce these errors (or incoming open
) then please also try to figure out in what conditions they appear. Reaching these errors should not be possible without another bug.
solver/scheduler.go
Outdated
@@ -351,7 +352,7 @@ type pipeFactory struct { | |||
func (pf *pipeFactory) NewInputRequest(ee Edge, req *edgeRequest) pipe.Receiver { | |||
target := pf.s.ef.getEdge(ee) | |||
if target == nil { | |||
panic("failed to get edge") // TODO: return errored pipe | |||
return pipe.NewErroredPipe(req, fmt.Errorf("failed to get edge")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be easier to call pf.NewFuncRequest(func() {return errors.Errorf()})
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I got it right, this would only fail the sender pipe and not the receiver, which we are looking for in NewInputRequest()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure I understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Said differently, pf.NewFuncRequest(func() {return errors.Errorf()})
doesn't return an errored pipe.Receiver
. If you call .Status().Err
on it, you get nil
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you call .Status().Err on it, you get nil
After you call Receive
you still get nil?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Maybe I'm not getting something right :)
func TestPanic(t *testing.T) {
s := NewSolver(SolverOpt{
ResolveOpFunc: testOpResolver,
})
defer s.Close()
e := edge{}
pf := &pipeFactory{s: s.s, e: &e}
receivedPipe := pf.NewFuncRequest(func(_ context.Context) (interface{}, error) { return nil, errors.Errorf("failed to get edge") })
receivedPipe.Receive()
require.Error(t, receivedPipe.Status().Err)
}
We only need an errored pipe in this one place and therefore I have nothing against dropping the special purpose errored pipe structure I introduce. On the other hand, it's quite simple to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event is already captured by the solver internally.
func TestPanic(t *testing.T) {
f := func(_ context.Context) (interface{}, error) {
return nil, errors.Errorf("failed to get edge")
}
p, start := pipe.NewWithFunction(f)
go start()
for {
if !p.Receiver.Receive() {
time.Sleep(time.Millisecond) // in real code this is achieved with signal
continue
}
require.Error(t, p.Receiver.Status().Err)
break
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In your example you can replace receivedPipe.Receive()
with time.Sleep(time.Second)
. Function runs async (and Receive()
does not block). You can't get the actual function completion time or wait for Receive()
as these are already captured internally in schedulers loop thread.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation !
Would either of you be able to review my comments in #2303 to help me pinpoint the root cause of this error when it occurs in our CI env? Many thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DCO failing + please squash commits
cbf6639
to
0e7de55
Compare
Signed-off-by: Maxime Lagresle <[email protected]>
0e7de55
to
b6d092d
Compare
Would this change work if patched to a slightly older version of buildkit? I have patched it in to the version I'm running in our CI and now get the following panic:
|
@jamesalucas can you try this instead ? Maybe the handling around the pipe was different before:, or you're hitting another bug ? ae8af5c I'll keep my eyes on crashes and the stack trace you posted. |
@maxlaverse looks like this needs a follow-up fix. In https://github.com/moby/buildkit/blob/master/solver/edge.go#L461 the value needs to be Not quite sure why the error says "interface {} is nil," though. You might want to test this by adding a code that manually produces |
Currently, buildkitd crashes with a panic when it can't get an edge in
solver.NewInputRequest()
. This is unfortunate as it affects all the other builds running on the daemon at the same time.As suggested by a comment left in the code, returning an errored pipe is much more elegant and leaves only the problematic build with a
return leaving incoming open
, instead of crashing the daemon.See #2303