Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

panic happened when add podgroup watch #1679

Closed
Crazybean-lwb opened this issue Nov 1, 2022 · 0 comments · Fixed by #1680
Closed

panic happened when add podgroup watch #1679

Crazybean-lwb opened this issue Nov 1, 2022 · 0 comments · Fixed by #1680

Comments

@Crazybean-lwb
Copy link

Crazybean-lwb commented Nov 1, 2022

kubeadm version: v1.20.11
k8s version: 1.20.11
I have build training-operator image with this merged pr.
pytorch job can use gang-scheduler, now. However nil pointer problem happened, when training controller build or watch pytorch jobs .
I have detected problem happened, when log event object in https://github.com/kubeflow/training-operator/blob/master/pkg/common/util/reconciler.go#L113
Some exception may happened.


errors as follows:

time="2022-10-31T11:55:15Z" level=info msg="PyTorchJob=ddp-vol, ReplicaType=Worker expected=1, running=1, succeeded=0, failed=0, Replicas=1"
E1031 11:56:00.164678       1 runtime.go:79] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
goroutine 796 [running]:
k8s.io/apimachinery/pkg/util/runtime.logPanic({0x185a700?, 0x2a0c1a0})
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:75 +0x99
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000064150?})
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:49 +0x75
panic({0x185a700, 0x2a0c1a0})
	/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/sirupsen/logrus.(*Entry).Logf(0xc0003dca00?, 0x1213940?, {0x1ade9f4?, 0xc000f92050?}, {0xc000d29b48?, 0xc00003c000?, 0x47?})
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:337 +0x22
github.com/sirupsen/logrus.(*Entry).Debugf(...)
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:347
github.com/kubeflow/training-operator/pkg/common/util.OnDependentUpdateFunc.func1({{0x1d5b0d0?, 0xc00045e000?}, {0x1d5b0d0?, 0xc000624d80?}})
	/workspace/pkg/common/util/reconciler.go:113 +0x4fd
sigs.k8s.io/controller-runtime/pkg/predicate.Funcs.Update(...)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/predicate/predicate.go:88
sigs.k8s.io/controller-runtime/pkg/source/internal.EventHandler.OnUpdate({{0x1d445a0, 0xc0002c5c20}, {0x1d4d7c8, 0xc00099c1a0}, {0xc0008a0d70, 0x1, 0x1}}, {0x1a6fb40?, 0xc00045e000}, {0x1a6fb40, ...})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/internal/eventsource.go:88 +0x43d
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:816 +0xf7
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00046ea20?)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0007bf738?, {0x1d2b040, 0xc000722c00}, 0x1, 0xc00091a540)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00046ed50?, 0x3b9aca00, 0x0, 0x90?, 0xc0005a0420?)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90
k8s.io/client-go/tools/cache.(*processorListener).run(0xc00036c280?)
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:812 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:73 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:71 +0x85
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x14eb0c2]

goroutine 796 [running]:
k8s.io/apimachinery/pkg/util/runtime.HandleCrash({0x0, 0x0, 0xc000064150?})
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/runtime/runtime.go:56 +0xd7
panic({0x185a700, 0x2a0c1a0})
	/usr/local/go/src/runtime/panic.go:884 +0x212
github.com/sirupsen/logrus.(*Entry).Logf(0xc0003dca00?, 0x1213940?, {0x1ade9f4?, 0xc000f92050?}, {0xc000d29b48?, 0xc00003c000?, 0x47?})
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:337 +0x22
github.com/sirupsen/logrus.(*Entry).Debugf(...)
	/go/pkg/mod/github.com/sirupsen/[email protected]/entry.go:347
github.com/kubeflow/training-operator/pkg/common/util.OnDependentUpdateFunc.func1({{0x1d5b0d0?, 0xc00045e000?}, {0x1d5b0d0?, 0xc000624d80?}})
	/workspace/pkg/common/util/reconciler.go:113 +0x4fd
sigs.k8s.io/controller-runtime/pkg/predicate.Funcs.Update(...)
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/predicate/predicate.go:88
sigs.k8s.io/controller-runtime/pkg/source/internal.EventHandler.OnUpdate({{0x1d445a0, 0xc0002c5c20}, {0x1d4d7c8, 0xc00099c1a0}, {0xc0008a0d70, 0x1, 0x1}}, {0x1a6fb40?, 0xc00045e000}, {0x1a6fb40, ...})
	/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/source/internal/eventsource.go:88 +0x43d
k8s.io/client-go/tools/cache.(*processorListener).run.func1()
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:816 +0xf7
k8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1(0xc00046ea20?)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:155 +0x3e
k8s.io/apimachinery/pkg/util/wait.BackoffUntil(0xc0007bf738?, {0x1d2b040, 0xc000722c00}, 0x1, 0xc00091a540)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:156 +0xb6
k8s.io/apimachinery/pkg/util/wait.JitterUntil(0xc00046ed50?, 0x3b9aca00, 0x0, 0x90?, 0xc0005a0420?)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:133 +0x89
k8s.io/apimachinery/pkg/util/wait.Until(...)
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:90
k8s.io/client-go/tools/cache.(*processorListener).run(0xc00036c280?)
	/go/pkg/mod/k8s.io/[email protected]/tools/cache/shared_informer.go:812 +0x6b
k8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1()
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:73 +0x5a
created by k8s.io/apimachinery/pkg/util/wait.(*Group).Start
	/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:71 +0x85

Originally posted by @liuweibin6566396837 in #1666 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant