Check whether State.Waiting object exists #619
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue: Running AWX-Operator 0.16.1 with AWX version 19.5.1, the AWX-EE container would panic when any job was run, causing the job to fail:
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x11baab8]
goroutine 185 [running]:
github.com/ansible/receptor/pkg/workceptor.podRunningAndReady.func1({{0x146846c, 0xc0005536b8}, {0x162e980, 0xc00030b800}})
/source/pkg/workceptor/kubernetes.go:97 +0x258
k8s.io/client-go/tools/watch.UntilWithoutRetry({0x16414c8, 0xc000380300}, {0x1630028, 0xc00049e360}, {0xc000553958, 0x1, 0x8})
/root/go/pkg/mod/k8s.io/[email protected]/tools/watch/until.go:82 +0x397
k8s.io/client-go/tools/watch.UntilWithSync({0x16414c8, 0xc000380300}, {0x16304d8, 0xc0004c8198}, {0x162e980, 0xc00050e800}, 0x0, {0xc00012d958, 0x1, 0x1})
/root/go/pkg/mod/k8s.io/[email protected]/tools/watch/until.go:153 +0x245
github.com/ansible/receptor/pkg/workceptor.(*kubeUnit).createPod(0xc000344c60, 0x0)
/source/pkg/workceptor/kubernetes.go:231 +0xabb
github.com/ansible/receptor/pkg/workceptor.(*kubeUnit).runWorkUsingLogger(0xc000344c60)
/source/pkg/workceptor/kubernetes.go:272 +0x85
created by github.com/ansible/receptor/pkg/workceptor.(*kubeUnit).startOrRestart
/source/pkg/workceptor/kubernetes.go:823 +0xdb
I confirmed that the awx-ee image was based on the latest available from quay.io, which, based on the action history in the ansible/awx-ee, was built using the latest devel image of Receptor.
I noted that within the podRunningAndReady function, a loop to check the statuses of each non-ready container within a non-ready pod is called, but the code assumes that the container's status is waiting. Per the k8s API documentation at https://pkg.go.dev/k8s.io/api/core/v1#ContainerState, only one of these states may be active at a time. In my case, the container was failing the readiness check due to networking issues, but Receptor communication utilizing a unix socket worked correctly. As such, the container was failing the readiness check, but the container state was Running, not Waiting. This meant that the attempt to call ContainerStatus.State.Waiting.Reason failed with a nil pointer dereference.
In this PR, I have wrapped the faulting code in a check that ensures the ContainerStatus.State.Waiting object is not nil, allowing it to skip over this code if the container state is not Waiting.