Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fail kubernetes work units if image pull error #521

Closed
kdelee opened this issue Jan 18, 2022 · 0 comments · Fixed by #522
Closed

Fail kubernetes work units if image pull error #521

kdelee opened this issue Jan 18, 2022 · 0 comments · Fixed by #522

Comments

@kdelee
Copy link
Member

kdelee commented Jan 18, 2022

In AWX i'm experimenting with longer image pending timeout times, because I want to be able to use resource requests and let the jobs get scheduled as resources are available.

But that means that I want Pending jobs to really be Pending.

We have a test case where we try and start a job with a image we know is not pullable because it requires a pull secret we don't configure correctly.

The expecation is that this job fails. Currently, it only does this if we exceed the pod pending timeout, but it should fail sooner.

Example:
This work unit has the status Pending:

'wyclp8oe': {'Detail': 'Pod created',                                                                                                                                                                             
              'ExtraData': {'Command': '',                                                                                                                                                                         
                            'Image': '',                                                                                                                                                                           
                            'KubeConfig': '',                                                                                                                                                                      
                            'KubeNamespace': 'dmesseahl2',
                            'KubePod': '',
                            'Params': '',
                            'PodName': 'automation-job-1542-6f6sp'},
              'State': 0,
              'StateName': 'Pending',
              'StdoutSize': 0,
              'WorkType': 'kubernetes-runtime-auth'},

But the status in kubernetes is:

NAME                        READY   STATUS             RESTARTS   AGE
automation-job-1542-6f6sp   0/1     ImagePullBackOff   0          25m

This type of error generally does not resolve itself, and we should move the work unit status to Error with the detail of "Image pull backoff"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant