-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Relaunch jobs do not get queued when instances disabled #14365
Comments
Hello, May we ask why you are disabling the instances for a given job template? We would like to gain a better understanding of this particular use case. Thank you for your time! |
Hi, thanks for your quick response! When we disable instancesWe disable all instances (not just for a particular job template, but in general) when updating to a more recent AWX version in order not to interrupt our customers jobs in the process. Benefits for usJobs that are triggered during our update process are enqueued ("Pending" state) and executed after reenabling the instances (in our case: complete AWX redeployment). Only relaunched jobs run into above described error, while instances are disabled. More on why we disable isntancesUpdating for us means, updating the AWX operator and redeploying AWX using that newer Operator, which we trigger explicitly because of staging. Of course we would much rather update in a more kubernetes way and use a rolling update strategy (replacing old pods one by one) instead of disabling and redeploying, but as far as we know, that is not yet possible: awx-operator/issues/1275 and awx-operator/issues/1362 But maybe you have some helpful input on that for us, too? :) |
@2and3makes23 Thank you so much for providing this additional information! This is extremely helpful. Could you please also provide us with the trace-back logs that are generated when this occurs? This will be very helpful to us. Thank you again for taking the time to provide all of this information! |
I had a look and I was not able to reproduce this issue. Jobs can be relaunched even when all instances are disabled and the relaunch job goes into "pending" as expected. |
Sorry for the delay @AlanCoding thanks for checking on your side @djyasin please find log output below that is produced for one event of a user clicking job relaunch while all (two) instances are disabled
|
Thanks, that points to some relatively recent code so this is good information. awx/awx/main/dispatch/__init__.py Line 37 in 56230ba
|
I didn't give enough information in my last comment - the ValueError is hit because we have I did not hit this bug in my replication attempt because I was using a hybrid node, which submits tasks locally. Only web pods use this code. This is obviously valid and should get worked on. |
Thanks for looking into this, we really appreciate it ❤️ |
Please confirm the following
[email protected]
instead.)Bug Summary
When all AWX instances are disabled and a former job gets relaunched the following things happen
AWX version
22.5.0
Select the relevant components
Installation method
openshift
Modifications
no
Ansible version
2.12.10
Operating system
CentOS, RHEL
Web browser
Firefox
Steps to reproduce
Expected results
The relaunched job appears under jobs as pending and begins to start as soon as an AWX instance gets reenabled and picks it up
Actual results
internal server error
the job appears in jobs wit status "New" and is stuck there indefinetly
Reenabling instances does not change the state of the relaunched job.
Additional information
After reenabling AWX instances everything works fine again, including job relaunch.
Only the "New" job stays stuck.
The text was updated successfully, but these errors were encountered: