Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DWS/Rabbit interactions failed causes job to get indefinitely stuck in dws-epilog #231

Open
grondo opened this issue Oct 23, 2024 · 0 comments

Comments

@grondo
Copy link
Contributor

grondo commented Oct 23, 2024

On elcap we had a job stuck in the dws-epilog after a "DWS/Rabbit interactions failed" exception.

[Oct23 07:34] finish status=0
[  +0.007790] epilog-start description="job-manager.epilog"
[  +0.007843] epilog-start description="dws-epilog"
[  +0.144160] release ranks="all" final=true
[  +2.011065] epilog-finish description="job-manager.epilog" status=0
[Oct23 07:39] exception type="exception" severity=0 note="DWS/Rabbit interactions failed" userid=765
[Oct23 08:24] exception type="cancel" severity=0 note="" userid=62120
[Oct23 08:50] exception type="cancel" severity=0 note="" userid=62120

The job required manual cleanup by posting the epilog-finish event for dws-epilog. I'm not sure if this was the right approach, but perhaps the dws-epilog should somehow be canceled by the "DWS/Rabbit interactions failed" exception.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant