Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle multiple set of job logs for restarted jobs #552

Open
vitodb opened this issue Mar 13, 2024 · 0 comments
Open

Handle multiple set of job logs for restarted jobs #552

vitodb opened this issue Mar 13, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@vitodb
Copy link
Contributor

vitodb commented Mar 13, 2024

There are cases of jobs configured with AutoRelease feature that are trying to copy back logs both times they run, but the second time the log copy fails because ifdh cannot override existing file.

An example job is [email protected]
The job was part of POMS4_SUBMISSION_ID:1712364.
Fifebatch Events details show the job got held and released.
IFDH logs for the job show the log copy back failed the second time:

gfal-copy error: 17 (File exists) - Destination https://[redacted]/fermigrid/jobsub/jobs/2024_03_12/6f20c05e-8023-4248-966f-0233d5a3c089/fife_wrap2024_03_12_1822276f20c05e-8023-4248-966f-0233d5a3c089cluster.67623825.0.err exists and overwrite is not set

The job in kibana is reported with Exit code 0, while checking the stdout log we have:

executable was killed: exiting 1
Wed Mar 13 07:20:24 UTC 2024 fife_wrap COMPLETED with exit status 1

which is confusing.
This is happening because the log is for the first time the job ran, while the job exit state kibana is possibly for te second time the job ran.

As discussed at the Jobsub weekly meeting, we could use the NumJobStarts classAd, or something similar, as suffix for the log filename to disentangle logs for each time the job is restarted and so be able to copy them back all, possibly making them available to users.

@shreyb shreyb added the enhancement New feature or request label Mar 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants