-
Notifications
You must be signed in to change notification settings - Fork 36
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Deadline Config File Issue on Bundle Submission #386
Comments
Hey thanks for the bug report! For a little more context, what code path is your There's a few spots where this can pop up, and some allow you to bypass the setting.
My hunch here is that if it's an interactive submission with defaults then we should set the value to make it easier for users to inspect their job submissions. Otherwise if we're doing batch/background operations we should ignore updating it. |
Yes it's a submission using the CLI with a generated job bundle directory e.g /tmp/xyz123abc using the defaults which are in the config file because there is no param for a storage profile id which now required. Otherwise I would just pass config overrides. Is there an example for batch/background job submission? |
Fixes: aws-deadline#386 Problem: The customer reports that the config file can get clobbered when running many bundle submit commands in parallel. When clobbered, the config file will only contain the job-id for the a submitted job; all of the farm, queue, etc information will be gone. Solution: A standard pattern for concurrent file modification is to write changes to a temp file, and then move that temp file overtop of the config file via a filesystem rename operation. The rename is atomic, so that prevents the file content from being clobbered. Signed-off-by: Daniel Neilson <[email protected]>
Fixes: aws-deadline#386 Problem: The customer reports that the config file can get clobbered when running many bundle submit commands in parallel. When clobbered, the config file will only contain the job-id for the a submitted job; all of the farm, queue, etc information will be gone. Solution: A standard pattern for concurrent file modification is to write changes to a temp file, and then move that temp file overtop of the config file via a filesystem rename operation. The rename is atomic, so that prevents the file content from being clobbered. Signed-off-by: Daniel Neilson <[email protected]>
Fixes: aws-deadline#386 Problem: The customer reports that the config file can get clobbered when running many bundle submit commands in parallel. When clobbered, the config file will only contain the job-id for the a submitted job; all of the farm, queue, etc information will be gone. Solution: A standard pattern for concurrent file modification is to write changes to a temp file, and then move that temp file overtop of the config file via a filesystem rename operation. The rename is atomic, so that prevents the file content from being clobbered. Signed-off-by: Daniel Neilson <[email protected]>
Thanks for the bug report. The fix has been merged and will go out with the next release. We're also adding That should go out at the same time. |
Expected Behaviour
Execute Deadline Cloud submit job bundle in parallel to speed up job submission process without generating an cli errors.
Current Behaviour
When trying to submit twenty job bundles in parallel batches of five jobs the deadline cli starts throwing errors. It appears that after each job submissions deadline cli writes the job id to the .deadline/config file. As such when submitting jobs in parallel there is likely contention when updating the resulting in an issue where all the values for farm_id, queue_id, and storage_profile_id are missing. As such the next job fails. A work around is to use the submit command parameters e.g "--farm-id" etc.... , but storage_profile_id is not included as parameter so any jobs needing to upload a file can't be automated ask it triggers a prompt.
Reproduction Steps
Ensure the .deadline/config is correctly setup with a profile, farm_id, queue_id, and storage_profile_id . Then use openjd.model to generate a job bundle and ProcessPoolExecutor to submit those jobs in parallel in batches of 5 calling the deadline cli from a python subprocess. e.g. "deadline bundle submit --yes -p InFile=/tmp/test_script.py /tmp/tmpy2f5jdu8". Here the cli is using the config to know what farm/queue to submit the bundle.
Sample .deadline/config file:
Eventually you'll get some version of a CalledProcessError when the deadline cli fails to submit the job e.g.
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'deadline bundle submit --yes -p InFile=/tmp/test_script.py /tmp/tmpy2f5jdu8' returned non-zero exit status 1.
When you look at the config file it now reads as follows with all the other configurations missing and only the last successfully submitted job. As such no further job submission works unless you use the options in the cli or fix the config file.
Code Snippet
with ProcessPoolExecutor(max_workers=5) as executor:
futures = set()
for x in range(20):
futures.add(executor.submit(submit_job)
if (x+1) % 5 == 0:
done, futures = wait(futures, return_when=ALL_COMPLETED)
logger.info("next batch....")
futures.clear()
The text was updated successfully, but these errors were encountered: