-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add load lock file to prevent accidental re-loading of data to BQ #7138
Conversation
@@ -338,6 +350,9 @@ task LoadTable { | |||
echo "no ${FILES} files to process in $DIR" | |||
fi | |||
|
|||
# remove load lock | |||
echo "Removing load lock" | |||
gsutil rm "${DIR}${LOCKFILE}" || exit 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering how we deal with this file still being there if the workflow crashes? Even if it's just a "here's why I'm not worried"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's a manual process - you have to go look and figure out what you want to do, based on what had happened when it crashed (e.g. you'd do something different if the bq load went through versus if it didn't). if you do want to re-run, you'd manually delete the lockfile. but definitely open to other ideas for processes? we talked a little bit in standup about how we could try to get fancy and infer the state of things, but landed that for now it could be a manual process of just preventing the bad thing (double loading) and requiring some manual investigation & remedying.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just paraphrase some of that and leave it as a comment for our future selves
.dockstore.yml
Outdated
@@ -57,6 +57,7 @@ workflows: | |||
branches: | |||
- master | |||
- ah_var_store | |||
- mmt_lock_load |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you want this committed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not once i merge, but i'll remove it right before merging, so that it stays in dockstore in case i have to keep testing
add runtime specs use dbus-uuidgen try with tee
7de0f32
to
17e7605
Compare
@@ -209,6 +326,16 @@ task CreateImportTsvs { | |||
command <<< | |||
set -e | |||
|
|||
# check for existence of the correct lockfile | |||
LOCKFILE="~{output_directory}/loadlock" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i might change the filename loadlock
to LOCKFILE
just for clarity. as attached as i am to the lock&load idea, i have already confused myself because loadlock
is kind of a weird name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just one suggestion
…tion for removing lockfile
spec ops issue #248
process implemented here:
SetLoadLock
is called at the beginning ofImportGenomes
- it generates a UUID for the submission, writes that run_uuid to a lock file, and uploads that lock file to the output_directory (where the tsvs will be generated).ReleaseLoadLock
is called that removes the lock file from the bucket (again only if the uuid in the lockfile matches this run)tested and confirmed that:
loadlock
file is created and removed: https://app.terra.bio/#workspaces/broad-dsp-spec-ops-fc/1000G-high-coverage-2019_specops_mmt_test_memory/job_history/b0b9c7a1-70fd-4d44-a76e-b5604a5068f0