-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
updates to ImportGenomes and LoadBigQueryData #7112
Conversation
closing - refactoring to one WDL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've got a bunch of questions, but I think they mostly stemmed from the previous structure of Load and Create tables... We should clean that up, but we can also split that out of this PR and make it a different ticket (or part of the productionization ticket).
String numbered | ||
String partitioned | ||
String uuid | ||
Array[String] tsv_creation_done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
by requiring this as an input, this ensures that CreateImportTSVs runs before CreateTables can start. (removed here but added in to LoadTables)
…ptible, rename numbered to superpartitioned
schema = metadata_schema, | ||
superpartitioned = "false", | ||
partitioned = "false", | ||
uuid = "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used the UUID piece before, I think it was from earlier testing but now I would just create a new dataset instead of tables with a prefix. Remove it? (@ahaessly wdyt?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was definitely used for automated integration testing. I think Megan added it. If we wanted to add a uuid to the dataset, I think we would need to create that dataset outside of this wdl. But we should be able to do that in the test itself. Assuming we are not running that integration test, I would say let's go ahead and remove it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok let's keep it for now until we decide what we're doing with integration testing.
schema = metadata_schema, | ||
superpartitioned = "false", | ||
partitioned = "false", | ||
uuid = "", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't used the UUID piece before, I think it was from earlier testing but now I would just create a new dataset instead of tables with a prefix. Remove it? (@ahaessly wdyt?)
input { | ||
String project_id | ||
String dataset_name | ||
String storage_location |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is used anymore?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
* revert input_vcfs to array[file], add this to sample inputs json * add this branch to dockstore * remove this branch from dockstore * add LoadBigQueryData to dockstore, modify check for existing tables, load from github * exit with error if bq load fails * use relative path to import LoadBigQueryData.wdl * refactor ImportGenomes to contain BQ table creation and loading * remove for_testing_only * docker -> docker_final * last wdl fix please * remove #done * add back done - end of for loop * remove LoadBigQueryData wdl * ensure tsv creation before making bq tables * run CreateTables concurrently, clean up old code, LoadTable not preemptible, rename numbered to superpartitioned * pad table id to 3 digits * fix padded table id * fix padded logic again * fix range for table_id * remove unused import * remove feature branch from dockstore.yml
* revert input_vcfs to array[file], add this to sample inputs json * add this branch to dockstore * remove this branch from dockstore * add LoadBigQueryData to dockstore, modify check for existing tables, load from github * exit with error if bq load fails * use relative path to import LoadBigQueryData.wdl * refactor ImportGenomes to contain BQ table creation and loading * remove for_testing_only * docker -> docker_final * last wdl fix please * remove #done * add back done - end of for loop * remove LoadBigQueryData wdl * ensure tsv creation before making bq tables * run CreateTables concurrently, clean up old code, LoadTable not preemptible, rename numbered to superpartitioned * pad table id to 3 digits * fix padded table id * fix padded logic again * fix range for table_id * remove unused import * remove feature branch from dockstore.yml
* revert input_vcfs to array[file], add this to sample inputs json * add this branch to dockstore * remove this branch from dockstore * add LoadBigQueryData to dockstore, modify check for existing tables, load from github * exit with error if bq load fails * use relative path to import LoadBigQueryData.wdl * refactor ImportGenomes to contain BQ table creation and loading * remove for_testing_only * docker -> docker_final * last wdl fix please * remove #done * add back done - end of for loop * remove LoadBigQueryData wdl * ensure tsv creation before making bq tables * run CreateTables concurrently, clean up old code, LoadTable not preemptible, rename numbered to superpartitioned * pad table id to 3 digits * fix padded table id * fix padded logic again * fix range for table_id * remove unused import * remove feature branch from dockstore.yml
changes in this PR:
bq load
step fails (workflow was silently succeeding when this step failed)bq show
rather than the csv file - this should still be safe against a race condition because of @ericsong 's refactoring to prevent theCreateTables
step from being scatteredtesting: