-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
First cut at a python notebook to validate inputs. #7845
Conversation
Codecov Report
@@ Coverage Diff @@
## ah_var_store #7845 +/- ##
================================================
Coverage ? 86.305%
Complexity ? 35190
================================================
Files ? 2170
Lines ? 164837
Branches ? 17775
================================================
Hits ? 142263
Misses ? 16251
Partials ? 6323 |
49cce77
to
8b496ac
Compare
"sample_set = fapi.get_entity(ws_project, ws_name, \"sample_set\", sample_set_id).json()\n", | ||
"if (\"attributes\" not in sample_set):\n", | ||
" errors_seen = True\n", | ||
" print(\"ERROR: Looking up \" + sample_set_id + \": ''\" + sample_set[\"message\"] + \"''\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure what's up with these double single quotes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have cleaned those all up.
" errors_seen = True\n", | ||
" print(\"ERROR: Looking up \" + sample_set_id + \": ''\" + sample_set[\"message\"] + \"''\")\n", | ||
" \n", | ||
"attributes = sample_set[\"attributes\"]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't this going to blow up if the attribute wasn't present on line 136?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! Thanks - have addressed this.
"num_pages = int(math.ceil(float(entity_count) / page_size))\n", | ||
"\n", | ||
"# get entities by page where each page has page_size # of rows using API call\n", | ||
"#print('Getting all {num_pages} pages of entity data.')\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
debug cruft
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gone
"for page in tqdm(range(1, num_pages + 1)):\n", | ||
" page_of_entitites = fapi.get_entities_query(ws_project, ws_name, etype, page=page,\n", | ||
" page_size=page_size).json()#, sort_direction='asc',\n", | ||
"# filter_terms=attribute_names).json()\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gone too.
"\n", | ||
"# Inspect samples table - determine possibe names for reblocked_gvcfs.\n", | ||
"etype = 'sample'\n", | ||
"entity_types = fapi.list_entity_types(ws_project, ws_name).json()\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we check that the entity types we care about are actually in here (sample_set, sample, etc)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good suggestion - incorporated in latest commit.
" reblocked_gvcf_index_name = reblocked_gvcf_index.split('/')[-1]\n", | ||
" if (reblocked_gvcf_index_name != expected_reblocked_gvcf_index_name):\n", | ||
" errors_seen = True\n", | ||
" print(\"ERROR: Did not find expected index file (named: ''\" + expected_reblocked_gvcf_index_name + \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
paren and double single quote issues
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have cleaned those all up.
" if (error_seen):\n", | ||
" errors_seen = True\n", | ||
"\n", | ||
" # Inspect sample table - determine possible names for reblocked_gvcf indices.\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fun fact: three counts of indices
, five counts of indexes
😄
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I go back and forth on that.
" field_name = list(field_names_found)[0]\n", | ||
" else:\n", | ||
" error_seen = True\n", | ||
" print(f\"ERROR: There are multiple columns in the 'samples' datatable {str(field_names_found)} that potentially contain reblocked gvcfs\")\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the sample
datatable here and on line 68?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good except for some small nits (and one more for good measure, "sample set" or "sample_set" not both).
"This python notebook is intended to allow you to quickly validate the inputs for a Joint Call Set.\n", | ||
"To run it:\n", | ||
"\n", | ||
"Define the variable `sample_set_id` (below) to the name of the sample_set (in the current workspace) containing the list of samples to process\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
"Define the variable `sample_set_id` (below) to the name of the sample_set (in the current workspace) containing the list of samples to process\n", | |
"Define the variable `sample_set_id` (below) to the name of the sample_set (in the current workspace) containing the list of samples to process.\n", |
"- the sample set that you have listed is found\n", | ||
"- there are no duplicate samples in the sample set\n", | ||
"- there are no empty sample names in the sample set\n", | ||
"- each sample has a corresponding reblocked_gvcf index\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe
"- each sample has a corresponding reblocked_gvcf index\n", | |
"- each sample has a reblocked gVCF and a corresponding reblocked gVCF index\n", |
Working for sample_set now. Added Introduction Fixed error in looking up reblocked gvcf.
96b1120
to
3dc3916
Compare
Working for sample_set now.