-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter Failing QC Sites from Extract #7201
Conversation
|
||
String query_project = data_project | ||
|
||
String fq_sample_table = "~{data_project}.~{default_dataset}.sample_info" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens here if default_dataset is not defined?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I struggled with this a little bit to have a principle of least surprise...
If you don't supply a default_dataset AND you don't override these parameters... things won't work (ie the dataset reference won't be valid). Not much we can do there since there isn't great parameter validation in WDL with xor or groups of settings, etc.
We could make "default_dataset" required. That might make more sense since the vast majority of times... you mean to set that and having the validation would be nice. And in the rare case where someone wants to override all the table parameters... they'll still have to specify the default even though technically it's not needed. But that's the exceptional case... wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After the slack convo -- I think default_dataset
should be non-optional.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a slight preference for making default_dataset
required. Like you said, if it's going to be used the vast majority of the time, having it in the "required" section of inputs is going to lead to better outcomes than if it's a random optional input that really should most of the time be defined.
In the rarer cases where someone provides the fq table inputs, default_dataset will just get ignored, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍
fullYngMap.putIfAbsent(location, new HashMap<>()); | ||
fullYngMap.get(location).putIfAbsent(ref, new HashMap<>()); | ||
fullYngMap.get(location).get(ref).put(alt, yng); | ||
try (StorageAPIAvroReader reader = new StorageAPIAvroReader(filterSetInfoTableRef, rowRestrictionWithFilterSetName, projectID)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the try
here with no except
? does it take care of closing the reader when you're done?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes! It's like a with
in python
@@ -505,8 +557,7 @@ private VariantContext filterVariants(VariantContext mergedVC, HashMap<Allele, H | |||
builder.filter(GATKVCFConstants.VQSR_FAILURE_INDEL); | |||
} | |||
} else { | |||
// If VQSR dropped this site (there's no YNG or VQSLOD) then we'll filter it as a NAY. | |||
builder.filter(GATKVCFConstants.NAY_FROM_YNG); | |||
// per-conversation with Laura, if there is no information we let the site pass (ie no data does not imply failure) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice
@@ -48,4 +49,7 @@ public ExtractCohortFilterRecord(GenericRecord genericRecord) { | |||
|
|||
public String getAltAllele() { return this.altAllele; } | |||
|
|||
public String toString() { | |||
return ReflectionToStringBuilder.toString(this); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha..I was sick of the ExtractCohortFilterRecord@12318f
coming out in the logger messages :D
@@ -52,7 +52,7 @@ public StorageAPIAvroReader(final TableReference tableRef, String parentProjectI | |||
public StorageAPIAvroReader(final TableReference tableRef, final String rowRestriction, String parentProjectId) { | |||
|
|||
try { | |||
logger.info("Using Storage API from " + tableRef + " with '" + rowRestriction + "'"); | |||
logger.info("Using Storage API from " + tableRef.getFQTableName() + " with '" + rowRestriction + "'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you for fixing this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧹
String data_project | ||
String? default_dataset | ||
|
||
String query_project = data_project |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this make query_project
optional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the sense that "you don't have to put it in the WDL" -- yes.
I have no idea what the difference is between
String? query_project = "foo"
and
String query_project = "foo"
In either case... if you supply the workflow input it overrides the "foo" and if you don't supply the workflow value, it doesn't complain (ie the "foo" is used).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice, and per Slack thread, we want it this way (and not String?
) so that the user can't override query_project to null. 👍
Support for filtering sites based on
filter_set_sites
informationCleaned up some naming as well, wired into WDL (along with tranches parameter)