-
Notifications
You must be signed in to change notification settings - Fork 590
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add filtering to ExtractCohort #6971
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great! A few minor comments
@@ -46,6 +55,17 @@ protected void onStartup() { | |||
|
|||
VCFHeader header = CommonCode.generateVcfHeader(sampleNames, reference.getSequenceDictionary()); | |||
|
|||
if (minLocation == null && maxLocation == null && hasUserSuppliedIntervals()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is awesome!
fullName = "filter-set-name", | ||
doc = "Name in filter_set_name column of filtering table to use. Which training set should be applied in extract." | ||
) | ||
private String nameOfFilterSet = null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps filterSetName
instead
@@ -85,7 +86,8 @@ public ExtractCohortEngine(final String projectID, | |||
final double vqsLodSNPThreshold, | |||
final double vqsLodINDELThreshold, | |||
final ProgressMeter progressMeter, | |||
final ExtractCohort.QueryMode queryMode) { | |||
final ExtractCohort.QueryMode queryMode, | |||
final String nameOfFilterSet) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
perhaps filterSetName
instead
final String rowRestrictionWithFilterSetName = rowRestriction + " AND " + SchemaUtils.FILTER_SET_NAME + " = '" + nameOfFilterSet + "'"; | ||
|
||
final StorageAPIAvroReader filteringTableAvroReader = new StorageAPIAvroReader(filteringTableRef, rowRestrictionWithFilterSetName, projectID); | ||
final HashMap<Locatable, HashMap<Allele, HashMap<Allele, Double>>> fullVqsLodMap = new HashMap<>(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be good to have a comment about what each of these alleles are? is the first allele reference and the second is the alternate?
final String yng = queryRow.get("yng_status").toString(); | ||
final Allele ref = Allele.create(queryRow.get("ref").toString(), true); | ||
final Allele alt = Allele.create(queryRow.get("alt").toString(), false); | ||
final Locatable locatable = new SimpleInterval(contig, currentPosition, currentPosition); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of locatable, could we just leave it as location since we'll have that when we try to lookup in this table? (ie coming out of the extract BQ?)
} | ||
final VariantContext filteredVC = builder.make(); | ||
return filteredVC; | ||
} | ||
|
||
private <T> LinkedHashMap<Allele, T> remapAllelesInMap(VariantContext vc, HashMap<Allele, HashMap<Allele, T>> datamap, T emptyVal) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A comment for what this does would be great, since even reading it... I'm not sure!
* adding filtering to extractCohort * addressing comments
* adding filtering to extractCohort * addressing comments
* adding filtering to extractCohort * addressing comments
* adding filtering to extractCohort * addressing comments
* adding filtering to extractCohort * addressing comments
I think this is ready for initial review while I start testing it on larger scale data. I've left adding in the full VQSR tranche filtering as TODOs for now.