-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add splitting for allele-specific annotations and ADs to VariantsToTable #5697
Conversation
Codecov Report
@@ Coverage Diff @@
## master #5697 +/- ##
===============================================
+ Coverage 86.999% 87.077% +0.078%
+ Complexity 32110 31929 -181
===============================================
Files 1974 1942 -32
Lines 147249 146962 -287
Branches 16218 16249 +31
===============================================
- Hits 128105 127970 -135
+ Misses 13236 13070 -166
- Partials 5908 5922 +14
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Back to you @ldgauthier
@@ -102,6 +116,7 @@ | |||
) | |||
@DocumentedFeature | |||
public final class VariantsToTable extends VariantWalker { | |||
private static final VariantsToTable instance = new VariantsToTable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Come back to this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I killed the noop unit tests I could get rid of this.
@Argument(shortName="ASF", doc="The name of an allele-specific INFO field to be split if present", optional=true) | ||
private List<String> asFieldsToTake = new ArrayList<>(); | ||
|
||
@Argument(shortName="ASGF", doc="The name of an allele-specific INFO field to be split if present", optional=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
INFO -> FORMAT
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
throw new UserException("There are no samples and no fields - no output will be produced"); | ||
} | ||
} | ||
} | ||
|
||
if (asGenotypeFieldsToTake.isEmpty() && asFieldsToTake.isEmpty() && (!splitMultiAllelic)) { | ||
logger.warn("Allele-specific fields will only be split if splitting multi-allelic variants is specified (`--split-multi-allelics` or `-SMA`"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual argument is "--split-multi-allelic" singular. Better to extract a variable to avoid something like this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
throw new UserException("There are no samples and no fields - no output will be produced"); | ||
} | ||
} | ||
} | ||
|
||
if (asGenotypeFieldsToTake.isEmpty() && asFieldsToTake.isEmpty() && (!splitMultiAllelic)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for parentheses around (!splitMultiAllelic)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
final String genotypeHeader = createGenotypeHeader(); | ||
final String separator = (!baseHeader.isEmpty() && !genotypeHeader.isEmpty()) ? "\t" : ""; | ||
outputStream.println(baseHeader + separator + genotypeHeader); | ||
String line = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this series of if blocks can be shortened (and made clearer) by combining the lists first, as follows:
final List<String> fields = new ArrayList<>();
fields.addAll(fieldsToTake);
fields.addAll(asFieldsToTake);
final String header = new StringBuilder(Utils.join("\t", fields))
.append("\t")
.append(createGenotypeHeader())
.toString();
outputStream.println(header);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done -- thanks for the snippet
|
||
final int numRecordsToProduce = splitMultiAllelic ? vc.getAlternateAlleles().size() : 1; | ||
final List<List<String>> records = new ArrayList<>(numRecordsToProduce); | ||
|
||
final int numFields; | ||
final boolean addGenotypeFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty(); | ||
final boolean hasGFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
genotypeFiledsToTake
defaults to an empty array, so you don't need to check that it's not null. I would do away with hasGFields
and hasASGFields
and just say final boolean addGenotypeFields = !genotypeFieldsToTake.isEmpty() || !asGenotypeFieldsToTake.isEmpty();
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
final boolean addGenotypeFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty(); | ||
final boolean hasGFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty(); | ||
final boolean hasASGFields = asGenotypeFieldsToTake != null && !asGenotypeFieldsToTake.isEmpty(); | ||
final boolean addGenotypeFields = hasGFields || hasASGFields; | ||
if ( addGenotypeFields ) { | ||
numFields = fieldsToTake.size() + genotypeFieldsToTake.size() * samples.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically speaking you want to add the number of allele specific fields to numFields
, but it's not an error because numFields is just an initial list size. In my opinion getting the right initial list size here isn't worth four lines of code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point.
* @param result the cummulative output | ||
* @param alleleCount scalar, R-type or A-type values | ||
*/ | ||
private static void addAlleleSpecificFieldValue(final Object val, final List<List<String>> result, final VCFHeaderLineCount alleleCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you have bandwidth/will power, it would be great if you could revamp this legacy code that uses instanceof
and write separate methods that take parameters of different types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not. (You'll note that the date on the unit test file was June of last year.) Ordinarily I loathe instanceof
, but after vacillating for a while, I decided to let it ride because performance isn't really an issue here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha sounds good
|
||
@Test | ||
public void testExtractFields() throws Exception { | ||
final VariantsToTable VTTtest = VariantsToTable.getInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to say you can replace this with new VariantsToTable()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But really the root of the problem is that extractFields
is an instance method when it should be a static method that takes as parameters lists that are currently instance variables such as genotypeFieldsToTake
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that I never actually tested anything, I just took out the whole class.
.chr("1").start(15L).stop(15L) | ||
.attribute(GATKVCFConstants.AS_RAW_RMS_MAPPING_QUALITY_KEY, "400.00|285.00|385.00") | ||
.genotypes(genotype) | ||
.make(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Neither test checks for correctness (i.e. uses e.g. Assert.assertTrue) but I think they should.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had big plans for more functionality, but decided I don't want users mucking around in the GVCFs. I removed these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments addressed. Back to you @takutosato
* @param result the cummulative output | ||
* @param alleleCount scalar, R-type or A-type values | ||
*/ | ||
private static void addAlleleSpecificFieldValue(final Object val, final List<List<String>> result, final VCFHeaderLineCount alleleCount) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not. (You'll note that the date on the unit test file was June of last year.) Ordinarily I loathe instanceof
, but after vacillating for a while, I decided to let it ride because performance isn't really an issue here.
|
||
@Test | ||
public void testExtractFields() throws Exception { | ||
final VariantsToTable VTTtest = VariantsToTable.getInstance(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given that I never actually tested anything, I just took out the whole class.
final boolean addGenotypeFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty(); | ||
final boolean hasGFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty(); | ||
final boolean hasASGFields = asGenotypeFieldsToTake != null && !asGenotypeFieldsToTake.isEmpty(); | ||
final boolean addGenotypeFields = hasGFields || hasASGFields; | ||
if ( addGenotypeFields ) { | ||
numFields = fieldsToTake.size() + genotypeFieldsToTake.size() * samples.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent point.
throw new UserException("There are no samples and no fields - no output will be produced"); | ||
} | ||
} | ||
} | ||
|
||
if (asGenotypeFieldsToTake.isEmpty() && asFieldsToTake.isEmpty() && (!splitMultiAllelic)) { | ||
logger.warn("Allele-specific fields will only be split if splitting multi-allelic variants is specified (`--split-multi-allelics` or `-SMA`"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
final String genotypeHeader = createGenotypeHeader(); | ||
final String separator = (!baseHeader.isEmpty() && !genotypeHeader.isEmpty()) ? "\t" : ""; | ||
outputStream.println(baseHeader + separator + genotypeHeader); | ||
String line = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done -- thanks for the snippet
|
||
final int numRecordsToProduce = splitMultiAllelic ? vc.getAlternateAlleles().size() : 1; | ||
final List<List<String>> records = new ArrayList<>(numRecordsToProduce); | ||
|
||
final int numFields; | ||
final boolean addGenotypeFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty(); | ||
final boolean hasGFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
throw new UserException("There are no samples and no fields - no output will be produced"); | ||
} | ||
} | ||
} | ||
|
||
if (asGenotypeFieldsToTake.isEmpty() && asFieldsToTake.isEmpty() && (!splitMultiAllelic)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
@Argument(shortName="ASF", doc="The name of an allele-specific INFO field to be split if present", optional=true) | ||
private List<String> asFieldsToTake = new ArrayList<>(); | ||
|
||
@Argument(shortName="ASGF", doc="The name of an allele-specific INFO field to be split if present", optional=true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch!
@@ -102,6 +116,7 @@ | |||
) | |||
@DocumentedFeature | |||
public final class VariantsToTable extends VariantWalker { | |||
private static final VariantsToTable instance = new VariantsToTable(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since I killed the noop unit tests I could get rid of this.
.chr("1").start(15L).stop(15L) | ||
.attribute(GATKVCFConstants.AS_RAW_RMS_MAPPING_QUALITY_KEY, "400.00|285.00|385.00") | ||
.genotypes(genotype) | ||
.make(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had big plans for more functionality, but decided I don't want users mucking around in the GVCFs. I removed these tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
b1ad96b
to
ea0cbe9
Compare
@nh13 in light of your per-allele list subsetting bug report, this may be of interest |
With allele-specific annotations and filtering in the new exome pipeline, this should be helpful. (I was using it for debugging merging Mutect2 perAlleleAnnotations for MT data.)