Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add splitting for allele-specific annotations and ADs to VariantsToTable #5697

Merged
merged 1 commit into from
Mar 25, 2019

Conversation

ldgauthier
Copy link
Contributor

With allele-specific annotations and filtering in the new exome pipeline, this should be helpful. (I was using it for debugging merging Mutect2 perAlleleAnnotations for MT data.)

@codecov-io
Copy link

codecov-io commented Feb 21, 2019

Codecov Report

Merging #5697 into master will increase coverage by 0.078%.
The diff coverage is 92.537%.

@@               Coverage Diff               @@
##              master     #5697       +/-   ##
===============================================
+ Coverage     86.999%   87.077%   +0.078%     
+ Complexity     32110     31929      -181     
===============================================
  Files           1974      1942       -32     
  Lines         147249    146962      -287     
  Branches       16218     16249       +31     
===============================================
- Hits          128105    127970      -135     
+ Misses         13236     13070      -166     
- Partials        5908      5922       +14
Impacted Files Coverage Δ Complexity Δ
...s/variantutils/VariantsToTableIntegrationTest.java 100% <100%> (ø) 22 <1> (+1) ⬆️
...er/tools/walkers/variantutils/VariantsToTable.java 93.519% <90.741%> (+0.337%) 88 <11> (+13) ⬆️
...ute/hellbender/utils/reference/ReferenceUtils.java 64.706% <0%> (-11.765%) 4% <0%> (ø)
...tute/hellbender/utils/fragments/FragmentUtils.java 83.784% <0%> (-10.094%) 6% <0%> (-6%)
...plotypecaller/HaplotypeCallerGenotypingEngine.java 79.747% <0%> (-10.084%) 38% <0%> (-16%)
...kers/haplotypecaller/AssemblyBasedCallerUtils.java 80.303% <0%> (-8.658%) 41% <0%> (-74%)
.../basicshortmutpileup/BetaBinomialDistribution.java 65.385% <0%> (-7.343%) 5% <0%> (ø)
...lkers/contamination/MinorAlleleFractionRecord.java 84.783% <0%> (-6.522%) 5% <0%> (-3%)
...park/pipelines/PrintReadsSparkIntegrationTest.java 88.532% <0%> (-4.325%) 31% <0%> (+29%)
.../tools/walkers/mutect/SomaticGenotypingEngine.java 90.05% <0%> (-3.499%) 76% <0%> (+6%)
... and 168 more

@takutosato takutosato self-assigned this Feb 22, 2019
Copy link
Contributor

@takutosato takutosato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Back to you @ldgauthier

@@ -102,6 +116,7 @@
)
@DocumentedFeature
public final class VariantsToTable extends VariantWalker {
private static final VariantsToTable instance = new VariantsToTable();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Come back to this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I killed the noop unit tests I could get rid of this.

@Argument(shortName="ASF", doc="The name of an allele-specific INFO field to be split if present", optional=true)
private List<String> asFieldsToTake = new ArrayList<>();

@Argument(shortName="ASGF", doc="The name of an allele-specific INFO field to be split if present", optional=true)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

INFO -> FORMAT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

throw new UserException("There are no samples and no fields - no output will be produced");
}
}
}

if (asGenotypeFieldsToTake.isEmpty() && asFieldsToTake.isEmpty() && (!splitMultiAllelic)) {
logger.warn("Allele-specific fields will only be split if splitting multi-allelic variants is specified (`--split-multi-allelics` or `-SMA`");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The actual argument is "--split-multi-allelic" singular. Better to extract a variable to avoid something like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

throw new UserException("There are no samples and no fields - no output will be produced");
}
}
}

if (asGenotypeFieldsToTake.isEmpty() && asFieldsToTake.isEmpty() && (!splitMultiAllelic)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for parentheses around (!splitMultiAllelic)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

final String genotypeHeader = createGenotypeHeader();
final String separator = (!baseHeader.isEmpty() && !genotypeHeader.isEmpty()) ? "\t" : "";
outputStream.println(baseHeader + separator + genotypeHeader);
String line = "";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this series of if blocks can be shortened (and made clearer) by combining the lists first, as follows:

final List<String> fields = new ArrayList<>();
fields.addAll(fieldsToTake);
fields.addAll(asFieldsToTake);
final String header = new StringBuilder(Utils.join("\t", fields))
    .append("\t")
    .append(createGenotypeHeader())
    .toString();
outputStream.println(header);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done -- thanks for the snippet


final int numRecordsToProduce = splitMultiAllelic ? vc.getAlternateAlleles().size() : 1;
final List<List<String>> records = new ArrayList<>(numRecordsToProduce);

final int numFields;
final boolean addGenotypeFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty();
final boolean hasGFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genotypeFiledsToTake defaults to an empty array, so you don't need to check that it's not null. I would do away with hasGFields and hasASGFields and just say final boolean addGenotypeFields = !genotypeFieldsToTake.isEmpty() || !asGenotypeFieldsToTake.isEmpty();

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

final boolean addGenotypeFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty();
final boolean hasGFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty();
final boolean hasASGFields = asGenotypeFieldsToTake != null && !asGenotypeFieldsToTake.isEmpty();
final boolean addGenotypeFields = hasGFields || hasASGFields;
if ( addGenotypeFields ) {
numFields = fieldsToTake.size() + genotypeFieldsToTake.size() * samples.size();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking you want to add the number of allele specific fields to numFields, but it's not an error because numFields is just an initial list size. In my opinion getting the right initial list size here isn't worth four lines of code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point.

* @param result the cummulative output
* @param alleleCount scalar, R-type or A-type values
*/
private static void addAlleleSpecificFieldValue(final Object val, final List<List<String>> result, final VCFHeaderLineCount alleleCount) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have bandwidth/will power, it would be great if you could revamp this legacy code that uses instanceof and write separate methods that take parameters of different types.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not. (You'll note that the date on the unit test file was June of last year.) Ordinarily I loathe instanceof, but after vacillating for a while, I decided to let it ride because performance isn't really an issue here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha sounds good


@Test
public void testExtractFields() throws Exception {
final VariantsToTable VTTtest = VariantsToTable.getInstance();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to say you can replace this with new VariantsToTable().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But really the root of the problem is that extractFields is an instance method when it should be a static method that takes as parameters lists that are currently instance variables such as genotypeFieldsToTake

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that I never actually tested anything, I just took out the whole class.

.chr("1").start(15L).stop(15L)
.attribute(GATKVCFConstants.AS_RAW_RMS_MAPPING_QUALITY_KEY, "400.00|285.00|385.00")
.genotypes(genotype)
.make();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neither test checks for correctness (i.e. uses e.g. Assert.assertTrue) but I think they should.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had big plans for more functionality, but decided I don't want users mucking around in the GVCFs. I removed these tests.

Copy link
Contributor Author

@ldgauthier ldgauthier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comments addressed. Back to you @takutosato

* @param result the cummulative output
* @param alleleCount scalar, R-type or A-type values
*/
private static void addAlleleSpecificFieldValue(final Object val, final List<List<String>> result, final VCFHeaderLineCount alleleCount) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not. (You'll note that the date on the unit test file was June of last year.) Ordinarily I loathe instanceof, but after vacillating for a while, I decided to let it ride because performance isn't really an issue here.


@Test
public void testExtractFields() throws Exception {
final VariantsToTable VTTtest = VariantsToTable.getInstance();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that I never actually tested anything, I just took out the whole class.

final boolean addGenotypeFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty();
final boolean hasGFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty();
final boolean hasASGFields = asGenotypeFieldsToTake != null && !asGenotypeFieldsToTake.isEmpty();
final boolean addGenotypeFields = hasGFields || hasASGFields;
if ( addGenotypeFields ) {
numFields = fieldsToTake.size() + genotypeFieldsToTake.size() * samples.size();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent point.

throw new UserException("There are no samples and no fields - no output will be produced");
}
}
}

if (asGenotypeFieldsToTake.isEmpty() && asFieldsToTake.isEmpty() && (!splitMultiAllelic)) {
logger.warn("Allele-specific fields will only be split if splitting multi-allelic variants is specified (`--split-multi-allelics` or `-SMA`");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

final String genotypeHeader = createGenotypeHeader();
final String separator = (!baseHeader.isEmpty() && !genotypeHeader.isEmpty()) ? "\t" : "";
outputStream.println(baseHeader + separator + genotypeHeader);
String line = "";
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done -- thanks for the snippet


final int numRecordsToProduce = splitMultiAllelic ? vc.getAlternateAlleles().size() : 1;
final List<List<String>> records = new ArrayList<>(numRecordsToProduce);

final int numFields;
final boolean addGenotypeFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty();
final boolean hasGFields = genotypeFieldsToTake != null && !genotypeFieldsToTake.isEmpty();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

throw new UserException("There are no samples and no fields - no output will be produced");
}
}
}

if (asGenotypeFieldsToTake.isEmpty() && asFieldsToTake.isEmpty() && (!splitMultiAllelic)) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@Argument(shortName="ASF", doc="The name of an allele-specific INFO field to be split if present", optional=true)
private List<String> asFieldsToTake = new ArrayList<>();

@Argument(shortName="ASGF", doc="The name of an allele-specific INFO field to be split if present", optional=true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

@@ -102,6 +116,7 @@
)
@DocumentedFeature
public final class VariantsToTable extends VariantWalker {
private static final VariantsToTable instance = new VariantsToTable();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since I killed the noop unit tests I could get rid of this.

.chr("1").start(15L).stop(15L)
.attribute(GATKVCFConstants.AS_RAW_RMS_MAPPING_QUALITY_KEY, "400.00|285.00|385.00")
.genotypes(genotype)
.make();
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had big plans for more functionality, but decided I don't want users mucking around in the GVCFs. I removed these tests.

Copy link
Contributor

@takutosato takutosato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@ldgauthier ldgauthier force-pushed the ldg_allelesForVariantsToTable branch from b1ad96b to ea0cbe9 Compare March 15, 2019 18:18
@ldgauthier ldgauthier merged commit 0e36825 into master Mar 25, 2019
@ldgauthier
Copy link
Contributor Author

ldgauthier commented Mar 25, 2019

@nh13 in light of your per-allele list subsetting bug report, this may be of interest

@ldgauthier ldgauthier deleted the ldg_allelesForVariantsToTable branch March 25, 2019 14:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants