-
Notifications
You must be signed in to change notification settings - Fork 594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bamout read annotations for alignement and callable regions, and supported genotypes #5215
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small comments
@@ -35,6 +36,9 @@ | |||
public final class AssemblyBasedCallerUtils { | |||
|
|||
static final int REFERENCE_PADDING_FOR_ASSEMBLY = 500; | |||
static final String SUPPORTED_GENOTYPES_TAG="SG"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
public
* @param likelihoodsHaplotype ReadLikelihoods containing reads to be annotated along with haplotypes to which these reads have been aligned | ||
* @param callableRegion ref region over which caller is using these ReadLikelihoods to call variants | ||
*/ | ||
public static void annotateReadLikelihoodsWithRegions(final ReadLikelihoods<Haplotype> likelihoodsHaplotype, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
likelihoodsHaplotype
should just be likelihoods
, IMO
} | ||
|
||
/** | ||
* Annotates reads in ReadLiklihoods with genotype supported by each read a particular variant. If a read already has a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"a particular variant" seems out of place
public static void annotateReadLikelihoodsWithSupportedGenotypes(final VariantContext vc, | ||
final ReadLikelihoods<Allele> likelihoodsAllele) { | ||
//assign supported Genotypes to each read | ||
final Set<Allele> alleles = new LinkedHashSet<>(vc.getAlleles()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inline this into next line?
//assign supported Genotypes to each read | ||
final Set<Allele> alleles = new LinkedHashSet<>(vc.getAlleles()); | ||
final Map<Allele, List<Allele>> alleleSubset = alleles.stream().collect(Collectors.toMap(a -> a, Arrays::asList)); | ||
final ReadLikelihoods<Allele> subsettedLikelihoodsAllele = likelihoodsAllele.marginalize(alleleSubset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just call this subsettedLikelihoods
@@ -130,6 +132,11 @@ public CalledHaplotypes assignGenotypeLikelihoods(final List<Haplotype> haplotyp | |||
final int ploidy = configuration.genotypeArgs.samplePloidy; | |||
final List<Allele> noCallAlleles = GATKVariantContextUtils.noCallAlleles(ploidy); | |||
|
|||
if (withBamOut) { | |||
////add annotations to reads for alignment regions and calling regions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two slashes
@@ -181,6 +188,11 @@ public CalledHaplotypes assignGenotypeLikelihoods(final List<Haplotype> haplotyp | |||
final VariantContext annotatedCall = makeAnnotatedCall(ref, refLoc, tracker, header, mergedVC, readAlleleLikelihoods, call); | |||
returnCalls.add( annotatedCall ); | |||
|
|||
if (withBamOut) { | |||
//add annotations to reads for which genotypes are supported by each read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not necessary because the method name is self-documenting
@@ -103,6 +105,11 @@ public CalledHaplotypes callMutations( | |||
final Set<Haplotype> calledHaplotypes = new HashSet<>(); | |||
final List<VariantContext> returnCalls = new ArrayList<>(); | |||
|
|||
if(withBamOut){ | |||
////add annotations to reads for alignment regions and calling regions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two slashes
@@ -172,6 +179,10 @@ public CalledHaplotypes callMutations( | |||
final ReadLikelihoods<Allele> trimmedLikelihoods = log10Likelihoods.marginalize(trimmedToUntrimmedAlleleMap); | |||
|
|||
final VariantContext annotatedCall = annotationEngine.annotateContext(trimmedCall, featureContext, referenceContext, trimmedLikelihoods, a -> true); | |||
if(withBamOut) { | |||
//add annotations to reads for which genotypes are supported by each read |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is not necessary
final int start = 13763; | ||
final SimpleInterval loc = new SimpleInterval(contig, start, start + hap1String.length()); | ||
final SimpleInterval loc2 = new SimpleInterval(contig, loc.getStart() + 4, loc.getEnd() - 4); | ||
List<Integer> snpIndecies = Arrays.asList(null, 12, 23); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indices
Codecov Report
@@ Coverage Diff @@
## master #5215 +/- ##
============================================
+ Coverage 86.76% 86.77% +0.01%
- Complexity 29944 30000 +56
============================================
Files 1838 1838
Lines 138820 139051 +231
Branches 15285 15329 +44
============================================
+ Hits 120448 120667 +219
- Misses 12804 12810 +6
- Partials 5568 5574 +6
Continue to review full report at Codecov.
|
for (ReadLikelihoods<Allele>.BestAllele bestAllele : bestAlleles) { | ||
String attribute = ""; | ||
GATKRead read = bestAllele.read; | ||
Allele allele = bestAllele.allele; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious to me why these are supported genotypes. What is the value of the tag supposed to be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you're right. The value of the tag is really the alleles of called variants that each read supports. For example, if a read overlaps two called variants in the active region at 1:10000 and 1:10010, and supports the reference allele at the first and the first alt allele at the second, the annotation will be 1:10000=0, 1:10010=1. Essentially this is the same information encoded by allele depth in a vcf. So I think it is more correct to call these supported alleles as opposed to supported genotypes.
} | ||
|
||
/** | ||
* Annotates reads in ReadLiklihoods with genotype supported by each read a particular variant. If a read already has a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Annotates reads in ReadLikelihoods with genotype supported by each read at a particular variant." ?
back to you when you get a chance to take another look, @davidbenjamin and @ldgauthier |
93c7a9e
to
7d53fb9
Compare
…upported alleles (broadinstitute#5215) Add bamout read annotations for alignment and callable regions, and supported alleles (broadinstitute#5215)
Adds three annotations to reads in bamout produced by HaplotypeCaller and Mutect2: