-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SNPs should be proportional to graph height, not just linear scale #2838
Comments
thanks for reporting this, I think you're right about this. there was a little fancy programming that didnt have the right result here but i think coding it more straightforwardly as a linear drawing that is percentage of the score at that position will work |
This is on the main branch now if you want to test out! |
Hmm...I tried out the sample data you posted in #2833 but it seemed to be the same at that position (chrM:9347) in both v1.6.6 and main branch. It would definitely be concerning if there is a different in the frequencies. Could it be a different file? |
Yeah, the first image is from a different file, which was a whole-genome mapping. I subset the reads to chrM using samtools sort and samtools fastq, then remapped them to just chrM (so I could reduce the reference genome size), assuming that the mapping would be the same... but it wasn't. I've just now updated the BAM file to be a filtered BAM file with a modified header, without remapping, and it's recovered the apparent heteroplasmy: |
Okay, I think I see what's happening here: JBrowse2 is including secondary alignments when calculating total coverage, but ignoring them when calculating variant frequency (I recall mention of something like this when looking at the code). This will be happening because there are other genomic regions that contain a portion of the mitochondrial genome, and it's essentially random whether the mitochondrial genome or that other location is considered the primary alignment. Here's what reads within this region look with Tablet; you can see about half the reads mapped to the region have '???'s (because there's no asssociated sequence in the BAM file): |
The default set of flags that are filtered for both Pileup and SNPCoverage is "1540" which should retain secondary alignments https://broadinstitute.github.io/picard/explain-flags.html If interested feel free to make an issue for this. Also got inspired by the tablet screenshot and made a per-base viewer here! #2847 |
Yes, but the problem is that bases aren't known for these secondary reads. The default output for minimap2 is to set the sequence and quality fields to '*' when the sequence for a secondary read is present elsewhere in the BAM file, and the CIGAR sequence only reports if there is no change in sequence length at that location. In this case I had filtered to only include chrM mappings, so those sequences aren't present anywhere in the BAM file: there's no way to get access to the base values for that read at the target location. Here's an example showing the text representation, with a few secondary reads highlighted: |
If the sequence is not available (on the secondary reads) and there is no MD tag, there is probably no way that it could call the SNPs on the secondary reads. JBrowse either uses MD tags or compares the reference sequence with the read sequence to get SNPs! There might be some option with either calmd or the aligner to get these back though Thanks for all the detailed feedback also! |
(I have seen in some cases the sequence is preserved on secondary alignments in other BAM files, but wouldn't know the options to get that) |
This is an approach decided on by the mapper (in this case, minimap2); preserving sequence for secondary alignments is basically a #WONTFIX for minimap2, despite the obvious benefit of it in alignment visualisation tools: |
@gringer I made a little program to try to add SEQ and QUAL fields to minimap2 secondary alignments in BAM/CRAM as this problem always bugged me. might be of interest :) |
I like that SNPs are represented on a linear scale even on log plots (because, for example, it's really confusing to see a SNP at 50% frequency that is almost at the top of a graph), but the total graph heights should match. This does not seem to be the case with the current view:
jbrowse-components/plugins/alignments/src/SNPCoverageRenderer/SNPCoverageRenderer.ts
Lines 40 to 41 in fc60115
Image to demonstrate the issue, above is linear scale, below is log scale; the 'A' variant should occupy 46% of the vertical height, but it only occupies about 33% of the height in the log scale:
The text was updated successfully, but these errors were encountered: