-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
INFO field 'END' is before varaint's 'POS' for variant created via Smoove SV population calling #56
Comments
his is issue is based on files created with Smoove version 0.1.9. |
yes, this should be fixed in 0.2.3 |
Ok thank you. |
Hi Brent. I upgraded to Smoove 0.2.3 and ran another set of samples. I still get multiple See also the error below and the Smoove version from VCF header.
Number of samples and variants via bcftools stats
|
thanks for reporting, I'll have a look. |
Thank you. |
I opened a PR to fix this in svtools. Should be able to get that in the next release. |
Hi Brent. Thank you for opening the PR at svtools. I am running vanilla Smoove 0.2.3, following exactly the Smoove commands under the population section of the documentation on this github page. I am not using anything to set the REF allele. The REF allele just is N in the example above and also elsewhere in the VCF that Smoove made on our data. Or I am missing something / understanding it not fully. |
got it. I was thinking that snpEff wouldn't work without having the REF allele set properly but apparently that's not (any longer?) the case. |
snpEff worked for us in the past to annotate SV VCF files made with Lumpy and Manta.
http://snpeff.sourceforge.net/SnpEff_manual.html I do post process the snpEff annotated SV VCF file with a small CYVCF2 script to collapse very large ANN fields in to just the set or the number of genes hit (otherwise the ANN field becomes way to big to use /make sense of for large SVs). |
Thanks for pinging at svtools. As a temporary workaround we implemented the following small cyvcf2 script to post process the Smoove results. Maybe this is useful for someone else until the svtools update is made.
|
Hi,
I was able to use the Smoove population SV calling pipeline to SV call 200+ samples.
SVs that we are interested in were also included in the results.
So the pipeline worked fast and the results are good based on our first experience with Smoove.
I ran into one issue while trying to annotate the SV's via SnpEff.
https://github.com/pcingola/SnpEff/blob/87976d2c63fc2590408cc40f41b96818079324b9/src/main/java/org/snpeff/vcf/VcfEntry.java#1159
I could trace this back to this record in the final VCF file produced via Smoove paste. In this record END is indeed before POS.
For Sample_79 I could find back this records in the result from smoove call. This variants has the same start position, but length 48 instead of 54 and END is after POS.
In the
merged.sites.vcf.gz
VCF file produced with smoove the END before POS issue can first be found:I could not find in the VCF spec if END is always required to be after POS.
https://samtools.github.io/hts-specs/VCFv4.2.pdf
SnpEff seems to take this strict. As far as I know at least (small) variants including indels are always reported on the positive strand with END after than POS. Do you know if this is/should also be the case for SV's?
And maybe check if this issue can be fixed in the smoove merge code.
Thank you.
The text was updated successfully, but these errors were encountered: