Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

matching chromosome order in vcf and bam files #42

Open
dalhoomist opened this issue Feb 21, 2021 · 5 comments
Open

matching chromosome order in vcf and bam files #42

dalhoomist opened this issue Feb 21, 2021 · 5 comments

Comments

@dalhoomist
Copy link

Hello,

I am using popscle dsc-pileup function as a prelude for running demuxlet on cellranger output. Here is my code:
packages/popscle/bin/popscle dsc-pileup --sam bam.bam --vcf vcf1.vcf --out home/workscape/TB3966/pileup --group-list barcodes.tsv

I downloaded a vcf from the 1000genomes to use here. The problem is that pipeline is throwing the following error:
[E:int32_t cmdCramDscPileup(int32_t, char**)] Your VCF/BCF files and SAM/BAM/CRAM files have different ordering of chromosomes. SAM/BAM/CRAM file has 19 before 2, but VCF/BCF file has 19 after 2

The order of chromosomes in the vcf normally started from 1, 2, 3 ... etc. In the bam file it is 1 10 11 12 ....

Is there a way to fix that?

Thanks for all the help.

@raivivek
Copy link

raivivek commented Jul 15, 2021

Hi @dalhoomist You can reorder your VCF file corresponding to the BAM using picard SortVcf command. For example, picard SortVcf INPUT=example.vcf OUTPUT=example.sorted.vcf SEQUENCE=sort_by.bam. Once you've the sorted VCF, you can use it with demuxlet and the sorting order will match for all BAMs processed similarly (i.e., aligned to the same reference).

@fatimasnc
Copy link

Hi,

I'm having the same problem here. I tried @raivivek's suggestion but after running picard SortVcf on my VCF file I still get the same error:

FATAL ERROR - 
[E:int32_t main(int32_t, char**)] Your VCF/BCF files and SAM/BAM/CRAM files have different ordering of chromosomes. SAM/BAM/CRAM file has 19 before 2, but VCF/BCF file has 19 after 2

With help from a colleague we figured out the metadata at the top of the VCF file is not sorted, and apparently other software can handle this (having different order in the metadata and each variant entry) but it doesn't seem the case for demuxlet. Is there any way to fix this that wouldn't involve manually moving around the metadata entries for each chromosome?

Thanks!

@raivivek
Copy link

raivivek commented Apr 1, 2022

Hi @fatimasnc -- you're right on. Perhaps I should have mentioned this earlier as well, but certainly don't have to do it manually. Picard offers several other commands (such as CreateSequenceDictionary, UpdateVcfSequenceDictionary, and SortVcf) that allow you to extract the chromosome order from the reference and reorder VCF without having to do it manually — the approach I took finally. You might however need to create a script or wrapper that executes these few steps for a given set of VCF/Reference FASTA/BAM files.

@fatimasnc
Copy link

Thanks a lot for your reply @raivivek! In the end we managed to solve it with this helper script from the aerts lab :)

@hsymoon
Copy link

hsymoon commented Aug 6, 2024

Thanks a lot for your reply @raivivek! In the end we managed to solve it with this helper script from the aerts lab :)

Hi, I have read your conversation and thanks for you providing the helper script. But I do not know hou to run this sort_vcf_same_as_bam.sh .Could you please provide an example? Thanks so much for helping.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants