-
Notifications
You must be signed in to change notification settings - Fork 11
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
0 additions
and
0 deletions.
There are no files selected for viewing
File renamed without changes.
File renamed without changes.
File renamed without changes.
c30b845
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @hxj5,
I assume these references files that contain the list of SNPs from 1kgenome were created using this liftOver wrapper
cellSNP
It doesn't look like the headers are reformed after lifting over. So the contig reference annotation in the *.hg38.vcf still refers to GRCh37. (which triggers errors in some GATK tools).
Have you got any suggestions on how to fix this?
Thanks,
Ruqian
c30b845
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ruqian,
Thanks for your feedback. A new vcf header file (
header.txt
, attached), which contains contigs of hg38, is extracted fromcellranger-GRCh38-1.2.0/fasta/genome.fa.fai
.The new header could be used to replace the original header with
Xianjie
header.txt
c30b845
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, Xianjie.
The reheader command was okay. However, after reheading, I tried to sort the vcf.gz but encountered parsing errors.
Best,
Ruqian
c30b845
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Ruqian,
Updated
header.hg38.txt
and a new scriptreheader.sh.txt
were attached (seems github does not supportuploading .sh file now, so please rename
reheader.sh.txt
toreheader.sh
after downloading it)I was testing
hg38.AF5e2.vcf.gz
that does not includeINFO 'OLD'
, which is different from the case ofhg38.AF5e4.vcf.gz
. Now annotation of INFO 'OLD' is added to the updated header file, which should work for both cases.As to
INFO 'A'
, it could be a bug ofbcftools reheader
. For now please use the attachedreheader.sh
whose cmdline isA tiny advise is that
-Oz
option could be added when sorting vcf, otherwise the output vcf is in plain text instead of bgzip format.Best
Xianjie
header.hg38.txt
reheader.sh.txt
c30b845
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot, Xianjie.
I finally got GATK HaplotypeCaller working for the list of SNPs in genome1K.phase3.SNP_AF5e4.chr1toX.hg38.fixed.header.sorted.vcf.gz
Best,
Ruqian