-
Notifications
You must be signed in to change notification settings - Fork 240
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Give control over creation of vectors with mixed known and missing va…
…lues When input files have different alternate alleles, vector fields pertaining to unobserved alleles are set to missing by default. This creates vectors with mixed known and unknown values and some programs refuse to work with such. This commit changes the default behavior of --gvcf merging: - whenever the uknown allele is present (<*> or <NON_REF>) its values are used instead of the missing value - when it is not present, the default rule `-M PL:max,AD:0` is used A new `-M, --missing-rules` option is added which allows to override the default `-M PL:max,AD:0` rule implied by `--gvcf`. Note that the use of the unknown allele (<*> or <NON_REF>) given explicitly cannot be overriden, which I believe is the correct behavior. Resolves #1888.
- Loading branch information
Showing
9 changed files
with
261 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##reference=file://hs38DH.fa | ||
##contig=<ID=chr1> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record"> | ||
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases"> | ||
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum per-sample depth in this gVCF block"> | ||
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Minimum per-sample depth in this gVCF block"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Minimum per-sample depth in this gVCF block"> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SampleA SampleB | ||
chr1 1769963 . A <NON_REF> . . END=1769967 GT:PL 0/0:0,3,45 ./.:. | ||
chr1 1769968 . T <NON_REF> . . . GT:PL 0/0:0,3,45 0/0:0,18,270 | ||
chr1 1769969 . CAAAACAAAAACA CAAAACA,<NON_REF>,C . . . GT:AD:PL 1/1:0,9,0,0:405,27,0,405,27,405,405,405,405,405 3/3:0,0,0,4:181,181,181,181,181,181,12,181,12,0 | ||
chr1 1769976 . A <NON_REF> . . . GT:PL 0/0:0,0,0 ./.:. | ||
chr1 1769982 . A <NON_REF> . . . GT:PL ./.:. 0/0:0,0,0 | ||
chr1 1769983 . C T,A . . . GT:AD:PL 1/1:0,9,0:405,27,0,405,405,405 2/2:0,0,4:181,181,181,12,181,0 | ||
chr1 1769990 . CAAAACAAAAACA CAAAACA,<NON_REF>,C . . . GT:AD:PL 1:0,9,0,0:405,27,0,0 3:0,0,0,4:181,0,0,12 | ||
chr1 1769991 . C T,A . . . GT:AD:PL 1:0,9,0:405,0,405 2:0,0,4:181,181,0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
##fileformat=VCFv4.2 | ||
##FILTER=<ID=PASS,Description="All filters passed"> | ||
##reference=file://hs38DH.fa | ||
##contig=<ID=chr1> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record"> | ||
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases"> | ||
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum per-sample depth in this gVCF block"> | ||
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Minimum per-sample depth in this gVCF block"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Minimum per-sample depth in this gVCF block"> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SampleA SampleB | ||
chr1 1769963 . A <NON_REF> . . END=1769967 GT:PL 0/0:0,3,45 ./.:. | ||
chr1 1769968 . T <NON_REF> . . . GT:PL 0/0:0,3,45 0/0:0,18,270 | ||
chr1 1769969 . CAAAACAAAAACA CAAAACA,<NON_REF>,C . . . GT:AD:PL 1/1:0,9,0,0:405,27,0,405,27,405,405,405,405,405 3/3:0,0,0,4:181,181,181,181,181,181,12,181,12,0 | ||
chr1 1769976 . A <NON_REF> . . . GT:PL 0/0:0,0,0 ./.:. | ||
chr1 1769982 . A <NON_REF> . . . GT:PL ./.:. 0/0:0,0,0 | ||
chr1 1769983 . C T,A . . . GT:AD:PL 1/1:0,9,.:405,27,0,.,.,. 2/2:0,.,4:181,.,.,12,.,0 | ||
chr1 1769990 . CAAAACAAAAACA CAAAACA,<NON_REF>,C . . . GT:AD:PL 1:0,9,0,0:405,27,0,0 3:0,0,0,4:181,0,0,12 | ||
chr1 1769991 . C T,A . . . GT:AD:PL 1:0,9,.:405,0,. 2:0,.,4:181,.,0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##fileformat=VCFv4.2 | ||
##reference=file://hs38DH.fa | ||
##contig=<ID=chr1> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record"> | ||
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases"> | ||
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum per-sample depth in this gVCF block"> | ||
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Minimum per-sample depth in this gVCF block"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Minimum per-sample depth in this gVCF block"> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SampleA | ||
chr1 1769963 . A <NON_REF> . . END=1769968 GT:PL 0/0:0,3,45 | ||
chr1 1769969 . CAAAACA C,<NON_REF> . . . GT:AD:PL 1/1:0,9,0:405,27,0,405,27,405 | ||
chr1 1769976 . A <NON_REF> . . END=1769976 GT:PL 0/0:0,0,0 | ||
chr1 1769983 . C T . . . GT:AD:PL 1/1:0,9:405,27,0 | ||
chr1 1769990 . CAAAACA C,<NON_REF> . . . GT:AD:PL 1:0,9,0:405,27,0 | ||
chr1 1769991 . C T . . . GT:AD:PL 1:0,9:405,0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
##fileformat=VCFv4.2 | ||
##reference=file://hs38DH.fa | ||
##contig=<ID=chr1> | ||
##INFO=<ID=END,Number=1,Type=Integer,Description="End position of the variant described in this record"> | ||
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods"> | ||
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases"> | ||
##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum per-sample depth in this gVCF block"> | ||
##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Minimum per-sample depth in this gVCF block"> | ||
##FORMAT=<ID=GT,Number=1,Type=String,Description="Minimum per-sample depth in this gVCF block"> | ||
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SampleB | ||
chr1 1769968 . T <NON_REF> . . END=1769968 GT:PL 0/0:0,18,270 | ||
chr1 1769969 . CAAAACAAAAACA C,<NON_REF> . . . GT:AD:PL 1/1:0,4,0:181,12,0,181,12,181 | ||
chr1 1769982 . A <NON_REF> . . END=1769982 GT:PL 0/0:0,0,0 | ||
chr1 1769983 . C A . . . GT:AD:PL 1/1:0,4:181,12,0 | ||
chr1 1769990 . CAAAACAAAAACA C,<NON_REF> . . . GT:AD:PL 1:0,4,0:181,12,0 | ||
chr1 1769991 . C A . . . GT:AD:PL 1:0,4:181,0 |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.