Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[vds/combiner] Stop dropping GT in reference data during gvcf import #14560

Merged

Conversation

chrisvittal
Copy link
Collaborator

@chrisvittal chrisvittal commented May 22, 2024

CHANGELOG: The gvcf import stage of the VDS combiner now preserves the GT of reference blocks. Some datasets have haploid calls on sex chromosomes, and the fact that the reference was haploid should be preserved.

@chrisvittal
Copy link
Collaborator Author

  1. I need to add some tests.
  2. I flipped a coin and you came up @daniel-goldstein (as opposed to Ed), feel free to reassign if you feel uncomfortable.

Reference GT/PGT may have ploidy information, so we need to stop
dropping the GT/PGT.
@chrisvittal chrisvittal force-pushed the vds/combiner/ref-can-be-haploid branch from 070686a to 56baa59 Compare May 22, 2024 20:46
@daniel-goldstein
Copy link
Contributor

Gotta say I don't feel very well equipped to review this. Happy to hop on a zoom to walk through it or have @ehigham do the review

@chrisvittal chrisvittal linked an issue May 24, 2024 that may be closed by this pull request
- pure set logic for shared_fields/ref_fields in to_dense_mt/coalesce_join
- annotate the call_field from ref_call_field rather than transmute it,
  since both 'ref_call_field' and 'call_field' might still be in the
  variant data
@chrisvittal chrisvittal force-pushed the vds/combiner/ref-can-be-haploid branch from b8e2dc3 to c623645 Compare May 24, 2024 17:08
I _think_ this is fine, and users won't need reference PGT by default.
(Famous last words, I know, but we can burn that bridge when we get to
it)
@chrisvittal chrisvittal changed the title [vds] Stop dropping GT/PGT in reference data during import [vds] Stop dropping GT in reference data during import May 28, 2024
@chrisvittal chrisvittal force-pushed the vds/combiner/ref-can-be-haploid branch from 5b1b4a9 to 1aaaf91 Compare May 30, 2024 16:01
@chrisvittal chrisvittal changed the title [vds] Stop dropping GT in reference data during import [vds/combiner] Stop dropping GT in reference data during gvcf import May 30, 2024
@chrisvittal chrisvittal force-pushed the vds/combiner/ref-can-be-haploid branch 2 times, most recently from 6f3689d to 9d3debc Compare May 31, 2024 15:47
@ehigham ehigham requested review from ehigham and removed request for daniel-goldstein May 31, 2024 15:51
Copy link
Member

@ehigham ehigham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chris kindly walked me through the genomic details that back this PR. With that, I'll approve.

@hail-ci-robot hail-ci-robot merged commit d9d85d5 into hail-is:main May 31, 2024
2 checks passed
chrisvittal added a commit to chrisvittal/hail that referenced this pull request Jul 10, 2024
…ail-is#14560)

CHANGELOG: The gvcf import stage of the VDS combiner now preserves the
GT of reference blocks. Some datasets have haploid calls on sex
chromosomes, and the fact that the reference was haploid should be
preserved.
hail-ci-robot pushed a commit that referenced this pull request Jul 10, 2024
#14560 updated `to_dense_mt` to take into account reference the
existence of reference GT fields. However, it was untested. I take our
old `test_to_dense_mt` test, and add a haploid `LGT` field to the
reference, and check to make sure that the haploid reference is present
in the result.
chrisvittal added a commit that referenced this pull request Jul 10, 2024
#14560 updated `to_dense_mt` to take into account reference the
existence of reference GT fields. However, it was untested. I take our
old `test_to_dense_mt` test, and add a haploid `LGT` field to the
reference, and check to make sure that the haploid reference is present
in the result.
chrisvittal added a commit that referenced this pull request Jul 11, 2024
…14560)

CHANGELOG: The gvcf import stage of the VDS combiner now preserves the
GT of reference blocks. Some datasets have haploid calls on sex
chromosomes, and the fact that the reference was haploid should be
preserved.
chrisvittal added a commit that referenced this pull request Jul 11, 2024
#14560 updated `to_dense_mt` to take into account reference the
existence of reference GT fields. However, it was untested. I take our
old `test_to_dense_mt` test, and add a haploid `LGT` field to the
reference, and check to make sure that the haploid reference is present
in the result.
chrisvittal added a commit that referenced this pull request Jul 11, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [query] Don't error on VCF export when haploid call is unphased (#14375)
   - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal added a commit that referenced this pull request Jul 15, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [query] Don't error on VCF export when haploid call is unphased (#14375)
   - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal added a commit that referenced this pull request Jul 15, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [query] Don't error on VCF export when haploid call is unphased (#14375)
   - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal added a commit that referenced this pull request Jul 15, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [query] Don't error on VCF export when haploid call is unphased (#14375)
   - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal added a commit that referenced this pull request Jul 15, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [query] Don't error on VCF export when haploid call is unphased (#14375)
   - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal added a commit that referenced this pull request Jul 15, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [query] Don't error on VCF export when haploid call is unphased (#14375)
   - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal added a commit that referenced this pull request Jul 15, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [query] Don't error on VCF export when haploid call is unphased (#14375)
   - [compiler] apply scalafix to all scala sources (#14156)
chrisvittal added a commit that referenced this pull request Jul 15, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [query] Don't error on VCF export when haploid call is unphased (#14375)
   - [compiler] apply scalafix to all scala sources (#14156)
   - [annotationdb][datasets] regional buckets (#14286)
chrisvittal added a commit that referenced this pull request Jul 15, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [query] Don't error on VCF export when haploid call is unphased (#14375)
   - [compiler] apply scalafix to all scala sources (#14156)
   - [annotationdb][datasets] regional buckets (#14286)
chrisvittal added a commit that referenced this pull request Jul 16, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [query] Don't error on VCF export when haploid call is unphased (#14375)
   - [compiler] apply scalafix to all scala sources (#14156)
   - [annotationdb][datasets] regional buckets (#14286)
chrisvittal added a commit that referenced this pull request Jul 30, 2024
…14560)

CHANGELOG: The gvcf import stage of the VDS combiner now preserves the
GT of reference blocks. Some datasets have haploid calls on sex
chromosomes, and the fact that the reference was haploid should be
preserved.
chrisvittal added a commit that referenced this pull request Jul 30, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
chrisvittal added a commit that referenced this pull request Jul 30, 2024
…14560)

CHANGELOG: The gvcf import stage of the VDS combiner now preserves the
GT of reference blocks. Some datasets have haploid calls on sex
chromosomes, and the fact that the reference was haploid should be
preserved.
chrisvittal added a commit that referenced this pull request Jul 30, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [vds/combiner] Fix truncation of PL in GVCF import with haploid calls (#14577)
chrisvittal added a commit that referenced this pull request Jul 30, 2024
…14560)

CHANGELOG: The gvcf import stage of the VDS combiner now preserves the
GT of reference blocks. Some datasets have haploid calls on sex
chromosomes, and the fact that the reference was haploid should be
preserved.
chrisvittal added a commit that referenced this pull request Jul 30, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls, as well as one critical correctness bug,
the substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [vds/combiner] Fix truncation of PL in GVCF import with haploid calls (#14577)
chrisvittal added a commit that referenced this pull request Jul 30, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls. The substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [vds/combiner] Fix truncation of PL in GVCF import with haploid calls (#14577)
chrisvittal added a commit that referenced this pull request Aug 7, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls. The substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [vds/combiner] Fix truncation of PL in GVCF import with haploid calls (#14577)
chrisvittal added a commit that referenced this pull request Aug 8, 2024
This patch version implements necessary changes for working with Variant
Datasets with haploid calls. The substantial backports are:

   - [vds/combiner] Stop dropping GT in reference data during gvcf import (#14560)
   - [vds/combiner] Fix truncation of PL in GVCF import with haploid calls (#14577)
chrisvittal added a commit to chrisvittal/hail that referenced this pull request Sep 18, 2024
After split_multi, LGT is dropped from the variant data of a VDS. After
PR hail-is#14560, LGT is added to datasets after creation via the combiner.
After hail-is#14675 the same is true for `from_merged_representation`. We
should keep the GT/LGT field consistent across ref and var data. This
change does so for split_multi.

Resolves hail-is#14694
chrisvittal added a commit to chrisvittal/hail that referenced this pull request Sep 19, 2024
After split_multi, LGT is dropped from the variant data of a VDS. After
PR hail-is#14560, LGT is added to datasets after creation via the combiner.
After hail-is#14675 the same is true for `from_merged_representation`. We
should keep the GT/LGT field consistent across ref and var data. This
change does so for split_multi.

Resolves hail-is#14694
hail-ci-robot pushed a commit that referenced this pull request Sep 19, 2024
After split_multi, LGT is dropped from the variant data of a VDS. After
PR #14560, LGT is added to datasets after creation via the combiner.
After #14675 the same is true for `from_merged_representation`. We
should keep the GT/LGT field consistent across ref and var data. This
change does so for split_multi.

Resolves #14694
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

VDS reference data needs to have ploidy information
4 participants