Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add mode of inheritance from HPO #493

Merged
merged 3 commits into from
Oct 9, 2024

Conversation

holtgrewe
Copy link
Contributor

@holtgrewe holtgrewe commented Oct 9, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced a new enumeration for modes of inheritance, enhancing genetic data representation.
    • Added functionality for accessing and processing Human Phenotype Ontology (HPO) information.
    • Enhanced the Annotator with a mapping from HGNC gene IDs to modes of inheritance.
    • Updated genetic variant annotation methods to include modes of inheritance in outputs.
  • Bug Fixes

    • Corrected documentation comments for clarity.
  • Chores

    • Simplified file writing syntax in the CLI component.
    • Added a substantial data file for phenotype-to-gene mapping.

Copy link

codecov bot commented Oct 9, 2024

Codecov Report

Attention: Patch coverage is 87.02290% with 17 lines in your changes missing coverage. Please review.

Project coverage is 74%. Comparing base (bfe718d) to head (9789f6f).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/seqvars/query/annonars.rs 0% 9 Missing ⚠️
src/seqvars/query/hpo.rs 96% 4 Missing ⚠️
src/seqvars/query/mod.rs 0% 4 Missing ⚠️
Additional details and impacted files
@@         Coverage Diff          @@
##           main   #493    +/-   ##
====================================
  Coverage    74%    74%            
====================================
  Files        42     43     +1     
  Lines      7366   7494   +128     
====================================
+ Hits       5470   5584   +114     
- Misses     1896   1910    +14     
Files with missing lines Coverage Δ
src/strucvars/aggregate/cli.rs 92% <100%> (ø)
src/seqvars/query/hpo.rs 96% <96%> (ø)
src/seqvars/query/mod.rs 15% <0%> (-1%) ⬇️
src/seqvars/query/annonars.rs 0% <0%> (ø)

... and 1 file with indirect coverage changes

@holtgrewe holtgrewe force-pushed the feat-add-hpo-mode-of-inheritance branch from 89fd043 to adc54f3 Compare October 9, 2024 12:33
@holtgrewe holtgrewe marked this pull request as ready for review October 9, 2024 12:33
Copy link
Contributor

coderabbitai bot commented Oct 9, 2024

Walkthrough

The changes involve updates to several files within the varfish.v1.seqvars package. A new enumeration, ModeOfInheritance, is introduced to represent various inheritance modes in the output.proto file. The GeneRelatedPhenotypes message is modified to include a repeated field for these inheritance modes. Additionally, the Annotator struct in annonars.rs is enhanced with a new field for mapping HGNC gene IDs to inheritance modes. A new file, hpo.rs, is added to process Human Phenotype Ontology data, and a new module is created in mod.rs to incorporate inheritance modes into genetic variant queries.

Changes

File Path Change Summary
protos/varfish/v1/seqvars/output.proto Added enumeration ModeOfInheritance and updated GeneRelatedPhenotypes message to include repeated ModeOfInheritance mode_of_inheritances.
src/seqvars/query/annonars.rs Added field pub hgnc_to_moi: HgncToMoiMap in Annotator struct and updated with_path method to handle loading this mapping with error handling. Minor comment correction in AnnonarsDbs struct.
src/seqvars/query/hpo.rs Introduced ModeOfInheritance enum, method from_hpo_id, type alias HgncToMoiMap, and function load_hgnc_to_inheritance_map. Added submodules for processing TSV files and included unit tests for new functionalities.
src/seqvars/query/mod.rs Added new module hpo and updated with_seqvar_and_annotator and phenotypes methods to include an additional parameter for modes of inheritance.
src/strucvars/aggregate/cli.rs Simplified the syntax for writing a newline character in split_input_by_chrom_and_sv_type function.
tests/strucvars/query/db/hpo/phenotype_to_genes.txt New file added to repository, approximately 49.6 MB, associated with mapping of phenotypes to genes.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant Annotator
    participant GeneRelatedAnnotation
    participant HPO

    User->>Annotator: Request for gene annotation
    Annotator->>HPO: Load HGNC to Mode of Inheritance mapping
    HPO->>Annotator: Return mapping
    Annotator->>GeneRelatedAnnotation: Provide gene data and modes of inheritance
    GeneRelatedAnnotation->>User: Return annotated gene information
Loading

🐇 "In the garden where genes align,
New modes of inheritance brightly shine.
With HGNC maps and HPO's grace,
We hop through data, a joyful race!
Each phenotype now finds its way,
In this world of genes, we play!" 🌼


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between adc54f3 and 9789f6f.

📒 Files selected for processing (3)
  • protos/varfish/v1/seqvars/output.proto (1 hunks)
  • src/seqvars/query/hpo.rs (1 hunks)
  • src/seqvars/query/mod.rs (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • protos/varfish/v1/seqvars/output.proto
  • src/seqvars/query/hpo.rs
  • src/seqvars/query/mod.rs

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@holtgrewe holtgrewe enabled auto-merge (squash) October 9, 2024 12:40
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (7)
src/seqvars/query/annonars.rs (2)

130-131: LGTM: New field correctly added to Annotator struct.

The new hgnc_to_moi field is correctly added and documented. It's consistent with the structure's style, being public like other fields.

Consider adding a brief explanation of what MOI stands for in the comment, to improve clarity:

- /// Mapping from HGNC gene ID to modes of inheritance; from `hpo` directory.
+ /// Mapping from HGNC gene ID to modes of inheritance (MOI); from `hpo` directory.

151-162: LGTM: with_path method correctly updated.

The method is properly updated to initialize the new hgnc_to_moi field. The error handling is comprehensive and consistent with the existing style.

For consistency with the error handling for annonars_dbs, consider wrapping the load_hgnc_to_inheritance_map call in a separate Result:

let hgnc_to_moi = load_hgnc_to_inheritance_map(&path.as_ref().join("hpo"))
    .map_err(|e| {
        anyhow::anyhow!(
            "problem loading HGNC to mode of inheritance map at {}: {}",
            path.as_ref().join("hpo").display(),
            e
        )
    })?;

This would make the error handling structure more uniform throughout the method.

src/seqvars/query/hpo.rs (5)

117-117: Use /// for field documentation comments.

In the Entry struct, the field hpo_name is documented using // instead of ///. For consistency and to generate proper documentation, please change // to ///.

Apply this diff to correct the comment:

- // HPO Name.
+ /// HPO Name.

185-187: Consistent parameter passing for path.

In the function load_ncbi_to_hgnc, the path parameter is taken by value (P), whereas other functions like load_entries take &P. For consistency and to avoid unnecessary cloning, consider changing the function signature to take path: &P.

Apply this diff to adjust the function signature:

- pub fn load_ncbi_to_hgnc<P: AsRef<std::path::Path>>(
-     path: P,
+ pub fn load_ncbi_to_hgnc<P: AsRef<std::path::Path>>(
+     path: &P,

And update the function calls accordingly:

- for entry in load_entries(&path)? {
+ for entry in load_entries(path)? {

76-81: Include full file paths in error messages.

When loading files, it's helpful to include the full file path in error messages to aid in debugging. Consider using path.as_ref().join(...).display() to show the complete path.

Apply this diff to enhance the error messages:

- .map_err(|e| anyhow::anyhow!("error loading phenotype_to_genes.txt: {}", e))?;
+ .map_err(|e| anyhow::anyhow!(
+     "error loading {}: {}",
+     path.as_ref().join("phenotype_to_genes.txt").display(),
+     e
+ ))?;

Similarly for the next error handling:

- .map_err(|e| anyhow::anyhow!("error loading hgnc_xlink.tsv: {}", e))?;
+ .map_err(|e| anyhow::anyhow!(
+     "error loading {}: {}",
+     path.as_ref().join("hgnc_xlink.tsv").display(),
+     e
+ ))?;

203-203: Remove unnecessary pub visibility in test functions.

The test function test_mode_of_inheritance_from_hpo_id is declared as pub fn, but test functions do not need to be public. Consider changing it to fn to keep the default visibility.

Apply this diff to adjust the function definition:

- pub fn test_mode_of_inheritance_from_hpo_id() {
+ fn test_mode_of_inheritance_from_hpo_id() {

236-236: Remove unnecessary pub visibility in test functions.

The test function test_mode_of_inheritance_into_pbs is declared as pub fn, but test functions do not need to be public. Consider changing it to fn to keep the default visibility.

Apply this diff to adjust the function definition:

- pub fn test_mode_of_inheritance_into_pbs() {
+ fn test_mode_of_inheritance_into_pbs() {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between bfe718d and adc54f3.

⛔ Files ignored due to path filters (4)
  • src/seqvars/query/snapshots/varfish_server_worker__seqvars__query__hpo__test__hgnc_to_xlink_load_entries.snap is excluded by !**/*.snap
  • src/seqvars/query/snapshots/varfish_server_worker__seqvars__query__hpo__test__load_hgnc_to_inheritance_map.snap is excluded by !**/*.snap
  • src/seqvars/query/snapshots/varfish_server_worker__seqvars__query__hpo__test__phenotype_to_genes_load_entries.snap is excluded by !**/*.snap
  • tests/strucvars/query/db/hpo/hgnc_xlink.tsv is excluded by !**/*.tsv
📒 Files selected for processing (6)
  • protos/varfish/v1/seqvars/output.proto (1 hunks)
  • src/seqvars/query/annonars.rs (3 hunks)
  • src/seqvars/query/hpo.rs (1 hunks)
  • src/seqvars/query/mod.rs (3 hunks)
  • src/strucvars/aggregate/cli.rs (1 hunks)
  • tests/strucvars/query/db/hpo/phenotype_to_genes.txt (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • src/strucvars/aggregate/cli.rs
  • tests/strucvars/query/db/hpo/phenotype_to_genes.txt
🧰 Additional context used
🔇 Additional comments (10)
src/seqvars/query/annonars.rs (2)

9-12: LGTM: New imports are correctly added.

The new imports from the hpo module are necessary for the added functionality and follow the existing import style.


Line range hint 1-314: Summary: Changes enhance Annotator with mode of inheritance functionality.

The modifications to src/seqvars/query/annonars.rs successfully integrate mode of inheritance data into the Annotator struct. The changes are well-implemented, maintaining consistency with the existing code style and error handling patterns. Minor suggestions for documentation and code structure improvements have been provided, but overall, the implementation is sound and ready for integration.

protos/varfish/v1/seqvars/output.proto (3)

170-185: LGTM: Well-structured enum for modes of inheritance

The new ModeOfInheritance enum is well-defined and covers the main modes of inheritance relevant to genetic studies. The inclusion of HPO IDs in comments adds valuable cross-referencing information. The enum follows Proto3 best practices by including an UNSPECIFIED option with value 0.


193-194: LGTM: Appropriate addition of modes_of_inheritance field

The addition of the modes_of_inheritance field to the GeneRelatedPhenotypes message is well-implemented. Using a repeated field of type ModeOfInheritance allows for the association of multiple inheritance modes with a gene-related phenotype, enhancing the data model with more detailed genetic information.


170-194: Summary: Excellent addition of mode of inheritance information

The changes in this file successfully implement the addition of mode of inheritance information from HPO. The new ModeOfInheritance enum and the modes_of_inheritance field in the GeneRelatedPhenotypes message work together to enhance the data model with valuable genetic inheritance information. These modifications align well with the PR objectives and follow Proto3 best practices.

The implementation allows for:

  1. Clear representation of various modes of inheritance with corresponding HPO IDs.
  2. Association of multiple inheritance modes with gene-related phenotypes.

These changes will improve the ability to analyze and interpret genetic data within the VarFish system.

src/seqvars/query/mod.rs (5)

4-4: Importing the 'hpo' module

The hpo module is correctly imported to handle Human Phenotype Ontology functionalities.


716-716: Retrieving modes of inheritance for genes

The code appropriately retrieves the modes of inheritance (mois) for the gene using the annotator.


724-724: Passing mois to the phenotypes function

The mois parameter is correctly passed to the phenotypes function, enabling modes of inheritance to be included in the phenotype annotations.


761-761: Updating phenotypes function signature

The phenotypes function signature is updated to accept mois, allowing it to process modes of inheritance.


768-773: Including modes of inheritance in phenotype annotations

The code correctly processes mois and includes the modes of inheritance in the GeneRelatedPhenotypes output.

let modes_of_inheritance = result
.entry(hgnc_id.clone())
.or_insert_with(indexmap::IndexSet::new);
modes_of_inheritance.insert_sorted(mode_of_inheritance);
Copy link
Contributor

@coderabbitai coderabbitai bot Oct 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

IndexSet does not have an insert_sorted method.

The method insert_sorted is being called on an IndexSet, but IndexSet does not provide an insert_sorted method. This will result in a compilation error. Consider using the insert method instead to add elements to the set.

Apply this diff to fix the method call:

- modes_of_inheritance.insert_sorted(mode_of_inheritance);
+ modes_of_inheritance.insert(mode_of_inheritance);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
modes_of_inheritance.insert_sorted(mode_of_inheritance);
modes_of_inheritance.insert(mode_of_inheritance);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are wrong, IndexSet, has such a function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holtgrewe, thank you for the correction. You're right; IndexSet does have an insert_sorted method. I apologize for the confusion.

🐰🧐


✏️ Learnings added
Learnt from: holtgrewe
PR: varfish-org/varfish-server-worker#493
File: src/seqvars/query/hpo.rs:0-0
Timestamp: 2024-10-09T12:42:38.460Z
Learning: In Rust, the `IndexSet` type from the `indexmap` crate provides an `insert_sorted` method.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@holtgrewe holtgrewe disabled auto-merge October 9, 2024 12:42
@holtgrewe holtgrewe enabled auto-merge (squash) October 9, 2024 12:42
@holtgrewe holtgrewe merged commit 2da2b86 into main Oct 9, 2024
11 checks passed
@holtgrewe holtgrewe deleted the feat-add-hpo-mode-of-inheritance branch October 9, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant