Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: percent encoding clinvar annotation (#553) #566

Merged
merged 1 commit into from
Oct 8, 2024

Conversation

holtgrewe
Copy link
Contributor

@holtgrewe holtgrewe commented Oct 8, 2024

Summary by CodeRabbit

  • New Features

    • Introduced a new module for percent encoding strings in VCF annotations.
    • Enhanced ClinVar annotation handling to ensure proper encoding of special characters.
  • Chores

    • Updated dependencies in the project to improve functionality, including libraries for profiling and JSON handling.

@holtgrewe holtgrewe linked an issue Oct 8, 2024 that may be closed by this pull request
Copy link
Contributor

coderabbitai bot commented Oct 8, 2024

Walkthrough

The changes in this pull request involve modifications to the Cargo.toml file and the addition of a new module in the Rust package "mehari." In Cargo.toml, several dependencies were added and subsequently removed, indicating a restructuring of the dependency management. The new module vcf_encoding introduces functions for percent encoding strings, specifically for VCF annotations, and updates the ClinvarAnnotator class to utilize this new functionality for encoding special characters in classification descriptions.

Changes

File Change Summary
Cargo.toml Added dependencies: coz = "0.1.3", dhat = "0.3.3", once_cell = "1.20.1", pbjson = "0.7", pbjson-types = "0.7"; removed same dependencies.
src/annotate/seqvars/mod.rs Added new module vcf_encoding with function percent_encode; modified ClinvarAnnotator to use percent_encode for encoding values in VCF records.

Sequence Diagram(s)

sequenceDiagram
    participant ClinvarAnnotator
    participant VCFRecord
    participant VCFEncoding

    ClinvarAnnotator->>VCFEncoding: Call percent_encode(s)
    VCFEncoding-->>ClinvarAnnotator: Return encoded string
    ClinvarAnnotator->>VCFRecord: Insert encoded value into info field
Loading

Poem

In the meadow where changes bloom,
New dependencies make room.
With encoding, special characters play,
In VCF records, they find their way.
Hops of joy, a leap in code,
For every rabbit, a new road! 🐇✨


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
Cargo.toml (1)

Line range hint 41-63: Summary of dependency changes

The changes in this file primarily involve experimenting with various dependencies:

  1. Temporary additions (later removed): coz, dhat, once_cell, pbjson, and pbjson-types.
  2. Permanent addition: percent-encoding.

These changes suggest that some profiling or experimentation was done during development, but only the percent-encoding addition was kept for the final version. This aligns with the PR's objective of fixing percent encoding for ClinVar annotations.

To ensure code quality and prevent potential issues:

  1. Double-check that the removal of temporary dependencies doesn't leave any unused imports or code in the project.
  2. Verify that the percent-encoding library is used correctly in the relevant parts of the code dealing with ClinVar annotations.
  3. Consider updating the project documentation to reflect any changes in dependencies or build processes.
src/annotate/seqvars/mod.rs (2)

1621-1638: LGTM! Consider adding a brief module-level documentation.

The vcf_encoding module implementation looks correct and adheres to the VCF specification for character encoding. Good job on using the percent_encoding crate for this task.

Consider adding a brief module-level documentation comment (//!) to provide an overview of the module's purpose and usage.


1699-1704: LGTM! Consider adding a brief comment explaining the encoding.

The implementation of percent encoding for ClinVar classification descriptions is correct and addresses the issue mentioned in the comment. This change should resolve problems with special characters in VCF annotations.

Consider adding a brief inline comment explaining why the percent encoding is necessary here, for the benefit of future maintainers. For example:

// Percent encode the classification description to ensure
// special characters are properly handled in VCF annotations
Some(vcf_encoding::percent_encode(&value).to_string())
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between caed9b9 and 9c386d7.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • Cargo.toml (2 hunks)
  • src/annotate/seqvars/mod.rs (2 hunks)
🧰 Additional context used
🔇 Additional comments (6)
Cargo.toml (5)

59-59: Confirm the removal of once_cell dependency.

The once_cell = "1.20.1" dependency was added and then removed. Please confirm:

  1. Was this removal intentional?
  2. If yes, have all usages of once_cell been replaced or removed from the codebase?
#!/bin/bash
# Check if once_cell is still used anywhere in the project
rg --type rust '\bonce_cell\b'

63-63: Approve addition of percent-encoding dependency.

The addition of percent-encoding = "2.3" aligns with the PR objective to fix percent encoding for ClinVar annotations. This change looks good.

To ensure proper usage, you may want to run:

#!/bin/bash
# Check where percent-encoding is used in the project
rg --type rust '\bpercent_encoding\b'

61-62: Clarify the usage of pbjson and pbjson-types.

The dependencies pbjson = "0.7" and pbjson-types = "0.7" were added and then removed. However, pbjson-build = "0.7.0" remains in the build dependencies. Please clarify:

  1. Are pbjson and pbjson-types still needed at runtime? If not, confirm their removal.
  2. Is pbjson-build sufficient for your Protocol Buffers needs?
#!/bin/bash
# Check if pbjson or pbjson-types are used anywhere in the project
rg --type rust '\b(pbjson|pbjson_types)\b'

41-41: Verify the intention behind adding and removing coz.

The dependency coz = "0.1.3" was added and then removed in this PR. Was this intentional? If profiling is no longer needed, consider removing this line completely.

✅ Verification successful

Dependency coz = "0.1.3" has been successfully removed and is not used in the project.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if coz is used anywhere in the project
rg --type rust '\bcoz\b'

Length of output: 356


46-46: Clarify the usage of dhat and its feature flag.

The dependency dhat = "0.3.3" was added and then removed. However, there's still a dhat-heap feature flag. Please clarify:

  1. Is dhat still needed? If not, consider removing this line.
  2. If dhat is removed, should the dhat-heap feature also be removed?
✅ Verification successful

Clarify the intent behind the dhat-heap feature and the dhat dependency.

The dependency dhat = "0.3.3" is still used in src/main.rs under the dhat-heap feature flag. Please confirm:

  1. Is the dhat-heap feature still required?
    If not, consider removing both the dependency and the feature flag.

  2. If the feature is still needed:
    Ensure that the dependency is correctly retained and documented.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if dhat is used anywhere in the project
rg --type rust '\bdhat\b'

Length of output: 356


Script:

#!/bin/bash
# Corrected: Check if dhat is used anywhere in the Rust project
rg --type rust '\bdhat\b'

Length of output: 228

src/annotate/seqvars/mod.rs (1)

Line range hint 1-1740: Overall, these changes effectively address the character encoding issue for ClinVar annotations.

The addition of the vcf_encoding module and its usage in the ClinvarAnnotator struct successfully implements percent encoding for ClinVar classification descriptions. This should resolve the issue with special characters in VCF annotations (as referenced in varfish-org/varfish-server-worker#485).

The changes are focused, well-implemented, and integrate smoothly with the existing code. They improve the robustness of the VCF annotation process, particularly for ClinVar data.

Copy link

codecov bot commented Oct 8, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 74.01%. Comparing base (bf54266) to head (9c386d7).
Report is 7 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #566      +/-   ##
==========================================
+ Coverage   73.99%   74.01%   +0.02%     
==========================================
  Files          26       26              
  Lines        9849     9857       +8     
==========================================
+ Hits         7288     7296       +8     
  Misses       2561     2561              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@holtgrewe holtgrewe merged commit 6540520 into main Oct 8, 2024
11 checks passed
@holtgrewe holtgrewe deleted the 533-clinvar-annotation-must-escape-string branch October 8, 2024 10:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ClinVar annotation must escape string
1 participant