-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: percent encoding clinvar annotation (#553) #566
Conversation
WalkthroughThe changes in this pull request involve modifications to the Changes
Sequence Diagram(s)sequenceDiagram
participant ClinvarAnnotator
participant VCFRecord
participant VCFEncoding
ClinvarAnnotator->>VCFEncoding: Call percent_encode(s)
VCFEncoding-->>ClinvarAnnotator: Return encoded string
ClinvarAnnotator->>VCFRecord: Insert encoded value into info field
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (3)
Cargo.toml (1)
Line range hint
41-63
: Summary of dependency changesThe changes in this file primarily involve experimenting with various dependencies:
- Temporary additions (later removed):
coz
,dhat
,once_cell
,pbjson
, andpbjson-types
.- Permanent addition:
percent-encoding
.These changes suggest that some profiling or experimentation was done during development, but only the
percent-encoding
addition was kept for the final version. This aligns with the PR's objective of fixing percent encoding for ClinVar annotations.To ensure code quality and prevent potential issues:
- Double-check that the removal of temporary dependencies doesn't leave any unused imports or code in the project.
- Verify that the
percent-encoding
library is used correctly in the relevant parts of the code dealing with ClinVar annotations.- Consider updating the project documentation to reflect any changes in dependencies or build processes.
src/annotate/seqvars/mod.rs (2)
1621-1638
: LGTM! Consider adding a brief module-level documentation.The
vcf_encoding
module implementation looks correct and adheres to the VCF specification for character encoding. Good job on using thepercent_encoding
crate for this task.Consider adding a brief module-level documentation comment (//!) to provide an overview of the module's purpose and usage.
1699-1704
: LGTM! Consider adding a brief comment explaining the encoding.The implementation of percent encoding for ClinVar classification descriptions is correct and addresses the issue mentioned in the comment. This change should resolve problems with special characters in VCF annotations.
Consider adding a brief inline comment explaining why the percent encoding is necessary here, for the benefit of future maintainers. For example:
// Percent encode the classification description to ensure // special characters are properly handled in VCF annotations Some(vcf_encoding::percent_encode(&value).to_string())
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
Cargo.lock
is excluded by!**/*.lock
📒 Files selected for processing (2)
- Cargo.toml (2 hunks)
- src/annotate/seqvars/mod.rs (2 hunks)
🧰 Additional context used
🔇 Additional comments (6)
Cargo.toml (5)
59-59
: Confirm the removal ofonce_cell
dependency.The
once_cell = "1.20.1"
dependency was added and then removed. Please confirm:
- Was this removal intentional?
- If yes, have all usages of
once_cell
been replaced or removed from the codebase?#!/bin/bash # Check if once_cell is still used anywhere in the project rg --type rust '\bonce_cell\b'
63-63
: Approve addition ofpercent-encoding
dependency.The addition of
percent-encoding = "2.3"
aligns with the PR objective to fix percent encoding for ClinVar annotations. This change looks good.To ensure proper usage, you may want to run:
#!/bin/bash # Check where percent-encoding is used in the project rg --type rust '\bpercent_encoding\b'
61-62
: Clarify the usage ofpbjson
andpbjson-types
.The dependencies
pbjson = "0.7"
andpbjson-types = "0.7"
were added and then removed. However,pbjson-build = "0.7.0"
remains in the build dependencies. Please clarify:
- Are
pbjson
andpbjson-types
still needed at runtime? If not, confirm their removal.- Is
pbjson-build
sufficient for your Protocol Buffers needs?#!/bin/bash # Check if pbjson or pbjson-types are used anywhere in the project rg --type rust '\b(pbjson|pbjson_types)\b'
41-41
: Verify the intention behind adding and removingcoz
.The dependency
coz = "0.1.3"
was added and then removed in this PR. Was this intentional? If profiling is no longer needed, consider removing this line completely.✅ Verification successful
Dependency
coz = "0.1.3"
has been successfully removed and is not used in the project.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if coz is used anywhere in the project rg --type rust '\bcoz\b'Length of output: 356
46-46
: Clarify the usage ofdhat
and its feature flag.The dependency
dhat = "0.3.3"
was added and then removed. However, there's still adhat-heap
feature flag. Please clarify:
- Is
dhat
still needed? If not, consider removing this line.- If
dhat
is removed, should thedhat-heap
feature also be removed?✅ Verification successful
Clarify the intent behind the
dhat-heap
feature and thedhat
dependency.The dependency
dhat = "0.3.3"
is still used insrc/main.rs
under thedhat-heap
feature flag. Please confirm:
Is the
dhat-heap
feature still required?
If not, consider removing both the dependency and the feature flag.If the feature is still needed:
Ensure that the dependency is correctly retained and documented.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check if dhat is used anywhere in the project rg --type rust '\bdhat\b'Length of output: 356
Script:
#!/bin/bash # Corrected: Check if dhat is used anywhere in the Rust project rg --type rust '\bdhat\b'Length of output: 228
src/annotate/seqvars/mod.rs (1)
Line range hint
1-1740
: Overall, these changes effectively address the character encoding issue for ClinVar annotations.The addition of the
vcf_encoding
module and its usage in theClinvarAnnotator
struct successfully implements percent encoding for ClinVar classification descriptions. This should resolve the issue with special characters in VCF annotations (as referenced in varfish-org/varfish-server-worker#485).The changes are focused, well-implemented, and integrate smoothly with the existing code. They improve the robustness of the VCF annotation process, particularly for ClinVar data.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #566 +/- ##
==========================================
+ Coverage 73.99% 74.01% +0.02%
==========================================
Files 26 26
Lines 9849 9857 +8
==========================================
+ Hits 7288 7296 +8
Misses 2561 2561 ☔ View full report in Codecov by Sentry. |
Summary by CodeRabbit
New Features
Chores