-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add /api/v1/seqvars/annos/query with OpenAPI (#577) #592
feat: add /api/v1/seqvars/annos/query with OpenAPI (#577) #592
Conversation
Important Review skippedReview was skipped as selected files did not have any reviewable changes. 💤 Files selected but had no reviewable changes (2)
You can disable this status message by setting the WalkthroughThis pull request introduces significant modifications to the OpenAPI schema for a genome annotation service, including the addition of new endpoints and restructuring of existing ones. Key updates involve changing the Changes
Possibly related issues
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Outside diff range and nitpick comments (9)
src/server/run/annos_range.rs (1)
180-187
: Consider refactoring version-specific Gnomad handling.The version-specific handling for GnomadExomes and GnomadGenomes contains similar patterns that could be refactored for better maintainability:
- Extract version handling into a shared function
- Use a version-to-protobuf-type mapping
- Reduce code duplication in error messages
Consider this approach:
fn get_gnomad_record_type(db_version: &str) -> Result<Box<dyn GnomadRecord>, CustomError> { match db_version.chars().next() { Some('2') => Ok(Box::new(crate::pbs::gnomad::gnomad2::Record)), Some('3') => Ok(Box::new(crate::pbs::gnomad::gnomad3::Record)), Some('4') => Ok(Box::new(crate::pbs::gnomad::gnomad4::Record)), _ => Err(CustomError::new(anyhow::anyhow!( "unsupported gnomAD version {}", db_version ))) } }Also applies to: 214-228
src/server/run/fetch.rs (4)
7-7
: Fix typo in documentation commentChange "JSOn" to "JSON" in the documentation comment.
-/// Function to fetch prost Message from a variant database into JSOn. +/// Function to fetch prost Message from a variant database into JSON.
41-68
: Consider reducing code duplication withfetch_var_protobuf_json
Both functions share similar database access and error handling logic. Consider extracting the common functionality into a private helper function to improve maintainability.
+fn fetch_var_protobuf_raw<T>( + db: &rocksdb::DBWithThreadMode<rocksdb::MultiThreaded>, + cf_name: &str, + key: keys::Var, +) -> Result<Option<T>, CustomError> +where + T: prost::Message + Default, +{ + let cf_data = db + .cf_handle(cf_name) + .unwrap_or_else(|| panic!("unknown column family: {}", cf_name)); + let key: Vec<u8> = key.into(); + + let raw_data = db + .get_cf(&cf_data, key) + .map_err(|e| CustomError::new(anyhow::anyhow!("problem querying database: {}", e)))?; + raw_data + .map(|raw_data| { + prost::Message::decode(&raw_data[..]).map_err(|e| { + CustomError::new(anyhow::anyhow!( + "problem decoding protobuf from database (cf_name={}): {}", + cf_name, + e + )) + }) + }) + .transpose() +} pub fn fetch_var_protobuf<T>( db: &rocksdb::DBWithThreadMode<rocksdb::MultiThreaded>, cf_name: &str, key: keys::Var, ) -> Result<Option<T>, CustomError> where T: prost::Message + serde::Serialize + Default, { - let cf_data = db - .cf_handle(cf_name) - .unwrap_or_else(|| panic!("unknown column family: {}", cf_name)); - let key: Vec<u8> = key.into(); - - let raw_data = db - .get_cf(&cf_data, key) - .map_err(|e| CustomError::new(anyhow::anyhow!("problem querying database: {}", e)))?; - raw_data - .map(|raw_data| { - prost::Message::decode(&raw_data[..]).map_err(|e| { - CustomError::new(anyhow::anyhow!( - "problem decoding protobuf from database (cf_name={}): {}", - cf_name, - e - )) - }) - }) - .transpose() + fetch_var_protobuf_raw(db, cf_name, key) } pub fn fetch_var_protobuf_json<T>( db: &rocksdb::DBWithThreadMode<rocksdb::MultiThreaded>, cf_name: &str, key: keys::Var, ) -> Result<Option<serde_json::Value>, CustomError> where T: prost::Message + serde::Serialize + Default, { - let cf_data = db - .cf_handle(cf_name) - .unwrap_or_else(|| panic!("unknown column family: {}", cf_name)); - let key: Vec<u8> = key.into(); - - let raw_data = db - .get_cf(&cf_data, key) - .map_err(|e| CustomError::new(anyhow::anyhow!("problem querying database: {}", e)))?; - raw_data - .map(|raw_data| { - let msg: T = prost::Message::decode(&raw_data[..]).map_err(|e| { - CustomError::new(anyhow::anyhow!( - "problem decoding protobuf from database (cf_name={}): {}", - cf_name, - e - )) - })?; + fetch_var_protobuf_raw::<T>(db, cf_name, key).and_then(|opt_msg| { + opt_msg.map(|msg| { serde_json::to_value(msg).map_err(|e| { CustomError::new(anyhow::anyhow!("problem decoding JSON from database: {e}",)) }) }) .transpose() + }) }
119-161
: Consider reducing code duplication withfetch_pos_protobuf_json
Similar to the previous suggestion, consider extracting common iterator logic into a helper function. The return type change to
Vec<T>
is a good improvement as it's more idiomatic for collections in Rust.+fn fetch_pos_protobuf_raw<T, F>( + db: &rocksdb::DBWithThreadMode<rocksdb::MultiThreaded>, + cf_name: &str, + start: keys::Pos, + stop: keys::Pos, + mut process_value: F, +) -> Result<Vec<T>, CustomError> +where + T: prost::Message + Default, + F: FnMut(T) -> Result<T, CustomError>, +{ + let stop = crate::common::keys::Pos { + chrom: stop.chrom.clone(), + pos: stop.pos, + }; + + let cf_data = db.cf_handle(cf_name).unwrap(); + let mut iter = db.raw_iterator_cf(&cf_data); + let start: Vec<u8> = start.into(); + iter.seek(&start); + + let mut result = Vec::new(); + while iter.valid() { + if let Some(raw_value) = iter.value() { + let iter_key = iter.key().unwrap(); + let iter_pos: crate::common::keys::Pos = iter_key.into(); + + if iter_pos.chrom != stop.chrom || iter_pos.pos > stop.pos { + break; + } + + let msg = prost::Message::decode(raw_value).map_err(|e| { + CustomError::new(anyhow::anyhow!( + "problem decoding protobuf from database (cf_name={}): {}", + cf_name, + e + )) + })?; + result.push(process_value(msg)?); + + iter.next(); + } + } + + Ok(result) +} pub fn fetch_pos_protobuf<T>( db: &rocksdb::DBWithThreadMode<rocksdb::MultiThreaded>, cf_name: &str, start: keys::Pos, stop: keys::Pos, ) -> Result<Vec<T>, CustomError> where T: prost::Message + serde::Serialize + Default, { - let stop = crate::common::keys::Pos { - chrom: stop.chrom.clone(), - pos: stop.pos, - }; - - let cf_data = db.cf_handle(cf_name).unwrap(); - let mut iter = db.raw_iterator_cf(&cf_data); - let start: Vec<u8> = start.into(); - iter.seek(&start); - - let mut result = Vec::new(); - while iter.valid() { - if let Some(raw_value) = iter.value() { - let iter_key = iter.key().unwrap(); - let iter_pos: crate::common::keys::Pos = iter_key.into(); - - if iter_pos.chrom != stop.chrom || iter_pos.pos > stop.pos { - break; - } - - result.push(prost::Message::decode(raw_value).map_err(|e| { - CustomError::new(anyhow::anyhow!( - "problem decoding protobuf from database (cf_name={}): {}", - cf_name, - e - )) - })?); - - iter.next(); - } - } - - Ok(result) + fetch_pos_protobuf_raw(db, cf_name, start, stop, Ok) } pub fn fetch_pos_protobuf_json<T>( db: &rocksdb::DBWithThreadMode<rocksdb::MultiThreaded>, cf_name: &str, start: keys::Pos, stop: keys::Pos, ) -> Result<Option<serde_json::Value>, CustomError> where T: prost::Message + serde::Serialize + Default, { - // ... existing implementation ... + let values = fetch_pos_protobuf_raw(db, cf_name, start, stop, |msg| { + serde_json::to_value(msg).map_err(|e| { + CustomError::new(anyhow::anyhow!("problem decoding JSON from database: {e}",)) + }) + })?; + Ok(Some(serde_json::Value::Array(values))) }
40-41
: Add more comprehensive documentationConsider adding more detailed documentation for both new functions:
- Document the expected format and schema of the protobuf messages
- Explain the relationship between position-based and variant-based queries
- Add examples of typical usage
This will help future maintainers understand the code's role in the genomic data system.
Also applies to: 118-119
src/server/run/mod.rs (1)
Line range hint
296-313
: Consider grouping related service handlers.While the handler registration is correct, consider organizing related handlers together for better maintainability. For example:
- Group annotation-related handlers (
annos_*
)- Group gene-related handlers (
genes_*
)- Group ClinVar-related handlers (
clinvar_*
)- Add comments to separate different endpoint groups
let app = App::new() .app_data(dbs.clone()) + // Annotation endpoints .service(annos_variant::handle) .service(annos_variant::handle_with_openapi) .service(annos_range::handle) .service(annos_db_info::handle) + // ClinVar endpoints .service(clinvar_sv::handle) .service(clinvar_sv::handle_with_openapi) + // Gene endpoints .service(genes_clinvar::handle) .service(genes_info::handle) .service(genes_info::handle_with_openapi) .service(genes_search::handle) .service(genes_search::handle_with_openapi) .service(genes_lookup::handle) .service(genes_lookup::handle_with_openapi) + // Version endpoint .service(versions::handle)openapi.schema.yaml (3)
45-87
: Consider adding parameter validations for genomic coordinatesThe endpoint changes appropriately focus on single variant queries. However, consider adding:
- Validation for chromosome names (e.g., 1-22, X, Y, M)
- Range validation for position (e.g., within chromosome length)
- Length validation for reference/alternative alleles
7396-7522
: Consider adding value constraints for numeric fieldsThe GnomadMtdnaRecord and related components are well-structured, but consider adding:
- Maximum values for position fields
- Value ranges for frequency/probability fields (0.0-1.0)
- Clearer documentation for when optional fields are expected to be null
Line range hint
1-7873
: LGTM! Well-structured OpenAPI schemaThe schema is well-organized with:
- Clear separation of concerns
- Consistent naming conventions
- Comprehensive documentation
- Appropriate use of required/optional fields
Consider enhancing error responses with more specific error types beyond CustomError for better error handling.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (6)
openapi.schema.yaml
(4 hunks)src/server/run/annos_range.rs
(8 hunks)src/server/run/clinvar_data.rs
(1 hunks)src/server/run/clinvar_sv.rs
(2 hunks)src/server/run/fetch.rs
(4 hunks)src/server/run/mod.rs
(4 hunks)
🔇 Additional comments (9)
src/server/run/annos_range.rs (5)
100-102
: LGTM! Appropriate protobuf type for Clinvar data.
The use of ExtractedVcvRecordList
is correct for Clinvar records, and error handling is properly maintained.
130-130
: LGTM! Consistent implementation across databases.
The changes maintain a consistent pattern across Dbsnp, Helixmtdb, and GnomadMtdna databases, each using their appropriate protobuf Record types with proper error handling.
Also applies to: 144-144, 158-158
248-248
: LGTM! Appropriate protobuf type for conservation data.
The use of RecordList
is correct for conservation records, and error handling is consistent with other sections.
Line range hint 1-265
: Ensure test coverage for JSON format changes.
While the changes are consistent throughout the file, please ensure:
- Test coverage for JSON serialization/deserialization
- Integration tests for each database type
- Performance tests comparing the old and new implementations
13-13
: Verify the impact of switching to JSON-based fetching.
The change from fetch_pos_protobuf
to fetch_pos_protobuf_json
suggests a shift in data format handling. This could affect performance and memory usage.
✅ Verification successful
The change to fetch_pos_protobuf_json
is a safe refactoring
Looking at the implementation in fetch.rs
, the change is a well-structured refactoring where:
- Both functions use the same underlying database access and protobuf decoding logic
- The new
_json
version simply adds JSON serialization as the final step - The change is consistently applied across all annotation types (Clinvar, Dbsnp, Helixmtdb, etc.)
- Error handling is properly maintained with appropriate context
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for any remaining uses of the old function name to ensure complete migration
rg "fetch_pos_protobuf[^_]" --type rust
# Look for performance-related comments or benchmarks
rg -i "performance|benchmark" -g "*.rs"
Length of output: 333
Script:
#!/bin/bash
# Search for any remaining uses of the old function name
rg "fetch_pos_protobuf" --type rust -A 2
# Search for the new function implementation to understand the changes
ast-grep --pattern 'fn fetch_pos_protobuf_json'
# Look for test files related to this functionality
fd -e rs -g "*fetch*" -g "*test*"
# Check if there are any performance-related configurations or constants
rg "FETCH_" --type rust
Length of output: 4359
Script:
#!/bin/bash
# Check the implementation differences between old and new functions
ast-grep --pattern 'fn fetch_pos_protobuf$$$'
# Look for the function definition in fetch.rs
cat src/server/run/fetch.rs
Length of output: 10570
src/server/run/clinvar_sv.rs (1)
229-242
: LGTM: Clean and idiomatic From
trait implementation
The implementation follows Rust best practices for type conversion, with safe and efficient field mapping. The use of iterator chain for variation_types conversion is particularly elegant.
src/server/run/mod.rs (1)
41-41
: Verify completeness of the new endpoint implementation.
For the new /api/v1/seqvars/annos/query
endpoint, let's verify:
- All necessary components are properly implemented
- Error handling is consistent with other endpoints
- Documentation is complete
Also applies to: 63-63, 275-277, 296-296
✅ Verification successful
Based on the shell script results, I can now generate the final response:
The new endpoint implementation appears complete and well-structured
The verification shows:
- OpenAPI documentation is properly defined in
openapi.schema.yaml
with bothSeqvarsAnnosQuery
andSeqvarsAnnosResponse
schemas - Comprehensive error handling is implemented using
CustomError
andanyhow
- All necessary components are in place:
- Query struct and implementations
- Response types and serialization
- Handler with proper error handling
- OpenAPI integration
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the completeness of the new endpoint implementation
# Check for the handler implementation
echo "Checking handler implementation..."
rg -p "handle_with_openapi.*SeqvarsAnnosQuery"
# Check for proper error handling
echo "Checking error handling..."
rg -p "Result|Error|anyhow" src/server/run/annos_variant.rs
# Check for OpenAPI documentation
echo "Checking OpenAPI documentation..."
rg -p "SeqvarsAnnos(Query|Response)"
Length of output: 4034
openapi.schema.yaml (1)
4401-4428
: LGTM! Well-structured DbSNP record definition
The DbsnpRecord component is well-defined with appropriate required fields and types.
src/server/run/clinvar_data.rs (1)
7087-7090
: LGTM: Clean and idiomatic From implementation
The implementation correctly provides the reverse conversion from ClinvarExtractedVariationType
to protobuf VariationType
, matching the existing TryFrom
implementation. The code follows Rust's idiomatic practices for type conversions.
let cf_data = db.cf_handle(cf_name).unwrap(); | ||
let mut iter = db.raw_iterator_cf(&cf_data); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replace unwrap()
calls with proper error handling
The code uses unwrap()
in several places which could lead to panics in production:
- When getting column family handle
- When accessing iterator keys
Consider using ok_or_else()
with appropriate error messages.
- let cf_data = db.cf_handle(cf_name).unwrap();
+ let cf_data = db.cf_handle(cf_name).ok_or_else(|| {
+ CustomError::new(anyhow::anyhow!("unknown column family: {}", cf_name))
+ })?;
// ... later in the code ...
- let iter_key = iter.key().unwrap();
+ let iter_key = iter.key().ok_or_else(|| {
+ CustomError::new(anyhow::anyhow!("invalid iterator key"))
+ })?;
Also applies to: 141-142
handle_impl(data, path, Into::<Request>::into(query.into_inner())) | ||
.await | ||
.map_err(|e| CustomError::new(anyhow::anyhow!("Implementaion error: {:?}", e)))? | ||
.try_into() | ||
.map_err(|e| CustomError::new(anyhow::anyhow!("Response conversion error: {:?}", e)))?, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Improve error handling and messages
The current error handling implementation has several areas for improvement:
- There's a typo in "Implementaion error"
- Using {:?} debug formatting in error messages might expose internal details
- The error chain could be more concise
Consider applying this improvement:
- handle_impl(data, path, Into::<Request>::into(query.into_inner()))
- .await
- .map_err(|e| CustomError::new(anyhow::anyhow!("Implementaion error: {:?}", e)))?
- .try_into()
- .map_err(|e| CustomError::new(anyhow::anyhow!("Response conversion error: {:?}", e)))?,
+ let response = handle_impl(data, path, query.into_inner().into()).await
+ .map_err(|e| CustomError::new(anyhow::anyhow!("Implementation error: {}", e)))?;
+ response.try_into()
+ .map_err(|e| CustomError::new(anyhow::anyhow!("Response conversion error: {}", e)))?,
This refactor:
- Fixes the typo in "Implementation"
- Uses
{}
instead of{:?}
for cleaner error messages - Simplifies the conversion chain
- Improves readability with intermediate variable
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
handle_impl(data, path, Into::<Request>::into(query.into_inner())) | |
.await | |
.map_err(|e| CustomError::new(anyhow::anyhow!("Implementaion error: {:?}", e)))? | |
.try_into() | |
.map_err(|e| CustomError::new(anyhow::anyhow!("Response conversion error: {:?}", e)))?, | |
let response = handle_impl(data, path, query.into_inner().into()).await | |
.map_err(|e| CustomError::new(anyhow::anyhow!("Implementation error: {}", e)))?; | |
response.try_into() | |
.map_err(|e| CustomError::new(anyhow::anyhow!("Response conversion error: {}", e)))?, |
SeqvarsAnnosQuery, | ||
SeqvarsAnnosResponse, | ||
// TODO: more here! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Complete the OpenAPI schema documentation.
The TODO comment suggests that the OpenAPI schema documentation is incomplete. This could lead to:
- Missing schema validations
- Incomplete API documentation
- Potential runtime issues if the schema doesn't match the implementation
Please complete the schema documentation by:
- Adding all required components
- Removing the TODO comment once done
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Refactor