-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: suppress unknown consequence errors #491
Conversation
WalkthroughThe changes in this pull request involve modifications across three primary files: Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Ingest
participant Query
participant Strucvars
User->>Ingest: seqvars ingest command
Ingest->>Ingest: Load transaction database
Ingest-->>User: Processed VCF files
User->>Query: Request variant consequences
Query->>Query: Filter consequences
Query-->>User: Return valid consequences
User->>Strucvars: strucvars query command
Strucvars->>Strucvars: Load databases
Strucvars->>Strucvars: Process structural variants
Strucvars-->>User: Return structural variant results
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #491 +/- ##
===================================
- Coverage 74% 74% -1%
===================================
Files 42 42
Lines 7365 7366 +1
===================================
Hits 5470 5470
- Misses 1895 1896 +1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (1)
src/seqvars/ingest/mod.rs (1)
326-326
: LGTM! Consider using a const for the file name.The change to remove the
&
operator is correct, as it now passes an ownedString
toload_tx_db
. This aligns with an updated function signature and improves efficiency by avoiding a temporary reference.For improved clarity and maintainability, consider extracting the "txs.bin.zst" file name into a constant:
+const TRANSCRIPT_DB_FILENAME: &str = "txs.bin.zst"; let tx_db = mehari::annotate::seqvars::load_tx_db(format!( - "{}/{}/txs.bin.zst", + "{}/{}/{}", &args.path_mehari_db, - path_component(args.genomebuild) + path_component(args.genomebuild), + TRANSCRIPT_DB_FILENAME ))?;This change would make it easier to update the file name in the future if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
src/strucvars/query/mod.rs (5)
Line range hint
12-12
: Inconsistent Use of Logging FrameworksThe code imports and uses both
log
andtracing
crates for logging:
- Line 12:
use log::warn;
- Line 571:
tracing::info!("args_common = {:?}", &args_common);
- Line 623:
tracing::info!("Loading mehari tx database...");
Mixing different logging frameworks can lead to inconsistencies in log formatting and output destinations. It's recommended to standardize on a single logging framework throughout the codebase.
Suggested Action: Refactor the code to consistently use the
tracing
crate for all logging purposes, replacing any usage oflog
macros with theirtracing
equivalents.Also applies to: 623-623, 571-571
Line range hint
12-12
: Inconsistent Error Handling and Lack of Warnings
- Line 12: Imports
log::warn
but does not use it consistently throughout the code.- Lines 158-185 (
resolve_hgvs_id
function): When anhgvs_id
is not found, the function returns aGene
with mostlyNone
fields without logging a warning or error.- Lines 371-378 (
compute_tx_effects_for_breakpoint
function): If the chromosome is not found inchrom_to_acc
, the function silently returns a default value without logging.Suggested Action:
For
resolve_hgvs_id
: Add a warning log when anhgvs_id
cannot be resolved to inform users of potential data issues.if let Some(record_idxs) = record_idxs { // Existing code... } else { warn!("HGVS ID '{}' could not be resolved", hgvs_id); // Existing code... }For
compute_tx_effects_for_breakpoint
: Log a warning when the chromosome is not found.let chrom = chrom_to_acc.get(&annonars::common::cli::canonicalize(&sv.chrom)); if chrom.is_none() { warn!("Chromosome '{}' not found in chrom_to_acc", sv.chrom); return Default::default(); }Ensure that the
log::warn
is replaced withtracing::warn
if standardizing on thetracing
crate.Also applies to: 158-185, 371-378
Line range hint
465-481
: Possible Panic Due to Unchecked Chromosome IndexIn the
overlapping_hgnc_ids
function:let tree = &tx_idx.trees[chrom_idx];Accessing
tx_idx.trees
without checking ifchrom_idx
is within bounds may lead to a panic if an invalidchrom_idx
is provided.Suggested Action: Add validation to ensure
chrom_idx
is within the bounds oftx_idx.trees
.if chrom_idx >= tx_idx.trees.len() { return Vec::new(); // or handle the error appropriately } let tree = &tx_idx.trees[chrom_idx];Alternatively, match on
tx_idx.trees.get(chrom_idx)
to safely access the tree.
Line range hint
548-562
: Error Context Missing inload_databases
FunctionThe
load_databases
function propagates errors from multiple database loading functions using the?
operator:Ok(InMemoryDbs { bg_dbs: load_bg_dbs(path_worker_db, genome_release)?, patho_dbs: load_patho_dbs(path_worker_db, genome_release)?, tad_sets: load_tads(path_worker_db, genome_release, max_tad_distance)?, masked: load_masked_dbs(path_worker_db, genome_release)?, genes: load_gene_db(path_worker_db, genome_release)?, clinvar_sv: load_clinvar_sv(path_worker_db, genome_release)?, })If an error occurs, it might be unclear which database failed to load.
Suggested Action: Add context to each error using the
with_context
method from theanyhow
crate to indicate which database failed.Example:
Ok(InMemoryDbs { bg_dbs: load_bg_dbs(path_worker_db, genome_release) .with_context(|| "Failed to load background databases")?, patho_dbs: load_patho_dbs(path_worker_db, genome_release) .with_context(|| "Failed to load pathogenic databases")?, // ... similarly for other databases })This will provide more informative error messages, aiding in debugging.
Line range hint
725-753
: Missing Assertions in the Test ModuleIn the
smoke_test
function:insta::assert_snapshot!(std::fs::read_to_string(args.path_output.as_str())?);There's an assertion to compare the output file to a snapshot. However, there are no additional assertions to verify the test results or to clean up the temporary directory after the test runs.
Suggested Action: Consider adding assertions to check for expected outcomes and ensure that the temporary directory is properly managed.
// Additional assertions can be added here assert!(std::path::Path::new(&args.path_output).exists(), "Output file was not created"); // Clean up the temporary directory at the end of the test drop(tmpdir);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- src/seqvars/query/mod.rs (5 hunks)
- src/strucvars/query/mod.rs (1 hunks)
🧰 Additional context used
🔇 Additional comments (5)
src/seqvars/query/mod.rs (4)
299-299
: Improved tracing for better debuggingThe addition of this trace log is a good improvement. It provides more detailed information during the processing of each record, which can be valuable for debugging and monitoring the application's behavior.
427-427
: Enhanced logging for file operationsThis addition of a debug log is beneficial. It clearly indicates when the program starts writing to the noheader file, which can be useful for tracking the progress of the operation and for debugging purposes.
481-481
: Improved logging for output file writingThis addition of a debug log is a good improvement. It clearly indicates when the program starts writing to the output file, which enhances the ability to track the progress of the operation and aids in debugging.
534-545
: 🛠️ Refactor suggestionConsider refactoring consequence filtering logic
While the formatting has been improved, the underlying logic remains the same. As suggested in a previous review, consider refactoring this consequence filtering logic into a helper function. This would enhance maintainability and reduce code duplication, especially since similar logic appears elsewhere in the file (around lines 743-751).
Here's a reminder of the suggested refactoring:
- Create a helper function:
fn convert_consequence<T>(csq: T) -> Option<i32> where T: TryInto<pbs_query::Consequence>, { csq.try_into().ok().map(|csq| csq as i32) }
- Update this code segment:
- .filter_map(|(csq, count)| -> Option<pbs_output::ConsequenceCount> { - // We ignore consequences that don't have a mapping into the protobuf. - if let Ok(csq) = TryInto::<pbs_query::Consequence>::try_into(*csq) { - Some(pbs_output::ConsequenceCount { - consequence: csq as i32, - count: *count as u32, - }) - } else { - None - } - }) + .filter_map(|(csq, count)| { + convert_consequence(*csq).map(|csq| pbs_output::ConsequenceCount { + consequence: csq, + count: *count as u32, + }) + })This refactoring would also apply to the similar code around lines 743-751.
src/strucvars/query/mod.rs (1)
255-255
: Potential Performance Impact Due to Trace-Level LoggingLine 255 introduces a
trace
-level logging statement:tracing::trace!("processing record {:?}", record_sv);Using
trace
-level logging can generate a large volume of log data, which may impact performance and clutter log files, especially when processing many records.Suggested Action: Verify if this granular level of logging is necessary in a production environment. If detailed logging is required for debugging purposes, consider:
- Making the logging level configurable via command-line arguments or configuration files.
- Using the
debug
level instead if it provides sufficient detail.
Summary by CodeRabbit
New Features
seqvars ingest
subcommand for improved processing of VCF files and variant data.Bug Fixes
Documentation