Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: refactor db subset CLI, add subset by VCF and subset by TxId options #641

Merged
merged 1 commit into from
Dec 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions src/annotate/seqvars/provider.rs
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,48 @@ impl TxIntervalTrees {

(contig_to_idx, trees)
}

pub fn get_tx_for_region(
&self,
tx_seq_db: &TxSeqDatabase,
alt_ac: &str,
_alt_aln_method: &str,
start_i: i32,
end_i: i32,
) -> Result<Vec<TxForRegionRecord>, Error> {
let contig_idx = *self
.contig_to_idx
.get(alt_ac)
.ok_or(Error::NoTranscriptFound(alt_ac.to_string()))?;
let query = start_i..end_i;
let tx_idxs = self.trees[contig_idx].find(query);

Ok(tx_idxs
.iter()
.map(|entry| {
let tx = &tx_seq_db.tx_db.as_ref().expect("no tx_db?").transcripts
[*entry.data() as usize];
assert_eq!(
tx.genome_alignments.len(),
1,
"Can only have one alignment in Mehari"
);
let alt_strand = tx.genome_alignments.first().unwrap().strand;
TxForRegionRecord {
tx_ac: tx.id.clone(),
alt_ac: alt_ac.to_string(),
alt_strand: match Strand::try_from(alt_strand).expect("invalid strand") {
Strand::Plus => 1,
Strand::Minus => -1,
_ => unreachable!("invalid strand {}", alt_strand),
},
alt_aln_method: ALT_ALN_METHOD.to_string(),
start_i,
end_i,
}
})
Comment on lines +126 to +139
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Avoid panics with assert and provide meaningful errors

Using assert_eq! and unwrap() can cause the program to panic if conditions are not met. Instead, handle these cases gracefully by returning errors.

Replace assert_eq! and unwrap() with proper error handling:

-let tx = &tx_seq_db.tx_db.as_ref().expect("no tx_db?").transcripts
-    [*entry.data() as usize];
-assert_eq!(
-    tx.genome_alignments.len(),
-    1,
-    "Can only have one alignment in Mehari"
-);
-let alt_strand = tx.genome_alignments.first().unwrap().strand;
+let tx_db = tx_seq_db.tx_db.as_ref().ok_or_else(|| {
+    Error::InvalidInput("Transcript database is missing".to_string())
+})?;
+let tx = tx_db.transcripts.get(*entry.data() as usize).ok_or_else(|| {
+    Error::InvalidInput("Transcript index out of bounds".to_string())
+})?;
+if tx.genome_alignments.len() != 1 {
+    return Err(Error::InvalidInput(format!(
+        "Expected one genome alignment, found {}",
+        tx.genome_alignments.len()
+    )));
+}
+let alt_strand = tx.genome_alignments.first().ok_or_else(|| {
+    Error::InvalidInput("Genome alignment is missing".to_string())
+})?.strand;

This change prevents unexpected panics and provides useful error messages to the caller.

Committable suggestion skipped: line range outside the PR's diff.

.collect())
}
Comment on lines +101 to +141
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Duplicate implementation of get_tx_for_region

The get_tx_for_region method added to TxIntervalTrees appears to duplicate logic already present in the Provider implementation. This duplication can lead to maintenance difficulties and inconsistencies.

Consider refactoring to reuse the get_tx_for_region logic from TxIntervalTrees within Provider. For example, modify the Provider's method to delegate to tx_trees.get_tx_for_region:

fn get_tx_for_region(
    &self,
    alt_ac: &str,
    _alt_aln_method: &str,
    start_i: i32,
    end_i: i32,
) -> Result<Vec<TxForRegionRecord>, Error> {
    self.tx_trees.get_tx_for_region(
        &self.tx_seq_db,
        alt_ac,
        _alt_aln_method,
        start_i,
        end_i,
    )
}

This approach reduces code duplication and centralizes the logic.

}

/// Configuration for constructing the `Provider`.
Expand Down
Loading
Loading