feat!: include source information in transcript database #615

tedil · 2024-11-11T10:19:30Z

Release-as: 0.30.0
#601

Summary by CodeRabbit

Release Notes

New Features
- Introduced new enumerations for transcript database assembly and source.
- Added a SourceVersion message type for encapsulating version information.
- Enhanced command line arguments for specifying genome assembly details and transcript sources.
Bug Fixes
- Improved assembly consistency checks during database merging.
Refactor
- Streamlined constructors by removing unnecessary parameters.
- Centralized assembly management within the TxSeqDatabase.
Documentation
- Updated comments and documentation to reflect changes in argument structures and functionality.

…source-information

coderabbitai · 2024-11-11T10:19:37Z

Warning

Rate limit exceeded

@tedil has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 17 minutes and 24 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 8ff02c9 and dbf7431.

Walkthrough

The pull request introduces significant changes to the handling of assembly and source information in various components of the codebase. New enumerations and message structures are added to the protocol buffer definitions, enhancing the representation of transcript databases. Additionally, several constructors and method signatures are modified to remove the explicit assembly parameter, streamlining the initialization process across multiple modules. The changes also include updates to command-line argument structures and improvements in version handling, ensuring a more cohesive and maintainable codebase.

Changes

File Path	Change Summary
protos/mehari/txs.proto	Added enums `Assembly` and `Source`, new message `SourceVersion`, and updated `TxSeqDatabase` structure.
src/annotate/seqvars/csq.rs	Removed `Assembly` parameter from `ConsequencePredictor::new`, adjusted internal logic and tests.
src/annotate/seqvars/mod.rs	Updated command line arguments and removed `assembly` parameter from `fill_coords` and `setup_annotator`.
src/annotate/seqvars/provider.rs	Removed `assembly` parameter from `TxIntervalTrees::new` and `build_indices`, adjusted `Provider` methods.
src/db/create/mod.rs	Updated `Args` struct to replace `genome_release` with `assembly` and added new fields; modified `TranscriptLoader`.
src/db/merge.rs	Enhanced `merge_transcript_databases` to check assembly consistency using `itertools`.
src/db/mod.rs	Introduced `TranscriptDatabase` trait with `assembly` method for `TxSeqDatabase`.
src/db/subset/mod.rs	Renamed `genome_release` field to `source_version` in `TxSeqDatabase`.
src/server/run/actix_server/versions.rs	Added `version_cdot` field to `DataVersionEntry`, updated version handling logic.
src/server/run/mod.rs	Removed `assembly` parameter from `MehariProvider` and `SeqvarConsequencePredictor` constructors.
src/verify/seqvars.rs	Removed `assembly` parameter from `MehariProvider` and `ConsequencePredictor` constructors.

Possibly related PRs

feat!: provide sub command "server run" (#600) #602: Restructuring of command-line commands, specifically moving the "run-server" command to "server run," related to handling of assembly and source information.
feat: /api/v1/strucvars/csq endpoint with OpenAPI (#607) #612: Introduction of a new API endpoint for querying structural variant consequences, which involves handling assembly information relevant to the changes in this PR.

🐇 In the meadow where data flows,
New enums and messages, how it grows!
Assembly and sources, now clearer to see,
A hop and a skip, as smooth as can be!
With each little change, we refine and we play,
In the world of transcripts, we leap every day! 🌼

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

holtgrewe

LGTM, see comments

protos/mehari/txs.proto

src/db/create/mod.rs

src/db/merge.rs

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (13)

src/db/mod.rs (2)
13-15: Add documentation for the new trait and method.

Consider adding documentation comments to:

Explain the purpose of the TranscriptDatabase trait

Document the assembly() method's behavior and return value

Include usage examples if applicable

Example:
/// Represents a database containing transcript information with assembly metadata.
pub trait TranscriptDatabase {
    /// Returns the genome assembly version used in this transcript database.
    fn assembly(&self) -> Assembly;
}
17-31: Consider centralizing assembly type mapping.

The current implementation directly maps between pbs::txs::Assembly and biocommons_bioutils::assemblies::Assembly. Consider:

Creating a central location for assembly type conversions

Documenting supported assemblies and their relationships

Adding tests for assembly conversions

This would make it easier to maintain and extend assembly support in the future.
src/db/merge.rs (2)
31-39: Consider using Result instead of assertions for better error handling.

While the assembly verification logic is correct, using assert! for business logic validation can be improved. Consider returning Result with descriptive error messages instead, which would provide better error handling for end users.
-        assert!(first.source_version.iter().map(|v| v.assembly).all_equal());
+        if !first.source_version.iter().map(|v| v.assembly).all_equal() {
+            return Err(anyhow::anyhow!("Inconsistent assembly versions in first database"));
+        }
         let assembly = first
             .source_version
             .first()
             .expect("At least one source_version entry expected")
             .assembly;
-        assert!(others
+        if !others
             .iter()
             .all(|db| db.source_version.iter().all(|v| v.assembly == assembly)));
+        {
+            return Err(anyhow::anyhow!("All databases must use the same assembly version"));
+        }
Line range hint 74-77: Consider using with_capacity for HashMap initialization.

Since we know the size of the other database's gene-to-transcript mappings, we can optimize memory allocation by pre-allocating the HashMap.
-            let mut other_gene_to_tx: HashMap<_, _> = other_tx_db
+            let mut other_gene_to_tx: HashMap<_, _> = HashMap::with_capacity(other_tx_db.gene_to_tx.len());
+            other_gene_to_tx.extend(other_tx_db
                 .gene_to_tx
                 .into_iter()
-                .map(|g2t| (g2t.gene_id.clone(), g2t))
-                .collect();
+                .map(|g2t| (g2t.gene_id.clone(), g2t)));
src/server/run/mod.rs (1)

138-143: LGTM: Clean architectural improvement.

The removal of explicit assembly configuration in favor of provider-based handling is a good architectural improvement that:

Reduces parameter coupling

Centralizes assembly information handling

Maintains backward compatibility through the existing genome_release mapping

The changes align well with the PR's objective of improving source information handling in the transcript database.

Consider documenting this architectural change in the project's architecture documentation to help future maintainers understand the design decisions.
src/server/run/actix_server/versions.rs (1)
106-113: Consider optimizing the filter_map operation.

The current implementation uses filter_map with bool::then, which can be more efficiently written using separate filter and map operations.
-            .filter_map(|v| (v.source_name == source_name).then(|| v.source_version.clone()))
+            .filter(|&v| v.source_name == source_name)
+            .map(|v| v.source_version.clone())
🧰 Tools

🪛 GitHub Check: clippy

[warning] 109-109: usage of bool::then in filter_map
warning: usage of bool::then in filter_map
--> src/server/run/actix_server/versions.rs:109:18
|
109 | .filter_map(|v| (v.source_name == source_name).then(|| v.source_version.clone()))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use filter then map instead: filter(|&v| (v.source_name == source_name)).map(|v| v.source_version.clone())
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#filter_map_bool_then
= note: #[warn(clippy::filter_map_bool_then)] on by default
protos/mehari/txs.proto (2)
30-38: Consider future transcript sources.

The enum covers the main transcript sources (RefSeq and Ensembl). Consider if we should document a plan for handling additional sources that might be needed in the future (e.g., GENCODE, UCSC).

40-54: Rename source_name field for consistency.

The field name source_name (line 49) is inconsistent with the enum type Source. Consider renaming it to source for better alignment with the type name and other field naming patterns.
    // Source, either RefSeq or Ensembl (or Unknown).
-    Source source_name = 4;
+    Source source = 4;
src/db/subset/mod.rs (1)
Line range hint 28-120: Consider enhancing error handling and modularity.

The subset_tx_db function handles complex data structure manipulation. Consider these improvements:

Extract the transcript filtering logic into a separate function for better maintainability

Add more descriptive error messages when tx_db or seq_db are missing

Consider adding tests for edge cases (empty gene symbols list, non-existent genes)

Example improvement for error handling:
-    let container_tx_db = container.tx_db.as_ref().expect("no tx_db");
+    let container_tx_db = container.tx_db.as_ref()
+        .ok_or_else(|| anyhow::anyhow!("Missing transcript database in container"))?;
src/annotate/seqvars/csq.rs (1)
1240-1241: Consider consolidating test setup code.

Multiple test functions have similar provider and predictor initialization code. Consider extracting this common setup into a helper function to reduce code duplication.

Example helper function:
fn create_test_predictor(config: Option<Config>) -> Result<ConsequencePredictor, anyhow::Error> {
    let tx_path = "tests/data/annotate/db/grch37/txs.bin.zst";
    let tx_db = load_tx_db(tx_path)?;
    let provider = Arc::new(MehariProvider::new(tx_db, Default::default()));
    Ok(ConsequencePredictor::new(provider, config.unwrap_or_default()))
}
Also applies to: 1298-1299, 1552-1553
src/annotate/seqvars/mod.rs (1)
Line range hint 1944-1971: Consider enhancing error handling for annotator setup.

The function correctly handles the assembly changes and properly initializes annotators based on provided sources. However, consider adding more specific error context for each annotator initialization failure.

Consider wrapping errors with additional context:
 if let Some(rocksdb_path) = &args.sources.frequencies {
-    annotators.push(FrequencyAnnotator::from_path(rocksdb_path).map(AnnotatorEnum::Frequency)?)
+    annotators.push(
+        FrequencyAnnotator::from_path(rocksdb_path)
+            .map(AnnotatorEnum::Frequency)
+            .with_context(|| format!("Failed to initialize frequency annotator from {}", rocksdb_path))?
+    )
 }
src/annotate/seqvars/provider.rs (2)
338-345: Improve formatting of data_version for clarity

The current data_version concatenates tx_seq_db.version, formatted source_version, and config without clear separators and uses debug formatting, which might reduce readability. Consider adding separators and using consistent formatting to enhance clarity.

Apply this diff to improve the data_version formatting:
 let data_version = format!(
-    "{}{}{:?}",
+    "{}_{}_{}",
     tx_seq_db.version.as_ref().unwrap_or(&"".to_string()),
-    tx_seq_db
-        .source_version
-        .iter()
-        .map(|v| format!("{:#?}", v))
-        .join(","),
+    tx_seq_db.source_version.join(","),
+    format!("{:?}", config)
 );
This change adds underscores as separators and formats source_version without debug syntax for better readability.

604-606: Simplify assembly formatting in error messages

Using "{:?}" in format!("{:?}", self.assembly()) includes debug information, which might not be necessary for user-facing error messages. Consider using format!("{}", self.assembly()) to present a cleaner message.

Apply this diff to adjust the formatting:
 Err(Error::NoAlignmentFound(
     tx_ac.to_string(),
-    format!("{:?}", self.assembly()),
+    format!("{}", self.assembly()),
 ))
This change uses the Display trait for Assembly, producing a more user-friendly error message.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between cce8a6d and 8ff02c9.

⛔ Files ignored due to path filters (6)

src/db/create/snapshots/mehari__db__create__test__run_smoke_brca1_opa1.snap is excluded by !**/*.snap
src/db/create/snapshots/mehari__db__create__test__run_smoke_mitochondrial.snap is excluded by !**/*.snap
src/db/create/snapshots/mehari__db__create__test__run_smoke_selenoproteins.snap is excluded by !**/*.snap
src/db/subset/snapshots/mehari__db__subset__tests__subset_tx_db.snap is excluded by !**/*.snap
tests/data/annotate/db/grch37/txs.bin.zst is excluded by !**/*.zst
tests/data/annotate/db/grch38/txs.bin.zst is excluded by !**/*.zst

📒 Files selected for processing (11)

protos/mehari/txs.proto (2 hunks)
src/annotate/seqvars/csq.rs (5 hunks)
src/annotate/seqvars/mod.rs (7 hunks)
src/annotate/seqvars/provider.rs (6 hunks)
src/db/create/mod.rs (13 hunks)
src/db/merge.rs (2 hunks)
src/db/mod.rs (1 hunks)
src/db/subset/mod.rs (1 hunks)
src/server/run/actix_server/versions.rs (2 hunks)
src/server/run/mod.rs (1 hunks)
src/verify/seqvars.rs (0 hunks)

💤 Files with no reviewable changes (1)

src/verify/seqvars.rs

🧰 Additional context used

🪛 GitHub Check: clippy

src/server/run/actix_server/versions.rs

[warning] 109-109: usage of bool::then in filter_map
warning: usage of bool::then in filter_map
--> src/server/run/actix_server/versions.rs:109:18
|
109 | .filter_map(|v| (v.source_name == source_name).then(|| v.source_version.clone()))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use filter then map instead: filter(|&v| (v.source_name == source_name)).map(|v| v.source_version.clone())
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#filter_map_bool_then
= note: #[warn(clippy::filter_map_bool_then)] on by default

🔇 Additional comments (22)

src/db/mod.rs (2)

3-5: LGTM: Imports are well-organized and necessary.

The new imports are properly scoped and support the new transcript database functionality.

17-31: ⚠️ Potential issue

Replace unwrap() and panic! with proper error handling.

The current implementation has several reliability concerns:

Multiple unwrap() calls could cause runtime panics
Explicit panic! for unsupported assemblies is not user-friendly
Assumes source_version is non-empty
Only uses the first source version when there could be multiple

Consider this safer implementation:

impl TranscriptDatabase for TxSeqDatabase {
    fn assembly(&self) -> Assembly {
        let assembly = self
            .source_version
            .first()
            .ok_or_else(|| anyhow::anyhow!("No source version found"))?
            .assembly
            .try_into()
            .map_err(|_| anyhow::anyhow!("Invalid assembly enum value"))?;

        match assembly {
            pbs::txs::Assembly::Grch37 => Ok(Assembly::Grch37p10),
            pbs::txs::Assembly::Grch38 => Ok(Assembly::Grch38),
            other => Err(anyhow::anyhow!("Unsupported assembly: {:?}", other)),
        }
    }
}

Let's check if there are other places in the codebase that might be affected by this change:

src/db/merge.rs (1)

6-6: LGTM: Good choice using itertools!

The itertools crate is a good choice for the all_equal() functionality needed in assembly verification.

src/server/run/mod.rs (2)

143-143: Document the configuration changes in SeqvarConsequencePredictor.

The removal of explicit assembly configuration in favor of Default::default() represents a significant architectural change. While the code change is sound, it would benefit from documentation updates.

Consider adding a code comment explaining:

How assembly information is now handled
What the default configuration includes
Any implications for existing integrations

+    // Assembly information is now handled through the provider's internal configuration
+    // Default configuration includes: ...
     SeqvarConsequencePredictor::new(provider.clone(), Default::default()),

#!/bin/bash
# Description: Verify the impact of removing assembly configuration

# Check for any tests that might need updates
rg -A 5 "SeqvarConsequencePredictor::new\(" --type rust

# Look for any documentation that might need updates
fd . -e md -e rst --exec rg -l "SeqvarConsequencePredictor"

138-138: Verify the default configuration for MehariProvider.

The removal of the assembly parameter and use of Default::default() needs verification to ensure it provides appropriate defaults for all use cases.

✅ Verification successful

Default configuration for MehariProvider is safe to use

Based on the codebase analysis, the change to use Default::default() is safe and maintains backward compatibility:

The code shows consistent usage of Default::default() in test cases and production code
MehariProviderConfigBuilder::default() is used throughout the codebase for configuration
When specific configuration is needed, the builder pattern is used to customize settings (e.g., transcript picking)
The default configuration is used successfully in test scenarios without any assembly parameter

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any custom configurations that might have been using the assembly parameter

# Look for any previous custom configurations of MehariProvider
rg -A 5 "MehariProvider::new\(" --type rust

# Check for any potential configuration structs that might need updates
ast-grep --pattern 'struct $name {
  $$$
  assembly: $_,
  $$$
}'

Length of output: 689

Script:

#!/bin/bash
# Let's check for MehariProvider usage and configuration patterns

# Look for MehariProvider configurations and usages
rg -A 5 "MehariProvider" --type rust

# Check for any configuration structs related to Mehari
ast-grep --pattern 'struct MehariConfig {
  $$$
}'

# Check for Default implementation
ast-grep --pattern 'impl Default for MehariConfig {
  $$$
}'

Length of output: 10936

src/server/run/actix_server/versions.rs (5)

1-9: LGTM! Well-organized imports.

The addition of itertools::Itertools is appropriate for the string joining operations used in version handling.

97-98: LGTM! Well-documented new field.

The addition of version_cdot aligns with the PR objective of including source information in the transcript database.

105-116: LGTM! Clean version handling implementation.

The version extraction and joining logic is well-structured and handles all cases appropriately.

🧰 Tools

🪛 GitHub Check: clippy

[warning] 109-109: usage of bool::then in filter_map
warning: usage of bool::then in filter_map
--> src/server/run/actix_server/versions.rs:109:18
|
109 | .filter_map(|v| (v.source_name == source_name).then(|| v.source_version.clone()))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use filter then map instead: filter(|&v| (v.source_name == source_name)).map(|v| v.source_version.clone())
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#filter_map_bool_then
= note: #[warn(clippy::filter_map_bool_then)] on by default

120-122: LGTM! Proper struct initialization.

The initialization correctly includes all fields in the same order as the struct definition.

105-116: Verify consistent version handling across the codebase.

Let's ensure that the version information is consistently handled across different components.

✅ Verification successful

Let me gather more specific information about the version handling implementation.

Based on the gathered information, I can now provide the final response:

Version handling implementation is consistent across the codebase

The version handling implementation in versions.rs aligns well with the rest of the codebase:

Version information is properly structured through dedicated types (SoftwareVersions, DataVersionEntry, VersionsInfoResponse)
The cdot version handling is consistently implemented across database operations and server responses
Version information is appropriately exposed through the /api/v1/version endpoint

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for consistent version handling patterns across the codebase

# Look for other version-related structures and their handling
echo "Checking for version-related structures:"
rg -l "version|Version" --type rust

# Check for potential places where cdot version might need to be handled
echo -e "\nChecking for cdot version handling:"
rg "cdot.*version|version.*cdot" --type rust

# Look for similar version joining patterns that might need updating
echo -e "\nChecking for version joining patterns:"
ast-grep --pattern 'join(",")'

Length of output: 2083

Script:

#!/bin/bash
# Check the implementation details of version handling

# Look for version-related structs and types
echo "Checking version-related structures:"
rg -A 5 "struct.*Version|type.*Version" --type rust

# Check for version handling in server responses
echo -e "\nChecking version handling in responses:"
rg -A 5 "version.*Response|Response.*version" --type rust

# Look for version-related database operations
echo -e "\nChecking version handling in database operations:"
ast-grep --pattern 'fn $_ ($$$) -> $_ {
  $$$
  version
  $$$
}'

Length of output: 2600

🧰 Tools

🪛 GitHub Check: clippy

[warning] 109-109: usage of bool::then in filter_map
warning: usage of bool::then in filter_map
--> src/server/run/actix_server/versions.rs:109:18
|
109 | .filter_map(|v| (v.source_name == source_name).then(|| v.source_version.clone()))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ help: use filter then map instead: filter(|&v| (v.source_name == source_name)).map(|v| v.source_version.clone())
|
= help: for further information visit https://rust-lang.github.io/rust-clippy/master/index.html#filter_map_bool_then
= note: #[warn(clippy::filter_map_bool_then)] on by default

protos/mehari/txs.proto (2)

20-28: LGTM: Assembly enum is well-defined and documented.

The enum follows proto3 best practices with UNKNOWN = 0 and includes comprehensive values for current needs.

194-195: LGTM: Well-documented field addition.

The repeated source_version field is appropriately documented, explaining its purpose in tracking information during database merges.

src/db/subset/mod.rs (1)

119-119: ⚠️ Potential issue

Breaking change: Field renamed from genome_release to source_version.

This change aligns with the PR objective to enhance source information handling in the transcript database. However, this is a breaking change that will require updates in dependent code.

Let's verify the impact and test coverage:

src/annotate/seqvars/csq.rs (3)

104-106: LGTM: Constructor refactoring improves code organization.

The removal of the explicit assembly parameter and using provider.assembly() instead centralizes the assembly management and reduces parameter coupling.

1049-1051: LGTM: Test setup correctly updated.

The test initialization has been properly updated to reflect the constructor changes.

Line range hint 1-1800: Overall code quality is excellent.

The code demonstrates:

Comprehensive documentation
Robust error handling
Extensive test coverage
Consistent implementation of changes

src/annotate/seqvars/mod.rs (3)

Line range hint 1365-1369: LGTM: Consistent with assembly handling changes.

The method correctly uses the updated fill_coords signature and maintains proper control flow with early return for non-canonical chromosomes.

Line range hint 1731-1744: LGTM: Well-structured configuration setup.

The method effectively uses the builder pattern to configure the annotator, properly handling all configuration options from the arguments.

Line range hint 647-657: LGTM: Assembly handling refactored correctly.

The assembly is now properly retrieved from the struct's state with a clear error message. The release string and chromosome naming logic correctly handles both GRCh37 and GRCh38 assemblies.

Let's verify the assembly handling across the codebase:

✅ Verification successful

Let me gather more context about the assembly usage to ensure the refactoring is consistent.

Assembly handling is correctly implemented and consistent across the codebase

The verification shows that:

Assembly is properly handled as an optional field in structs
The expect error handling in the reviewed code matches the pattern used elsewhere
Assembly state is accessed consistently through struct fields
The assembly parameter is only used in constructor/setup functions (new, set_assembly) where it's appropriate

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify assembly handling consistency
# Test: Check for any direct assembly parameter usage that should use struct state
rg -A 3 "fn.*assembly: Assembly"

Length of output: 1996

Script:

#!/bin/bash
# Check for assembly field declarations and their usage
ast-grep --pattern 'assembly: Option<Assembly>'

# Check for assembly.expect usage to ensure consistent error handling
rg -A 2 'assembly\.expect'

# Check for any direct assembly usage without expect
rg -A 2 'self\.assembly\s+[^=.]'

Length of output: 646

src/annotate/seqvars/provider.rs (3)

44-45: Simplified TxIntervalTrees::new by removing unnecessary assembly parameter

The assembly parameter has been removed from TxIntervalTrees::new, and the assembly is now obtained directly from db. This change simplifies the function signature and reduces redundancy.

52-53: Streamlined assembly retrieval in build_indices

In build_indices, the assembly is now retrieved using db.assembly(). This aligns with the changes in TxIntervalTrees::new and ensures that the assembly information is consistently sourced from the TxSeqDatabase.

164-165: Simplified Provider::new by removing assembly parameter

The assembly parameter has been removed from Provider::new. The assembly is now accessed through tx_seq_db, which simplifies the constructor and ensures that assembly information is consistently obtained from the database.

src/db/create/mod.rs

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

…varfish-org/mehari into 601-include-database-source-information

tedil added 8 commits November 8, 2024 12:20

add versioning information to txdatabase

9cc4eff

add mehari version info

855d0d3

expand upon version information comment

cc283d0

move sourceversion bit to the correct message

4789127

Merge remote-tracking branch 'origin/main' into 601-include-database-…

01ff4dd

…source-information

require additional info in mehari db create

46687d6

update db create test args

9165cc6

update test snapshots for db create

e48cd0f

tedil added 2 commits November 11, 2024 11:28

do not re-use (genome_release=)4, use 5 instead

33e7bb1

update txdbs

d3f0f33

holtgrewe previously approved these changes Nov 11, 2024

View reviewed changes

tedil added 2 commits November 11, 2024 12:02

address code review

652ae44

snapshot updates

772e371

tedil dismissed holtgrewe’s stale review via 772e371 November 11, 2024 11:03

tedil added 4 commits November 11, 2024 12:04

add missing comment

33437a3

determine assembly from tx_db where appropriate

9db6231

tidy

ab365f1

update versions endpoint with actual versions and cdot version

8ff02c9

tedil requested a review from holtgrewe November 11, 2024 11:47

tedil marked this pull request as ready for review November 11, 2024 11:47

clippy lint

f125f52

coderabbitai bot reviewed Nov 11, 2024

View reviewed changes

src/db/create/mod.rs Outdated Show resolved Hide resolved

tedil and others added 7 commits November 11, 2024 12:56

Update src/db/create/mod.rs

21088b2

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

some suggested small refactoring changes

88e6078

Merge branch '601-include-database-source-information' of github.com:…

b2e1a3d

…varfish-org/mehari into 601-include-database-source-information

update openapi.schema.yaml

d4f7eef

Release-As: 0.30.0

64d5220

merge origin/main

0d20ee3

cargo fmt

dbf7431

holtgrewe enabled auto-merge (squash) November 11, 2024 12:40

holtgrewe merged commit 111ffcc into main Nov 11, 2024
9 checks passed

holtgrewe deleted the 601-include-database-source-information branch November 11, 2024 12:44

varfish-bot mentioned this pull request Nov 11, 2024

chore(main): release 0.30.0 #603

Merged

coderabbitai bot mentioned this pull request Nov 20, 2024

chore: Update documentation to match current CLI/usage #595

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat!: include source information in transcript database #615

feat!: include source information in transcript database #615

tedil commented Nov 11, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 11, 2024 •

edited

Loading

Rate limit exceeded

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

holtgrewe left a comment

coderabbitai bot left a comment

feat!: include source information in transcript database #615

feat!: include source information in transcript database #615

Conversation

tedil commented Nov 11, 2024 • edited by coderabbitai bot Loading

Summary by CodeRabbit

Release Notes

coderabbitai bot commented Nov 11, 2024 • edited Loading

Rate limit exceeded

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

holtgrewe left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

tedil commented Nov 11, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 11, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)