-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dbghaplo #51035
Add dbghaplo #51035
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThis pull request introduces a new package Changes
Possibly related PRs
Suggested labels
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (5)
recipes/dbghaplo/build.sh (2)
3-5
: LGTM: Good workaround for Cargo issues.The environment variable setup addresses potential issues with SSH-based Git connections and sets a custom CARGO_HOME. This is a good practice for conda-build environments.
Consider splitting the variable assignments for better readability:
-export CARGO_NET_GIT_FETCH_WITH_CLI=true CARGO_HOME="$(pwd)/.cargo" +export CARGO_NET_GIT_FETCH_WITH_CLI=true +export CARGO_HOME="$(pwd)/.cargo"This change addresses a minor Shellcheck warning and improves readability without affecting functionality.
🧰 Tools
🪛 Shellcheck
[warning] 5-5: Declare and assign separately to avoid masking return values.
(SC2155)
9-9
: LGTM: Copying additional scripts.Copying scripts from the 'scripts' directory to $PREFIX/bin ensures that any auxiliary scripts are properly installed with the main tool.
Consider adding a check to ensure the 'scripts' directory exists and contains files:
-cp scripts/* $PREFIX/bin +if [ -d "scripts" ] && [ "$(ls -A scripts)" ]; then + cp scripts/* $PREFIX/bin +else + echo "Warning: 'scripts' directory is empty or does not exist. No additional scripts were copied." +fiThis change will prevent potential errors if the 'scripts' directory is missing or empty, and provide a helpful warning message.
recipes/dbghaplo/meta.yaml (3)
7-10
: Suggestion: Improve URL format using Jinja2 variables.While the current URL is functional, it's recommended to use the
{{ name }}
and{{ version }}
variables in the URL for better maintainability. This allows for easier updates and reduces the chance of errors when the package is updated.Consider updating the URL as follows:
source: url: https://github.com/bluenote-1577/{{ name }}/archive/v{{ version }}.tar.gz sha256: e7e2741afb0c7f12718ec969815d3c8f18ce7ba66517e21c6fcfb3fe4262780b
31-39
: LGTM with suggestion: Add Python version check to test commands.The test commands are comprehensive, covering both the main tool (dbghaplo) and its dependencies. This is excellent for ensuring the package is correctly installed and functional.
Consider adding a Python version check to ensure the correct version is available:
test: commands: - python --version - dbghaplo -h - samtools --help - minimap2 --help - lofreq --help - tabix --help - run_dbghaplo_pipeline -h - haplotag_bam -h
41-44
: LGTM with suggestion: Add license file.The about section provides clear and concise information about the package, including the home URL, license, and a descriptive summary. This is excellent for users looking to understand the purpose and terms of use for the package.
Consider adding a
license_file
entry to specify the location of the license file in the source code. This is typically required by Bioconda. For example:about: home: https://github.com/bluenote-1577/dbghaplo license: MIT license_file: LICENSE summary: Haplotyping small sequences from heterogeneous long-read sequencing samples with a SNP-encoded positional de Bruijn Graph.Please ensure that the LICENSE file exists in the source repository and adjust the path if necessary.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- recipes/dbghaplo/build.sh (1 hunks)
- recipes/dbghaplo/meta.yaml (1 hunks)
🧰 Additional context used
🪛 Shellcheck
recipes/dbghaplo/build.sh
[warning] 5-5: Declare and assign separately to avoid masking return values.
(SC2155)
🪛 yamllint
recipes/dbghaplo/meta.yaml
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
🔇 Additional comments (5)
recipes/dbghaplo/build.sh (2)
1-2
: LGTM: Appropriate shebang and shell options.The shebang and shell options are well-chosen for a robust build script. The
-euo
options help catch errors early and prevent silent failures.
7-8
: LGTM: Appropriate Cargo installation command.The Cargo installation command is well-constructed:
- Enables full backtrace for debugging
- Uses verbose output for better build logs
- Builds from the current directory
- Installs to the correct location in the conda environment
This setup should work well for building the dbghaplo tool.
recipes/dbghaplo/meta.yaml (3)
1-5
: LGTM: Package and version section looks good.The use of a Jinja2 variable for the version is a good practice for maintainability. The package name correctly matches the GitHub repository name.
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
46-48
: LGTM: Extra section is correct.The extra section correctly specifies the recipe maintainer, which matches the GitHub username of the PR submitter. This is good practice for Bioconda recipes and helps with future maintenance and communication.
1-1
: Note: Ignore yamllint error for Jinja2 syntax.The yamllint tool reports a syntax error for the '%' character at the beginning of the file. This is a false positive as the '%' is part of a valid Jinja2 template syntax commonly used in Conda recipes. You can safely ignore this warning.
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
build: | ||
number: 0 | ||
run_exports: | ||
- {{ pin_subpackage('dbghaplo', max_pin="x.x") }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correction needed: Adjust pin_subpackage syntax.
The use of run_exports
with pin_subpackage
is good for ABI compatibility. However, the syntax for pin_subpackage
is incorrect.
Please update the run_exports
section as follows:
build:
number: 0
run_exports:
- {{ pin_subpackage('dbghaplo', max_pin='x.x') }}
Note the single quotes around 'x.x'
and the use of an equals sign instead of a colon.
recipes/dbghaplo/meta.yaml
Outdated
requirements: | ||
build: | ||
- {{ compiler("cxx") }} | ||
- rust >= 1.70 | ||
- make | ||
- cmake >= 3.12 | ||
run: | ||
- python | ||
- samtools | ||
- minimap2 | ||
- lofreq >= 2.1.5 | ||
- tabix | ||
- pysam >= 0.16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Suggestion: Specify Python version and verify dependencies.
The build and run requirements look generally good, but there are a few points to consider:
- It's recommended to specify the Python version to ensure compatibility. Consider adding a version constraint, e.g.,
python >=3.6
. - Verify if specific versions are required for samtools, minimap2, and tabix.
- Consider adding
{{ compiler('c') }}
to the build requirements if there are any C dependencies.
Here's a suggested update to the requirements section:
requirements:
build:
- {{ compiler('cxx') }}
- {{ compiler('c') }}
- rust >=1.70
- make
- cmake >=3.12
host:
- python >=3.6
run:
- python >=3.6
- samtools
- minimap2
- lofreq >=2.1.5
- tabix
- pysam >=0.16
Please verify the minimum Python version required for your package and adjust accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- recipes/dbghaplo/meta.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/dbghaplo/meta.yaml
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
🔇 Additional comments (6)
recipes/dbghaplo/meta.yaml (6)
1-5
: LGTM: Package and version section looks good.The package name and version are correctly defined. The use of Jinja2 templating for the version is a good practice for maintainability.
Note: The yamllint error about the '%' character is a false positive. This is valid Jinja2 syntax in YAML and is commonly used in Bioconda recipes.
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
7-10
: LGTM: Source section is well-defined.The source URL correctly points to the GitHub release and uses the version variable. The inclusion of a SHA256 checksum is good practice for ensuring package integrity.
12-15
: LGTM: Build section is correct and follows best practices.The build number is appropriately set to 0 for a new package. The use of
run_exports
withpin_subpackage
is good for ABI compatibility. The syntax forpin_subpackage
has been corrected as per the previous review comment, which is great to see.
31-39
: LGTM: Comprehensive test commands.The test section includes commands to verify the functionality of both the main package (dbghaplo) and its dependencies. This is excellent practice to ensure the package is installed correctly and all required tools are accessible.
41-44
: LGTM: About section is well-defined.The about section provides all necessary information: the home URL, license (MIT), and a concise summary of the package's purpose. This is excellent for users who want to quickly understand what the package does and where to find more information.
46-48
: LGTM: Recipe maintainer is specified.The extra section correctly specifies the recipe maintainer. This is important for future communications and updates to the package.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- recipes/dbghaplo/meta.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/dbghaplo/meta.yaml
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
🔇 Additional comments (6)
recipes/dbghaplo/meta.yaml (6)
1-5
: LGTM: Package and version are correctly defined.The use of Jinja2 templating for the version is a good practice, allowing for easier updates in the future.
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
7-10
: LGTM: Source URL and checksum are correctly defined.The source URL is properly formatted using the version variable, and the SHA256 checksum is provided for integrity verification.
12-15
: LGTM: Build section is correctly configured.The build number is set to 0, which is appropriate for a new package. The use of
run_exports
withpin_subpackage
is good for ABI compatibility, and the syntax has been corrected as per the previous review comment.
31-39
: LGTM: Comprehensive test commands.The test section includes commands to verify the functionality of dbghaplo and its dependencies. This ensures that the package and all required tools are correctly installed and accessible.
41-44
: LGTM: About section is complete and informative.The about section provides all necessary information, including the home URL, license, and a clear summary of the package's purpose.
46-48
: LGTM: Maintainer information is provided.The extra section correctly specifies the recipe maintainer.
recipes/dbghaplo/meta.yaml
Outdated
requirements: | ||
build: | ||
- {{ compiler("cxx") }} | ||
- rust >=1.70 | ||
- make | ||
- cmake >=3.12 | ||
run: | ||
- python | ||
- samtools | ||
- minimap2 | ||
- lofreq >=2.1.5 | ||
- tabix | ||
- pysam >=0.16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider specifying Python version and adding version constraints for dependencies.
The requirements look good overall, but there are a few improvements to consider:
- It's recommended to specify the Python version to ensure compatibility. Consider adding a version constraint, e.g.,
python >=3.6
. - Add version constraints for samtools and minimap2 if specific versions are required.
Here's a suggested update to the run requirements:
run:
- python >=3.6
- samtools >=1.9
- minimap2 >=2.17
- lofreq >=2.1.5
- tabix
- pysam >=0.16
Please adjust the Python version and other tool versions according to your package's specific requirements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hey @bluenote-1577 thanks for the PR (and the great tool - looking forward to the preprint)
Please have a look at https://github.com/bioconda/bioconda-recipes/tree/master/recipes/seq-hasher which is another recent Rust package for the consistent style I am trying to encourage on bioconda (as it is what is used on conda-forge).
I'll leave some suggestions to point you in the right direction
Co-authored-by: Michael Hall <[email protected]>
Co-authored-by: Michael Hall <[email protected]>
Co-authored-by: Michael Hall <[email protected]>
Co-authored-by: Michael Hall <[email protected]>
Co-authored-by: Michael Hall <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Outside diff range and nitpick comments (2)
recipes/dbghaplo/build.sh (1)
10-10
: Consider adding error handling for script copyingWhile copying scripts from the 'scripts' directory to $PREFIX/bin is a good practice, it's worth adding some error handling to prevent issues if the directory is empty or doesn't exist.
Consider updating the line as follows:
-cp scripts/* $PREFIX/bin +if [ -d scripts ] && [ "$(ls -A scripts)" ]; then + cp scripts/* $PREFIX/bin +else + echo "Warning: 'scripts' directory is empty or does not exist. No additional scripts were copied." +fiThis change will prevent errors if the 'scripts' directory is empty or doesn't exist, and provide a helpful warning message.
recipes/dbghaplo/meta.yaml (1)
37-43
: LGTM: About section is comprehensive.The about section provides all necessary information including the home URL, license, license files, and a concise summary of the package's purpose.
Minor suggestion:
Remove the trailing space at the end of line 42 to adhere to YAML best practices.🧰 Tools
🪛 yamllint
[error] 42-42: trailing spaces
(trailing-spaces)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- recipes/dbghaplo/build.sh (1 hunks)
- recipes/dbghaplo/meta.yaml (1 hunks)
🧰 Additional context used
🪛 Shellcheck
recipes/dbghaplo/build.sh
[warning] 5-5: Declare and assign separately to avoid masking return values.
(SC2155)
🪛 yamllint
recipes/dbghaplo/meta.yaml
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
[error] 24-24: trailing spaces
(trailing-spaces)
[error] 42-42: trailing spaces
(trailing-spaces)
🔇 Additional comments (8)
recipes/dbghaplo/build.sh (3)
1-2
: LGTM: Proper shebang and script setupThe shebang is correctly set to use bash with error handling options (
-euo
), which is a good practice for robust shell scripting.
3-5
: LGTM: Proper environment setup for Rust/cargoThe environment variables are correctly set to address known issues with SSH-based Git connections in Rust/cargo. The custom CARGO_HOME is necessary due to conda-build's behavior.
Regarding the static analysis hint (SC2155), in this case, we can safely ignore it as we're not capturing any return values from these assignments.
🧰 Tools
🪛 Shellcheck
[warning] 5-5: Declare and assign separately to avoid masking return values.
(SC2155)
7-9
: LGTM: Proper build and installation processThe build process correctly generates a YAML file of third-party licenses and installs the binary using cargo. The current implementation is an improvement over the past suggestion:
- It uses
--verbose
for more detailed output, which can be helpful for debugging.- It includes
--no-track
, which prevents modifying the Cargo.toml file.The use of
--locked
ensures reproducible builds, which is crucial for package management.recipes/dbghaplo/meta.yaml (5)
1-9
: LGTM: Package and source information looks good.The package version, source URL, and SHA256 checksum are correctly specified. The use of Jinja2 templating for the version is a good practice for maintainability.
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
11-14
: LGTM: Build section is correctly configured.The build number is appropriately set to 0 for a new package. The
run_exports
section withpin_subpackage
is correctly implemented, which is good for ABI compatibility.
31-35
: LGTM: Test commands are appropriate.The test commands cover the main functionalities of the package (dbghaplo, run_dbghaplo_pipeline, and haplotag_bam). This ensures that the core components of the package are working as expected after installation.
45-47
: LGTM: Extra section is correctly specified.The recipe maintainer is properly identified in the extra section.
1-47
: Overall, the recipe looks good and is ready for submission.The meta.yaml file for dbghaplo is well-structured and follows Bioconda guidelines. It includes all necessary sections with appropriate information. The suggested improvements (specifying Python version, adding version constraints for some dependencies, and removing a trailing space) are minor and can be addressed in a future update if desired.
Great job on creating this Bioconda recipe for dbghaplo!
🧰 Tools
🪛 yamllint
[error] 1-1: syntax error: found character '%' that cannot start any token
(syntax)
[error] 24-24: trailing spaces
(trailing-spaces)
[error] 42-42: trailing spaces
(trailing-spaces)
requirements: | ||
build: | ||
- {{ compiler("cxx") }} | ||
- {{ compiler('rust') }} | ||
- cargo-bundle-licenses | ||
- make | ||
- cmake >=3.12 | ||
run: | ||
- python | ||
- samtools | ||
- minimap2 | ||
- lofreq >=2.1.5 | ||
- tabix | ||
- pysam >=0.16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider specifying Python version and adding version constraints for dependencies.
The requirements section looks good overall, but there are a few improvements to consider:
- It's recommended to specify the Python version to ensure compatibility. Consider adding a version constraint, e.g.,
python >=3.6
. - Add version constraints for samtools and minimap2 if specific versions are required.
Here's a suggested update to the run requirements:
run:
- python >=3.6
- samtools >=1.9
- minimap2 >=2.17
- lofreq >=2.1.5
- tabix
- pysam >=0.16
Please adjust the Python version and other tool versions according to your package's specific requirements.
🧰 Tools
🪛 yamllint
[error] 24-24: trailing spaces
(trailing-spaces)
Adding dbghaplo, a new tool for long-read haplotyping of highly heterogeneous small sequences.
Please read the guidelines for Bioconda recipes before opening a pull request (PR).
General instructions
@BiocondaBot please add label
command.@bioconda/core
in a comment.Instructions for avoiding API, ABI, and CLI breakage issues
Conda is able to record and lock (a.k.a. pin) dependency versions used at build time of other recipes.
This way, one can avoid that expectations of a downstream recipe with regards to API, ABI, or CLI are violated by later changes in the recipe.
If not already present in the meta.yaml, make sure to specify
run_exports
(see here for the rationale and comprehensive explanation).Add a
run_exports
section like this:with
...
being one of:{{ pin_subpackage("myrecipe", max_pin="x") }}
{{ pin_subpackage("myrecipe", max_pin="x.x") }}
{{ pin_subpackage("myrecipe", max_pin="x.x") }}
(in such a case, please add a note that shortly mentions your evidence for that){{ pin_subpackage("myrecipe", max_pin="x.x.x") }}
(in such a case, please add a note that shortly mentions your evidence for that){{ pin_subpackage("myrecipe", max_pin=None) }}
while replacing
"myrecipe"
with eithername
if aname|lower
variable is defined in your recipe or with the lowercase name of the package in quotes.Bot commands for PR management
Please use the following BiocondaBot commands:
Everyone has access to the following BiocondaBot commands, which can be given in a comment:
@BiocondaBot please update
@BiocondaBot please add label
please review & merge
label.@BiocondaBot please fetch artifacts
You can use this to test packages locally.
Note that the
@BiocondaBot please merge
command is now depreciated. Please just squash and merge instead.Also, the bot watches for comments from non-members that include
@bioconda/<team>
and will automatically re-post them to notify the addressed<team>
.Summary by CodeRabbit
New Features
dbghaplo
package (version 0.0.2) for haplotyping small sequences from long-read sequencing samples.Documentation
meta.yaml
file.Requirements
Tests