Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SeqKit to v2.9.0 #51860

Merged
merged 1 commit into from
Nov 3, 2024
Merged

Update SeqKit to v2.9.0 #51860

merged 1 commit into from
Nov 3, 2024

Conversation

shenwei356
Copy link
Contributor

Describe your pull request here

Changes

  • SeqKit v2.9.0 - 2024-11-01 Github Releases (by Release)
    • seqkit:
      • Fix sequence ID parsing with the default regular expression (in this case, we actually use bytes.Index instead) for a rare case: "xxx\tyyy zzz" was wrongly parsed as "xxx\tyyy". #486
    • seqkit locate:
      • Fix -G/--non-greedy for tandem repeats, e.g., ATTCGATTCGATTCG (ATTCGx3).
    • seqkit grep/subseq:
      • Fix negative regions longer than sequence length. #479.
    • seqkit stats:
      • Add an extra column sum_n to count the number of ambiguous characters. #490

Please read the guidelines for Bioconda recipes before opening a pull request (PR).

General instructions

  • If this PR adds or updates a recipe, use "Add" or "Update" appropriately as the first word in its title.
  • New recipes not directly relevant to the biological sciences need to be submitted to the conda-forge channel instead of Bioconda.
  • PRs require reviews prior to being merged. Once your PR is passing tests and ready to be merged, please issue the @BiocondaBot please add label command.
  • Please post questions on Gitter or ping @bioconda/core in a comment.

Instructions for avoiding API, ABI, and CLI breakage issues

Conda is able to record and lock (a.k.a. pin) dependency versions used at build time of other recipes.
This way, one can avoid that expectations of a downstream recipe with regards to API, ABI, or CLI are violated by later changes in the recipe.
If not already present in the meta.yaml, make sure to specify run_exports (see here for the rationale and comprehensive explanation).
Add a run_exports section like this:

build:
  run_exports:
    - ...

with ... being one of:

Case run_exports statement
semantic versioning {{ pin_subpackage("myrecipe", max_pin="x") }}
semantic versioning (0.x.x) {{ pin_subpackage("myrecipe", max_pin="x.x") }}
known breakage in minor versions {{ pin_subpackage("myrecipe", max_pin="x.x") }} (in such a case, please add a note that shortly mentions your evidence for that)
known breakage in patch versions {{ pin_subpackage("myrecipe", max_pin="x.x.x") }} (in such a case, please add a note that shortly mentions your evidence for that)
calendar versioning {{ pin_subpackage("myrecipe", max_pin=None) }}

while replacing "myrecipe" with either name if a name|lower variable is defined in your recipe or with the lowercase name of the package in quotes.

Bot commands for PR management

Please use the following BiocondaBot commands:

Everyone has access to the following BiocondaBot commands, which can be given in a comment:

@BiocondaBot please update Merge the master branch into a PR.
@BiocondaBot please add label Add the please review & merge label.
@BiocondaBot please fetch artifacts Post links to CI-built packages/containers.
You can use this to test packages locally.

Note that the @BiocondaBot please merge command is now depreciated. Please just squash and merge instead.

Also, the bot watches for comments from non-members that include @bioconda/<team> and will automatically re-post them to notify the addressed <team>.

Copy link
Contributor

coderabbitai bot commented Nov 1, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

The pull request involves modifications to the meta.yaml file for the seqkit package. The version number has been updated from "2.8.2" to "2.9.0". Along with the version change, the MD5 checksums for the source URLs corresponding to various platforms, including macOS (amd64 and arm64) and Linux (amd64 and arm64), have been revised to reflect the new version. The MD5 checksum for macOS amd64 has changed from 03b13956b7e3ef1678db591a79eb68ed to 50fff73601f3094d664031e5c00674f5, for macOS arm64 from 0b2716f39c3974a5c3e310bcbd4076eb to cf1b7c4b74664f04d3cc361e9c5370a1, for Linux amd64 from 67220b508f3f81c2c8697e6534eed440 to 83e359c186dee6b49cd4df362d0b6d5a, and for Linux arm64 from b83b021850c9447cfc8e0d5751b7be99 to 7a81adcd79553e10552456060c2b17ba. Additionally, the build number has been modified from 1 to 0. The overall structure of the meta.yaml file, including the package name, source URLs, and other metadata, remains unchanged.

Possibly related PRs

  • Update meta.yaml for spec2vec #49857: Updates the meta.yaml file for the spec2vec package with a version bump and checksum changes.
  • Update qtlseq to 2.2.5 #51172: Updates the meta.yaml file for the qtlseq package, reflecting a version increment and checksum modifications.
  • Update qtlseq to 2.2.8 #51471: This PR updates the qtlseq package's version and SHA256 checksum, consistent with the changes made in the seqkit package's meta.yaml.
  • Update iseq to 1.2.0 #51583: The iseq package's meta.yaml reflects a version update and checksum change, similar to the modifications in the seqkit package.
  • Update iSeq #51584: This PR also updates the iseq package's version and checksum, aligning with the changes made in the seqkit package's meta.yaml.

Suggested labels

new version, please review & merge


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 7740410 and f2ea067.

📒 Files selected for processing (1)
  • recipes/seqkit/meta.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/seqkit/meta.yaml

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)

🔇 Additional comments (3)
recipes/seqkit/meta.yaml (3)

1-1: Version update aligns with latest release.

The version update to 2.9.0 matches the latest official release from November 1, 2024, which includes important fixes for sequence ID parsing and improvements to various SeqKit commands.

🧰 Tools
🪛 yamllint

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)


21-21: Build number reset is appropriate.

Resetting the build number to 0 is correct when updating to a new version.


9-9: ⚠️ Potential issue

MD5 checksums do not match the official release artifacts.

Based on the previous verification results, the MD5 checksums in the recipe do not match the official release artifacts:

  • darwin_arm64: expected c6ddaa62d73280b645215143087dd1db, got cf1b7c4b74664f04d3cc361e9c5370a1
  • linux_amd64: expected 5b5c5ba2796d45f42b2e61fde4d79821, got 83e359c186dee6b49cd4df362d0b6d5a
  • linux_arm64: expected 5cbca93de57c11d1f70d1e713a44be5a, got 7a81adcd79553e10552456060c2b17ba

Let's verify the current checksums against the official release:

#!/bin/bash
# Description: Verify MD5 checksums match the official release artifacts
# Expected: MD5 checksums should match the values in meta.yaml

VERSION="2.9.0"
TEMP_DIR=$(mktemp -d)
cd "$TEMP_DIR" || exit 1

echo "Verifying checksums for SeqKit v${VERSION}..."

# Download and verify each platform artifact
for platform in darwin_arm64 linux_amd64 linux_arm64; do
  echo -e "\nPlatform: ${platform}"
  curl -sL "https://github.com/shenwei356/seqkit/releases/download/v${VERSION}/seqkit_${platform}.tar.gz" -o "${platform}.tar.gz"
  md5sum "${platform}.tar.gz"
done

cd - || exit 1
rm -rf "$TEMP_DIR"

Also applies to: 12-12, 15-15, 18-18


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
recipes/seqkit/meta.yaml (1)

1-24: Consider testing backward compatibility.

SeqKit v2.9.0 includes several important fixes that affect sequence parsing and handling:

  • Modified sequence ID parsing behavior
  • Changes to -G/--non-greedy option in locate command
  • Updates to negative region handling in grep/subseq
  • New sum_n column in stats output

These changes might affect existing pipelines that depend on the specific behavior of these features.

Consider:

  1. Testing the package with existing bioinformatics pipelines
  2. Documenting any behavior changes in the package metadata
  3. Adding example commands in the test section for the modified features
🧰 Tools
🪛 yamllint

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 4e7548c and 7740410.

📒 Files selected for processing (1)
  • recipes/seqkit/meta.yaml (1 hunks)
🧰 Additional context used
🪛 yamllint
recipes/seqkit/meta.yaml

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)

🔇 Additional comments (2)
recipes/seqkit/meta.yaml (2)

1-1: LGTM: Version update matches the release.

The version bump to 2.9.0 correctly reflects the latest SeqKit release from November 1, 2024.

🧰 Tools
🪛 yamllint

[error] 1-1: syntax error: found character '%' that cannot start any token

(syntax)


21-21: LGTM: Build number reset is correct.

Resetting the build number to 0 is appropriate when updating to a new version.

Comment on lines 9 to 18
md5: c83d82188fbd83da765b5ae9939f5079 # [osx]

url: https://github.com/shenwei356/seqkit/releases/download/v{{ version }}/seqkit_darwin_arm64.tar.gz # [arm64]
md5: 0b2716f39c3974a5c3e310bcbd4076eb # [arm64]
md5: c6ddaa62d73280b645215143087dd1db # [arm64]

url: https://github.com/shenwei356/seqkit/releases/download/v{{ version }}/seqkit_linux_amd64.tar.gz # [linux]
md5: 67220b508f3f81c2c8697e6534eed440 # [linux]
md5: 5b5c5ba2796d45f42b2e61fde4d79821 # [linux]

url: https://github.com/shenwei356/seqkit/releases/download/v{{ version }}/seqkit_linux_arm64.tar.gz # [aarch64]
md5: b83b021850c9447cfc8e0d5751b7be99 # [aarch64]
md5: 5cbca93de57c11d1f70d1e713a44be5a # [aarch64]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

MD5 checksums do not match the official release artifacts

The verification shows mismatches between the expected and actual MD5 checksums for all platforms:

  • darwin_arm64: expected c6ddaa62d73280b645215143087dd1db, got cf1b7c4b74664f04d3cc361e9c5370a1
  • linux_amd64: expected 5b5c5ba2796d45f42b2e61fde4d79821, got 83e359c186dee6b49cd4df362d0b6d5a
  • linux_arm64: expected 5cbca93de57c11d1f70d1e713a44be5a, got 7a81adcd79553e10552456060c2b17ba

Please update the MD5 checksums in meta.yaml to match the current release artifacts.

🔗 Analysis chain

Verify checksums against official release artifacts.

Let's verify the MD5 checksums match the official SeqKit v2.9.0 release artifacts.


Let me try a different approach using curl instead of wget since it seems to be available in the sandbox environment.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify MD5 checksums match the official release artifacts
# Expected: MD5 checksums should match the values in meta.yaml

# Create a temporary directory for downloads
TEMP_DIR=$(mktemp -d)
cd "$TEMP_DIR" || exit 1

# Download and verify each platform artifact
VERSION="2.9.0"
PLATFORMS=("darwin_amd64" "darwin_arm64" "linux_amd64" "linux_arm64")

for platform in "${PLATFORMS[@]}"; do
  echo "Verifying $platform..."
  wget -q "https://github.com/shenwei356/seqkit/releases/download/v${VERSION}/seqkit_${platform}.tar.gz"
  md5sum "seqkit_${platform}.tar.gz"
done

cd - || exit 1
rm -rf "$TEMP_DIR"

Length of output: 1669


Script:

#!/bin/bash
# Description: Verify MD5 checksums match the official release artifacts using curl
# Expected: MD5 checksums should match the values in meta.yaml

# Create a temporary directory for downloads
TEMP_DIR=$(mktemp -d)
cd "$TEMP_DIR" || exit 1

# Download and verify each platform artifact
VERSION="2.9.0"

echo "Verifying darwin_arm64..."
curl -sL "https://github.com/shenwei356/seqkit/releases/download/v${VERSION}/seqkit_darwin_arm64.tar.gz" -o darwin_arm64.tar.gz
echo "c6ddaa62d73280b645215143087dd1db  darwin_arm64.tar.gz (expected)"
md5sum darwin_arm64.tar.gz

echo -e "\nVerifying linux_amd64..."
curl -sL "https://github.com/shenwei356/seqkit/releases/download/v${VERSION}/seqkit_linux_amd64.tar.gz" -o linux_amd64.tar.gz
echo "5b5c5ba2796d45f42b2e61fde4d79821  linux_amd64.tar.gz (expected)"
md5sum linux_amd64.tar.gz

echo -e "\nVerifying linux_arm64..."
curl -sL "https://github.com/shenwei356/seqkit/releases/download/v${VERSION}/seqkit_linux_arm64.tar.gz" -o linux_arm64.tar.gz
echo "5cbca93de57c11d1f70d1e713a44be5a  linux_arm64.tar.gz (expected)"
md5sum linux_arm64.tar.gz

cd - || exit 1
rm -rf "$TEMP_DIR"

Length of output: 1353

@shenwei356
Copy link
Contributor Author

No useful information in the build and test (ARM) log. 😢

@bgruening
Copy link
Member

18:43:51 BIOCONDA INFO (OUT) RuntimeError: MD5 mismatch: '83e359c186dee6b49cd4df362d0b6d5a' != '5b5c5ba2796d45f42b2e61fde4d79821'

@shenwei356
Copy link
Contributor Author

my bad, thank you @bgruening :)

@martin-g martin-g merged commit bc60553 into bioconda:master Nov 3, 2024
6 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Nov 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants