Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit post release version/name splitting not splitting correctly #7155

Closed
notenti opened this issue Sep 7, 2024 · 22 comments · Fixed by #7208
Closed

Implicit post release version/name splitting not splitting correctly #7155

notenti opened this issue Sep 7, 2024 · 22 comments · Fixed by #7208
Assignees
Labels
bug Something isn't working compatibility Compatibility with a specification or another tool help wanted Contribution especially encouraged

Comments

@notenti
Copy link

notenti commented Sep 7, 2024

Summary

uv doesn't appear to support implicit post releases (e.g 1.2.3-1 does not work, whereas 1.2.3.post1 does). I poked around the source a little and found that when the version preceding .dist_info has an implicit post release specifier, the rstrip_once call in this block improperly splits the string, leading to the following error message:

❯ uv pip install aardvark-py==5.40
  × No solution found when resolving dependencies:
  ╰─▶ Because aardvark-py==5.40 has an invalid package format and you require aardvark-py==5.40, we can conclude that your requirements are unsatisfiable.

      hint: The structure of aardvark-py==5.40 was invalid:
        The .dist-info directory aardvark_py-5.40-1 does not start with the normalized package name: aardvark-py

Recreate

❯ uv venv --python-preference system --seed
Using Python 3.8.13 interpreter at: /Users/notenti/.pyenv/versions/3.8.13/Library/Frameworks/Python.framework/Versions/3.8/bin/python3
Creating virtualenv with seed packages at: .venv
 + pip==24.2
 + setuptools==74.1.2
 + wheel==0.44.0
Activate with: source .venv/bin/activate
❯ source .venv/bin/activate
❯ uv pip install aardvark-py==5.40
  × No solution found when resolving dependencies:
  ╰─▶ Because aardvark-py==5.40 has an invalid package format and you require aardvark-py==5.40, we can conclude that your requirements are unsatisfiable.

      hint: The structure of aardvark-py==5.40 was invalid:
        The .dist-info directory aardvark_py-5.40-1 does not start with the normalized package name: aardvark-py
❯ pip install aardvark-py==5.40
Looking in indexes: https://artifactory/simple
Collecting aardvark-py==5.40
  Using cached https://artifactory/aardvark_py-5.40-1-py2.py3-none-macosx_10_9_x86_64.whl (54 kB)
Installing collected packages: aardvark-py
Successfully installed aardvark-py-5.40

Environment

❯ uv version
uv 0.4.5 (42b6bfbad 2024-09-04)
❯ arch
i386
❯ uname
Darwin

I was hoping I could tackle this one myself, but after I saw that version.rs may need changes, I figure'd I'd leave it to the experts 😆

EDIT: Turns out that there aren't changes needed to version.rs --only metadata.rs.

@charliermarsh charliermarsh added the compatibility Compatibility with a specification or another tool label Sep 7, 2024
@charliermarsh
Copy link
Member

I believe we do support implicit post-releases, e.g.:

assert_eq!(p("5-2"), Version::new([5]).with_post(Some(2)));

So the issue in just in how we do the splitting there.

@charliermarsh charliermarsh added the bug Something isn't working label Sep 7, 2024
@charliermarsh
Copy link
Member

It should be not-too-hard to fix. We should instead try to strip the canonical name, and error if it doesn't match; then warn if the remaining segment doesn't match the version. (As opposed to splitting on -, then validating the two segments.) A bit closer to what pip does here:

https://github.com/pypa/pip/blob/111eed14b6e9fba7c78a5ec2b7594812d17b5d2b/src/pip/_internal/utils/wheel.py#L59

@charliermarsh charliermarsh added the help wanted Contribution especially encouraged label Sep 7, 2024
@charliermarsh
Copy link
Member

PR welcome if you're up for it :)

@notenti
Copy link
Author

notenti commented Sep 7, 2024

I believe we do support implicit post-releases, e.g.:

assert_eq!(p("5-2"), Version::new([5]).with_post(Some(2)));

So the issue in just in how we do the splitting there.

You're right, my bad! I'll update the title to better reflect the problem.

@notenti
Copy link
Author

notenti commented Sep 7, 2024

PR welcome if you're up for it :)

I'll give it the ole college try. Never written a lick of Rust before, so we'll see how this goes 👍

@notenti notenti changed the title Implicit post releases not supported Implicit post release version/name splitting not splitting correctly Sep 7, 2024
@charliermarsh
Copy link
Member

Great! Just let me know if you're not able or don't plan to get to it, so we can prioritize accordingly.

@notenti
Copy link
Author

notenti commented Sep 8, 2024

Great! Just let me know if you're not able or don't plan to get to it, so we can prioritize accordingly.

Sorry, should have followed up earlier.

I've hit a few snags with the implementation --I'll try to sum them up. The problem (as least as I've come to understand it) lies in uv's inability (right now) to determine what the "package name" is vs. what the "version" is, mostly due to how unconstrained the .dist-info name schema is. Because uv has elected to warn the user when the version doesn't match some form, we can't make too many assumptions about how it's structured. Additionally, because package names can take on a few different forms, it's challenging to know which - is the delimiter between "package" and "version", and which one is just part of the package name, or the could-be-malformed version name.

I attempted to mirror what the pip implementation does, but that impl relies on a generic canonicalize_name function that subs whichever chars it can and ignores the others. uv, on the other hand, has structs when more elegantly handle normalizing/canonicalizing names on a per-object-type basis, so there's not a great parallel to what pip's doing (that I could find --there certainly could be one somewhere 😃).

I got to:

    // Like `pip`, validate that the `.dist-info` directory is prefixed with the canonical
    // package name, but only warn if the version is not the normalized version.
    let canonical_dist_info_prefix = PackageName::from_str(&dist_info_prefix)?;
    let Some(version) = canonical_dist_info_prefix.as_str().strip_prefix(&filename.name.as_dist_info_name().to_string()) else {
        return Err(Error::MissingDistInfoPackageName(
            dist_info_prefix.to_string(),
            filename.name.to_string(),
        ));
    };


    // ...
    // ... use `version` and warn user if necessary

but I had issues with edge-case .dist-info names --like "version" with a + in it.

It seems that this could be solved with either:

  1. uv having a generic, best attempt normalize/canonicalize function that's public, so that the entire .dist-info directory name can be normalized, or
  2. Some smart way to know which - char is the delimiter between "package name" and "version", so that the dist_info_prefix value can be normalized with PackageName::from_str and compared to filename.name, or
  3. Some other smart way that the Astral team will think of 😆

Thanks for all that you do! Great products 🫡

@notenti
Copy link
Author

notenti commented Sep 8, 2024

FWIW, I didn't want to slap in a private normalize function within metadata.rs to accomplish my short-term goal as I figured that would be quite the blemish on the uv architecture 😆

@charliermarsh
Copy link
Member

No worries at all, thanks for following up! Lemme take a look...

@charliermarsh
Copy link
Member

Does this work?

diff --git a/crates/install-wheel-rs/src/metadata.rs b/crates/install-wheel-rs/src/metadata.rs
index 5129ad097..4308163e8 100644
--- a/crates/install-wheel-rs/src/metadata.rs
+++ b/crates/install-wheel-rs/src/metadata.rs
@@ -50,7 +50,8 @@ pub fn find_archive_dist_info<'a, T: Copy>(

     // Like `pip`, validate that the `.dist-info` directory is prefixed with the canonical
     // package name, but only warn if the version is not the normalized version.
-    let Some((name, version)) = dist_info_prefix.rsplit_once('-') else {
+    let normalized_prefix = PackageName::from_str(&dist_info_prefix)?.as_dist_info_name();
+    let Some((name, version)) = normalized_prefix.rsplit_once('-') else {
         return Err(Error::MissingDistInfoSegments(dist_info_prefix.to_string()));
     };
     if PackageName::from_str(name)? != filename.name {

@notenti
Copy link
Author

notenti commented Sep 8, 2024

I think the issues lies in the PackageName::from_str conversion. It seems like the package name schema is more restrictive than the .dist_info name schema, so converting all dist_info_prefix values to a PackageName may not be possible.

@notenti
Copy link
Author

notenti commented Sep 8, 2024

I mean, could totally toss in a

/// Canonicalize name according to PEP 503
fn canonicalize_name(name: &str) -> String {
    let re = Regex::new(r"[-_.]+").unwrap();
    re.replace_all(name, "-").to_lowercase()
}

and compare canonicalize_name(dist_info_prefix) and canonicalize_name(filename.name)...but that felt...not great haha

@charliermarsh
Copy link
Member

It's possible that we should just skip these checks when there are multiple -, since that in itself is a spec violation. Or we could try to split at each -, and as long as one package name matches, we're ok.

@charliermarsh
Copy link
Member

I think it's probably safe to pass the dist info name to PackageName though... In both cases they can't start or end with punctuation. So it might be fine to do the PackageName thing, then strip the normalized package name, then skip the next char (presumedly _), then validate the version?

@notenti
Copy link
Author

notenti commented Sep 8, 2024

I think it's probably safe to pass the dist info name to PackageName though... In both cases they can't start or end with punctuation.

If that's the case, great! I was running into issues with the normalization process because some versions can contain a +, e.g, this req

(I don't believe what I linked above is the actual test that was failing --can't find the exact one. Just meant as an example)

@charliermarsh
Copy link
Member

Ahh that's true, you're right. + is not allowed.

@charliermarsh
Copy link
Member

We may want to create a DistInfoName that's like PackageName, but with slightly different rules.

@notenti
Copy link
Author

notenti commented Sep 8, 2024

We may want to create a DistInfoName that's like PackageName, but with slightly different rules.

That's kinda where I arrived, but then I reminded myself that I'm not a rust dev and that biting that off would be a lot 😆

@charliermarsh
Copy link
Member

Haha no worries. I'll give it a try and can mark you as a co-author on it.

@charliermarsh
Copy link
Member

@notenti -- Are you able to test his branch against your package? #7208

@notenti
Copy link
Author

notenti commented Sep 9, 2024

As one would expect, right as I try to test it out I'm running into toolchain issues 😬. aardvark-py is what gave me issues, so executing:

uv pip install aardvark-py==5.40

should flex the behavior. I'll keep trying to fix up my build env

Thanks for such a quick turnaround!

@notenti
Copy link
Author

notenti commented Sep 9, 2024

Looks like it works!

❯ target/debug/uv venv test_venv
Using Python 3.12.1
Creating virtualenv at: test_venv
Activate with: source test_venv/bin/activate
❯ source test_venv/bin/activate
❯ target/debug/uv pip install aardvark-py==5.40
Resolved 1 package in 1.45s
Prepared 1 package in 48ms
Installed 1 package in 4ms
 + aardvark-py==5.40

:shipit:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working compatibility Compatibility with a specification or another tool help wanted Contribution especially encouraged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants