Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: check duplicate dependencies #717

Merged
merged 13 commits into from
Feb 21, 2024

Conversation

orhun
Copy link
Contributor

@orhun orhun commented Jan 29, 2024

fixes #715

e.g. if you add:

[dependencies]
python = "3.11.*"
flask = "2.*"
Flask = "2.*"

pixi run fails as:

  × failed to parse pixi.toml from /home/orhun/gh/pixi/examples/flask-hello-world
  ╰─▶ duplicate dependency: Flask

@orhun orhun force-pushed the feat/check_duplicate_dependencies branch 2 times, most recently from 924d7b0 to de5fe9d Compare January 29, 2024 13:54
@orhun orhun force-pushed the feat/check_duplicate_dependencies branch from de5fe9d to 3361b76 Compare January 29, 2024 13:57
src/project/manifest/feature.rs Outdated Show resolved Hide resolved
src/project/manifest/feature.rs Outdated Show resolved Hide resolved
src/project/manifest/mod.rs Outdated Show resolved Hide resolved
Copy link
Contributor

@baszalmstra baszalmstra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im wondering if it would make more sense to add a custom deserializer implementation for this (using serde_with) instead. Because that would allow you to error out while deserializing the fields which provides in context error messages.

@orhun
Copy link
Contributor Author

orhun commented Jan 29, 2024

I'm not sure if it is possible to collect different fields (run, build, host dependencies) and check for duplicates in them 🤔

@ruben-arts
Copy link
Contributor

Could just throw an error if capitalization is used right?

@baszalmstra
Copy link
Contributor

baszalmstra commented Jan 29, 2024

I'm not sure if it is possible to collect different fields (run, build, host dependencies) and check for duplicates in them 🤔

In that case I think its fine, because these can overwrite each other. This is OK (although weird):

[dependencies]
foobar = 1

[build-dependencies]
foobar = 2

This is not OK:

[dependencies]
foobar = 1
Foobar = 2

@baszalmstra
Copy link
Contributor

Could just throw an error if capitalization is used right?

Thats an alternative approach, to only allow normalized names in the pixi.toml.

@orhun
Copy link
Contributor Author

orhun commented Jan 29, 2024

I added a function for deserializing dependencies in a1df903

I can add tests if you think it looks good.

@ruben-arts
Copy link
Contributor

I think it is best to removed to chaining of the different dependency fields. As I just understood bas' comment. We broke the functionality with this changed.

My plan was to error on:

[dependencies]
foobar = "*"

[build-dependencies]
FooBar = "*"

But we now also error, but shouldn't on:

[dependencies]
foobar = "*"

[build-dependencies]
foobar = "*"

The plan to avoid any capitalization is still a thought but might be breaking. Lets throw it in the community.

@orhun
Copy link
Contributor Author

orhun commented Jan 30, 2024

I see, so just checking the uppercase for each dependency section is enough?

@baszalmstra
Copy link
Contributor

No its not. Especially Pypi dependencies have more rules than just lowercasing (dashes, dots and underscores).

Would this work?

#[serde(default)]
#[serde_as(as = "IndexMap<DisplayFromStr, PickFirst<(DisplayFromStr, _)>>")]
dependencies: IndexMap<NormalizedPackageName, NamelessMatchSpec>,

@orhun
Copy link
Contributor Author

orhun commented Jan 30, 2024

Not really.

NormalizedPackageName is deserialized from the FromStr implementation (via DisplayFromStr):

impl FromStr for NormalizedPackageName {
    type Err = ParsePackageNameError;

    fn from_str(s: &str) -> Result<Self, Self::Err> {
        Ok(PackageName::from_str(s)?.into())
    }
}

And it is basically the same thing as PackageName.

@baszalmstra
Copy link
Contributor

A maybe thats something we can change! We could implement a custom DeserializeAs (StrictNormalizedName) or whatever, that requires the normalized name to be the same as the non-normalized name?

@orhun
Copy link
Contributor Author

orhun commented Jan 31, 2024

Thanks for the tip @baszalmstra!

I went with a waaaay simpler approach in c70586c

src/project/manifest/mod.rs Outdated Show resolved Hide resolved
@ruben-arts ruben-arts added the 💣 breaking Breaks something in the api or config label Jan 31, 2024
@orhun orhun requested a review from baszalmstra February 1, 2024 15:55
@baszalmstra
Copy link
Contributor

I hate to say this but I'm still not entirely sure about this approach.

The downside of this approach is that existing pixi.tomls that contain non-normalized names will break. We know that there are examples of this (#264).

I see two routes that we can take:

  1. Instead of erroring, raise a warning about this problem and that in the future this will be an error.
  2. Write a custom Deserializer that while it builds the mapping also checks each key of the map for duplicates. Im confident that with some thinkering we can build this in such a way that we can both allow non-normalized names but check for duplicates and have contextual errors. Probably using some custom DeserializeSeed implementations.

Id prefer we take the second route. If you don't feel as confident as I in the second route I'd be happy to take a stab at it.

@orhun
Copy link
Contributor Author

orhun commented Feb 2, 2024

That's totally reasonable. I'm happy to take the second route and ping you in case I need any guidance 🐻

@orhun orhun force-pushed the feat/check_duplicate_dependencies branch from 6de9743 to 904e832 Compare February 12, 2024 18:27
@orhun
Copy link
Contributor Author

orhun commented Feb 13, 2024

@baszalmstra I took a look at this again and there are a couple of points that I want to clarify:

  • Write a custom Deserializer that while it builds the mapping also checks each key of the map for duplicates.

I came up with this:

fn deserialize_package_map<'de, D>(
    deserializer: D,
) -> Result<IndexMap<PackageName, NamelessMatchSpec>, D::Error>
where
    D: Deserializer<'de>,
{
    struct PackageMapVisitor(PhantomData<()>);

    impl<'de> Visitor<'de> for PackageMapVisitor {
        type Value = IndexMap<PackageName, NamelessMatchSpec>;

        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {
            write!(formatter, "a map")
        }

        fn visit_map<A>(self, mut map: A) -> Result<Self::Value, A::Error>
        where
            A: MapAccess<'de>,
        {
            let mut result = IndexMap::new();
            while let Some((k, v)) = map.next_entry::<String, String>()? {
                let package_name = PackageName::from_str(&k).map_err(serde::de::Error::custom)?;
                let match_spec =
                    NamelessMatchSpec::from_str(&v).map_err(serde::de::Error::custom)?;
                if result.insert(package_name, match_spec).is_some() {
                    return  Err(serde::de::Error::custom(
                        "invalid dependency: please avoid using capitalized names for the dependencies",
                    ));
                }
            }

            Ok(result)
        }
    }
    let visitor = PackageMapVisitor(PhantomData);
    deserializer.deserialize_seq(visitor)
}

However, this is used with serde(with) attribute which means we can no longer use serde_as:

- #[serde_as(as = "IndexMap<_, PickFirst<(DisplayFromStr, _)>>")]
+ #[serde(default, deserialize_with = "deserialize_package_map")]
dependencies: IndexMap<PackageName, NamelessMatchSpec>,

Also, in the case of Option<IndexMap<...>, this won't work.

  • Both allow non-normalized names but check for duplicates and have contextual errors

Right now we get:

  × failed to parse pixi.toml from /home/orhun/gh/pixi/etc/issue717
  ╰─▶ failed to parse project manifest
    ╭─[pixi.toml:11:1]
 11 │
 12 │ ╭─▶ [dependencies]
 13 │ │   python = "3.11.*"
 14 │ │   flask = "2.*"
 15 │ ├─▶ Flask = "2.*"
    · ╰──── invalid dependency: please avoid using capitalized names for the dependencies
    ╰────

What do you mean by allowing non-normalized names? Isn't this a contextual error?

I took a look at the DeserializeSeed example and couldn't wrap my head around how that can be applied to this case. If I understood correctly, it is used when there is an already allocated type that we want to pass into a Deserializer. In our case we can probably allocate the map inside the Deserializer as well.

Maybe I got it wrong though. Do we have any usages of DeserializeSeed already that I can use as an example?

@baszalmstra
Copy link
Contributor

@orhun that's indeed what I had in mind. We indeed lose serde_as which is fine I think.

Instead of deserializing the key value pairs as Strings you could directly parse them as PackageName and NamelessMatchSpec (or some wrapper type to allow custom deserializers). The NamelessMatchSpec can be deserialized using serde_untagged instead of using serde_with (its either a string or the mapped object itself).

The only remaining issue is that now any error is emitted at the level of the entire dependencies block. I think there you could use a DeserializeSeed implementation for the package name which you pass a mutable reference (or some *Cell type) to the map in construction. The deserializer impl then checks if an entry with the same key already exists and errors there. That will then raise the error for the key itself instead of for the containing block. Hopefully, that yields error Valhalla bliss!

@orhun
Copy link
Contributor Author

orhun commented Feb 16, 2024

Made the requested changes in ea724bf

2 things:

  • I added deserialize_package_map and deserialize_opt_package_map for deserializing Option<T> and T - not sure if there is a cleaner way of doing this.
  • How should we test this? Is adding unit tests for deserialization enough?

@baszalmstra
Copy link
Contributor

@ruben-arts Please test to see if you liky!

@ruben-arts
Copy link
Contributor

First thing I tried was this:
image
😅

I do see that as an issue, should we make an issue for it to merge this or do you think it is easy to fix with this PR. I'm fine either way as it already is an improvement over the current state.

@baszalmstra
Copy link
Contributor

What do you mean? The error indicates there is a duplicate flask entry which is indeed the problem?

@ruben-arts
Copy link
Contributor

ruben-arts commented Feb 19, 2024

What do you mean? The error indicates there is a duplicate flask entry which is indeed the problem?

The add still lets you add a duplicate, leaving the toml in a broken state. I think we should never do that.

@baszalmstra
Copy link
Contributor

Aaah! Yeah now I get it! That makes sense.

@orhun
Copy link
Contributor Author

orhun commented Feb 20, 2024

Should we also check duplicates for the pixi add command? Maybe that can be handled in a separate PR.

@ruben-arts
Copy link
Contributor

I would ask you to start on that immediately but we can merge this for now.

@ruben-arts ruben-arts merged commit 89e71fb into prefix-dev:main Feb 21, 2024
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💣 breaking Breaks something in the api or config
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Error on duplicate dependencies.
3 participants