Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pip metadata refactoring #680

Merged
merged 6 commits into from
Nov 13, 2024

Conversation

slimreaper35
Copy link
Member

@slimreaper35 slimreaper35 commented Oct 9, 2024

My local approximate results:

(venv) ~/cachi2 (main) $ time tox -e py312

real    0m24.600s
user    0m23.449s
sys     0m1.419s

(venv) ~/cachi2 (pip-refactoring) $ time tox -e py312

real    0m13.625s
user    0m12.713s
sys     0m0.997s

Maintainers will complete the following section

  • Commit messages are descriptive enough
  • Code coverage from testing does not decrease and new code is covered
  • Docs updated (if applicable)
  • Docs links in the code are still valid (if docs were updated)

Note: if the contribution is external (not from an organization member), the CI
pipeline will not run automatically. After verifying that the CI is safe to run:

Copy link
Collaborator

@a-ovchinnikov a-ovchinnikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with some minor nitpicks.

cachi2/core/package_managers/pip.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/pip.py Outdated Show resolved Hide resolved
@eskultety
Copy link
Member

There's too much going on in this single commit, so it's difficult to follow all the changes in the diff, please introduce them gradually.

Copy link
Member

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very much in favour of this work, needs some polishing though.

cachi2/core/package_managers/pip.py Show resolved Hide resolved
cachi2/core/package_managers/pip.py Show resolved Hide resolved
cachi2/core/package_managers/pip.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/pip.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/pip.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/pip.py Outdated Show resolved Hide resolved
a-ovchinnikov
a-ovchinnikov previously approved these changes Oct 10, 2024
Copy link
Member

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After having carefully gone through the unit tests which I didn't do in my first round of reviews I think we're actually opening us up for potential issues with pyproject.toml setup.py etc. mixed metadata.
I think that while we may cosmetically change the code and break the logic into smaller helper functions, we'll have to test the metadata querying in the compound way we're doing now.

cachi2/core/package_managers/pip.py Outdated Show resolved Hide resolved
tests/unit/package_managers/test_pip.py Outdated Show resolved Hide resolved
cachi2/core/package_managers/pip.py Outdated Show resolved Hide resolved
@slimreaper35
Copy link
Member Author

After having carefully gone through the unit tests which I didn't do in my first round of reviews I think we're actually opening us up for potential issues with pyproject.toml setup.py etc. mixed metadata.

That's a good point. The fact that we were mixing metadata from multiple project configuration files was there reason why we ended up with extremely complicated and long unit tests. Splitting name and version into multiple configuration files makes no sense on its own. In the end, we only need the name, one string, for an SBOM component.

@a-ovchinnikov a-ovchinnikov dismissed their stale review October 16, 2024 18:34

More changes have accumulated, must take another look

@eskultety
Copy link
Member

eskultety commented Oct 17, 2024

After having carefully gone through the unit tests which I didn't do in my first round of reviews I think we're actually opening us up for potential issues with pyproject.toml setup.py etc. mixed metadata.

That's a good point. The fact that we were mixing metadata from multiple project configuration files was there reason why we ended up with extremely complicated and long unit tests. Splitting name and version into multiple configuration files makes no sense on its own. In the end, we only need the name, one string, for an SBOM component.

Well, what this PR just did is a breaking change from the behaviour POV without any warning. There probably was a reason we did this way in the past. It's true that mixing metadata is wrong, however, we allowed it and it also wasn't against the ecosystem practices, was it? (although very unexpected without a doubt). So this can't be compared to our recent dropping of Go vendoring flags, because those actually allowed projects to use incorrect repo setups which would not be buildable using standard toolkits the way users intended to in the first place, I'm not sure that's the case here.

If we end up wanting this, then you'll have to accompany this change with a docs update (we'll also need to mention that in the release notes). That said, although I'm definitely not a fan of breaking backwards compatibility, strictly speaking SemVer [1]:

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

and so I won't stop this work based on this argument, but we'll probably need more voices in favour.

[1] https://semver.org/#semantic-versioning-specification-semver

Copy link
Member

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still stuffed everything into commit 1. The changes can be introduced gradually by adding one unit test at a time and turning off that particular test area in the more complex unit test you're trying to kill. That way, you'd keep most of the things as is until you're ready to switch and then remove everything you don't need in a single commit, it can be done and the diff will be much more readable IMO. I'm not fond of trying to argument squashed changes by a complex unit test that isn't easily to be replaced (as I mentioned one option how to do it) as a justification - things can be made cleaner for the reader/reviewer.

cachi2/core/package_managers/pip.py Show resolved Hide resolved
@slimreaper35
Copy link
Member Author

You still stuffed everything into commit 1. The changes can be introduced gradually by adding one unit test at a time and turning off that particular test area in the more complex unit test you're trying to kill. That way, you'd keep most of the things as is until you're ready to switch and then remove everything you don't need in a single commit, it can be done and the diff will be much more readable IMO. I'm not fond of trying to argument squashed changes by a complex unit test that isn't easily to be replaced (as I mentioned one option how to do it) as a justification - things can be made cleaner for the reader/reviewer.

I'll try my best

@slimreaper35
Copy link
Member Author

Also, it might be worth discussing setup.py as:

New projects are advised to avoid setup.py configurations (beyond the minimal stub) when custom scripting during the build is not necessary. Examples are kept in this document to help people interested in maintaining or contributing to existing packages that use setup.py. Note that you can still keep most of configuration declarative in setup.cfg or pyproject.toml and use setup.py only for the parts not supported in those files (e.g. C extensions). See note.

We can at least add a warning when extracting metadata from setup.py

@eskultety
Copy link
Member

eskultety commented Oct 18, 2024

We can at least add a warning when extracting metadata from setup.py

People should be aware of setup.py soft deprecation by now, do we want to hold everyone's hand? I mean displaying a warning for users who genuinely need setup.py (because pyproject.toml simply doesn't cut it for them as they may have C deps) doesn't feel right. I wouldn't strictly argue against having a warning if you proposed it somewhere in the code, I'm just questioning the usefulness given the circumstances.

@eskultety
Copy link
Member

After having carefully gone through the unit tests which I didn't do in my first round of reviews I think we're actually opening us up for potential issues with pyproject.toml setup.py etc. mixed metadata.

That's a good point. The fact that we were mixing metadata from multiple project configuration files was there reason why we ended up with extremely complicated and long unit tests. Splitting name and version into multiple configuration files makes no sense on its own. In the end, we only need the name, one string, for an SBOM component.

Well, what this PR just did is a breaking change from the behaviour POV without any warning. There probably was a reason we did this way in the past. It's true that mixing metadata is wrong, however, we allowed it and it also wasn't against the ecosystem practices, was it? (although very unexpected without a doubt). So this can't be compared to our recent dropping of Go vendoring flags, because those actually allowed projects to use incorrect repo setups which would not be buildable using standard toolkits the way users intended to in the first place, I'm not sure that's the case here.

If we end up wanting this, then you'll have to accompany this change with a docs update (we'll also need to mention that in the release notes). That said, although I'm definitely not a fan of breaking backwards compatibility, strictly speaking SemVer [1]:

Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

and so I won't stop this work based on this argument, but we'll probably need more voices in favour.

[1] https://semver.org/#semantic-versioning-specification-semver

@brunoapimentel @a-ovchinnikov @taylormadore @ben-alkov any opinions on simpler yet backwards incompatible behaviour?

@a-ovchinnikov
Copy link
Collaborator

any opinions on simpler yet backwards incompatible behaviour?

Original behaviour looks somewhat strange to me -- I would think that a package which has name defined in one config, and version in another is malformed and will cause other issues as well. While possible I don't believe it is probable to find such a package. I am generally in favor of making a live test.

The change breaks the original behavior, but I am not sure it was correct to begin with. With the code as we had it before we stopped at the first found pair of name and version, but technically every location could have defined its own name and version, so a sequence of

name    version
----------------
foo     None
bar     1.0.0
baz     2.3.4

would have resulted in (foo, 1.0.0) and there is no good way of telling if it is the correct (name, version) rather than (baz, 2.3.4). Personally I would have rejected a package that has mismatching names or versions in its definition, or at least emitted a big warning.

This change makes the code a little cleaner so I am in favor of it.
I would suggest capturing the essence of this discussion and adding a big comment to the new extractor to explain that we don't want to deal with heterogeneous (name, version) pair. In case we find that this was a wrong decision we'll update both the code and the comment and won't need to figure that out ever again.

@slimreaper35 slimreaper35 force-pushed the pip-refactoring branch 4 times, most recently from d64087f to 84a0060 Compare October 25, 2024 11:32
cachi2/core/package_managers/pip.py Outdated Show resolved Hide resolved
Copy link
Member

@eskultety eskultety left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs commit msg adjustment wrt/ this being a breaking change, otherwise ACK.

There is no context within the log warning.
We don't warn users about other things when parsing package metadata
(for example deprecation of setup.py).
The version is an optional attribute in the SBOM.
Even cachi2 uses "dynamic version".

Signed-off-by: Michal Šoltis <[email protected]>
The commit follows the previous one, that drops
a warning when processing metadata from pyproject.toml.
This piece of code is no longer needed.

Signed-off-by: Michal Šoltis <[email protected]>
Do not mix name and version from multiple config files
(pyproject.toml, setup.cfg, setup.py) and with the name from
git origin remote.

Now, the current behavior parses one config file at a time,
and then tries to get the name + version from it. If the name
is there, both name and version are returned regardless of
the version presence. This metadata will be used in the SBOM
for the component representing the processed package.

Even though, the probability of affecting users is low,
it is considered as a breaking change since the component PURL
might be different now. Therefore, it should be mentioned
in the release notes.

The commit also drastically simplifies unit tests to speed up
overall time of unit tests while preserving the same coverage.

Signed-off-by: Michal Šoltis <[email protected]>
@slimreaper35 slimreaper35 added this pull request to the merge queue Nov 13, 2024
Merged via the queue into containerbuildsystem:main with commit 5a65f16 Nov 13, 2024
16 checks passed
@slimreaper35 slimreaper35 deleted the pip-refactoring branch November 13, 2024 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants