Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Track relationships between packages derived from the same upstream #308

Open
pombredanne opened this issue Feb 26, 2024 · 4 comments
Open

Comments

@pombredanne
Copy link
Member

This is breaking things down from #186 (comment)
@armijnhemel wrote:

This is actually something that recently dawned upon me as well and I have been thinking about this for quite some time. I already warned @pombredanne that I would be leaving a very long description of my thoughts, so here it is.

When you look at the purlspec ( https://github.com/package-url/purl-spec ) you can see that a purl has (at least) 7 components (or actually, at least 6, as the first one is always pkg). The second component indicates a hint about the format of the package, such as rpm, deb, and so on.

While I think that when talking about a specific instance of a package purl is the right way to describe it, it is not how people think about packages. Let's look at an example from the purlspec:

pkg:rpm/fedora/[email protected]?arch=i386&distro=fedora-25

This describes the binary RPM package from a version of Fedora for a particular architecture. This package was built in a certain way, with a certain configuration, in a certain environment, and possibly with some patches applied to the source code tree before it was built. There could also be a similar package for a version of Debian. This would NOT describe the exact same package (as it was built in a different environment, with a different configuration and possibly with different patches) but it is a related package. What relates the two packages is that they derive from the same basis, namely the curl source code archive, which can also be described using a purl.

So all these purls (the Fedora package, Debian package and original source code archive) are related to each other, but they are not identical. But this is not how people conceptually think about "a package". They will refer to the Fedora RPM as "curl", to the Debian deb as "curl" and to the original source code archive as "curl". This is not necessarily wrong, but also not necessarily right (as explained above).

If instead there would be a meta package for "curl" then all of the purls (Fedora RPM, Debian deb, source code archive) can be seen as instances of the meta package "curl". These instances could have associated facts (for the lack of a better word) describing certain aspects of the fact which might or might not be correct ("facts" that could be extracted from the RPM metadata: location of the VCS, location of the webpage, package name, and so on).

The above example is a bit simple and straightforward, so let's throw in a few more complex examples, starting with renaming packages. There are distributions that rename packages. The most straightforward example is Debian that uses lower case names for all of its packages by convention (along with some other things, like replacing hyphens with other characters). A renamed package would still be an instance of a "meta package".

A slightly more radical example: in Debian the httpd package was renamed to apache2, while Fedora uses httpd. Both are packages derived from the Apache httpd source code and thus are related and should not be seen as completely different packages. Instead, there could be an "Apache httpd" meta package that has both the Fedora and Debian packages as instances.

Another more difficult example would be GCC: from the GCC code base many different packages are created, which the GCC 13 page on Launchpad shows: https://launchpad.net/ubuntu/+source/gcc-13
These are very obviously not the same packages, but they were generated from the same source code, or subsets of the same source code, so they are related. Add to that all the different versions of GCC, and the different configurations they were built in (cross compilers, etc.) and you can see that it can get quite complex. Yet: still they are all related.

Wrapping up: I think that the idea of a "meta package" is great, as this is how people are used to talk about code. A meta package could have several instances which are described by purls that point to specific binary packages/source code archives, which in turn have facts (metadata) associated with them. The meta package could try to consolidate these facts (along with other facts from for example Wikidata) and/or present these to the user in a certain way.

@pombredanne
Copy link
Member Author

This is quite related to Repology's metapackage ... for instance https://repology.org/project/firefox/versions

@mjherzog
Copy link
Member

Ubuntu has an implementation of MetaPackages - https://help.ubuntu.com/community/MetaPackages

@armijnhemel
Copy link

I guess that #373 is related.

@pombredanne
Copy link
Member Author

pombredanne commented May 8, 2024

@armijnhemel re:

I guess that #373 is related.

It was related... but that part has been moved to:

And the package set feature is closely related:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants