Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Definition of "isa" has an OS-specific vocabulary and various other corner cases #78

Open
smcv opened this issue Sep 27, 2024 · 5 comments
Labels
help wanted This issue would benefit from community assistance. need discussion Resolution of this issue should be discussed within the wider community before resolving.

Comments

@smcv
Copy link

smcv commented Sep 27, 2024

The isa field is currently defined to be a possible output of uname -m, which isn't necessarily a great fit for build systems for several reasons:

  • The existence of uname -m is a Unixism: as far as I'm aware, Windows doesn't have it at all. Is there a meaningful definition of what the isa should be on Windows, to distinguish between i386, x86_64 and others?

  • Different OSs represent the same ISA in uname -m differently. For example, Darwin's arm64 is the same as Linux's aarch64 according to GNU config.guess, the conventional Windows name for what Linux calls x86_64 is x64, and PowerPC is variously powerpc{,64} or ppc{,64}.

  • Sometimes the same ISA has multiple representations even on the same OS. For example, on Linux, i386 up to i686 are all the same ISA really, and semi-arbitrary strings like armv5tel are the same ISA as arm. The current CPS spec seems to consider i586 and i686 to be distinct ISAs, and similarly arm and armv5tel: it seems bad if a CPS-based build system is encouraged to crash out with an error like "you are compiling for i686, but the version of libfoo we found was for i586".

  • Some CPUs like PowerPC and ARM can be run in two modes, little-endian (LSB first) or big-endian (MSB first); some vocabularies of CPU families represent this as part of the architecture name, and some do not. For example, Linux uname -m on 64-bit PowerPC can output either ppc64 or ppc64le, but Meson considers both of those to be members of the ppc64 CPU family. At the moment CPS seems to consider ppc64 and ppc64le to be distinct, but it isn't clear whether this is really intentional.

(See GNU's /usr/share/misc/config.guess and /usr/share/misc/config.sub on a Linux system for many more examples of the output of uname -m needing normalization or postprocessing.)

If the ISA is important information to appear in these files, I'd suggest having a normative vocabulary of architecture names, like Meson does: https://mesonbuild.com/Reference-tables.html#cpu-families (the table ends with "Any cpu family not listed in the above list is not guaranteed to remain stable in future releases").

Defining the OS as being uname -s has many of the same issues.

@mwoehlke mwoehlke added help wanted This issue would benefit from community assistance. need discussion Resolution of this issue should be discussed within the wider community before resolving. labels Oct 21, 2024
@mwoehlke
Copy link
Member

If the ISA is important information to appear in these files

...I think so? If I'm building for ia64 (why? 😉) and I find a package built for ppc64, I'm not going to be able to link that, am I?

That said, platform compatibility is an area that's known to need a complete overhaul, so please don't hold your breath expecting rapid progress. However, I think the idea of having an explicit registry has merit.

@dcbaker
Copy link
Collaborator

dcbaker commented Oct 22, 2024

I really, really, really want to have this. The number of issues we've fielded in Meson that turned out to be "my pkg-config picked up a .pc file for my build machine on a host machine target" is enough to make me pull my hair out.

I'm obviously biased, but the tables approach has worked fairly well for Meson so far.

@bruxisma
Copy link
Contributor

I've been working on a solution for these issues as part of the EcoIS, which is two fold. The first is bringing back P1864 so that CPS could at least have an idea of "common" names to use for ISAs.

The second is having a superset of CPS configurations folded into a well known directory layout much akin to Apple's .xcframework.

Unfortunately sudden health issues and work requirements have resulted in almost no time allowed for working on these :(

I would argue just listing an ISA is not enough. A full target tuplet is necessary to know what a package does (e.g., "does this C package use the Windows calling convention") so that a build system can select the correct option. This also opens CPS up to allow platforms that would be considered old and dead (e.g., the SNES), and this would be a boon for retrocomputing as both a hobby via homebrew but also as a field of study for older compiler toolchains.

@mwoehlke
Copy link
Member

I would argue just listing an ISA is not enough

Heartily seconded. You don't want to find a Windows package when building for Linux... and "windows"/"linux" are probably not adequate, either, for their axis. "Platform" is a many-dimensional concept for which most of the axes matter.

The trick is figuring out a) what the axes are, and b) what the set of possible values is for each. Personally, I'm not convinced a tuple is the right data structure. CPS, as it stands, is using the equivalent of a dictionary.

@bruxisma
Copy link
Contributor

☝🤓 well akshually (forgive me for that, but also don't)

a tuple is just a dictionary where the indices are the keys of a dictionary, and if those indices are tied to a name, it's just a named tuple, and a named tuple is just a dictionary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted This issue would benefit from community assistance. need discussion Resolution of this issue should be discussed within the wider community before resolving.
Projects
None yet
Development

No branches or pull requests

4 participants