Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bad.pyd has a .voltbl section at the beginning #1

Open
carlkl opened this issue Jan 22, 2022 · 25 comments
Open

bad.pyd has a .voltbl section at the beginning #1

carlkl opened this issue Jan 22, 2022 · 25 comments

Comments

@carlkl
Copy link

carlkl commented Jan 22, 2022

bad

This also applies for the broken scipy pyd files which are linked with npymath.lib from VS v142. This question on the mingw-w64 ML is related: https://sourceforge.net/p/mingw-w64/mailman/message/37595762/

See also: https://developercommunity.visualstudio.com/t/vc-generatesships-obj-files-with-illegal-associati/441151

@carlkl
Copy link
Author

carlkl commented Jan 22, 2022

FYI @mstorsjo

@carlkl
Copy link
Author

carlkl commented Jan 22, 2022

FYI @rgommers

@matthew-brett
Copy link
Owner

More discussion here : https://bugs.chromium.org/p/chromium/issues/detail?id=925943

MS update to PE format spec associated : https://github.com/MicrosoftDocs/win32/compare/7ddf2ceff076edb3effb6f9326307c3bbf862cfb..d86a62d2a66a033cae105f063f30b9e62ab760c1

Is it possible Binutils is not up to date with the updates to the spec?

@matthew-brett
Copy link
Owner

.voltbl also appears inbad_npymath.lib - see https://github.com/matthew-brett/dll_investigation/blob/main/objdump_x_bad_npymath.txt#L429 for one of four references in objdump, many references in llvm-readobj.

@carlkl
Copy link
Author

carlkl commented Jan 22, 2022

Obviously using the static npymath.lib as well as npyrandom.lib by GNU ld seems to be a working solution all the time. But given the statement by @mstorsjo there is no guarantee for compatibility at all and it worked only by accident. This statement is not new BTW and quite old.

Now we we the situation that relying on VS build static libraries becomes a maintenance problem.

A solution could be the following:

building CPython extensions could rely on static libraries of the same vein but build with GCC instead of VS. This could be done during a numpy build, or with the help of a dedicated package relying on numpy.

Using an old VS toolchain as proposed in Please revert to VS2017-compatible build for Numpy #145 is only an intermediate solution.

@rgommers
Copy link

Using an old VS toolchain as proposed in Please revert to VS2017-compatible build for Numpy #145 is only an intermediate solution.

True, but we can keep that unchanged for quite a while for official binaries at least. So we have some time to come up with a solution.

building CPython extensions could rely on static libraries of the same vein but build with GCC instead of VS. This could be done during a numpy build, or with the help of a dedicated package relying on numpy.

That sounds painful. Maybe a better long term solution is to create shared libnpymath and libnpyrandom libraries?

But given the statement by @mstorsjo there is no guarantee for compatibility at all and it worked only by accident. This statement is not new BTW and quite old.

I missed that - is this true for Mingw only, or also for Clang?

@carlkl
Copy link
Author

carlkl commented Jan 22, 2022

That sounds painful. Maybe a better long term solution is to create shared libnpymath and libnpyrandom libraries?

Maybe the best and easiest solution if accepted ASAP by the numpy developers. This would allow to use VS 2019 to build numpy. Maybe the next meson based scipy should simply request a numpy version of this kind as a minimum.

I missed that - is this true for Mingw only, or also for Clang?

There is clang available for the UCRT64 environment. This clang implementations is using GNU ld.exe as well; or MSVRT link.exe when clang-cl is used. And there is a dedicated CLANG64 environment which is using lld. If lld is able to handle VS build static libraries - I don't know. Unfortunately CLANG64 has no Fortran compiler is available right now. It is said that one has to wait until LLMV 15 at least until FLANG is functional. Maybe even longer on the windows platform.

This brings me to the idea to use the UCRT64 version of clang-cl instead of GCC to compile the remaining binaries using npymath

@matthew-brett
Copy link
Owner

Good idea to think about what to do as a better long-term solution - but it would be good to get to the bottom of the problem here so we can a) give more information to the Mingw-w64 team, and b) we get a better understanding of how likely it is we can make this work (perhaps, again, unsupported) with VS2019 v142.

@carlkl
Copy link
Author

carlkl commented Jan 22, 2022

These problems have been discussed before, see conda-forge/scipy-feedstock#165 (comment)

@mstorsjo
Copy link

mstorsjo commented Jan 22, 2022

Is it possible Binutils is not up to date with the updates to the spec?

Yes, I'm quite sure binutils hasn't been updated to take these things into account. And as long as it's only MSVC that produces them, I don't think there's going ot be any rush to implement it anytime soon either. (If someone sends a patch to binutils to implement it, sure I guess it could be accepted though.)

FWIW see https://reviews.llvm.org/D116474#3220925 for some other discussion on the matter - LLD doesn't handle these sections yet either.

But ocaml/ocaml@0ac7358 mentions a MSVC option that turns off generating these new unsupported sections.

But given the statement by @mstorsjo there is no guarantee for compatibility at all and it worked only by accident. This statement is not new BTW and quite old.

I missed that - is this true for Mingw only, or also for Clang?

Comparing "Mingw" to "Clang" like this is misrepresenting the whole situation. Let's take it from the start.

Originally, there was the compiler GCC together with binutils, which needs an SDK that provides headers and import libraries. Mingw is the set of (GCC compatible) headers and import libraries. Then there also was MSVC (the compiler) and its associated headers (MSVC + WinSDK).

Then Clang entered the stage. Clang/LLVM can impersonate both GCC and MSVC; in mingw mode, Clang can use the mingw headers and libraries, and generally behaves as drop-in replacements for the corresponding GCC/binutils tools. In MSVC mode, Clang impersonates MSVC, in the form of clang-cl (enabling a bunch of language frontend quirks to make it able to parse the MSVC headers), and works as a drop-in replacement for the MSVC tools.

So Clang isn't one toolchain, but one set of tools that can behave like the rest of the tools in either of these separate universes.

The two separate universes, mingw mode and MSVC mode, generally only can interoperate over DLL boundaries, with C function interfaces between each other.

As for what the differences are - there are bigger or smaller differences throughout the whole toolchain stack. Starting from the bottom of the stack, on the object file level, while both GCC/binutils and MSVC use the same COFF object file format, they do use features of the COFF object file format slightly differently. Plain simple functions are mostly the same, but when it comes to more tricky features (comdats, used for functions that can be deduplicated across object files, like in C++ inline functions and lots of other cases), they behave differently. I guess GNU binutils ld.bfd couldn't handle the MSVC style of comdats, and likewise MSVC link.exe couldn't handle the GNU style of comdats. In the middle ground, LLVM's lld can handle both variants though.

Then slightly higher level, things like implicitly called functions for stack probing, for stack canaries/protection, etc, are different between MSVC and GCC produced code, and each of them expect to be linked to the support libraries of each others' ecosystems. (It's plausible that the mingw runtime libs maybe could contain the entry points used by MSVC code in some cases too, though.)

Then for the biggest practical problem - interfacing with the CRT. If you build mingw code for the traditional msvcrt.dll environments, then code built targeting a modern MSVC CRT (UCRT) can't be linked together. If you don't call much tricky CRT functions you might be fine, but for more complicated CRT functions, there will be incompatibilties. If linking against UCRT in the mingw environment, the chances of success are a bit higher. But even then, there's differences in divsion of things between inline functions in headers and statically linked minor helper functions in the import libraries, so code built with one set of headers pretty much needs to be linked against the corresponding sets of libraries from the same vendor.

Additionally, mingw and msvc environments disagree on the long double type on x86. In mingw environments, long double is an 80 bit float, while in msvc it's a plain 64 bit float. So code can't be using that data type across the mixing boundary either.

Then for the real killer, C++. In C++, mingw and MSVC environments use two entirely incompatible ABIs.

So, overall, there are differences between mingw and msvc environments across pretty much every single layer of the stack. By linking with lld instead of binutils ld, you can get a little bit of more compatibility for the issues at the lowest level of the stack, but that doesn't help for all issues above that.

If your code doesn't use anything else than plain C functions without complicated linkage, and doesn't call many tricky CRT functions, there's a chance it might work.

So overall, if mixing the two environments happens to work, good for you, but for developers of toolchains, it's pretty much a non-goal as it's not something that can be made to work in the general sense anyway.

@mstorsjo
Copy link

More discussion here : https://bugs.chromium.org/p/chromium/issues/detail?id=925943

MS update to PE format spec associated : https://github.com/MicrosoftDocs/win32/compare/7ddf2ceff076edb3effb6f9326307c3bbf862cfb..d86a62d2a66a033cae105f063f30b9e62ab760c1

Is it possible Binutils is not up to date with the updates to the spec?

To add detail on this matter; the linked issue is about associative comdats and the finer details about implementing them. GCC/binutils don't produce associative comdats at all, and afaik binutils ld hasn't implemented anything regarding linking of them (even the basic cases), so if the input object files have such, then all bets are off. But For the case of .voltbl, I think it's all metadata so it shouldn't affect things much in itself even if it was wrong, except that they seem to leave incorrectly handled/mapped sections in the output executable.

@matthew-brett
Copy link
Owner

@mstorsjo - thanks very much for the detailed explanations.

Please don't take this as a plea to support .lib linkinng of arbitrary MSVC .lib files - I totally accept that's not a goal. My own desire in this thread / repo was just to get a better understanding of why it causes the error it does. That understanding may well lead to the conclusion we have to stop trying to do this - that's OK too.

For the .voltbl - do you think then this is not the explanation for the 193 error? If it is not - is there anything else we should be looking at to work out what is going on?

@mstorsjo
Copy link

@mstorsjo - thanks very much for the detailed explanations.

Please don't take this as a plea to support .lib linkinng of arbitrary MSVC .lib files - I totally accept that's not a goal. My own desire in this thread / repo was just to get a better understanding of why it causes the error it does. That understanding may well lead to the conclusion we have to stop trying to do this - that's OK too.

For the .voltbl - do you think then this is not the explanation for the 193 error? If it is not - is there anything else we should be looking at to work out what is going on?

I think it's plausible that those sections cause the runtime issues - from what I remember from https://reviews.llvm.org/D116474, those sections end up needing some tweaks in the linker for the executable to be even valid (so ideally a linker would at least need to just omit those sections if nothing else). I think the option from ocaml/ocaml@0ac7358 might be a good first step of trying to make MSVC stop emitting them, which might fix the issue. Other than that, it might be possible to run a pass with GNU objcopy or llvm-objcopy on the object files in the static library, to strip out those problematic sections before hitting the unknowing linker. But that's a bit of a wild guess too.

I haven't really ran into the .voltabl section issue myself yet so I haven't studied it closer. (I guess in all cases where I use MSVC produced object files, they're linked with MSVC's link.exe, which has got whatever extra support is needed for it.)

@carlkl
Copy link
Author

carlkl commented Jan 23, 2022

@mstorsjo, thank you for this detailed inside view. Maybe using the flag -d2VolatileMetadata- for cl during numpy build may help for now, but in the long run we have to move away from linking possibly incompatible static libraries together. Your explanations are very helpful for further discussions.

@h-vetinari
Copy link

@carlkl: It is said that one has to wait until LLMV 15 at least until FLANG is functional. Maybe even longer on the windows platform.

The latest status is that flang will most likely only be experimental for LLVM 15, see e.g. discussion here. If code generation becomes available, I'll try to build a couple conda-forge packages with it and raise bugs upstream.

But I think anything before LLVM 16 is completely unrealistic, and even then analyses like this one show that this might still come with substantial performance regressions (obviously too early to tell, but there'll still be quite substantial work to do on flang even after the upstreaming from fir-dev is complete).

On the other hand, it sounds conceivable to bridge another 1-2 years with vc141...?

Did someone ever try if vc142 + the flag from ocaml/ocaml@0ac7358 is mingw-compatible for npymath.lib?

@matthew-brett
Copy link
Owner

Cross link to discussion of future of npymath.

@matthew-brett
Copy link
Owner

Hmm - it looks like -d2VolatileMetadata- does work in making it possible for Scipy to link the VC142 npymath library, as per the ocaml commit. See: numpy/numpy@main...matthew-brett:no-msvc-voltab (needs cleaning up to check whether flag is valid for compiler version used).

@rgommers - or anyone - what do y'all think about including this flag in the Numpy build? I can't find much on it, except https://devblogs.microsoft.com/cppblog/msvc-backend-updates-in-visual-studio-2019-version-16-10-preview-2 - which appears to confirm the OCaml comment that the flag disables some optimization for ARM64 emulation when running x64 binaries.

@h-vetinari
Copy link

Hmm - it looks like -d2VolatileMetadata- does work in making it possible for Scipy to link the VC142 npymath library

Thanks a lot for testing that, that's great news!

@rgommers - or anyone - what do y'all think about including this flag in the Numpy build?

I don't claim to speak for the numpy folks, but considering the context of:

Mixing static libraries or object files between mingw and msvc is not
supported, and not expected to work, in general. You might have been lucky
and your library might have been simple enough not to hit any problematic
case though.

I think it would be a strict improvement if things also end up working with vc142 - by removing the thing (volatile assembly sections) that broke the previous interaction, we'd just be extending that happy accident a bit further (e.g. until flang comes around).

@carlkl
Copy link
Author

carlkl commented Apr 16, 2022

This all was discussed before in january of this year, Remember there was a decision to switch back to VS2017 numpy builds for this reason (.voltbl sections are created by VS2019): Please revert to VS2017-compatible build for Numpy. It was not decided to using the -d2VolatileMetadata- flag instead.
The overall consenous as far as I remember was to avoid mixing of MSVC and mingw-w64 static objectfiles in the long term.

@matthew-brett
Copy link
Owner

I think the only point here was that, if, for other reasons, we want to update to the VS142 (2019) toolchain, before we have a better long-term solution for npymath, then we can, by adding this flag to the Numpy build.

I suppose that there may be some other projects building against npymath, who are using MSVC, in which case, they may prefer to keep using the older toolchain, for compatibility. But we've just hit a VS141 incompatibility in Numpy, for example: numpy/numpy#21346

@rgommers
Copy link

The decision whether to use 14.1 for NumPy wheels is not under discussion I'd say. We can just include the d2VolatileMetadata flag in the build, so things work better with 14.2 for whoever does want to use that toolchain.

matthew-brett added a commit to matthew-brett/numpy that referenced this issue Apr 19, 2022
Add flag to disable the voltab section in the generated DLL, which
appears to break mingw-w64 linking.

See:
ocaml/ocaml@0ac7358

and

matthew-brett/dll_investigation#1 (comment)
matthew-brett added a commit to matthew-brett/numpy that referenced this issue Apr 19, 2022
Add flag to disable the voltab section in the generated library, which
(if present) appears to break mingw-w64 linking.

See:
ocaml/ocaml@0ac7358

and

matthew-brett/dll_investigation#1 (comment)
@h-vetinari
Copy link

The decision whether to use 14.1 for NumPy wheels is not under discussion I'd say.

Respectfully, how can removing an EOL compiler not be up for discussion?

As you know (but just for setting context), the situation on windows is quite different from linux - there, it's understandable to want to be compatible with still-supported distros (with ABIs bound to an old compiler versions, resp. the libstdcxx that comes with that), but on windows, the OS is not coupled so tightly with the compiler version.

In fact, MSFT has been prioritizing backwards-compatible from VS2022 back to VS2015 almost to the point of parody (e.g. if you look at the [[msvc::no_unique_address]] background), so there should be no situation where VS2019 isn't a 1:1 drop-in for VS2017**, and where it isn't, MSFT would consider it a bug to be fixed.

** This includes the reuse of all other binaries being consumed by a build; the only restriction is the other way around; projects consuming artefacts produced by VS2019 (more precisely, the VC142 toolchain) need to be compiled with that as well. However, as explained above, that upgrade really should be a non-issue in every sense; I would be very interested to hear of any scenario where that's not the case.

@matthew-brett
Copy link
Owner

I believe this was an issue for conda - that some projects were compiling with VS2017, and running into problems linking against VS2019 binaries. @isruf -can I ask - what's the current situation?

@rgommers
Copy link

Respectfully, how can removing an EOL compiler not be up for discussion?

I meant in this discussion. It's simply a bad idea to have the same discussion in multiple places. This issue apparently goes away with a single compiler flag, so we should add it. I do not want to rehash the recent conda-forge and numpy-wheels discussions, they are orthogonal.

@h-vetinari
Copy link

h-vetinari commented Apr 19, 2022

@matthew-brett: I believe this was an issue for conda - that some projects were compiling with VS2017, and running into problems linking against VS2019 binaries. @isruf -can I ask - what's the current situation?

Have a look at this comment I just wrote for some context. I can provide more links for previous discussions if desired.

@rgommers: I meant in this discussion. It's simply a bad idea to have the same discussion in multiple places.

Sure, that's fair enough (but wasn't clear to me from your formulation). It seems numpy/numpy#21346 would be the best place for this now (since numpy/numpy#20880 - while related - deals with a very different problem at its core)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants