Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

macOS: dead_strip_dylibs ldflag causes missing symbols. #107

Closed
mathstuf opened this issue Dec 17, 2020 · 38 comments
Closed

macOS: dead_strip_dylibs ldflag causes missing symbols. #107

mathstuf opened this issue Dec 17, 2020 · 38 comments

Comments

@mathstuf
Copy link

Reflining from https://gitlab.kitware.com/paraview/paraview/-/issues/19645

Also seen here: AnacondaRecipes/intel_repack-feedstock#8

When building on macOS, the -dead_strip_dylibs option is passed in. This causes the final executables (pvpython, paraview, etc.) to drop the libpython3.X.dylib reference because the executable itself does not call Python APIs directly. The actual calls are in a library (vtkPythonInterpreter) which does not directly link libpython because doing so breaks when the symbols are provided by the executable itself (as is the case with Anaconda's python binary). There's no way to turn the flag's behavior off once it is on the command line, so not passing it is the solution for ParaView (and VTK).

@mingwandroid
Copy link

How do you link vtkPythonInterpreter?

@mathstuf
Copy link
Author

We explicitly do not link to libpython because it conflicts when it is transitively loaded through import vtk by Anaconda (since that executable brings its own statically-linked libpython symbols). That link is satisfied with the -undefined dynamic_lookup flag.

@mingwandroid
Copy link

I asked how you link to vtkPythonInterpreter. Is it a dylib?

@mathstuf
Copy link
Author

Oh, the lack of "to" made me think you were asking about linking the library itself.

Yes, this is in a shared build.

@mingwandroid
Copy link

OK, presumably:

.. your python extension module claims not to need any .dylibs beyond vtkPythonInterpreter.dylib and that is true (btw, it esp. does not need a 2nd copy of the python interpreter code that knows nothing about the one linked to the exe, that's a certainty!).

vtkPythonInterpreter.dylib does need python symbols, but that's fine because the executable will provide those.

This does not seem to me different in a technical sense from any other .dylib loading that we do!

Can someone explain what actually goes wrong here?

@mathstuf
Copy link
Author

We tell the executable to link to libpython (and it's on the link line) because it ends up calling vtkPythonInterpreter and the symbols need to be provided. However, -dead_strip_dylibs is passed and since none of the executable's object files use symbols from libpython, it is removed from the list of libraries to link. This results in missing symbols at runtime.

@mingwandroid
Copy link

We tell the executable to link to libpython (and it's on the link line) because it ends up calling vtkPythonInterpreter and the symbols need to be provided

I am afraid that is un-true and that is the point of https://github.com/python/cpython/search?q=%22-bundle+-undefined+dynamic_lookup%22&type=code

Passing -lpython ever when also using a statically linked python is just wrong. CC @jschueller

@mingwandroid
Copy link

And I have yet to see a real example of this auto-stripping feature getting things wrong. Not a single one.

@mathstuf
Copy link
Author

That's fine for Anaconda, but pvpython is another top-level interpreter and it needs to link to libpython in order to work (as is any other executable that embeds Python rather than being used via import).

@mingwandroid
Copy link

If it is the case that vtkPythonInterpreter.dylib links to python.dylib then the act of import vtk should cause python.dylib to get automatically loaded by the OS loaded and bound to the entry points within vtkPythonInterpreter.dylib. However the entry points within vtk.dylib should have already been resolved to those from the executable due to you having passed -bundle -undefined dynamic_lookup to it.

@mingwandroid
Copy link

I am in no way talking about "anaconda".

@mathstuf
Copy link
Author

vtkPythonInterpreter does not link to libpython at all because doing so conflicts with the "interpreter provides the symbols" pattern that is apparently wanted on macOS.

But, we have other entry points into starting the interpreter than just bin/python importing a module. pvpython, pvbatch, paraview, and probably the other ones can all start an interpreter. To do so, these executables need to provide the libpython symbols just like bin/python does. How is one to do that without linking to the python library?

@mingwandroid
Copy link

OK, in that case, you should dynamically load and bind vtkPythonInterpreter.dylib to python.dylib.

Saying that the top level module needs a DLL when it doesn't just so you don't have to load it yourself is not the best course of action.

@mingwandroid
Copy link

As esp. telling a dylib that it should load a dylib from which symbols will conflict with the executable that's just loaded it seems very dangerous.

@mathstuf
Copy link
Author

How does one do that? And why is that needed at all when just removing this flag works?

The library is not loading libpython on its own. The executable is to provide it. We're trying to do that and using -dead_strip_dylibs thwarts it.

@mingwandroid
Copy link

Please look at dlfcn

@mathstuf
Copy link
Author

AFAIK, the Symbol not found happens before main. How would that even work?

@mathstuf
Copy link
Author

How does bin/python end up providing the libpython symbols? That's what we need to do at the core.

@mathstuf
Copy link
Author

Just a note: I've spent weeks dealing with these kinds of linker problems and hours figuring out how to get it all correct. I've even written a linker patch so that the dynamic_lookup wouldn't be global, but instead just for libpython symbols (or any other library). I understand the problems here (enough at least), I think we're just having some kind of communication breakdown with mismatched assumptions of knowledge and how things are working/supposed to work in this situation.

@mingwandroid
Copy link

mingwandroid commented Dec 17, 2020

I'm sorry, I am just pointing out how things are meant to work. Correct linker flags are being made a scapegoat for the actual problem which seems to be in the design of VTK to me.

Our libs shouldn't lie about what they need and over- or under- declare things and that is what these mechanisms achieve. These tricks VTK is doing are not things I would recommend. Software components should be correct and they should stand-alone (within the execution environment and ABI definitions provided by their creators such as that of a Python extension module). The Python upstream developers are also un-ambiguous in their advice. You should not link dynamically to the python interpreter unless you are embedding it (and even then, why not go static?). Here you are embedding it but hoping to use some hacks to make things easier, but they're dangerous and they are wrong.

How does bin/python end up providing the libpython symbols? That's what we need to do at the core

It statically links them into the executable from libpython.a

I'm glad to hear you have dived into the code! But honestly, I believe the initial assumption that there's anything wrong in the flags here that is wrong and that VTK is doing decidedly funky things.

@mathstuf
Copy link
Author

You should not link dynamically to the python interpreter unless you are embedding it (and even then, why not go static?).

It statically links them into the executable from libpython.a

Great, let's do that. We'll need find_package(Python) to return the static library on macOS when creating an executable with an embedded interpreter. (Hopefully -dead_strip_dylibs doesn't strip that too, if it does, then I don't know that the flag is viable.) We're trying to provide the symbols here, but other flags are conflicting and making us not do so.

Here you are embedding it but hoping to use some hacks to make things easier, but they're dangerous and they are wrong.

What "hacks" are you referring to?

  • we do not link to libpython in anything import can reach
  • we want to provide symbols from executables, so we link to libpython

@mingwandroid
Copy link

mingwandroid commented Dec 17, 2020

Please stop talking about 'linking' to anything. Linking exists in 3 contexts in this world: statically, dynamically (auto-loaded) and dynamically (run-time). I guess you are never using run-time, but still, I don't know what you mean.

@mathstuf
Copy link
Author

By "link" here, I mean "adding an LC_LOAD_DYLIB command to the binary". At least in a dynamic world. Static is different and (AFAIK), not really relevant here (we only really care about it on Linux, but is "best effort" on other platforms).

@mingwandroid
Copy link

mingwandroid commented Dec 17, 2020

What "hacks" are you referring to?

we do not link to libpython in anything import can reach

but doesn't import vtk somehow load vtkPythonInterpreter?

we want to provide symbols from executables, so we link to libpython

That is not how you provide symbols from executables on macOS. The operating system already provides those symbols to the executable in the global namespace and the loader of any subsequent dylibs knows them already. This is indeed, what a bundle is in Apple-world, although technically it's just a dylib (AFAIK). In other worlds we call them plugins

@mathstuf
Copy link
Author

but doesn't import vtk somehow load vtkPythonInterpreter?

Yes. And the tool that imported the vtk module provides the libpython symbols.

That is not how you provide symbols from executables on macOS. The operating system already provides those symbols to the executable in the global namespace and the loader of any subsequent dylibs knows them already.

How does the Mach-O interpreter know what "those symbols" are? Where are they provided from?

@mingwandroid
Copy link

mingwandroid commented Dec 17, 2020

From the execution environment. Symbols make it all the way into executables and dylibs and the OS loaded loads and parses them. This happens on all OSes.

@mathstuf
Copy link
Author

OK. So let's figure out how to write a link line to make pvpython do the right thing. Is all we need to replace libpython3.9.dylib with libpython3.9.a? Or is there some other magic we need to do?

@mingwandroid
Copy link

mingwandroid commented Dec 17, 2020

Yes. And the tool that imported the vtk module provides the libpython symbols.

If your module(s) want symbols from the python dylib then have it autoload the dylib (unless you code runtime binding). Don't rely on the executable to provided it. That way it will work in both cases, say from another executable that didn't autoload the python dylib. If the OS loaded encounters the same dylib twice it will do the right thing and not load it twice.

@mingwandroid
Copy link

Go for it! I don't have time beyond what I've already spent!

Cheers.

@mathstuf
Copy link
Author

But that conflicts with bin/python with static symbols provided with it. Something about Py_GetThreadState not finding the current thread.

@mingwandroid
Copy link

mingwandroid commented Dec 17, 2020

No, that's due to the changes in Python initialization around Python 3.8 (AFAIR). You just need to update. But yes, actually, this could get back to the craziness of having two interpreters with the same symbols in the same process address space.

@mingwandroid
Copy link

The point is, a very different Py_SetThreadState got called and it wrote to different memory than the memory that Py_GetThreadState (from the dylib) reads from.

@mathstuf
Copy link
Author

We did the updates for 3.8…or at least we haven't had any problems with 3.8 with our current approach. Where are the docs on what we'd need to change for 3.8?

@mingwandroid
Copy link

These VTK things have been around for as long as I can remember, so I really hope you get it cleaned up!

@mingwandroid
Copy link

Sorry, I've got a lot of work to do.

@mathstuf
Copy link
Author

Well, Anaconda support isn't really the top of my list either. @jschueller I'm afraid further investigation at the moment might be up to you…things work for us as-is and there's the easy solution of dropping the flag available…

@mingwandroid
Copy link

mingwandroid commented Dec 17, 2020

This is conda-forge, not Anaconda. If I'm ever asked to look at this by my company then I will for sure get onto it but vtk didn't scratch any itch for my personal open source needs so i won't be able to spend time helping it on conda-forge, unless it's for quick tips. Rarely are tips quick 🙂

@mathstuf
Copy link
Author

Ah, I'm obviously not well-versed in Anaconda's corner of things other than bug reports we get about it :) . Since conda-forge is a (the?) main way of getting ParaView through conda, it's easy to get it conflated on that side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants