-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor k-rule-apply
tool
#1110
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Baltoli
force-pushed
the
refactor-rule-apply
branch
from
July 18, 2024 21:20
a10941c
to
b38d730
Compare
gtrepta
approved these changes
Jul 18, 2024
Robertorosmaninho
approved these changes
Jul 18, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for the clean up! The code look much better now!
rv-jenkins
pushed a commit
that referenced
this pull request
Jul 19, 2024
This PR fixes two subtle, related issues that are blocking updates from going through downstream in the Kasmer project. At a high level, the issues are: - Flat namespace linking on macOS produces incorrect symbol lookups in dynamic libraries. - #1097 misses a subtle edge case related to tail-call optimisation. The actual code changes required are small, but warrant some detailed explanation. ## Flat Namespaces For a long time, macOS has implemented a system known as _two-level_ namespaces, whereby undefined symbol names in a dynamic library are prefixed with the name of the library in which the loader expects to be able to find them at run-time. This is a conservative behaviour; even if a symbol with the same name exists in a different library, it won't be selected. For example, the dynamic libraries built by `llvm-kompile` in `c` mode link against `libgmp`. Two-level namespaces produce dynamic symbol tables that look like: ```console $ dyld_info test/c/Output/flat-namespace.kore.tmp.dir/libtest.so -symbolic_fixups | grep gmpz_clear +0x2B28 bind pointer libgmp.10.dylib/___gmpz_clear ``` This behaviour is different to Linux, which does not have a notion of two-level namespaces. For legacy compatibility purposes, Apple supply a linker flag `-flat_namespace` that behaves more similarly to Linux behaviour. Its use is discouraged in new code, but we had enabled it to work around an issue in the Python bindings (python/cpython#97524) that should be fixed in a future CPython / macOS combination.[^1] When enabled, the symbol table looks something like this for the same example: ```console $ dyld_info test/c/Output/flat-namespace.kore.tmp.dir/libtest.so -symbolic_fixups | grep gmpz_clear +0x2EE8 bind pointer flat-namespace/___gmpz_clear ``` As a consequence of this, if the symbol `___gmpz_clear` exists in multiple dynamic libraries loaded by the same process, then the order in which they will be selected by the dynamic loader is not clearly well-defined,[^2] and when it's referenced we could end up loading either the correct or the incorrect symbol. This caused the initial bug observed as follows:[^3] - The Haskell backend statically links the `kore-rpc-booster` executable against `libgmp`, meaning that some GMP symbols appear in that binary. - The backend compiles shared libraries that dynamically link against `libgmp`. - `kore-rpc-booster` dynamically loads one of these libraries, and when resolving symbols to load, the flat namespace environment selects the static version for some and the dynamic version for others. - A call to `__gmpz_clear` from a backend hook ends up referencing the statically linked symbol, rather than the dynamically linked version. Generally, I think this situation is harmless - GMP is very stable and it's plausible that doing this for most symbols is not observable. - However, the dynamically-linked GMP library has been set up to use the KORE memory management functions. When the static version is called, it tries to `free()` a pointer allocated by the backend's GC, and crashes. The fix for this issue is to drop our usage of `-flat_namespace` for C shared libraries compiled by the backend. This breaks a few places we were relying on the old (incorrect) behaviour in the presence of C++ RTTI; having multiple instances of identically-named typeinfo symbols in a process is known to be broken there: - `libunwind` is actually implicitly linked via the macOS system library; if we explicitly link it as well, then code that handles exceptions will break. - The `k-rule-apply` tool linked two copies of the KORE AST library, causing `dynamic_cast` to break. #1110 addresses this. ## Tail-Call Optimisation In #1097, we made some changes that explicitly mark K functions as `musttail` when we know they're tail recursive. In doing so, we removed the need to use the `-tailcallopt` flag in most cases. However, the change in that PR missed that as well as IR-level transformations, `-tailcallopt` sets a lower-level flag in the backend[^4] code generator that guarantees tail-call code generation. For large programs, this meant I could observe stack overflows when traversing large terms. The fix is just to enforce that this internal option gets set properly; doing so is just a restoration of the behaviour we got from `-tailcallopt` before. [^1]: But isn't yet fixed, unfortunately - the underlying bug is still present on my system. Should be revisited in the future, ideally! [^2]: It might be defined somewhere, but the initial manifestation of this bug appeared in an apparently unrelated commit, so I think we were just getting lucky previously. The fix in this PR is morally correct whether or not things worked accidentally beforehand. [^3]: I intend to write this up fully later in a separate issue. [^4]: As in the X86 or arm backend of LLVM itself.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR is a preparatory refactoring that will enable me to fix a problematic bug in the Kasmer project; I've factored out the changes here because they include other related cleanups not directly relevant to the bug I'm fixing in the subsequent PR.
Currently, the
k-rule-apply
tool is explicitly linked against the KORE AST library in order for it to be able to parse a KORE definition and search for an axiom with a particular label. This works fine, but on changing the way that the LLVM backend is linked on macOS, this AST linkage caused a subtle bug to manifest because of the way thatk-rule-apply
usesdlopen
to inspect a shared library compiled by the backend.1The fix here is to remove the explicit dependency of this tool on the AST + Parser libraries, and instead push all the required behaviour of the tool into the C bindings so that it can be used fully with
dlopen
.The changes in this PR are as follows:
k-rule-apply
tool to be more hygienic in its usage of header files etc.dlopen
ing.Footnotes
I have a full writeup of this on the way; in the context of this PR the specifics aren't important. ↩