Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build with Clang + MSVC #1607

Open
OwenTrokeBillard opened this issue Jan 9, 2024 · 33 comments · May be fixed by llvm/llvm-project#83052
Open

Build with Clang + MSVC #1607

OwenTrokeBillard opened this issue Jan 9, 2024 · 33 comments · May be fixed by llvm/llvm-project#83052

Comments

@OwenTrokeBillard
Copy link

It would be beneficial for Enzyme to build with Clang + MSVC on Windows. One use case is Rust integration. See EnzymeAD/rust#54.

Enzyme is presently cross-compiled for Windows with Clang + MinGW. With any luck, the only change required is updating CMakeLists.txt for Clang + MSVC compatibility.

@OwenTrokeBillard
Copy link
Author

OwenTrokeBillard commented Jan 9, 2024

I am trying to make Enzyme compatible with Clang + MSVC. There are some stubborn linker errors. It fails to link the LLVM Support library. Does this look familiar to anyone?

Linker error

FAILED: Enzyme/LLVMEnzyme-18.dll
cmd.exe /C "cd . && C:\source\llvm-project\build\bin\clang++.exe -fuse-ld=lld-link -nostartfiles -nostdlib -Wall -fno-rtti  -Werror=unused-variable -Werror=dangling-else -O2 -D_DLL -D_MT -Xclang --dependent-lib=msvcrt  -Wl,--gc-sections -shared -o Enzyme\LLVMEnzyme-18.dll  -Xlinker /MANIFEST:EMBED -Xlinker /implib:Enzyme\LLVMEnzyme-18.lib -Xlinker /pdb:Enzyme\LLVMEnzyme-18.pdb -Xlinker /version:0.0 Enzyme/CMakeFiles/LLVMEnzyme-18.dir/ActivityAnalysis.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/ActivityAnalysisPrinter.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/CApi.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/CacheUtility.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/CallDerivatives.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/DiffeGradientUtils.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/DifferentialUseAnalysis.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/Enzyme.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/EnzymeLogic.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/FunctionUtils.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/GradientUtils.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/InstructionBatcher.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/MustExitScalarEvolution.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/PreserveNVVM.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TraceGenerator.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TraceInterface.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TraceUtils.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/Utils.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/SCEV/ScalarEvolutionExpander.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TypeAnalysis/TypeTree.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TypeAnalysis/TypeAnalysis.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TypeAnalysis/TypeAnalysisPrinter.cpp.obj Enzyme/CMakeFiles/LLVMEnzyme-18.dir/TypeAnalysis/RustDebugInfo.cpp.obj  C:/source/llvm-project/build/lib/LLVMDemangle.lib  -lkernel32 -luser32 -lgdi32 -lwinspool -lshell32 -lole32 -loleaut32 -luuid -lcomdlg32 -ladvapi32 -loldnames  && cd ."
lld-link: warning: ignoring unknown argument '--gc-sections'
lld-link: error: undefined symbol: void __cdecl llvm::deallocate_buffer(void *, unsigned int, unsigned int)
>>> referenced by Enzyme/CMakeFiles/LLVMEnzyme-18.dir/ActivityAnalysis.cpp.obj:(void __cdecl `dynamic atexit destructor for 'InactiveGlobals''(void))
>>> referenced by Enzyme/CMakeFiles/LLVMEnzyme-18.dir/ActivityAnalysis.cpp.obj:(void __cdecl `dynamic atexit destructor for 'MPIInactiveCommAllocators''(void))
>>> referenced by Enzyme/CMakeFiles/LLVMEnzyme-18.dir/ActivityAnalysis.cpp.obj:(void __cdecl `dynamic atexit destructor for 'KnownInactiveFunctionInsts''(void))
>>> referenced 485 more times

(many more)

Steps to reproduce

  1. Build LLVM as outlined in the Enzyme instructions.
  2. Clone Enzyme.
  3. In Enzyme/enzyme/CMakeLists.txt:
    • Delete the -fPIC compiler flag. This is not needed or supported on Windows. (make fpic work on windows #1614)
    • Add cmake_policy(SET CMP0091 NEW). This fixes the C runtime library link mismatch.
    • Add add_compile_definitions(_ALLOW_KEYWORD_MACROS). This fixes the macro redefinition error.
  4. Open Developer PowerShell for VS 2022. It comes with Visual Studio.
  5. Set the compiler to Clang. Your paths will be different.
    $env:CC = "C:\path\to\llvm\bin\clang"
    $env:CXX = "C:\path\to\llvm\bin\clang++"
    $env:RC = "C:\path\to\llvm\bin\llvm-rc"
    
  6. Build Enzyme.
    cd enzyme
    mkdir build
    cd build
    cmake -G Ninja .. -DLLVM_DIR=C:/source/llvm-project/build/lib/cmake/llvm -DCMAKE_BUILD_TYPE:STRING=Release
    ninja
    

@ZuseZ4
Copy link
Member

ZuseZ4 commented Jan 15, 2024

Looks like unfortunately no one is too familiar with this issue. Do you want to join the next Enzyme meeting, so we can all try to debug this together? It would be Wednesday, 16:00 ET on Zoom: https://mit.zoom.us/j/96000853439

@OwenTrokeBillard
Copy link
Author

@ZuseZ4 Sure, it would be my pleasure. Thank you for the invite.

@tthsqe12
Copy link
Contributor

Building a clang that can accept enzyme on windows with MSVC is tricky because not everything that enzyme needs from clang/opt is exported in the export table.

@jvstech
Copy link

jvstech commented Jan 18, 2024

I won't be much help until maybe tomorrow, but after a very cursory glance, you'll have to link against LLVMSupport.lib (which should be handled automatically by LLVM's CMake modules, and you may be doing this already). You may also get better results using clang-cl instead of clang++, but that's tertiary to the issue at hand where a symbol isn't being found due to the support library being excluded from the linking process for some reason.

My apologies for the generic answer. I will be back in less than 24 hours with a (hopefully) better explanation and (again, hopefully), a solution – if no one else has solved it by then.


I'm updating my debug build of Clang+LLVM to 17.0.6. Once that build finishes, I'll clone the repo and take a much closer look. In the meantime, I'll look through the CMakeLists.txt to see if there is anything super obvious that I didn't catch yesterday.

@jvstech
Copy link

jvstech commented Jan 19, 2024

Okay, I think I see what's happening. I'm still building LLVM at the moment, but after taking a closer look at enzyme/Enzyme/CMakeLists.txt:

# on windows `PLUGIN_TOOL` doesn't link against LLVM.dll
if ((WIN32 OR CYGWIN) AND LLVM_LINK_LLVM_DYLIB)
add_llvm_library( LLVMEnzyme-${LLVM_VERSION_MAJOR}
${ENZYME_SRC}
MODULE
DEPENDS
intrinsics_gen
LINK_COMPONENTS
LLVM
)
if (${Clang_FOUND})
add_llvm_library( ClangEnzyme-${LLVM_VERSION_MAJOR}
${ENZYME_SRC} Clang/EnzymeClang.cpp
Clang/EnzymePassLoader.cpp
MODULE
DEPENDS
intrinsics_gen
LINK_COMPONENTS
LLVM
)
target_compile_definitions(ClangEnzyme-${LLVM_VERSION_MAJOR} PUBLIC ENZYME_RUNPASS)
endif()
add_llvm_library( LLDEnzyme-${LLVM_VERSION_MAJOR}
${ENZYME_SRC} Clang/EnzymePassLoader.cpp
MODULE
DEPENDS
intrinsics_gen
LINK_COMPONENTS
LLVM
)

If you're linking against LLVM.dll (which is what the comment indicates), there is a very good chance you're running into a fundamental limitation of PE files: the exports table size is limited to 16 bits, and there are far more than 65536 symbols in LLVM. The one you're missing is likely in that list. Unfortunately, this means you'll have to link directly against LLVM's support library, and maybe even statically.

It should be as simple as adding Support under LINK_COMPONENTS, but I'm not sure how LLVM_LINK_LLVM_DYLIB changes linking logic. LLVMSupport.dll may not exist if LLVM was configured to build LLVM.dll. If it fails, the static library should still be there, and you'll have to link against that.

llvm_map_components_to_libnames(llvmLibs Support)
target_link_libraries(LLVMEnzyme-${LLVM_VERSION_MAJOR} PRIVATE ${llvmLibs})

If that doesn't work, try manually linking without having LLVM "help" you:

target_link_libraries(LLVMEnzyme-${LLVM_VERSION_MAJOR}
  PRIVATE LLVMSupport)
# Or:
target_link_libraries(LLVMEnzyme-${LLVM_VERSION_MAJOR}
  PRIVATE LLVMSupport.lib)

I haven't looked at the Enzyme source code, but if it exposes anything from the support library, you'll have to do PUBLIC linkage instead.


I gave a talk on building LLVM plugins for Windows under Visual Studio. You may not be building directly with MSVC, but even so, you may run into other problems getting add_llvm_library to work with the toolchain -- and I briefly covered that. In my demo repo, I'm using add_library directly -- though the "normal" way would be to use add_llvm_pass_plugin. See lines 1 through 19.

If you do run into problems, you may have to change the function you use for defining your library targets -- specifically for Windows + MSVC.

@OwenTrokeBillard
Copy link
Author

@jvstech Thank you for looking at this issue. The linker errors persist with your suggestions, but your insights are still informative. I found your talk especially helpful and well presented. Your demo repo will also be a great resource.

We investigated LLVM.dll and found that it does not exist. I believe this is because LLVM_LINK_LLVM_DYLIB is not supported on Windows. Linking the libraries statically sounds reasonable, and I believe (or rather hope) this is what the build is attempting to do.

In your presentation you discuss many tricky environmental settings that must be configured correctly. It is plausible that one such issue is causing the linker error. If your environment is working already, it would be a great help if you could attempt to build Enzyme with Clang + MSVC. Though, you have done a great deal already and have given us plenty to work with.

@jvstech
Copy link

jvstech commented Jan 20, 2024

@OwenTrokeBillard Not a problem. Happy to help in any way I can. I wasn't sure if LLVM.dll was a thing, myself. I knew LLVM-C.dll could be built, but obviously, that serves a different purpose.

It took a while (I had to re-configure LLVM with LLVM_ENABLE_PLUGINS), but I got to the error you described. I was able to get past it by

  1. changing MODULE to SHARED in the add_llvm_library definitions, and
  2. adding target_link_libraries(LLVMEnzyme-${LLVM_VERSION_MAJOR} PUBLIC LLVMSupport)

That said, once I got past that missing symbol, there were several other libraries that that had to be added. Additionally, there were symbols the linker insisted were missing (from LLVMSupport.lib, LLVMAnalysis.lib, and LLVMCore.lib, although I believe there were others), yet they were clearly defined in their associated .lib files, which could be verified with dumpbin /symbols. For example, one of the "missing" functions was llvm::Function::BuildLazyArguments. And yet:

─▷ dumpbin /symbols C:\LLVM\src\llvm\build\llvm-release\install\lib\LLVMCore.lib | select-string 'BuildLazyArguments'

<snip>
15F 00000000 SECT4B notype ()    External     | ?BuildLazyArguments@Function@llvm@@AEBAXXZ (private: void __cdecl llvm::Function::BuildLazyArguments(void)const )
<snip>

It's very clearly listed with its defined section and with external linkage. The only thing I can think of is add_llvm_library is doing something else to the build commands (indirectly) to cause those symbols to not be visible by the linker. And that's a very, very rough guess.

In the meantime, I'm going to try adding this function to the CMakeLists.txt file and change all the add_llvm_library uses to it:

function(add_enzyme_library targetName)
    cmake_parse_arguments(ELIB
        ""
        ""
        "LINK_COMPONENTS"
        ${ARGN})
    if ((WIN32 OR MSVC) AND NOT CYGWIN AND NOT MINGW)
        add_library(${targetName} SHARED ${ELIB_UNPARSED_ARGUMENTS})
        if (ELIB_LINK_COMPONENTS)
            llvm_map_components_to_libnames(ENZYME_WINDOWS_LLVM_LIBS ${ELIB_LINK_COMPONENTS})
        endif()
        if (LLVM_LINK_COMPONENTS)
            list(APPEND ELIB_LINK_COMPONENTS ${LLVM_LINK_COMPONENTS})
        endif()
        if (ELIB_LINK_COMPONENTS)
            target_link_libraries(${targetName} PUBLIC ${ENZYME_WINDOWS_LLVM_LIBS})
        endif()
    else()
        add_llvm_library(${targetName} MODULE ${ELIB_UNPARSED_ARGUMENTS})
    endif()
endfunction()

I'll let you know if I get any further.

@jvstech
Copy link

jvstech commented Jan 20, 2024

By the way, you may be interested in this draft pull request with a prototype for building LLVM as DLLs on Windows. It's not done yet, but there is effort being made to get there.

@jvstech
Copy link

jvstech commented Jan 21, 2024

AH HA! I figured it out!

#if LLVM_VERSION_MAJOR >= 16
#define private public
#include "llvm/Analysis/ScalarEvolution.h"
#include "llvm/Transforms/Utils/ScalarEvolutionExpander.h"
#undef private
#else
#include "SCEV/ScalarEvolution.h"
#include "SCEV/ScalarEvolutionExpander.h"
#endif

And, specifically:

#define private public

Do not do this. It's forbidden by the C++ standard because those access specifiers are part of the ABI. I know that's what the _ALLOW_KEYWORD_MACROS definition is for, but unfortunately, that causes this particular error.

The giveaway was running lld-link --verbose during linking. Note the error:

lld-link: error: undefined symbol: public: __cdecl llvm::BasicBlock::BasicBlock(class llvm::LLVMContext &, class llvm::Twine const &, class llvm::Function *, class llvm::BasicBlock *)

However, with verbose linker output, we can also see this:

lld-link: Loaded LLVMCore.lib(BasicBlock.cpp.obj) for private: __cdecl llvm::BasicBlock::BasicBlock(class llvm::LLVMContext &, class llvm::Twine const &, class llvm::Function *, class llvm::BasicBlock *)

The only difference is that the found symbol is private: while the missing symbol is public:.

I've removed the #define private public and #undef private code for further testing. I'm assuming there's a reason for doing it (and I assume that's to have access to private class state), but there must be a better work-around.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Jan 21, 2024

Thank you a lot for helping with this, having MSCV support will save us from workarounds on the Rust side.

@wsmoses Do you remember why this line was added?

@wsmoses
Copy link
Member

wsmoses commented Jan 21, 2024 via email

@wsmoses
Copy link
Member

wsmoses commented Jan 21, 2024 via email

@wsmoses
Copy link
Member

wsmoses commented Jan 21, 2024 via email

@OwenTrokeBillard
Copy link
Author

@jvstech Thank you for thoroughly investigating this. I never would have suspected redefining access specifiers was the culprit. You have saved me weeks of toiling.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Jan 21, 2024

Just building a patched LLVM+Enzyme to make sure there are no other issues.

@jvstech
Copy link

jvstech commented Jan 21, 2024

Looks like the problem is that the exit limit type in scalar evolution is marked private in LLVM but used by us. This should hopefully be a simple patch to llvm upstream.

There are more issues than just that. There are several direct uses of private member variables and functions.

C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(123,18): error: 'DT' is a private member of 'llvm::ScalarEvolution'
  123 |   if (!Latch || !DT.dominates(ExitingBlock, Latch))
      |                  ^
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(176,18): error: 'howFarToZero' is a private member of 'llvm::ScalarEvolution'
  176 |   ExitLimit EL = howFarToZero(getMinusSCEV(LHS, RHS), L, ControlsOnlyExit);
      |                  ^
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(361,10): error: 'computeExitCountExhaustively' is a private member of 'llvm::ScalarEvolution'
  361 |   return computeExitCountExhaustively(L, ExitCond, ExitIfTrue);
      |          ^
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(500,20): error: 'howManyGreaterThans' is a private member of 'llvm::ScalarEvolution'
  500 |     ExitLimit EL = howManyGreaterThans(LHS, RHS, L, IsSigned, ControlsExit,
      |                    ^
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(515,10): error: 'computeShiftCompareExitLimit' is a private member of 'llvm::ScalarEvolution'
  515 |   return computeShiftCompareExitLimit(ExitCond->getOperand(0),
      |          ^
C:\Users\jsmith\source\repos\Enzyme\enzyme\Enzyme\MustExitScalarEvolution.cpp(959,9): error: 'canIVOverflowOnLT' is a private member of 'llvm::ScalarEvolution'
  959 |     if (canIVOverflowOnLT(RHS, Stride, IsSigned) && !isUBOnWrap())
      |         ^

and so on.

This is one of those situations where I would make a copy of the required source and mark those members protected: until a patch can be made to expose equivalent logic in LLVM proper. You won't be able to use the same names directly lest you end up with symbol collisions, but that can be worked around by either renaming the needed classes/structs/free functions, or wrapping everything in a custom namespace { ... }. It'll require changes to MustExitScalarEvolution regardless.

It's not perfect, but it's a sterile bandage for the time being.

@jvstech
Copy link

jvstech commented Jan 21, 2024

@jvstech Thank you for thoroughly investigating this. I never would have suspected redefining access specifiers was the culprit. You have saved me weeks of toiling.

Not a problem. It's been a long weekend and I've had some spare cycles, so I'm happy to help.

@wsmoses
Copy link
Member

wsmoses commented Jan 21, 2024

Oh darn, yeah I think we'd just want to replace all private linkages with protected there, which also hopefully wouldn't be that bad.

However the long term solution is that MustExitScalarEvolution should just be upstreamed.

@skewballfox
Copy link

I'm think I fixed this but could someone with a windows system test this out to confirm?
I made the (very minor) required change to LLVM in this branch, and the changes to the cmake txt listed in comment here

I'm testing on fedora with

cmake .. -GNinja -DLLVM_DIR=$HOME/Workspace/llvm-project/build/lib/cmake/llvm -DLLVM_EXTERNAL_LIT=(which lit) -DCMAKE_CPP_COMPILER=clang-cl

clang-cl is supposed to be compatible with MSVC, so I expect it should build. let me know if it doesn't, or if you need more information on my end

@tgymnich
Copy link
Member

@skewballfox I personally don’t have access to a windows machine. However GitHub actions offer windows runners.

@ZuseZ4
Copy link
Member

ZuseZ4 commented Feb 26, 2024

@jedbrown iirc you had a windows machine, if you have some time, could you verify this builds please? Let's try without Rust for now, that should be easy to add later.

@wsmoses
Copy link
Member

wsmoses commented Feb 26, 2024

cc @matinraayai who was starting to look at some of the linkage upstreaming

@wsmoses
Copy link
Member

wsmoses commented Feb 26, 2024

@skewballfox once the linkages are fixed in upstream LLVM, the correct fix to Enzyme is to remove the define private public things.

@skewballfox
Copy link

once the linkages are fixed in upstream

is there an issue I can look at to get a bit more context? sorry new to contributing to llvm/enzyme

@ZuseZ4
Copy link
Member

ZuseZ4 commented Feb 26, 2024

once the linkages are fixed in upstream

is there an issue I can look at to get a bit more context? sorry new to contributing to llvm/enzyme

https://github.com/EnzymeAD/Enzyme/pull/1619/files

@wsmoses
Copy link
Member

wsmoses commented Feb 26, 2024

No there isn't an issue (besides this one), but essentially your LLVM commit needs to be merged upstream into LLVM. Supposing your fix exists in LLVM, the correct thing to do in Enzyme is to remove these defines (

#define private public
). Or rather, the correct thing to do is to version gate them so the defines only happen on a version of LLVM without the fix

@skewballfox
Copy link

skewballfox commented Feb 26, 2024

No there isn't an issue (besides this one), but essentially your LLVM commit needs to be merged upstream into LLVM.

I submitted a PR with the one line change to LLVM. I'll update you if the status changes

@skewballfox
Copy link

Hey, anyone know of anything projects downstream of enzyme making use of MustExitScalarEvolution? I upstreamed the code from the function overrides into LLVM, and it passes LLVM's existing test, but MESE was never tested enzyme side

@skewballfox
Copy link

So I started looking throught the commit history and associated issues for MustExitScalarEvolution.cpp to try to find code where this is being called. Can Anyone explain how @turbo is leading to mustexit being called in #1691 ?

Alternatively, anyone have ideas, suggestions, or links that could be useful for writing test for MESE's code?

@ZuseZ4
Copy link
Member

ZuseZ4 commented Mar 7, 2024

@turbo comes from the Julia side, I wouldn't try running tests from Julia.
@wsmoses Can you help him here or on the llvm pr to get this code tested? ripgrep didn't reveal any meaningful tests tailored towards this code

@wsmoses
Copy link
Member

wsmoses commented Mar 7, 2024

I'll try to take a look in a few days, but as you saw we didn't have any explicit unit tests for the new functionality so you're going to need to write them by hand

@wsmoses
Copy link
Member

wsmoses commented Mar 7, 2024

Probably that involves forking the existing SE tests that I referenced in the LLVM PR, adding a flag to parse the must exit functionality, and then testing the new functionality behaves as expected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants