Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows Installation Issue #124

Open
vbalmer opened this issue Apr 30, 2024 · 31 comments · May be fixed by #137
Open

Windows Installation Issue #124

vbalmer opened this issue Apr 30, 2024 · 31 comments · May be fixed by #137
Assignees
Labels
bug Something isn't working hackathon help wanted Extra attention is needed

Comments

@vbalmer
Copy link

vbalmer commented Apr 30, 2024

I am rather new to the Fortran environment in Windows and have encountered some trouble during installation. Even though I have followed the instructions provided on the Website for Windows (https://cambridge-iccs.github.io/FTorch/page/troubleshooting.html), the following error consistently shows up when running the "cmake --build ." command:

  1. “fatal error LNK1181: cannot open input file ‘mkl_blacs_ilp64.lib’” (see attached screenshot) . Apparently I am not the only person with this issue (https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-oneAPI-Base-Toolkit-v-2024-1-Error-mkl-blacs-ilp64-lib/m-p/1587668#M3636), but also changing the environment variables didn’t do the job for me. For your understanding I have attached the full command lines that I have used for installation as txt to this E-Mail. I have removed this error with a very messy cleanup by copying and renaming that specific dll file in the intel compiler to test the further workflow, where I received more different errors (see 2.).
  2. When I try to run the Simplenet example the following errors show up: unresolved external symbol FTORCH_mp_TORCH_TENSOR_FROM_ARRAY_REAL32_1D referenced in function MAIN__
    (and additional functions see second image attached). Is this error related to the first? Where is this variable defined?

The following changes to the installation have also been tried out but were not successful:

  1. running
    cmake .. -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release-DCMAKE_PREFIX_PATH="C:\Users\vbalmer\Anaconda3\envs\Py-F-02\Lib\site-packages\torch" -DCMAKE_INSTALL_PREFIX="C:\Users\vbalmer\AppData\Local\FTorch"
    --> doesn't successfully run cmake command
  2. including the old Fortran compiler ifort instead of ifx --> leads to same error as outlined above

Thank you very much for your help in advance!

Error_Screenshot (005)
BuildInstructions_Clean.txt
Error_SimpleNet_Screenshot

@jatkinson1000
Copy link
Member

Hi @vbalmer

Thanks for bringing this to our attention and trying those simple suggestions.

I have now spun up a Windows VM and successfully replicated the problem you are having (and also found an opportunity to improve some of our documentation in #123) which is the first step.

I'll work on finding a solution and get back to you once we have more news.
If you make any progress in the meantime please do let us know!

@jatkinson1000 jatkinson1000 self-assigned this May 1, 2024
@jatkinson1000 jatkinson1000 added bug Something isn't working help wanted Extra attention is needed labels May 1, 2024
@jatkinson1000
Copy link
Member

Adding notes:

It seems to be that there is an issue locating mkl correctly (math kernel library used for mathematical operations).

This source suggests that it could be because calling the setvars.bat sets a number of environment variables, including CMAKE_PREFIX_PATH required for linking, but we are perhaps the overwriting this when we set -DCMAKE_PREFIX_PATH in the cmake command.

To check this I have tried running CMake with -DTorch_DIR=<path> instead of -DCMAKE_PREFIX_PATH=<path>, but still get the same error.

I tried setting the MKL_ROOT environment variable, but no luck as @vbalmer suggested above:

set MKL_ROOT="C:/Program Files (x86)/Intel/oneAPI/mkl/latest"

I also tried setting it to C:/Program Files (x86)/Intel/oneAPI/mkl which is the actual location on my machine. Still no luck.
Interestingly, however CMake does output a warning saying it is ignoring this variable for compatibility with policy CMP0074:

CMake Warning (dev) at C:/Users/Test/Downloads/libtorch-win-shared-with-deps-2.3.0+cpu/libtorch/share/cmake/Caffe2/public/mkl.cmake:1 (find_package):
  Policy CMP0074 is not set: find_package uses <PackageName>_ROOT variables.
  Run "cmake --help-policy CMP0074" for policy details.  Use the cmake_policy
  command to set the policy and suppress this warning.

  Environment variable MKL_ROOT is set to:

    "C:/Program Files (x86)/Intel/oneAPI/mkl"

  For compatibility, CMake is ignoring the variable.

It seems that despite setting CMake is ignoring the location to link from, so next steps might be to try hadr-coding to see if we can force a link for success at all.

I am running cmake with the --trace flag for more output, but this is a bit too much.
Running cmake build with verbose is helpful, however:

cmake --build . --verbose

@jwallwork23 these may be useful as a starting point if you get chance to look.

@jatkinson1000
Copy link
Member

jatkinson1000 commented May 1, 2024

Related issues andf PRs on the PyTorch repo:

Some suggest that it is an issue with the FindMKL file supplied with pytorch/cmake/
Modules/FindMKL.cmake

Tried with the nightly build of libtorch to see if there has been a recent fix, but still the same issue.

Tried setting -DCMAKE_FIND_PACKAGE_PREFER_CONFIG variable to try and force CMake to use the native FindMKL.cmake file rather than the one provided by PyTorch but the issue persists.

It may be that we consider opening an issue on the PyTorch repository as this seems to perhaps be an issue with libtorch.

@vbalmer
Copy link
Author

vbalmer commented May 6, 2024

Hi Jack,
thank you very much for looking into this problem in more detail. Until now I have not managed to fix the bug even with trying out different ways of setting the environment variables. I also have to admit that this is a bit out of my familiar programming zone...
Have you opened an issue on the PyTorch repository yet or found anything else? I would be eager to also follow the progress on the pytorch issue if you open one.
Thank you for your help!

@jatkinson1000
Copy link
Member

Hi @vbalmer I'm afraid not - I was away presenting at a conference for the last 3 days, and am now on leave.
I will look to do it once I am back however.

I want to get this sorted, but in the meantime, are you bound to building your code using Windows, or is there an option for you to build on a linux system, or using the Windows Subsystem for Linux?

@jatkinson1000
Copy link
Member

OK @vbalmer, I've opened something here: pytorch/pytorch#125871 as you can see linked above.
Let's see what they say.

In the meantime I would still encourage WSL or a Linux build if possible.
Are you able to tell us what the code you are trying to add FTorch to is, or point us to some source?

@vbalmer
Copy link
Author

vbalmer commented May 11, 2024

Hi Jack,
similar for me, I was also at a conference the last three days and will be on vacation next week. Still I would like to answer your questions:

  1. I'm afraid the windows system is a hard boundary condition, so if it does not work I will maybe have to find a workaround without FTorch, which would be very sad though...
  2. For the code that I am trying to use FTorch with, I will send you an e-mail with more detailed information as there is no repository that would be publically available yet.
    Thank you very much for setting up the issue on pytorch and for continuing to help me with this!

@jatkinson1000
Copy link
Member

@vbalmer PyTorch have provided a solution to the first part of your problem on the issue here: pytorch/pytorch#125871 (comment). It requires changing line 9 of the following file in your libtorch installation <path-to-libtorch>\libtorch\share\cmake\Caffe2\public\mkl.cmake to:

target_link_libraries(caffe2::mkl INTERFACE MKL::MKL)

Please let me know if this works in allowing you to build and install FTorch and I will let them know (I have checked and it fixes the problem for me).

There is a follow on issue when I try to build the first example whereby it seems unable to match the type of the tensor_layout argument.
Running the example 'as-is' results in:

C:\Users\Test\Documents\FTorch\examples\1_SimpleNet\simplenet_infer_fortran.f90(40): error #6284: There is no matching specific function for this generic function reference.   [TORCH_TENSOR_FROM_ARRAY]
   in_tensors(1) = torch_tensor_from_array(in_data, tensor_layout, torch_kCPU, -1, .false.)
-------------------^
C:\Users\Test\Documents\FTorch\examples\1_SimpleNet\simplenet_infer_fortran.f90(41): error #6284: There is no matching specific function for this generic function reference.   [TORCH_TENSOR_FROM_ARRAY]
   out_tensor = torch_tensor_from_array(out_data, tensor_layout, torch_kCPU, -1, .false.)
----------------^
compilation aborted for C:\Users\Test\Documents\FTorch\examples\1_SimpleNet\simplenet_infer_fortran.f90 (code 1)
NMAKE : fatal error U1077: 'C:\PROGRA~2\Intel\oneAPI\compiler\latest\bin\ifort.exe @C:\Users\Test\AppData\Local\Temp\nm3C0A.tmp' : return code '0x1'
Stop.

However, if I change the function call in resnet_infer_simplenet.f90 to explicitly target torch_tensor_from_array_real32_1d() rather than the interface I get:

C:\Users\Test\Documents\FTorch\examples\1_SimpleNet\simplenet_infer_fortran.f90(40): error #6633: The type of the actual argument differs from the type of the dummy argument.   [TENSOR_LAYOUT]
   in_tensors(1) = torch_tensor_from_array_real32_1d(in_data, tensor_layout, torch_kCPU, -1, .false.)
--------------------------------------------------------------^
C:\Users\Test\Documents\FTorch\examples\1_SimpleNet\simplenet_infer_fortran.f90(41): error #6633: The type of the actual argument differs from the type of the dummy argument.   [TENSOR_LAYOUT]
   out_tensor = torch_tensor_from_array_real32_1d(out_data, tensor_layout, torch_kCPU, -1, .false.)
------------------------------------------------------------^
compilation aborted for C:\Users\Test\Documents\FTorch\examples\1_SimpleNet\simplenet_infer_fortran.f90 (code 1)
NMAKE : fatal error U1077: 'C:\PROGRA~2\Intel\oneAPI\compiler\latest\bin\ifort.exe @C:\Users\Test\AppData\Local\Temp\nm45A.tmp' : return code '0x1'
Stop.

Which implies a type mismatch between the integer-array type used in the example and what it is looking to bind to.

It is odd that this only occurs on Windows (I have just checked and it runs fine on unix-based OSX).
Please could you take a look @TomMelt as it initially shows up as a complaint in the interface which you will know best.

I did note that there are a number of warnings when compiling FTorch against libtorch regarding conversion from 'size_t' to 'int' which are likely related. I have asked for advice on the issue we have open on PyTorch as this only seems to occur on Windows systems.

C:\Users\Test\Documents\FTorch\src\build_latest>cmake --build .
[ 33%] Building Fortran object CMakeFiles/ftorch.dir/ftorch.f90.obj
[ 66%] Building CXX object CMakeFiles/ftorch.dir/ctorch.cpp.obj
ctorch.cpp
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\optional(82): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\optional(82): note: the template instantiation context (the oldest one first) is
C:\Users\Test\Downloads\libtorch-win-shared-with-deps-latest\libtorch\include\ATen/core/function_schema.h(437): note: see reference to function template instantiation 'std::optional<int32_t>::optional<const I,0>(_Ty2 &&) noexcept' being compiled
        with
        [
            I=unsigned __int64,
            _Ty2=unsigned __int64
        ]
C:\Users\Test\Downloads\libtorch-win-shared-with-deps-latest\libtorch\include\ATen/core/function_schema.h(437): note: see the first reference to 'std::optional<int32_t>::optional' in 'c10::FunctionSchema::argumentIndexWithName'
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\optional(248): note: see reference to function template instantiation 'std::_Optional_construct_base<_Ty>::_Optional_construct_base<const unsigned __int64>(std::in_place_t,const unsigned __int64 &&)' being compiled
        with
        [
            _Ty=int32_t
        ]
C:\Users\Test\Documents\FTorch\src\ctorch.cpp(262): note: see reference to function template instantiation 'std::_Optional_destruct_base<_Ty,true>::_Optional_destruct_base<const unsigned __int64>(std::in_place_t,const unsigned __int64 &&) noexcept' being compiled
        with
        [
            _Ty=int32_t
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xutility(255): warning C4267: 'initializing': conversion from 'size_t' to '_Ty', possible loss of data
        with
        [
            _Ty=int
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xutility(255): note: the template instantiation context (the oldest one first) is
C:\Users\Test\Downloads\libtorch-win-shared-with-deps-latest\libtorch\include\torch/csrc/dynamo/compiled_autograd.h(439): note: see reference to function template instantiation '_Ty &std::vector<_Ty,std::allocator<_Ty>>::emplace_back<size_t&>(size_t &)' being compiled
        with
        [
            _Ty=int32_t
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\vector(862): note: see reference to function template instantiation '_Ty &std::vector<_Ty,std::allocator<_Ty>>::_Emplace_one_at_back<size_t&>(size_t &)' being compiled
        with
        [
            _Ty=int32_t
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\vector(780): note: see reference to function template instantiation '_Ty &std::vector<_Ty,std::allocator<_Ty>>::_Emplace_back_with_unused_capacity<size_t&>(size_t &)' being compiled
        with
        [
            _Ty=int32_t
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\vector(795): note: see reference to function template instantiation 'void std::_Construct_in_place<int,size_t&>(_Ty &,size_t &) noexcept' being compiled
        with
        [
            _Ty=int
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xmemory(727): warning C4267: 'initializing': conversion from 'size_t' to '_Objty', possible loss of data
        with
        [
            _Objty=int
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xmemory(727): note: the template instantiation context (the oldest one first) is
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\vector(783): note: see reference to function template instantiation 'int *std::vector<int32_t,std::allocator<int>>::_Emplace_reallocate<size_t&>(int *const ,size_t &)' being compiled
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\vector(834): note: see reference to function template instantiation 'void std::_Default_allocator_traits<_Alloc>::construct<_Ty,size_t&>(_Alloc &,_Objty *const ,size_t &)' being compiled
        with
        [
            _Alloc=std::allocator<int>,
            _Ty=int,
            _Objty=int
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\utility(250): warning C4267: 'initializing': conversion from 'size_t' to '_Ty1', possible loss of data
        with
        [
            _Ty1=int
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\utility(250): note: the template instantiation context (the oldest one first) is
C:\Users\Test\Downloads\libtorch-win-shared-with-deps-latest\libtorch\include\torch/csrc/dynamo/compiled_autograd.h(433): note: see reference to function template instantiation '_Ty &std::vector<_Ty,std::allocator<_Ty>>::emplace_back<size_t&,int&>(size_t &,int &)' being compiled
        with
        [
            _Ty=std::pair<int,int>
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\vector(862): note: see reference to function template instantiation '_Ty &std::vector<_Ty,std::allocator<_Ty>>::_Emplace_one_at_back<size_t&,int&>(size_t &,int &)' being compiled
        with
        [
            _Ty=std::pair<int,int>
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\vector(780): note: see reference to function template instantiation '_Ty &std::vector<_Ty,std::allocator<_Ty>>::_Emplace_back_with_unused_capacity<size_t&,int&>(size_t &,int &)' being compiled
        with
        [
            _Ty=std::pair<int,int>
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\vector(795): note: see reference to function template instantiation 'void std::_Construct_in_place<std::pair<int,int>,size_t&,int&>(_Ty &,size_t &,int &) noexcept' being compiled
        with
        [
            _Ty=std::pair<int,int>
        ]
C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\include\xutility(255): note: see reference to function template instantiation 'std::pair<int,int>::pair<size_t&,int&,0>(_Other1,_Other2) noexcept' being compiled
        with
        [
            _Other1=size_t &,
            _Other2=int &
        ]
[100%] Linking CXX shared library ftorch.dll
[100%] Built target ftorch

@vbalmer
Copy link
Author

vbalmer commented May 22, 2024

Thanks Jack! This installation is now also working for me 😊 A few elaborations on it below:

  • I was also able to build the library by including your suggested change to the mkl.cmake file in libtorch
C:\Users\vbalmer\Documents\GitHub\FTorch\src\build>cmake --build .
[ 33%] Building Fortran object CMakeFiles/ftorch.dir/ftorch.f90.obj
[ 66%] Building CXX object CMakeFiles/ftorch.dir/ctorch.cpp.obj
[100%] Linking CXX shared library ftorch.dll
[100%] Built target ftorch
  • When I try to run the same example that you also tried (SimpleNet), I receive the following error (which does not seem to be exactly the same one as you get…). Do you have any idea on how to resolve this issue or know whether it might be related to the one you showed above?
C:\Users\vbalmer\OneDrive\Dokumente\GitHub\research_vera\02_Computations\05_PipelineTesting\02_ts2F\Trial_SimpleNet\build>cmake --build .
[ 50%] Building Fortran object CMakeFiles/simplenet_infer_fortran.dir/simplenet_infer_fortran.f90.obj
[100%] Linking Fortran executable simplenet_infer_fortran.exe
LINK: command "C:\PROGRA~2\Intel\oneAPI\compiler\latest\bin\ifx.exe /nologo @CMakeFiles\simplenet_infer_fortran.dir\objects1.rsp /Qoption,link,/machine:x64 /Qoption,link,/INCREMENTAL:NO /Qoption,link,/subsystem:console C:\Users\vbalmer\AppData\Local\FTorch\lib\ftorch.lib user32.lib /link /out:simplenet_infer_fortran.exe /implib:simplenet_infer_fortran.lib /pdb:C:\Users\vbalmer\OneDrive\Dokumente\GitHub\research_vera\02_Computations\05_PipelineTesting\02_ts2F\Trial_SimpleNet\build\simplenet_infer_fortran.pdb /version:0.0 /MANIFEST:EMBED,ID=1" failed (exit code 1120) with the following output:
simplenet_infer_fortran.f90.obj : error LNK2019: unresolved external symbol FTORCH_mp_TORCH_TENSOR_FROM_ARRAY_REAL32_1D referenced in function MAIN__
simplenet_infer_fortran.f90.obj : error LNK2019: unresolved external symbol FTORCH_mp_TORCH_MODULE_LOAD referenced in function MAIN__
simplenet_infer_fortran.f90.obj : error LNK2019: unresolved external symbol FTORCH_mp_TORCH_MODULE_FORWARD referenced in function MAIN__
simplenet_infer_fortran.f90.obj : error LNK2019: unresolved external symbol FTORCH_mp_TORCH_MODULE_DELETE referenced in function MAIN__
simplenet_infer_fortran.f90.obj : error LNK2019: unresolved external symbol FTORCH_mp_TORCH_TENSOR_DELETE referenced in function MAIN__
simplenet_infer_fortran.exe : fatal error LNK1120: 5 unresolved externals
NMAKE : fatal error U1077: '"C:\Program Files (x86)\CMake\bin\cmake.exe" -E vs_link_exe --intdir=CMakeFiles\simplenet_infer_fortran.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests -- C:\PROGRA~2\Intel\oneAPI\compiler\latest\bin\ifx.exe /nologo @CMakeFiles\simplenet_infer_fortran.dir\objects1.rsp @C:\Users\vbalmer\AppData\Local\Temp\nm5F13.tmp' : return code '0xffffffff'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\nmake.exe" -s -f CMakeFiles\simplenet_infer_fortran.dir\build.make /nologo -SL                 CMakeFiles\simplenet_infer_fortran.dir\build' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\nmake.exe" -s -f CMakeFiles\Makefile2 /nologo -LS                 all' : return code '0x2'
Stop.

I had previously built the required files with the commands below, maybe there’s something missing there?
It would be great if the README for the SimpleNet could be amended by a line of code, that states how to build the files on a Windows system (i.e. including the -G “NMake Makefiles” command and potential additional links to the compiler). I can also push this change to the repo if you’d like me to.

mkdir build
cd build
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
cmake .. -G "NMake Makefiles" -DCMAKE_Fortran_COMPILER="C:\Program Files (x86)\Intel\oneAPI\compiler\latest\bin\ifx.exe" -DCMAKE_PREFIX_PATH="C:\Users\vbalmer\AppData\Local\FTorch" -DCMAKE_BUILD_TYPE=Release	

  • As a small sidenote: The line I needed to change in the libtorch folder (in the mkl.cmake file) concerned line 8 for me, not line 9

  • Another idea: As mantaionut pointed out in the pytorch issue (Error linking to libtorch (specifically mkl) on Windows with OneAPI and Visual Studio pytorch/pytorch#125871), the problem is likely linked to the new version of mkl (2024.1), which is based on the newest distribution of oneAPI. As an intermediary fix, would it maybe be an idea to advise people not to download the newest but rather a 2023 version of the oneAPI?

Thank you so much for your help with this! I hope that we'll soon be able to fix this such that FTorch works for a Windows machine as well.

@vbalmer
Copy link
Author

vbalmer commented May 30, 2024

Hi @jatkinson1000
since I have not heard back from you in a while, I was wondering whether you have already figured out a solution to this problem? I would still be highly interested in figuring out a way to make this work. Thanks for getting back to me!

@jatkinson1000
Copy link
Member

I was on leave last week but am back now (though will be at PASC in Zurich next week).

One suggestion based on your comments would be to try the 2023 release of OneAPI to see if the process is smoother.
If so, then we can definitely update the Windows installation guidance to direct people to use this version.

The other thing, which may or may not help would be to try using the ifort compiler rather than ifx.
We have tested against gfortran and ifort, but not the ifx generation (though we would hope to have it working with that as well).

As for improving guidance for Windows this is certainly welcome and please do feel free to open a pull request with changes.
I had already started something in #123 based on this discussion, so either there or as a separate contribution is more than welcome!

@vbalmer
Copy link
Author

vbalmer commented May 30, 2024

Hi @jatkinson1000

thanks for your fast reply!

It would be great to know if once you are back from vacation / conferences, you plan on having a look at this problem again, if there still is no fix for the issue yet. Thanks again a lot in advance for helping!

@jatkinson1000
Copy link
Member

@vbalmer Yes, I would very much like to see this fixed and working on Windows, and used in more projects.

I and my colleagues have been looking for a solution, but it takes some time as Windows is not our main development environment.
Your help in this issue pointing us in the right direction is very much appreciated, and as I say contributions are always welcome to improve the experience for future users.

I would also note that the maintenance of FTorch is done in addition to/above our full-time work so sometimes these things can take some time, especially when they are unfamiliar. I very much hope we can work something out. My colleagues are also looking at this when they can and I will do my best to push this forward on my return.

@vbalmer
Copy link
Author

vbalmer commented May 31, 2024

@jatkinson1000 Okay, perfect that is good to hear! Let me know if you want to meet up at ETH on Monday when you're at PASC. And in the mean time I will for sure give an update if I find something noteworthy!

@vbalmer
Copy link
Author

vbalmer commented Jun 3, 2024

Hi @jatkinson1000
as a quick update, sadly I receive the same error as already shown above when I try with ifort:

cd C:\Users\vbalmer\OneDrive\Dokumente\GitHub\research_vera\02_Computations\05_PipelineTesting\02_ts2F\Trial_SimpleNet
mkdir build
cd build
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"
cmake .. -G "NMake Makefiles" -DCMAKE_Fortran_COMPILER="C:\Program Files (x86)\Intel\oneAPI\compiler\latest\bin\ifort.exe" -DCMAKE_PREFIX_PATH="C:\Users\vbalmer\AppData\Local\FTorch" -DCMAKE_BUILD_TYPE=Release

leads to

[ 50%] Building Fortran object CMakeFiles/simplenet_infer_fortran.dir/simplenet_infer_fortran.f90.obj
ifort: remark #10448: Intel(R) Fortran Compiler Classic (ifort) is now deprecated and will be discontinued late 2024. Intel recommends that customers transition now to using the LLVM-based Intel(R) Fortran Compiler (ifx) for continued Windows* and Linux* support, new language support, new language features, and optimizations. Use '/Qdiag-disable:10448' to disable this message.
[100%] Linking Fortran executable simplenet_infer_fortran.exe
LINK: command "C:\PROGRA~2\Intel\oneAPI\compiler\latest\bin\xilink.exe /nologo @CMakeFiles\simplenet_infer_fortran.dir\objects1.rsp /out:simplenet_infer_fortran.exe /implib:simplenet_infer_fortran.lib /pdb:C:\Users\vbalmer\OneDrive\Dokumente\GitHub\research_vera\02_Computations\05_PipelineTesting\02_ts2F\Trial_SimpleNet\build\simplenet_infer_fortran.pdb /version:0.0 /machine:x64 /INCREMENTAL:NO /subsystem:console C:\Users\vbalmer\AppData\Local\FTorch\lib\ftorch.lib user32.lib /MANIFEST:EMBED,ID=1" failed (exit code 1120) with the following output:
simplenet_infer_fortran.f90.obj : error LNK2019: unresolved external symbol FTORCH_mp_TORCH_TENSOR_FROM_ARRAY_REAL32_1D referenced in function MAIN__
simplenet_infer_fortran.f90.obj : error LNK2019: unresolved external symbol FTORCH_mp_TORCH_MODULE_LOAD referenced in function MAIN__
simplenet_infer_fortran.f90.obj : error LNK2019: unresolved external symbol FTORCH_mp_TORCH_MODULE_FORWARD referenced in function MAIN__
simplenet_infer_fortran.f90.obj : error LNK2019: unresolved external symbol FTORCH_mp_TORCH_MODULE_DELETE referenced in function MAIN__
simplenet_infer_fortran.f90.obj : error LNK2019: unresolved external symbol FTORCH_mp_TORCH_TENSOR_DELETE referenced in function MAIN__
simplenet_infer_fortran.exe : fatal error LNK1120: 5 unresolved externals
NMAKE : fatal error U1077: '"C:\Program Files (x86)\CMake\bin\cmake.exe" -E vs_link_exe --intdir=CMakeFiles\simplenet_infer_fortran.dir --rc=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\rc.exe --mt=C:\PROGRA~2\WI3CF2~1\10\bin\100226~1.0\x64\mt.exe --manifests -- C:\PROGRA~2\Intel\oneAPI\compiler\latest\bin\xilink.exe /nologo @CMakeFiles\simplenet_infer_fortran.dir\objects1.rsp @C:\Users\vbalmer\AppData\Local\Temp\nmF642.tmp' : return code '0xffffffff'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\nmake.exe" -s -f CMakeFiles\simplenet_infer_fortran.dir\build.make /nologo -SL                 CMakeFiles\simplenet_infer_fortran.dir\build' : return code '0x2'
Stop.
NMAKE : fatal error U1077: '"C:\Program Files\Microsoft Visual Studio\2022\Community\VC\Tools\MSVC\14.39.33519\bin\HostX64\x64\nmake.exe" -s -f CMakeFiles\Makefile2 /nologo -LS                 all' : return code '0x2'
Stop.

I have started to google for possible solutions to this and came across this potential solve: Would one of these point us in the right direction?

@jatkinson1000
Copy link
Member

jatkinson1000 commented Jun 4, 2024

Ah, that is interesting.
I am currently in PASC sessions so limited resources, but I wonder if it is failing to link at link/runtime.

Sometimes on linux you need to add the location of the shared library to the LD_LIBRARY_PATH environment variable to allow it to be found at link/runtime.

As a first diagnosis could you try updating the CMake lists in the example to explicitly require FTorch and then try re-running cmake to check it is definitely finding FTorch?
That will require changing this line:

find_package(FTorch)

to:

find_package(FTorch REQUIRED)`

and re-running

I'm not experienced with the equivalent to LD_LIBRARY_PATH on Windows, but this answer here suggests it is the PATH variable.
In theory building with CMake and setting CMAKE_PREFIX_PATH should handle this.
An alternative way to investigate what is happening might be adapting the Makefile here: https://github.com/Cambridge-ICCS/FTorch/blob/main/examples/1_SimpleNet/Makefile to build in a more traditional sense and see what is going wrong.
Part of this would be appending the path to the ftorch .dll library to PATH and before trying to link.

@vbalmer
Copy link
Author

vbalmer commented Jun 5, 2024

Hi @jatkinson1000,

here a summary of what I tried out:

  • adjusting line 14 (as you suggested in your post) --> results in same error as above
  • adjusting the Makefile (https://github.com/Cambridge-ICCS/FTorch/blob/main/examples/1_SimpleNet/Makefile) by setting the compiler to FC = ifort (line 3, as it had previously been set to gfortran) --> results in same error as above
  • additionally adjusting the Makefile by setting LDFLAGS = -L</path/to/installation>/bin/ -lftorch (line 9, instead of LDFLAGS = -L</path/to/installation>/lib/ -lftorch), because the dll file lies in the bin folder, not the lib folder --> results in same error as above
  • adding the full path to the dll file in the Makefile: LDFLAGS = -L"C:\Users\vbalmer\AppData\Local\FTorch\bin" -lftorch (line 9) --> results in same error as above
  • in addition to the previous point, also adjusting line 6 by placing the full path there: FCFLAGS = -O3 -I"C:\Users\vbalmer\AppData\Local\include\ftorch" --> results in same error as above
  • keeping all the above changes and adding the path C:\Users\vbalmer\AppData\Local\FTorch or C:\Users\vbalmer\AppData\Local\FTorch\bin to the Path variable in my windows system variables (according to this description: https://www.java.com/en/download/help/path.html) --> results in same error as above
  • Restarting laptop and trying again with all above things implemented --> results in same error as above

Some thoughts / next ideas that I have:

Thanks again for your help and sorry for not having any better news... :/

CMakeCache.txt
Makefile.txt

@jatkinson1000 jatkinson1000 added this to the Initial Release milestone Jun 12, 2024
@TomMelt
Copy link
Member

TomMelt commented Jun 17, 2024

Hi @vbalmer , I have been able to reproduce the errors with ifort and ifx. I am looking into it, but I also don't use windows.

I think there might be an issue with using NMake as the backend. We might have to go down the visual studio route, which is even worse for me 😳 ... I will update if I find a solution.

@vbalmer
Copy link
Author

vbalmer commented Jun 17, 2024

Hi @TomMelt
thank you very much for the update!
Another thing that crossed my mind recently but I have not tested yet is the use of gfortran instead of ifort or ifx... On FTorch's Website the Download of the Intel OneAPI Base Toolkit is suggested (https://cambridge-iccs.github.io/FTorch/page/troubleshooting.html), however I am not quite sure why. Have you tried with a gfortran compiler already?
Thanks for keeping me updated, I am looking forward to hearing from you again!

@TomMelt
Copy link
Member

TomMelt commented Jun 19, 2024

hi @vbalmer . I think I have finally fixed it 😅

Can you try these instructions (adjusting for paths etc.)

checkout my test branch

In git bash (or otherwise)
go to the FTorch dir and switch to my test branch

git pull
git switch melt-windows-fix

add patch for libtorch

Make change suggested here to libtorch's cmake file

build ftorch lib

** NOTE ** if you have pre-existing builds please delete them with something like rd /s /q build

** NOTE ** the following instructions are for cmd prompt running in administrator mode

E:\Intel\oneAPI\setvars.bat
cd /d E:\FTorch\src
cmake -Bbuild -G "NMake Makefiles" -DCMAKE_PREFIX_PATH="C:\Users\melt\Downloads\libtorch-win-shared-with-deps-2.3.0+cpu\libtorch" -DCMAKE_BUILD_TYPE=Release -DCMAKE_Fortran_COMPILER=ifx -DCMAKE_C_COMPILER=icx -DCMAKE_CXX_COMPILER=icx
cmake --build build
cmake --install build

build example 1

cd /d E:\FTorch\examples\1_SimpleNet
cmake -Bbuild -G "NMake Makefiles" -DCMAKE_PREFIX_PATH="C:/Program Files (x86)/FTorch/lib" -DCMAKE_BUILD_TYPE=Release -DCMAKE_Fortran_COMPILER=ifx
cmake --build build
cmake --install build

set PATH=C:\Users\melt\Downloads\libtorch-win-shared-with-deps-2.3.0+cpu\libtorch\lib;%PATH%
set PATH=C:\Program Files (x86)\FTorch\bin;%PATH%
set PATH=E:\FTorch\examples\1_SimpleNet\venv\Library\bin;%PATH%

run example 1

** NOTE ** you will need to have already generated saved_simplenet_model_cpu.pt as per instructions in README.md

cd build
simplenet_infer_fortran.exe ..\saved_simplenet_model_cpu.pt

You should get output:

E:\FTorch\examples\1_SimpleNet\build>simplenet_infer_fortran.exe ..\saved_simplenet_model_cpu.pt
  0.0000000E+00   2.000000       4.000000       6.000000       8.000000

#itworksinmyVM

@TomMelt TomMelt linked a pull request Jun 19, 2024 that will close this issue
2 tasks
@vbalmer
Copy link
Author

vbalmer commented Jun 19, 2024

Hi @TomMelt, these are fantastic news, I will try it out asap and let you know whether it works.

@vbalmer
Copy link
Author

vbalmer commented Jun 21, 2024

Hi @TomMelt

almost everything is working now, thank you so much!! 😍

TL;DR

Everything works except for the final running of the example:
simplenet_infer_fortran.exe ..\saved_simplenet_model_cpu.pt
It throws an error about not finding two lib-torch dll files c10.dll, torch_cpu.dll and libifcoremd.dll even though I can find them in C:\Users\vbalmer\libtorch-win-shared-with-deps-2.2.2+cpu\libtorch\lib (for the first two dlls) and C:\Users\vbalmer\Anaconda3\pkgs\icc_rt-2019.0.0-h0cc432a_1\Library\bi (for the last dll)

Details

[Working] Steps on Windows for Installation

Step 1: 
goto cmd in admin mode

Step 2: 
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

Step 3: 
cd C:\Users\vbalmer\Documents\GitHub\FTorch\src
[Note: did not create build directory]


Step 4: 
cmake -Bbuild -G "NMake Makefiles" -DCMAKE_PREFIX_PATH="C:\Users\vbalmer\libtorch-win-shared-with-deps-2.2.2+cpu\libtorch" -DCMAKE_BUILD_TYPE=Release -DCMAKE_Fortran_COMPILER="C:\Program Files (x86)\Intel\oneAPI\compiler\latest\bin\ifx.exe" -DCMAKE_C_COMPILER="C:\Program Files (x86)\Intel\oneAPI\compiler\latest\bin\icx.exe" -DCMAKE_CXX_COMPILER="C:\Program Files (x86)\Intel\oneAPI\compiler\latest\bin\icx.exe"

Output: 
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- Configuring done (1.3s)
-- Generating done (0.3s)
-- Build files have been written to: C:/Users/vbalmer/Documents/GitHub/FTorch/src/build


Step 5: 
cmake --build build

Output: 
[ 33%] Building Fortran object CMakeFiles/ftorch.dir/ftorch.f90.obj
[ 66%] Building CXX object CMakeFiles/ftorch.dir/ctorch.cpp.obj
[100%] Linking CXX shared library ftorch.dll
[100%] Built target ftorch

Step 6: 
cmake --install build

Output: 
-- Install configuration: "Release"
-- Installing: C:/Users/vbalmer/AppData/Local/FTorch/lib/ftorch.lib
-- Installing: C:/Users/vbalmer/AppData/Local/FTorch/bin/ftorch.dll
-- Installing: C:/Users/vbalmer/AppData/Local/FTorch/include/ctorch.h
-- Installing: C:/Users/vbalmer/AppData/Local/FTorch/lib/cmake/FTorch/FTorchConfig.cmake
-- Installing: C:/Users/vbalmer/AppData/Local/FTorch/lib/cmake/FTorch/FTorchConfig-release.cmake
-- Installing: C:/Users/vbalmer/AppData/Local/FTorch/include/ftorch/ftorch.mod`

Changes made: Paths, especially the full paths to the compilers

[Working] Steps on Windows for Build

Step 1: 
goto cmd in admin mode

Step 2: 
"C:\Program Files (x86)\Intel\oneAPI\setvars.bat"

Step 3: 
cd C:\Users\vbalmer\Documents\GitHub\FTorch\examples\1_SimpleNet

Step 4: 
cmake -Bbuild -G "NMake Makefiles" -DCMAKE_PREFIX_PATH="C:\Users\vbalmer\AppData\Local\FTorch\lib" -DCMAKE_BUILD_TYPE=Release -DCMAKE_Fortran_COMPILER="C:\Program Files (x86)\Intel\oneAPI\compiler\latest\bin\ifx.exe"


Output: 
CMake Deprecation Warning at CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The Fortran compiler identification is IntelLLVM 2024.1.0 with MSVC-like command-line
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Check for working Fortran compiler: C:/Program Files (x86)/Intel/oneAPI/compiler/latest/bin/ifx.exe - skipped
-- Building with Fortran PyTorch coupling
-- Configuring done (2.8s)
-- Generating done (0.0s)
-- Build files have been written to: C:/Users/vbalmer/Documents/GitHub/FTorch/examples/1_SimpleNet/build



Step 5: 
cmake --build build

Output: 
[ 50%] Building Fortran object CMakeFiles/simplenet_infer_fortran.dir/simplenet_infer_fortran.f90.obj
[100%] Linking Fortran executable simplenet_infer_fortran.exe
[100%] Built target simplenet_infer_fortran

Step 6: 
cmake --install build

Output: 
-- Install configuration: "Release"


Step 7: 
set PATH="C:\Users\vbalmer\libtorch-win-shared-with-deps-2.2.2+cpu\libtorch";%PATH%
set PATH="C:\Users\vbalmer\AppData\Local\FTorch\bin";%PATH%

[no Output]

Step 8: 
run the python parts: 
cd C:\Users\vbalmer\Documents\GitHub\FTorch\examples\1_SimpleNet\venv
python3 -m venv venv
venv\Scripts\activate
pip install -r requirements.txt
python3 simplenet.py
python3 pt2ts.py
deactivate

Output: 
[Installation of python packages, tensor as described on the README]

Step 9: 
set PATH="C:\Users\vbalmer\Documents\GitHub\FTorch\examples\1_SimpleNet\venv";%PATH% 

[no Output]

Changes made:

  • activation of virtual environment for python is different than in Linux
  • adjusted all paths
  • FTorch was installed here: C:\Users\vbalmer\AppData\Local\FTorch, not here: C:\Program Files (x86)\FTorch

[Not working] Running the code

simplenet_infer_fortran.exe ..\saved_simplenet_model_cpu.pt

Throws errors, see screenshots:

The code execution cannot proceed because c10.dll was not found. Reinstalling the program may fix this problem.
The code execution cannot proceed because torch_cpu.dll was not found. Reinstalling the program may fix this problem.
The code execution cannot proceed because libifcoremd.dll was not found. Reinstalling the program may fix this problem

I have tried so far:

  • checked, that these files exist on my laptop, they do, namely the first two under C:\Users\vbalmer\libtorch-win-shared-with-deps-2.2.2+cpu\libtorch\lib and the third under C:\Users\vbalmer\Anaconda3\pkgs\icc_rt-2019.0.0-h0cc432a_1\Library\bin
  • used echo %PATH% to check that all set paths are indeed there
  • added the three paths to the environment variables like here (https://www.java.com/en/download/help/path.html)
  • restarted laptop
    which has all not worked out...

I will try after lunch:

  • to reinstall libtorch and reinstall FTorch again (including all couplings required (as shown above)
  • to link the library C:\Users\vbalmer\libtorch-win-shared-with-deps-2.2.2+cpu\libtorch\lib instead of C:\Users\vbalmer\libtorch-win-shared-with-deps-2.2.2+cpu\libtorch
  • to link the path C:\Users\vbalmer\Anaconda3\pkgs\icc_rt-2019.0.0-h0cc432a_1\Library\bin

c10dll_error

@jatkinson1000
Copy link
Member

This is great news.

The following may be useful:

Both suggest you need the VS redistributable installer: https://learn.microsoft.com/en-us/cpp/windows/latest-supported-vc-redist?view=msvc-170
Though I confess I don't know in detail what this is beyond reading the above pages.

The alternative solution seems to be to place those .dll files in your local directory, but it would be preferable if there was a general solution using the above.

Let me know if this helps and we can update the guidance over in #123

@vbalmer
Copy link
Author

vbalmer commented Jun 21, 2024

Hi @TomMelt and @jatkinson1000

I have tried many things which all did not work:

  • installing VS redistributable
  • reinstalling ftorch (which then got installed in C:\Program Files (x86)\FTorch, don't know why)
  • reinstalling libtorch (now newest version 2.3.1, again adjusting for this)
  • adjusting environment variables on Windows to the new libtorch and FTorch

Finally what worked to fix the above error was manually moving the dll-flies to the build folder from where I'm running the code. This however gives a new error:

INTEL MKL ERROR: The specified module could not be found. mkl_avx512.1.dll
Intel MKL FATAL ERROR: Cannot load mkl_avx512.1.dll or mkl_def.1.dll

I am looking into it.
The .dlls are located in C:\Users\vbalmer\Documents\GitHub\FTorch\examples\1_SimpleNet\venv\Library\bin
After moving these .dlls to the build folder as well, I get the desired result 😍

  0.0000000E+00   2.000000       4.000000       6.000000       8.000000

This is of course not the most elegant solution, so I will try to find a way in which these paths can be added more efficiently such that the dlls can be found more easily.

In any case, thank you very much for your help throughout the entire process!! I will keep you updated if I manage to find a better solution for this.

@TomMelt
Copy link
Member

TomMelt commented Jun 24, 2024

Hi @vbalmer , I had a similar issue. I must admit I am not a windows user so I don't fully understand but it seems Windows has an issue finding dlls depending on the order of the %PATH% variables.

Essentially:

  • the mkl dlls should be in the venv path
  • c10.dll and torch_cpu.dll should be in the torchlib path
  • and libifcoremd.dll I think comes from intel compiler path

My only suggestion (other than manually copying the dlls would be to try changing the order of the %PATH% vars.

Sorry I can't offer more support because I don't understand how Windows locates dlls 😓

But, I am really glad it now works (with workarounds) 🎉

If switching the order of %PATH% works let me know. But don't worry if not.

I will tidy up the bugfix PR #137 and merge it into main. When this is merged we will close this issue as solved.

@TomMelt
Copy link
Member

TomMelt commented Jun 24, 2024

@vbalmer , I also just wanted to say thanks for all your help spotting the issue in the first place and helping us to debug it. Our software is now in a better place because of it. Thanks 👍

@vbalmer
Copy link
Author

vbalmer commented Jun 24, 2024

Hi @TomMelt

thank you very much as well for your help. I really hope I get this to work in the simulation environment I would like to use it in too now in the future :) I have also updated the README for the SimpleNet example to your branch melt-windows-fix to include some instructions for windows users (following the steps above, see here: #141, not entirely sure whether I did this correctly with the request...).

As for the dlls, the order of the path was also going to be my next suspicion, so I will have a look at this when I have time and let you know here.

Otherwise I also think that the issue can be closed. Thank you again.

PS. What was it actually in your new git fork that then made everything work? Using the -Bbuild -G "NMake Makefiles" instead of just NMake seems the most obvious change but was there more to it?

@TomMelt
Copy link
Member

TomMelt commented Jun 24, 2024

hi @vbalmer ,

The main change that "fixed" it was the following change to src/CMakeLists.txt.

if (WIN32)
set (CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS TRUE)
set (BUILD_SHARED_LIBS TRUE)
endif ()

Now CMake correctly builds FTorch so that it can be linked against by the example program.

The change of instructions were mostly just my preference to make it less typing. E.g.,

cmake -Bbuild

instead of

mkdir build
cd build
cmake ..

Then, once I fixed the build issue the %PATH% stuff fixes missing dll issues.

@vbalmer
Copy link
Author

vbalmer commented Jun 25, 2024

Thanks for the explanation! That makes sense :)

@jatkinson1000
Copy link
Member

@vbalmer Following up on the issue of the .dll files it seems that the official advice is in fact to copy them over to your code.

See the information here: https://pytorch.org/cppdocs/installing.html

They provide the functionality as part of a CMakeLists file assuming you build your code using CMake, but if you are not doing this you will need to copy them across manually. It looks like any .dll file in ${TORCH_INSTALL_PREFIX}/lib/ is needed.

We have a summer school this week and next, but will aim to improve the documentation around this after that.

@vbalmer
Copy link
Author

vbalmer commented Jul 5, 2024

Hi @jatkinson1000 thank you for the update!
This makes sense. It is also interesting to see that there isn't a better solution for this, not even by pytorch...!

Thanks for updating the documentation. You're really putting a lot of effort into it - it's highly appreciated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working hackathon help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants