-
Notifications
You must be signed in to change notification settings - Fork 526
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix python cmake error #976
Conversation
Codecov Report
@@ Coverage Diff @@
## devel #976 +/- ##
===========================================
- Coverage 75.37% 64.28% -11.09%
===========================================
Files 85 5 -80
Lines 6801 14 -6787
===========================================
- Hits 5126 9 -5117
+ Misses 1675 5 -1670 Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you are right in theory, but sadly, I cannot reproduce the error...
I can get this error in both Aliyun and my local workstation ( •́ω•̩̥̀ ) |
I add && /usr/local/bin/g++750 -fPIC -std=c++11 -Wno-ignored-attributes -fopenmp -O3 -DNDEBUG -shared -Wl,-soname,libop_abi.so -o op/libop_abi.so op/CMakeFiles/op_abi.dir/custom_op.cc.o op/CMakeFiles/op_abi.dir/descrpt.cc.o op/CMakeFiles/op_abi.dir/descrpt_se_a_ef.cc.o op/CMakeFiles/op_abi.dir/descrpt_se_a_ef_para.cc.o op/CMakeFiles/op_abi.dir/descrpt_se_a_ef_vert.cc.o op/CMakeFiles/op_abi.dir/ewald_recp.cc.o op/CMakeFiles/op_abi.dir/gelu_multi_device.cc.o op/CMakeFiles/op_abi.dir/map_aparam.cc.o op/CMakeFiles/op_abi.dir/neighbor_stat.cc.o op/CMakeFiles/op_abi.dir/pair_tab.cc.o op/CMakeFiles/op_abi.dir/prod_env_mat_multi_device.cc.o op/CMakeFiles/op_abi.dir/prod_force.cc.o op/CMakeFiles/op_abi.dir/prod_force_multi_device.cc.o op/CMakeFiles/op_abi.dir/prod_virial.cc.o op/CMakeFiles/op_abi.dir/prod_virial_multi_device.cc.o op/CMakeFiles/op_abi.dir/soft_min.cc.o op/CMakeFiles/op_abi.dir/soft_min_force.cc.o op/CMakeFiles/op_abi.dir/soft_min_virial.cc.o op/CMakeFiles/op_abi.dir/tabulate_multi_device.cc.o op/CMakeFiles/op_abi.dir/unaggregated_grad.cc.o op/CMakeFiles/op_abi.dir/__/lib/src/SimulationRegion.cpp.o op/CMakeFiles/op_abi.dir/__/lib/src/neighbor_list.cc.o -Wl,-rpath,/home/jz748/codes/deepmd-kit/_skbuild/linux-x86_64-3.8/cmake-build/lib:/home/jz748/anaconda3/envs/dpdev/lib/python3.8/site-packages/tensorflow:/home/jz748/codes/deepmd-kit/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/cuda: lib/libdeepmd.so /home/jz748/anaconda3/envs/dpdev/lib/python3.8/site-packages/tensorflow/libtensorflow_framework.so.2 -Wl,-rpath-link,/home/jz748/codes/deepmd-kit/_skbuild/linux-x86_64-3.8/cmake-build/lib/src/cuda && : I think the key point is
(https://ftp.gnu.org/old-gnu/Manuals/ld-2.9.1/html_chapter/ld_2.html) So, what's your command? |
Before this PR, the command would be:
After this PR, the command would be:
The difference is whether the libdeepmd_op_cuda.so was used. |
Maybe it's a changed behavior in cmake... |
Actually, it's about the ninja-build package. When the Ninja package is not installed in the system, the "Ninja" generator of cmake will fail. At this time, the "Unix Makefiles" generator will be enabled by default; And the two generators have inconsistent behaviors for the PRIVATE parameter in the target_link_library function. |
In deepmodeling#976, propagate was changed from PRIVATE to PUBLIC. However, `libdeepmd` actually includes the CUDA libraries in the `gelu.cc` file.
In the local deepmd-kit python interface compilation, an error occurs with export DP_VARIANT=cuda:
This problem is caused by the incorrect linking of LIB_DEEPMD_OP_DEVICE within the
$deepmd_source_dir/source/op/CMakeLists.txt
Change explanation:
The usage requirements of a target can transitively propagate to dependents. The target_link_libraries() command has PRIVATE, INTERFACE and PUBLIC keywords to control the propagation.
Generally, a dependency should be specified in a use of target_link_libraries() with the PRIVATE keyword if it is used by only the implementation of a library, and not in the header files. If a dependency is additionally used in the header files of a library (e.g. for class inheritance), then it should be specified as a PUBLIC dependency. A dependency which is not used by the implementation of a library, but only by its headers should be specified as an INTERFACE dependency.
When compile with the cuda/rocm environment, the shared library LIB_DEEPMD will link the LIB_DEEPMD_OP_DEVICE. However, the LIB_DEEPMD uses the LIB_DEEPMD_OP_DEVICE in it's header files not in it's source files. So an INTERFACE argument is more proper than PRIVATE. And this will solve the above problem.