Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

David/fix cmake shared library ubuntu #4466

Merged

Conversation

davidlin409
Copy link
Contributor

This branch contains fix of CMake building bug, that causes Kaldi shared library to not functioning properly. Hope that it helps with CMake progress.

This diff is exactly what I mention in Issue "Build System".

@danpovey
Copy link
Contributor

danpovey commented Mar 5, 2021

Great, thanks!!
@davidlin409 could you please let us know what the specific changes you made were for-- e.g. the new way of building OpenFST, using python script instead of a CMake file (which seems no longer to be used).
@kkm000 and @jtrmal it would be great if you could comment on this.

@kkm000
Copy link
Contributor

kkm000 commented Mar 7, 2021 via email

@davidlin409
Copy link
Contributor Author

davidlin409 commented Mar 8, 2021

For the changes I made:

README.md

That is only for my own comment, I shall revert it before merging. Will patch it.

install_openfst.sh & OpenFST.py & .gitignore

Those are used to install OpenFST. Major motivation of this is due to the fact that in CMake installed binary, I will put those installed files into a prefix location (such as /usr/local/lib or ~/APP/). In that way, I also need to install OpenFST into corresponding location. As I remember, when I am looking at original installation script of OpenFST, it does not contain method to set installation prefix. That is why I add new script for OpenFST installation.

The other option is that original install script have a method to set installation prefix. In this way, those installation script will not be needed.

For the Python version, the reason I wrote that one is to consider the fact that Kaldi might also support Windows. Python can also be native to Windows, while shell script needs special container (such as WSL or virtual machine). However, that script seems to violate Codefactor rule. Therefore, still need to see how Python script can go, and if Python script is suitable or not.

CMakeLists.txt & gen_cmake_skeleton.py & bashrc_config.py

For CMakeLists.txt, those changes contains three different thing.

  • First part: It is fixing CMake building issue with shared library (together with gen_cmake_skeleton.py).
  • Second part is to set incluad and library path, such that Kaldi is building with OpenFST library/header which is install at CMAKE_INSTALL_PREFIX. From investigation of CMake's external_project API, it is really hard to set to right use, so I comment it and use custom installation script instead.
  • Third part is to install WSJ's script to installation directory (CMAKE_INSTALL_PREFIX). Since cmake is using another library for installation, those scripts needs to be copied to let external project use those scripts.
  • Fourth part is to register a variable called KALDI_DIR to bash init script (together with bashrc_config.py. However, this method to set environment variable is Linux only, which will not work in Windows.

Revert README.md change.
@kkm000
Copy link
Contributor

kkm000 commented Mar 9, 2021

  1. Fixing the shared build is a great thing, thanks. My own opinion is we need eventually move to CMake. The only advantage of make is that you can read the Makefiles, but as we ecramming more hacks, this advantage quickly disappears. CMake is freaking easy to install (curl http://... | sudo tar x) on any linux, even in your own ~/.local if you are not a sudoer, so I stopped adding hacks to CMakeLists depending on CMake version. I require 3.19 (it's natively supports CUDA as a lanuage, along with CXX, from 3.18, but 3.19 smoothed out most kinks). It's also very well-supported by VSCode, after some amount of wrestling.
  2. We have a Windows port of OpenFST (kkm000/OpenFST), and @jtrmal wrote CMake build for it. I am maintaining 2 branches there, "original" and "winport". I want to transplant his build system to the original build, like a 3rd branch "cmake" or something. If cmake build would find OpenFST built by tools/Makefile, as the make build currently does, that would be enough. Please do not over-engineer this. The main focus, IMO, should be on making the CMake build of Kaldi proper robust. Another platform we actively support is Mac.
  3. I would not worry about Windows. You can build Kaldi on Windows, that's easy. But you won't get anywhere w/o a functional bash anyway. If you want to train a model, rent a Linux machine in the cloud, connect as many GPUs as you want to it (P100 is 25% faster than 1080Ti, and T4 is only 20% slower, and cost peanuts. CUBLAS uses its RT cores, BTW--I did not expect that, but it does). Main use on Windows is only decoding, and for that build is customized anyway, if you build a production system. I have been doing that for 6 years, and if you ask me just one advice about Kaldi, it will be “do not train on Windows.”
  4. I see nothing wrong with separating the steps/ and utils/ into their own directory. In fact, I'm "installing" Kaldi on a cluster shared drive (seen as /opt from all nodes), and these 2 subdirs are copied by my (external) script to /opt/kaldi/lib/scripts. The script is quite unnice: it interrogates make's targets to locate files, and there is lot of ugly perl juggling after that. But as is, Kaldi is not designed to be installed. The only "installation" part is gathering all .so libs in a single lib/ dir in the case of a shared build. Let's not expect more from a CMake build for now. It must robustly replicate the current make build process (in src/ only, not even /tools). The largest advantage of the build is it's simple (speech scientists are not build engineers). This is the main invariant.
  5. I would not mess with users' .bashrc. Ever. I would be red angry if the setup program did that to my .bashrc silently. Also, we have an (ugly) mechanism to set the right path from each of the egs run.sh script. Let's not bit too large a bite to chew now and leave it as is.
  6. Ignore CodeFactor for now.

I wish @jtrmal could look at this. This CMake thing is going back and forth. I would be very happy if it finally materialized. Unfortunately, we are both very busy now. If you could take on this project, it would be very helpful. The main goal, in my view, is transition to CMake smoothly so we do not break the current users.

My opinion is there is no perfect build system, or, come think of it, even a non-brain-damaged one. They all require RTFM if your build is anything but trivial. I rewrote tools/Makefile, and it is still hard to use in even minimally non-typical scenarios. CMake+ninja is likely better than make, in the end, if you do it correctly.

@davidlin409
Copy link
Contributor Author

davidlin409 commented Mar 9, 2021

Thank you for the information about what should be noticed for patching Kaldi. I am new to Kaldi (3 month maybe), so I do need information to patch kaldi without destroying environment.

For CMake patching, previously it is because our work is related to Kaldi. Currently I have very limited time at both day and night to do the patching. Therefore, after reading your post, I came to an idea. Let's see if the plan work or not:


For tthose changes, my main focus is to ensure that our project's code can hook with Kaldi. Our code is an exntension to kaldi, so that it is a separate repository. Unlike Github project "phonetisaurus g2p", I don't have time to rewrite all training script to Python, while currently we select Python to run entire training process. Therefore, in my original intention is to concentrate all binary and library to fixed location (bin/ and lib/), which is like old-style Liniux does.

But considering the fact that I can't destroy Kaldi environment, so maybe there is an alternative in between that can both fix current CMake issue, while Kaldi's overall structure is majorly maintained.

  • For CMake's bug fix, it is remained as is, and that is the core of this patch.
  • For egs script, maybe it can be in orginal location. (For our project, we will find a way our-self)
  • For OpenFST, the install script inside tools/extra/install_xxxx.sh can still be used. For CMake part, OpenFST can be installed at location specified by tools/extra folder, and CMAKE_INSTALL_PREFIX is forced to KALDI_ROOT/src. This will not allow user to set CMAKE_INSTALL_REPFIX, but the change to existing file is less. If user wants to change CMAKE_INSTALL_PREFIX, the user needs to add PATH to his environment manually, which can be mentioned in README?
  • .bashrc related changes will be reverted, and no system related file will be touched.
  • Since OpenFST is installed using tools/extra/install_xxx.sh, is it needed to add path to OpenFST library to LD_LIBRARY_PATH in tools/common/config/common_path.sh?

Under those changes, eventually only CMakeLists.txt and gen_cmake_skeleton.py will be modified, and others are all not touched. After the change, I still need to verify if those changes can be compatible to our project (wihlle our project also needs changes to source shell script inside Python). If those changes are successfully integrated with our project, then those code can be tested under our training process, to ensure that there is "relatively" no bug inside.

@davidlin409
Copy link
Contributor Author

Hi @danpovey and @kkm000:

Based on the strategy stated above, I revert changes that is not necessary, and use Kaldi's src/ folder as install prefix to install by CMake. After the changes, I pull out THCHS-30 script, without touching any PATH setting, and modify necessary part to make script running. After testing it with bug fix, now THCHS-30 mono-phone decoding is successful. So I believe that till Tri4, training shall be OK.

For files changed in this pull-request, except for two files stated above, one additional file is changed, tools/config/common_path.sh. This is because OpenFST's library path is not added into LD_LIBRARY_PATH environment variable. When I am running process such as make_mfcc, it will complains that libfst can not be found.

For DNN part, there are actually bug in CUDA compilation for CUDA capability 35 during CMake build (Makefile build is normal). So DNN testing will need other patch (which I will put it in another pull request?). Therefore in this pull request, I only focus on non-DNN training part verification.

Do you think that the testing is enough? Or I need to ensure that training is successful till Tri4?

@danpovey
Copy link
Contributor

Thanks!! Let me see what @kkm000 thinks..

@kkm000
Copy link
Contributor

kkm000 commented Mar 11, 2021

Thank you David, you took on a long unsolved problem. I'm

This is because OpenFST's library path is not added into LD_LIBRARY_PATH

LD_LIBRARY_PATH should not normally be used. You build Kaldi after OpenFST has been installed, so you do not need any tricks to make linker hardcode the path to them. There is a way to compile a binary against libraries in one place, but have them at runtime in another (path to libraries to link against is specified with -L switch, but the part to .so at runtime with -rpath). But since they should be already there, setting LD_LIBRARY_PATH is unexpected. Something may be wrong.

OpenFST is installed into tools/openfst, which is a link to the directory with a version number, like tools/openfst-1.7.1. Yes, it is the same directory as the source, but it creates directories that do not exist at the root level (bin/, lib/ and include/; do not get confused: the original includes are in src/include, and src/lib also exists--it's the library source). In the end, whatever is linked, should be linked against e.g. ../../tools/openfst/lib/libfst.so.12 or similar. Use ldd <any-executable-or-.so-file> to see what libraries it wants to load. The tool will also list the missing libraries.

Are you building static or shared Kaldi?

I'll look carefully at your changes today. This one point looked strange to me.

CUDA capability 35

I do not think we support it any more, certainly not in the cudadecoder, so if you enable it (in configure, it is disabled by default), it's expected to fail. It's really very old. I do not remember which CUDA devices limited to it, but the ancient K20 is newer, AFAICR. Generally, it's better to specify the architecture(s) that you are testing on when building. Every additional sm_XX significantly adds to compilation time, but there is no point embedding a CUDA fatbin into the executable if the hardware cannot run it.

@davidlin409
Copy link
Contributor Author

Hi @kkm000:

Thank you for your tip. Indeed when I am linking OpenFST, I does not set -rpath, and maybe that is the reason why I got libfst not found error. I will try to find a way to allow cmake do the configuration by it's language, and fix OpenFST issue.

For this patch, it is focusing on the issue that CMake build on shared library fail, so I am using shared library for Kaldi build. I did not check static build yet, which maybe tested and, if problem exist, patch in another pull-request?

For CUDA build, does Kaldi already have method to specify CUDA capabiliity in CMake project file? If not, I can have yet another pull request to fix this issue (I have already build CUDA by CMake successfully, but the solution is ugly).

@davidlin409
Copy link
Contributor Author

Hi @kkm000:

I have added two more commits, to first remove change to common_config.sh. Second is to fix CMake script, to add openfst/lib into rpath. Now after testing THCHS-30 script (mono-phone till alignment), training is successful without tweaking LD_LIBRARY_PATH.

CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Outdated Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Outdated
# get_third_party(openfst)
# set(OPENFST_ROOT_DIR ${CMAKE_BINARY_DIR}/openfst)
# include(third_party/openfst_lib_target)
find_library(OpenFST
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable is cached, so it must conform to CMake conventions:

Suggested change
find_library(OpenFST
find_library(OpenFST_LIBRARY

CMakeLists.txt Outdated
NAMES fst
PATHS ${CMAKE_SOURCE_DIR}/tools/openfst/lib
REQUIRED)
include_directories(${CMAKE_SOURCE_DIR}/tools/openfst/include)
Copy link
Contributor

@kkm000 kkm000 Mar 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add, before this line,

find_path(OpenFST_INCLUDE_DIRS "fst.h" 
          PATHS "${CMAKE_SOURCE_DIR}/tools/openfst/include"
          REQUIRED)
get_filename_component(OpenFST_LIBRARY_DIR ${OpenFST_LIBRARY} DIRECTORY CACHE)
mark_as_advanced(FORCE OpenFST_INCLUDE_DIRS OpenFST_LIBRARY OpenFST_LIBRARY_DIR)

Since find_library caches the library path, it makes sense to do the same to the include path. find_path() will cache the path variable. As long as the path is cached, the search will not be performed again.

CMakeLists.txt Outdated
PATHS ${CMAKE_SOURCE_DIR}/tools/openfst/lib
REQUIRED)
include_directories(${CMAKE_SOURCE_DIR}/tools/openfst/include)
link_directories(${CMAKE_SOURCE_DIR}/tools/openfst/lib)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
link_directories(${CMAKE_SOURCE_DIR}/tools/openfst/lib)
link_directories(${OpenFST_LIBRARY_DIR})

This is a cached path to the library. Once found, the find_library() command won't perform the search again. Also, use OpenFST_INCLUDE_DIRS for the include_directories(). It's tempting to omit link_directories() here altogether and use link_libraries(${OpenFST_LIBRARY}), which contains a full path to the library, but it will be incorrect if someone toggles the BUILD_SHARED_LIBS value: the cached value retains the .a or .so suffix which was selected according to the value of that variable.

@kkm000
Copy link
Contributor

kkm000 commented Mar 15, 2021

@davidlin409, I ran the build a couple times, and everything is just fine, from the very first attempt. I'll merge this, there's only a couple things to fix. Thanks for your work, really appreciated!

I have a question for you: how willing are you to improve CMake build for Kaldi? What I'm seeing is kinda raw, compared to our configure. It would be nice to switch to CMake, but there is a lot of work needed. I'm bogged down for the month or so, but I could at the least draw a roadmap for it, what needs to be done. However much you want to bite off it, is great; it's a large project. Read "Modern CMake" and google for "more modern CMake." I found them very helpful.

  1. We need to fix the name of MATHLIB; all Kaldi cached variables start with KALDI_, except this one. This looks weird. It you run ccmake or cmake-gui, you do not know where it came from.

  2. One thing that struck me as very odd is the way MKL is configured: you need to pass -DMATHLIB=MKL to CMake as a cache variable, but specify MKLROOT=/opt/intel/mkl in the command line as an environment variable. This makes no sense. If you want to replicate the logic of searching for MKL from the configure script, that would be great. MKL should be the default.

  3. On the same note, I'd rename a couple of options:

KALDI_BUILD_EXE  => KALDI_BUILD_BINARIES
KALDI_BUILD_TEST => BUILD_TESTING   "Build the testing tree."

We call them binaries, not executables in the documentation, and the standard CMake variable (although undocumented and is a horrible name) is BUILD_TESTING. If you look into the CMakeCache.txt, you'll see what happens: KALDI_BUILD_TEST causes include(CTest) which saves BUILD_TESTING into the cache set to ON. Then (possibly on the next cmake run) it calls enable_testing() if BUILD_TESTING is set to ON. Then our CMakeLists.txt unconditionally calls enable_testing(). Both variables are in the cache. Now try to think what behavior you'd get if you invoke cmake next time with -DKALDI_BUILD_TEST=OFF, or with -DBUILD_TESTING=OFF. I could not solve this riddle, I can play only 2-D chess. Better not to run into it.

  1. Yet another thing that can be improved is the out-of-tree build. CMake is very good at hiding all the build dirt in a subdirectory, so your source stays clean, but this is not what the scripts expect: they want executables to be in the same directory as the sources, what our scripts do. We can approach this in different ways, but the simplest and cleanest is to tell CMake to build the executables where GNU make builds them. If you take on this, please make it conditional on an option
option(KALDI_CLASSIC_LAYOUT "Place built binaries in their source directory" ON)

This does not apply to libraries or tests; only the main binaries must reside in the expected locations.

Just for a reference, this is how I ran the build (Build is the name of the out-of-tree :

kaldi/src$ MKLROOT=/opt/intel/mkl cmake -G Ninja -DCMAKE_EXPORT_COMPILE_COMMANDS=ON -S .. -B Build -DMATHLIB=MKL -DKALDI_BUILD_TEST=ON -DBUILD_SHARED_LIBS=ON
-- Running gen_cmake_skeleton.py
-- Finding MKL from "/opt/intel/mkl"
-- A library with LAPACK API found.
CUDA_TOOLKIT_ROOT_DIR not found or specified
-- Could NOT find CUDA (missing: CUDA_TOOLKIT_ROOT_DIR CUDA_NVCC_EXECUTABLE CUDA_INCLUDE_DIRS CUDA_CUDART_LIBRARY)
-- Configuring done
-- Generating done
-- Build files have been written to: /home/kkm/work/kaldi/src/Build

kaldi/src$ time cmake --build Build
[1516/1516] Linking CXX executable src/kwsbin/kws-search

real    6m7.267s
user    67m44.524s
sys     3m27.165s

kaldi/src$ cmake --build Build -- test
[0/1] Running tests...
Test project /home/kkm/work/kaldi/src/Build
        Start   1: io-funcs-test
  1/128 Test   #1: io-funcs-test .....................   Passed    0.01 sec
 . . . . . . .
        Start 128: online-feat-test
128/128 Test #128: online-feat-test ..................   Passed    0.04 sec

100% tests passed, 0 tests failed out of 128

Total Test time (real) = 102.08 sec

For reference, these are the actual .so files that the binary will load:

kaldi/src$ ldd Build/src/matrix/matrix-lib-test
        linux-vdso.so.1 (0x00007ffd2a352000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f66a1e21000)
        libmkl_intel_lp64.so => /opt/intel/mkl/lib/intel64_lin/libmkl_intel_lp64.so (0x00007f66a12b1000)
        libmkl_sequential.so => /opt/intel/mkl/lib/intel64_lin/libmkl_sequential.so (0x00007f669fc8a000)
        libmkl_core.so => /opt/intel/mkl/lib/intel64_lin/libmkl_core.so (0x00007f669b90a000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f669b8e9000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f669b764000)
        libfst.so.10 => /home/kkm/work/kaldi/tools/openfst/lib/libfst.so.10 (0x00007f669b621000)
        libkaldi-matrix.so => /home/kkm/work/kaldi/src/Build/src/matrix/libkaldi-matrix.so (0x00007f669b4fd000)
        libkaldi-base.so => /home/kkm/work/kaldi/src/Build/src/base/libkaldi-base.so (0x00007f669b4d2000)
        libstdc++.so.6 => /lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f669b34e000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f669b334000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f669b171000)
        /lib64/ld-linux-x86-64.so.2 (0x00007f66a1eb8000)

All generated stuff is in a separate directory now:

kaldi/src$ ll Build/
total 6.2M
-rw-r--r--  1 kkm kkm 2.6M 2021-03-14 20:58:39 .ninja_deps
-rw-r--r--  1 kkm kkm 275K 2021-03-14 21:12:43 .ninja_log
-rw-r--r--  1 kkm kkm  42K 2021-03-14 20:52:03 CMakeCache.txt
drwxr-xr-x  5 kkm kkm 4.0K 2021-03-14 20:52:04 CMakeFiles/
-rw-r--r--  1 kkm kkm  360 2021-03-14 20:52:03 CMakeLists.txt
-rw-r--r--  1 kkm kkm 1.2K 2021-03-14 19:43:28 CTestTestfile.cmake
-rw-r--r--  1 kkm kkm 2.6K 2021-03-14 19:43:21 DartConfiguration.tcl
drwxr-xr-x  3 kkm kkm 4.0K 2021-03-14 19:43:21 Testing/
-rw-r--r--  1 kkm kkm 2.9M 2021-03-14 20:52:04 build.ninja
drwxr-xr-x  2 kkm kkm 4.0K 2021-03-14 20:52:03 cmake/
-rw-r--r--  1 kkm kkm 7.2K 2021-03-14 19:43:28 cmake_install.cmake
-rw-r--r--  1 kkm kkm 431K 2021-03-14 20:52:04 compile_commands.json
drwxr-xr-x 42 kkm kkm 4.0K 2021-03-14 19:43:28 src/

the src only contains subdirectories

kaldi/src$ ls -AF Build/src/
base/        decoder/  gmm/      ivectorbin/  lm/      nnet2bin/  online2/     sgmm2/
bin/         feat/     gmmbin/   kws/         lmbin/   nnet3/     online2bin/  sgmm2bin/
chain/       featbin/  hmm/      kwsbin/      matrix/  nnet3bin/  onlinebin/   transform/
chainbin/    fstbin/   itf/      lat/         nnet/    nnetbin/   rnnlm/       tree/
cudamatrix/  fstext/   ivector/  latbin/      nnet2/   online/    rnnlmbin/    util/

and here are example content of a library and a binary directory under it: only build aritifacts and a couple extra files

kaldi/src$ ll Build/src/lm
total 4.1M
drwxr-xr-x 5 kkm kkm 4.0K 2021-03-14 19:43:28 CMakeFiles/
-rw-r--r-- 1 kkm kkm 1.1K 2021-03-14 19:43:28 CTestTestfile.cmake
-rwxr-xr-x 1 kkm kkm 154K 2021-03-14 20:53:23 arpa-file-parser-test*
-rwxr-xr-x 1 kkm kkm 2.5M 2021-03-14 20:53:24 arpa-lm-compiler-test*
-rw-r--r-- 1 kkm kkm 6.0K 2021-03-14 20:52:03 cmake_install.cmake
-rwxr-xr-x 1 kkm kkm 1.4M 2021-03-14 20:53:21 libkaldi-lm.so*


kaldi/src$ ll Build/src/lmbin/
total 888K
drwxr-xr-x 4 kkm kkm 4.0K 2021-03-14 19:43:28 CMakeFiles/
-rw-r--r-- 1 kkm kkm  283 2021-03-14 19:43:28 CTestTestfile.cmake
-rwxr-xr-x 1 kkm kkm 147K 2021-03-14 20:56:22 arpa-to-const-arpa*
-rwxr-xr-x 1 kkm kkm 726K 2021-03-14 20:56:14 arpa2fst*
-rw-r--r-- 1 kkm kkm 4.0K 2021-03-14 20:52:04 cmake_install.cmake

Object files are hidden out of sight:

kaldi/src$ ll -R Build/src/lm/CMakeFiles/
Build/src/lm/CMakeFiles/:
total 12K
drwxr-xr-x 2 kkm kkm 4.0K 2021-03-14 20:53:16 arpa-file-parser-test.dir/
drwxr-xr-x 2 kkm kkm 4.0K 2021-03-14 20:53:23 arpa-lm-compiler-test.dir/
drwxr-xr-x 2 kkm kkm 4.0K 2021-03-14 20:53:20 kaldi-lm.dir/

Build/src/lm/CMakeFiles/arpa-file-parser-test.dir:
total 248K
-rw-r--r-- 1 kkm kkm 246K 2021-03-14 20:53:16 arpa-file-parser-test.cc.o

Build/src/lm/CMakeFiles/arpa-lm-compiler-test.dir:
total 5.5M
-rw-r--r-- 1 kkm kkm 5.5M 2021-03-14 20:53:23 arpa-lm-compiler-test.cc.o

Build/src/lm/CMakeFiles/kaldi-lm.dir:
total 2.5M
-rw-r--r-- 1 kkm kkm  110K 2021-03-14 20:53:13 arpa-file-parser.cc.o
-rw-r--r-- 1 kkm kkm 1006K 2021-03-14 20:53:14 arpa-lm-compiler.cc.o
-rw-r--r-- 1 kkm kkm  662K 2021-03-14 20:53:20 const-arpa-lm.cc.o
-rw-r--r-- 1 kkm kkm  372K 2021-03-14 20:53:20 kaldi-rnnlm.cc.o
-rw-r--r-- 1 kkm kkm  357K 2021-03-14 20:53:15 mikolov-rnnlm-lib.cc.o

It's all nice and good, except the scripts expect the binaries to be somewhere else. :)

@kkm000
Copy link
Contributor

kkm000 commented Mar 15, 2021

Forgot to mention a sensible and well-known hack. You can add set_property(GLOBAL PROPERTY CTEST_TARGETS_ADDED 1) before calling enable_testing() to suppress a buttload of useless test and CI targets which are specific to CDart/Dashboard which we are very unlikely to ever use. :)

It's not a biggie unless you open the CMake project in Visual Studio or VSCode, which will show all these targets, or run cmake --build Build -- help -- then you'd be puzzled!

@davidlin409
Copy link
Contributor Author

Hi @kkm000 :

For kaldi CMake improvement, may I ask if there is any place that can have Kanban view? Those items you mentioned above seems like really large project, while those projects are not directly linked to my work; I might need to organize them by several topics and handle them separately. The other option is that I create a open page on Notion.so, and we can add CMake related topic on top of it. But since this is directly related to Kaldi, I think that putting Kanban/Wiki on Kaldi Github will be most safe option. What do you think?

From the four topics you mentioned above, I do have a question about how those object files are placed in question 4. I know that traditional Kaldi place object files in the same directory as source code. But that is actually what old style Makefile does. For example when I am studying in university as graduated student, I already separate those object files into separate directory (also in the directory called build folder) in pure Makefile project, which needs tuning of Makefile since Kaldi Makefile system is large, but doable. So shall we make CMake place object file the same as Makefile, or shall we follow CMake and tune Makefile project?

  • Making those objects into build directory have an advantage of deleting those objects in a folder, which does not tweak a lot in .gitignore file. Also in current system, if I want to clean those binaries, I needs to purge git repository. For binary/library installation, it can still be inside src/ folder, which does not affect current binary library topology, and binary can found library properly with rpath, and those scripts will still functioning without risk of being break.

As for taking on those topics. For now my first priority is to patch those bug that is related to my project at work (a patch to egs script that causes warning in DNN training of nnet1, and CUDA compilation bug of CUDA capability 35). After that, I might start to take a look at those topics. But since those topics are not related to my work, I can only do it at night with my own computer, which actually install Windows instead of native Linux. So the speed will be much slower (work computer is with 32 CPU threads...), and I will have limited time. The progress will be slow, but by one-by-one looking into those topics, it can be gradually handled. For some topics (especially the object to be placed at source directory).

If I would start from those 4 topics, the first one that I might try is question 3, since I am also bugged by CTest. First and second can be later investigated (where there might be discussion to ensure what I think is the same as you). For the 4th question, changing Makefile is easy, but changing where CMake place object file is still unknown.

@danpovey
Copy link
Contributor

danpovey commented Mar 15, 2021 via email

davidlin409 and others added 2 commits March 16, 2021 00:54
Co-authored-by: Cy 'kkm' Katsnelson <[email protected]>
 -  ENDIF changed to endif, to follow kaldi convention.
 -  Add BUILD_SHARED_LIBS option, or user
   will not know that this option exists
   (based on kkm000 suggestion)
 -  For OpenFST include/library path, use
   `find_library`, `find_path`, and
   `get_filename_component` to locate them,
   and use corresponding CACHED variable
   in `include_directories` and `link_directories`.
 - Mark OpenFST related variable as advanced,
   since it is found by CMAKE procedure,
   instead of setted by user.
Copy link
Contributor Author

@davidlin409 davidlin409 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For comment above

Indeed issue is also a way to track those tasks. Thank you @danpovey for the tip :-D.


For the final change

For those changes, I have compiled it again using "shared_library" option, which is successful. I also check OpenFST related CMake variable, which is pointed to right locations. Since final modification is about compilation detail, I did not train again this time.

If training test is needed, please let me know such that I schedule it again.

PS: (The diff show below is old diff, so please use "File Changed" tab above to see last modification)

CMakeLists.txt Show resolved Hide resolved
CMakeLists.txt Show resolved Hide resolved
@davidlin409 davidlin409 reopened this Mar 15, 2021
@kkm000
Copy link
Contributor

kkm000 commented Mar 16, 2021

I'm with @danpovey on this. Let's bite little pieces, easy to chew. And we cannot touch the Makefile layouts. There are tens of thousands users out there who expect it to work as they used to. Most of them know little about build, and are in a rush to write the paper by deadline. So this part is very sensitive. This is also the realistic CMake build should, at the least by default, reproduce the make build exactly. It's not because it's perfect or even good. Look at all the files in tools/config/, which is exactly one, sourced by every run.sh script. Why is it in tools/, why does it require KALDI_ROOT be set when it perfectly knows it is invariably the full path to '../../src'--which could have been just .. if it were placed under src/ when it was,--and why is it maintained instead of just grabbing all *bin/ into the path, I do not know. But I am not touching this abhorrence, because it will likely break someone's setup at exactly the wrong moment.

Unfortunately, CMake is not very consistent and being consistent with the rest of CMake does not mean consistency in the normal sense. The convention is _INCLUDE_DIRS plural and _LIBRARY_DIR singular. I can concoct a rationale: when linking to a library X, you may need includes from multiple dirs, but linking to a single libX.a or libX.so, not multiple libraries. But most likely, I'm telling a "just so story". Check FindXXXX.cmake own modules (in /usr/share somewhere), and go along with them.

The naming of target properties is very unobvious. To place a .so or an executable binary to a directory you want, set its property RUNTIME_OUTPUT_DIRECTORY. Sometimes I'm tempted to think the properties are named the way they are because someone who came up with these names had a bad day and longed for vengeance on the whole world, if not for the Hanlon's razor. Here's an example. which builds a binary in its CMakeLists.txt location, ${CMAKE_CURRENT_SOURCE_DIR}:

$ ll
total 8.0K
-rw-r--r-- 1 kkm kkm 220 2021-03-15 21:20:51 CMakeLists.txt
-rw-r--r-- 1 kkm kkm  68 2021-03-15 21:21:47 main.cc
$ head -100 *
==> CMakeLists.txt <==
cmake_minimum_required(VERSION 3.15)
project(hello)
include(GNUInstallDirs)
add_executable(main main.cc)
set_target_properties(main PROPERTIES
                      RUNTIME_OUTPUT_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR})

==> main.cc <==
#include <iostream>
int main() { std::cout << "Hell, O World!\n"; }
$ cmake -G Ninja -B Build
-- The C compiler identification is GNU 8.3.0
-- The CXX compiler identification is GNU 8.3.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Configuring done
-- Generating done
-- Build files have been written to: /home/kkm/.tmp/exeloc/Build
$ cmake --build Build
[2/2] Linking CXX executable ../main
$ ll
total 32K
drwxr-xr-x 3 kkm kkm 4.0K 2021-03-15 21:25:44 Build/
-rw-r--r-- 1 kkm kkm  220 2021-03-15 21:20:51 CMakeLists.txt
-rwxr-xr-x 1 kkm kkm  17K 2021-03-15 21:25:44 main*
-rw-r--r-- 1 kkm kkm   68 2021-03-15 21:21:47 main.cc
$ ll Build/
total 60K
-rw-r--r-- 1 kkm kkm 8.8K 2021-03-15 21:25:44 .ninja_deps
-rw-r--r-- 1 kkm kkm  123 2021-03-15 21:25:44 .ninja_log
-rw-r--r-- 1 kkm kkm  17K 2021-03-15 21:25:27 CMakeCache.txt
drwxr-xr-x 5 kkm kkm 4.0K 2021-03-15 21:25:27 CMakeFiles/
-rw-r--r-- 1 kkm kkm  16K 2021-03-15 21:25:27 build.ninja
-rw-r--r-- 1 kkm kkm 1.6K 2021-03-15 21:25:27 cmake_install.cmake

@kkm000
Copy link
Contributor

kkm000 commented Mar 16, 2021

As for taking on those topics. For now my first priority is to patch those bug that is related to my project at work (a patch to egs script that causes warning in DNN training of nnet1, and CUDA compilation bug of CUDA capability 35).

I am really surprised anyone uses nnet1. Just yesterday I though of removing it. If you do not mind, can you explain why this choice?

What is your GPU? Do you really need sm_35? sm_35 is not a capability, it's an architecture. The capability is e.g. compute_35. Nothing prevents you from compiling capability 3.5 code for e.g. GTX 1080 (architecture 6.1) with compute_35,sm_61.

And if you really stumble, and are using nnet1, check out Kaldi around 5.2 or 5.3. I think we dropped support for cap 3.5 about a year ago.

@kkm000
Copy link
Contributor

kkm000 commented Mar 16, 2021

Re kanban, I used to use Kerika for my TODO lists. I thied may different things, but to me, nothing works better than Microsoft OneNote. Drag items up or down or between boxes on a page, highlgih in color--what else you need? If you create a notebook on OneDrive, you can have it synced on all machines that you use.

There are 2 OneNote apps: one comes with Windows 10, another with MS Office. Office is not free, but its OneNote is available somewhere as a free standalone download somewhere. I think if you get a trial of Office, you'll be able to install OneNote from it and use after the trial expires, too. I just used to it. Maybe the "new" Windows 10 OneNote gives you the same functionality. I never bothered to check, really :)

Office also includes Planner, which is more board-like. I do not know if it's available for free though.

@davidlin409
Copy link
Contributor Author

davidlin409 commented Mar 16, 2021

Hi @kkm000

  1. For first comment, if you want to define those variable to, for example, CMAKE_INCLUDE_DIRS and CMAKE_LIBRARY_DIR, it is all OK for me ~~~. I just follow traditional CMake routine (maybe I am old style), and you can change to whatever name style you feel good. As I remember that you have suggested change before, you can use the same method to change to whatever naming you want, and I accept those changes. The reason that I push one more commit is that the suggested CMake segment contains some syntax error and I don't know how to edit in review page @,@, and everything else is based on your suggestion (except the "S" letter I guess @@?). So after the renaming, is there still blocking factor for this merge? If not, maybe we can call for a end for this pull-request?
  2. For nnet1, the reason I use it is because I need to run DNN ASAP. I don't have time to assemble nnet3 binary due to our "extremely" rush schedule, and nnet1 contain exactly the same DNN structure that I need, so I choose it. For later, if I found a way to use nnet3 for DNN training, I can switch to nnet3. But now I am working on PyTorch Wake word~~~~
  3. For CUDA capability 3.5, I am using NVidia GPU "GeForce GT 710", which is ancient product..... That is why I use it. The point is, for people having different different CUDA capability, we need to have a method to adjust capability in CMake, such that people can build CUDA code successfully without the need to tweak Kaldi source code. For CUDA discussion, I suggest that a new issue is opened. (Since I encounter CUDA bug in new Kaldi commit~~~)
  4. For tool, the main purpose I mention it is because, here it is pull request for CMake's shared library feature. However we have discussed a lot of topics which does not belong to this pull-request. I am OK with all kind of tool (Github Issue, Onenote, Notion, etc). The key is, we can discuss other material elsewhere.

So, final question is, is there any further blocking factor for my changes made that needs to be fixed?

PS: For further CMake discussion, I have opened a free-issue
davidlin409#1

feel free to discuss all kind of stuff there.

@kkm000
Copy link
Contributor

kkm000 commented Mar 16, 2021

Your PR is very good, actually, thanks! I liked the defensive style in the Python part, asserting the boolean type. CMake produces inconsistent cache, however, this is what we need to solve. Either skip find_library() for OpenFST altogether (but it's good, it will stop build right there if it does not exist), or force-remove its variable from cache, or cache the include and lib directory too. Two cached variables for testing is also bad practice. Since it's work in progress, I can merge almost anything. The whole CMake code set requires more cleanup than this. But it's better be correct than not when adding a feature. Review comments are not a blame, please!

Definitely get an older version of Kaldi and older CUDA (9.0 would do), if your GPU is this old. You're using the parts of Kaldi not touched in years. Check out 5.4 and see if it just builds with CUDA 9, with make. CMake is not there yet when you just need a working Kaldi.

@davidlin409
Copy link
Contributor Author

Ok, so do I need further action to finish this PR, or it is already good to go? I am new to Github PR, so don't know how it is working. I still see a sign "1 review requesting changes", don't know if that will block anything @@?

Copy link
Contributor

@kkm000 kkm000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I missed your changes, sorry. Merging.

@kkm000 kkm000 merged commit d3c2be4 into kaldi-asr:master Mar 16, 2021
@davidlin409 davidlin409 deleted the david/fix_cmake_shared_library_ubuntu branch March 16, 2021 06:31
davidlin409 added a commit to davidlin409/kaldi that referenced this pull request Mar 18, 2021
@kkm000
Copy link
Contributor

kkm000 commented Mar 28, 2021

Thank you David, you took on a long unsolved problem. I'm

This is because OpenFST's library path is not added into LD_LIBRARY_PATH

LD_LIBRARY_PATH should not normally be used. You build Kaldi after OpenFST has been installed, so you do not need any tricks to make linker hardcode the path to them. There is a way to compile a binary against libraries in one place, but have them at runtime in another (path to libraries to link against is specified with -L switch, but the part to .so at runtime with -rpath). But since they should be already there, setting LD_LIBRARY_PATH is unexpected. Something may be wrong.

OpenFST is installed into tools/openfst, which is a link to the directory with a version number, like tools/openfst-1.7.1. Yes, it is the same directory as the source, but it creates directories that do not exist at the root level (bin/, lib/ and include/; do not get confused: the original includes are in src/include, and src/lib also exists--it's the library source). In the end, whatever is linked, should be linked against e.g. ../../tools/openfst/lib/libfst.so.12 or similar. Use ldd <any-executable-or-.so-file> to see what libraries it wants to load. The tool will also list the missing libraries.

Are you building static or shared Kaldi?

I'll look carefully at your changes today. This one point looked strange to me.

CUDA capability 35

I do not think we support it any more, certainly not in the cudadecoder, so if you enable it (in configure, it is disabled by default), it's expected to fail. It's really very old. I do not remember which CUDA devices limited to it, but the ancient K20 is newer, AFAICR. Generally, it's better to specify the architecture(s) that you are testing on when building. Every additional sm_XX significantly adds to compilation time, but there is no point embedding a CUDA fatbin into the executable if the hardware cannot run it.

@kkm000
Copy link
Contributor

kkm000 commented Mar 28, 2021

@davidlin409, I cannot login to your Notion I got an invitation for, access denied. But There is not so much little stuff that we won't be able to track it in a single ticket (possibly forking off X-ref'd as needed.)

@davidlin409
Copy link
Contributor Author

Hi @kkm000

For Notion, I also don't think it is adequate for issue tracking like stuff.

At my personal fork, I start a issue for general discussion, as below:
davidlin409#1

Also, you can create issue there. In my personal fork, I also open a project for CMake. Does that fit our need?
https://github.com/davidlin409/kaldi/projects/1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants