Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build from source: Cannot use custom compiled python #356

Closed
anands-repo opened this issue Sep 23, 2020 · 4 comments
Closed

Build from source: Cannot use custom compiled python #356

anands-repo opened this issue Sep 23, 2020 · 4 comments

Comments

@anands-repo
Copy link

anands-repo commented Sep 23, 2020

I am trying to build DeepVariant from source, and trying to use a custom python installation rather than the standard one. However, bazel test fails because it tries to use the standard library python. The requisite python is accessible as "python" because it is in the PATH variable, but bazel seems to ignore that and looks for python in the standard location. I am not an expert in bazel by any means, so any help in how to get around this issue is greatly appreciated.

Here is the command used for build (all necessary libraries have been compiled. I didn't use run-prereq.sh and build-prereq.sh, but I installed them manually).

Command used (this was edited into build_and_test.sh, and build_and_test.sh was run after the edits)

bazel test --host_javabase=@local_jdk//:jdk -c opt --local_test_jobs=1 ${DV_COPT_FLAGS} "$@" \
    deepvariant/...

settings.sh was changed as follows:

export DV_USE_PREINSTALLED_TF="1"
export TF_NEED_GCP=0
export CUDNN_INSTALL_PATH="/usr"
export DV_GPU_BUILD="1"
export DV_INSTALL_GPU_DRIVERS="0"
export PYTHON_BIN_PATH='/opt/at11.0/bin/python'
export PYTHON_LIB_PATH='/opt/at11.0/lib64/python3.6/site-packages'
export USE_DEFAULT_PYTHON_LIB_PATH=0
export DV_COPT_FLAGS="--copt=-mcpu=native --copt=-Wno-sign-compare --copt=-Wno-write-strings --copt=-DNO_WARN_X86_INTRINSICS"

Error trace:

(15:44:57) ERROR: /root/.cache/bazel/_bazel_root/8422bf851bfac3671a35809acde131a7/external/org_tensorflow/tensorflow/core/BUILD:2762:1: Executing genrule @org_tensorflow//tensorflow/core:version_info_gen failed (Exit 1): bash failed: error executing command 
  (cd /root/.cache/bazel/_bazel_root/8422bf851bfac3671a35809acde131a7/execroot/com_google_deepvariant && \
  exec env - \
    CUDA_TOOLKIT_PATH=/usr/local/cuda-10.0 \
    GCC_HOST_COMPILER_PATH=/opt/at11.0/bin/gcc \
    LD_LIBRARY_PATH=/usr/local/nvidia/lib:/usr/local/nvidia/lib64 \
    OMP_NUM_THREADS=1 \
    PATH=/root/bin:/opt/at11.0/bin:/opt/at11.0/sbin:/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin \
    PYTHON_BIN_PATH=/opt/at11.0/bin/python \
    PYTHON_LIB_PATH=/opt/at11.0/lib64/python3.6/site-packages \
    TF_CONFIGURE_IOS=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=3.7,6.0,7.0 \
    TF_CUDA_VERSION=10.0 \
    TF_CUDNN_VERSION=7 \
    TF_NEED_CUDA=1 \
  /bin/bash -c 'source external/bazel_tools/tools/genrule/genrule-setup.sh; bazel-out/host/bin/external/org_tensorflow/tensorflow/tools/git/gen_git_source --generate external/local_config_git/gen/spec.json external/local_config_git/gen/head external/local_config_git/gen/branch_ref "bazel-out/ppc-opt/bin/external/org_tensorflow/tensorflow/core/util/version_info.cc" --git_tag_override=${GIT_TAG_OVERRIDE:-}')
Execution platform: @bazel_tools//platforms:host_platform
Traceback (most recent call last):
  File "bazel-out/host/bin/external/org_tensorflow/tensorflow/tools/git/gen_git_source", line 252, in <module>
    Main()
  File "bazel-out/host/bin/external/org_tensorflow/tensorflow/tools/git/gen_git_source", line 242, in Main
    os.execv(args[0], args)
FileNotFoundError: [Errno 2] No such file or directory: '/usr/bin/python3.6'
(15:44:57) INFO: Elapsed time: 34.327s, Critical Path: 15.32s
(15:44:57) INFO: 910 processes: 910 local.
(15:44:57) FAILED: Build did NOT complete successfully

I am running on a CentOS 7 docker container. I am trying to build DeepVariant 1.0 (the current github release).

@anands-repo
Copy link
Author

Upon further examination I find that the file bazel-out/host/bin/external/org_tensorflow/tensorflow/tools/git/gen_git_source has the following hard-coded into it:

PYTHON_BINARY = '/usr/bin/python3.6'

I am not sure where this is generated from.

@anands-repo
Copy link
Author

Solved using: bazelbuild/bazel#4815 (comment)

However, I will keep this issue open for a little while for any further comments.

@gunjanbaid
Copy link
Contributor

gunjanbaid commented Sep 24, 2020

Hi @anands-repo, glad you were able to get it working! I don't have any other comments on the fix and will defer to the relevant bazel issue.

In general, I would recommend running DeepVariant using Docker for the simplest setup. If you are building from source because you want to experiment with changes to the codebase, I'd still recommend Docker. You can clone the DeepVariant repo, modify the source code, and build a Docker image with your changes using the provided Dockerfile.

@anands-repo
Copy link
Author

anands-repo commented Sep 25, 2020

Hi @gunjanbaid

Unfortunately I am not compiling for x86, but for IBM power, so most of the installation scripts need to be discarded, and packages need to be manually compiled from source using IBM's Advance Toolchain gcc compilers. I have finally gotten all bazel tests to complete as well as the build to complete.

I was wondering whether you could explain one piece of the build files though - this is just out of curiosity. In build_release_binaries, there is a function that starts as follows - which seems to be performing a hack to fix something:

# Bazel's --build_python_zip replaces our carefully engineered symbolic links
# with copies.  This function puts the symbolic links back.
function fix_zip_file {
  orig_zip_file=$1

  # Step 1:  Copy the zip file to a temporary place.
  TMPDIR=$(mktemp -d -t tmp.XXXXXXXXXXX)
  # The .zip version of the binary doesn't have the header that makes it
  # self-executable.  We use that version because otherwise unzip would
  # complain and raise an error code.
  cp "${orig_zip_file}.zip" "${TMPDIR}"

Would you be able to give a quick explanation of what the problem is? I understand what it does, but I do not understand why it is needed, or whether it is just for convenience.

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants