-
Notifications
You must be signed in to change notification settings - Fork 6.8k
Relocation truncation issues #17045
Comments
To solve this, I think we can instruct the compiler to always use 64 bit relocations instead of 32 bit relocations (that may overflow), - [1]: Still happens with |
My personal experience is that using 64bit relocation is fine on x86-64, so I am in favor of such change :-) |
Linking |
Looking at the "By default" it fails like
Enabling
And when setting
|
Any updates? I run into similiar issues recently. |
I ran into similar issue with the latest master. |
I run into the same issue with the latest master. |
Same issue |
@ptrendx is working on a fix (cf #18280 (comment)) |
Same issue on lastest master branch. |
met the same issue |
Set -DMXNET_CUDA_ARCH=7.0 or whatever arch you're targeting as workaround. |
thanks leezu |
We get the same issue on PyTorch on CUDA 11 recently pytorch/pytorch#39968 |
Happened again for the cu101 build: https://jenkins.mxnet-ci.amazon-ml.com/job/restricted-mxnet-cd/job/mxnet-cd-release-job/1525/execution/node/177/log/ |
@eric-haibin-lin that pipeline isn't the one that produces the nightly builds. |
Problem still exist when building on Jetson NX
Here's my cmake config
|
Isn't it turned on by default, I used the code pulled from master, the problem still exists. I can compile it on normal pc but not on jetson. |
Please paste the full cmake configure log. Also note that your Jetson uses AARCH64 and not X86 arch. The code memory model is different to X86 and compiler support generally much worse than on X86 (for example, if position independent code is required, gcc / clang may not implement anything but the default model, thus limiting the size of binary and causing relocation issue above). We do test compiling MXNet on the Jetson AARCH64 architecture (https://github.com/apache/incubator-mxnet/blob/master/ci/docker/Dockerfile.build.jetson), so in principle things should work and we just need to figure out how your environment differs from the tested one. |
Here's the cmake output:
|
Could you try matching the following build configuration (modulo DCMAKE_TOOLCHAIN_FILE and the CUDA version) Ie. our test suite builds for jetson without opencv and without lapack feature. You may also want to try ensure that you specify the |
Still the same:
Here's my cmake log:
|
Please ensure your system toolchain is up to date (includes https://bugzilla.redhat.com/show_bug.cgi?id=1243559 fix) You may also simply use the cross-compilation option by installing the cross-toolchain on your host system analogous to |
I think my system toolchain is up to date, I am using jetpack 4.3. If not, how to update system toolchain? |
The binutils is not part of jetpack. It is part of the operating system. You can check what package version is provided by the operating system used by your device. With repsect to jetpack, we recommend you update to 4.4, as this is the version tested by our CI. |
After some testing, I finally managed to build it. I updated ccache and openblas similar to
Then, I restarted the jetson and built it with these commands
I also added a 8GB swap so that I can build with all 6 cores. Based on the changes above, I don't know which is the main cause that solved the issue. Thanks @leezu for your help. |
@wms2537 Thanks for sharing your tip. Would you mind sharing your "CMakeLists.txt" (if you modified) or modified command at the I tried to build in Nvidia Jetson (AGX Orin) and am also having the same error of
|
Description
libmxnet.so
gets too large (depending on compile options), so that linking fails. This was observed before on CI with test coverage functionality enabled (#15971), but can also happen with non-test-coverage builds, such as-DUSE_INT64_TENSOR_SIZE=ON
build.I first observe this in the #17031 (http://jenkins.mxnet-ci.amazon-ml.com/blue/rest/organizations/jenkins/pipelines/mxnet-validation/pipelines/unix-gpu/branches/PR-17031/runs/6/nodes/52/steps/84/log/?start=0), but can easily reproduce it on the master branch when building with GCC 7.4.
Error Message
From the CI
Compiling master version with GCC on Ubuntu 18.04 (Deep Learning AMI) gives an equivalent error message (though slightly different wording due to GCC vs Clang).
To Reproduce
cmake -DUSE_SIGNAL_HANDLER=ON -DUSE_CUDA=ON -DUSE_CUDNN=ON -DPython3_EXECUTABLE=/usr/bin/python3 -DUSE_MKL_IF_AVAILABLE=OFF -DUSE_MKLDNN=OFF -DUSE_DIST_KVSTORE=ON -DCMAKE_BUILD_TYPE=Release -DCUDA_ARCH_NAME=Manual -DCUDA_ARCH_BIN=52,70 -DUSE_INT64_TENSOR_SIZE=ON ..
on Ubuntu 18.04 (gcc 7.4, ld 2.3), where the CMake options here are taken from the
build_ubuntu_gpu_large_tensor
CI run.Environment
Environment used for reproducing the error with master version of MXNet.
The text was updated successfully, but these errors were encountered: