You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have seen the failure in uct_sm_ep_get_bcopy and uct_sm_ep_put_short. In each case, the function makes a call to memcpy resulting in:
[thunderx2p-1:83303:0] Caught signal 7 (Bus error: nonexistent physical address)
==== backtrace ====
===================
[thunderx2p-1:83303:0] Process frozen.
Running gtest will sometimes cause the failure, other times, all tests succeed. I have seen this with 1.2.0 and master branch.
I can reliably reproduce this bug using OpenSHMEM calls in OpenMPI 2.1.0 compiled over UCX+XPMEM. Below is an error log and backtrace for a shmem implementation of integer sorting.
This failure occurs randomly on ThunderX processors when UCX is built on top of the XPMEM kernel module (https://github.com/hjelmn/xpmem).
I have seen the failure in uct_sm_ep_get_bcopy and uct_sm_ep_put_short. In each case, the function makes a call to memcpy resulting in:
Running gtest will sometimes cause the failure, other times, all tests succeed. I have seen this with 1.2.0 and master branch.
I can reliably reproduce this bug using OpenSHMEM calls in OpenMPI 2.1.0 compiled over UCX+XPMEM. Below is an error log and backtrace for a shmem implementation of integer sorting.
The text was updated successfully, but these errors were encountered: