Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding the script to build and run the rccl-tests for PTS #26

Open
wants to merge 33 commits into
base: master
Choose a base branch
from

Conversation

PedramAlizadeh
Copy link
Contributor

No description provided.

AddyLaddy and others added 23 commits October 25, 2021 16:30
Build with CUDARTLIB=cudart_static to remove dynamic linkage

Also removed unused curand and nvToolsExt dependencies

BUG 95
Add option to statically link cudart
* Added "verifiable", a suite of kernels for generating and verifying reduction
  input and output arrays in a bit-precise way.
* Data corruption errors now reported in number of wrong elements instead of max
  deviation.
* Use ncclGetLastError.
* Don't run hypercube on non-powers of 2 ranks.
* Fix to hypercube data verification.
* Use "thread local" as the defaut CUDA capture mode.
* Replaced pthread_yield -> sched_yield()
* Bugfix to the cpu-side barrier/allreduce implementations.
as relative to top-level directory. This done is by abspath'ing it before
passing it to subdirectory Makefile's.

The old behavior had two cases: with and without BUILDDIR being set by
the user. With BUILDDIR not set, the build dir would be named "build"
in the top-level directory. If BUILDDIR was set, then the build dir
would be placed at "src/${BUILDDIR}".

The new behavior is simpler, if BUILDDIR is not set then it defaults
to "build", and the directory holding the final build is always at just
"${BUILDDIR}" in the top level.
AlltoAll does not support in-place buffers
ncclGetLastError() was added in NCCL 2.13.0
all files compile now.
mpi tests also pass
error introduced with the web merger-resolution tool :-(
add the rccl/lib directory to the link path
the subdir entry is not actually required for the compilation.
make cmake stage also pass in CI
edgargabriel and others added 4 commits November 30, 2022 23:01
avoid a division by zero which seems to only occur for op=prod and
datatype=half, since the maximum exponent is small (15) and can exceed
the number of ranks.
fix algorithm assigning values in testsuite
PedramAlizadeh and others added 6 commits February 24, 2023 21:39
* Adding -pthread flag for linking issues into src/Makefile

* Adding -pthread flag for linking issues into CMakeLists.txt
we honor user requested MPI installations using MPI_PATH first,
and check afterwards for MPICH and Open MPI in the default
Ubuntu and RHEL installation directories.
@wenkaidu wenkaidu added the noCI Disable Jenkins for this PR. label May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
noCI Disable Jenkins for this PR.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants