From d414678df77c8a6fb4bc112fb0aa3bd81d677d3f Mon Sep 17 00:00:00 2001 From: john li Date: Wed, 7 Jun 2023 19:59:35 -0400 Subject: [PATCH] Small tweak on cuda version mismatch documentation (#3706) * Small tweak on cuda version mismatch documentation * clarify minor versions should also match --------- Co-authored-by: Logan Adams <114770087+loadams@users.noreply.github.com> --- docs/_tutorials/advanced-install.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/docs/_tutorials/advanced-install.md b/docs/_tutorials/advanced-install.md index c1153742f964..31987e6a87db 100755 --- a/docs/_tutorials/advanced-install.md +++ b/docs/_tutorials/advanced-install.md @@ -155,15 +155,20 @@ If you're getting the following error: Exception: >- DeepSpeed Op Builder: Installed CUDA version {VERSION} does not match the version torch was compiled with {VERSION}, unable to compile cuda/cpp extensions without a matching cuda version. ``` You have a misaligned version of CUDA installed compared to the version of CUDA -used to compile torch. We only require that major version match (e.g., 11.1 and -11.8 are OK). However a mismatch in the major version may result in unexpected -behavior and errors. +used to compile torch. A mismatch in the major version is likely to result in +errors or unexpected behavior. The easiest fix for this error is changing the CUDA version installed (check with `nvcc --version`) or updating the torch version to match the installed CUDA version (check with `python3 -c "import torch; print(torch.__version__)"`). -If you want to skip this check and proceed with the mismatched CUDA versions, use the following environment variable: +We only require that the major version matches (e.g., 11.1 and 11.8). However, +note that even a mismatch in the minor version _may still_ result in unexpected +behavior and errors, so it's recommended to match both major and minor versions. +When there's a minor version mismatch, DeepSpeed will log a warning. + +If you want to skip this check and proceed with the mismatched CUDA versions, +use the following environment variable, but beware of unexpected behavior: ```bash DS_SKIP_CUDA_CHECK=1