You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When the user's home directory is on NFS, deepspeed prints a warning such as this one:
Warning: The default cache directory for DeepSpeed Triton autotune, /home/ubuntu/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.
However, deepspeed prints this warning even when TRITON_CACHE_DIR is set. Although technically the warning is correct as it says "The default cache directory" it is also very misleading as it is irrelevant when TRITON_CACHE_DIR is set to a non-NFS directory. Furthermore, the warning is not be printed when the home directory is not on NFS but TRITON_CACHE_DIR is explicitly set to an NFS directory.
I suggest refactoring the logic that checks for the NFS directory and prints the warning to do so after the actual cache dir lookup is performed.
f"Warning: The default cache directory for DeepSpeed Triton autotune, {tmp_path}, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path."
)
if is_nfs_path(tmp_path):
print(
f"Warning: The default cache directory for DeepSpeed Triton autotune, {tmp_path}, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path."
)
) and the message can just be changed to reflect that it refers to the active cache directory rather than the default. Something like:
if is_nfs_path(self.cache_dir):
print(
f"Warning: The cache directory for DeepSpeed Triton autotune, {self.cache_dir}, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path."
)
The text was updated successfully, but these errors were encountered:
…ult (#6487)
move the logic that prints a warning when triton cache dir is on NFS to
act on the actual calculated cache_dir rather than on the default.
this means that:
- when the default directory (in the user's home directory) is on NFS
but `TRITON_CACHE_DIR` is set to a non-NFS directory, no warning will be
printed whereas prior to this change a spurious and confusing warning
was printed
- when the user's home directory is not on NFS but `TRITON_CACHE_DIR` is
set to an NFS directory, a warning will be printed whereas prior to this
change no warning would be printed
fixes#6486
When the user's home directory is on NFS, deepspeed prints a warning such as this one:
However, deepspeed prints this warning even when
TRITON_CACHE_DIR
is set. Although technically the warning is correct as it says "The default cache directory" it is also very misleading as it is irrelevant whenTRITON_CACHE_DIR
is set to a non-NFS directory. Furthermore, the warning is not be printed when the home directory is not on NFS butTRITON_CACHE_DIR
is explicitly set to an NFS directory.I suggest refactoring the logic that checks for the NFS directory and prints the warning to do so after the actual cache dir lookup is performed.
I.e. the code in
DeepSpeed/deepspeed/ops/transformer/inference/triton/matmul_ext.py
Lines 41 to 44 in 359c12e
Should be moved to operate on
self.cache_dir
(atDeepSpeed/deepspeed/ops/transformer/inference/triton/matmul_ext.py
Line 78 in 359c12e
The text was updated successfully, but these errors were encountered: