Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] deepspeed prints a warning when the user's home directory is on NFS even when TRITON_CACHE_DIR is set to a non-NFS directory #6486

Closed
jrandall opened this issue Sep 4, 2024 · 1 comment · Fixed by #6487
Labels
bug Something isn't working inference

Comments

@jrandall
Copy link
Contributor

jrandall commented Sep 4, 2024

When the user's home directory is on NFS, deepspeed prints a warning such as this one:

Warning: The default cache directory for DeepSpeed Triton autotune, /home/ubuntu/.triton/autotune, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path.

However, deepspeed prints this warning even when TRITON_CACHE_DIR is set. Although technically the warning is correct as it says "The default cache directory" it is also very misleading as it is irrelevant when TRITON_CACHE_DIR is set to a non-NFS directory. Furthermore, the warning is not be printed when the home directory is not on NFS but TRITON_CACHE_DIR is explicitly set to an NFS directory.

I suggest refactoring the logic that checks for the NFS directory and prints the warning to do so after the actual cache dir lookup is performed.

I.e. the code in

if is_nfs_path(tmp_path):
print(
f"Warning: The default cache directory for DeepSpeed Triton autotune, {tmp_path}, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path."
)

    if is_nfs_path(tmp_path):
        print(
            f"Warning: The default cache directory for DeepSpeed Triton autotune, {tmp_path}, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path."
        )

Should be moved to operate on self.cache_dir (at

) and the message can just be changed to reflect that it refers to the active cache directory rather than the default. Something like:

    if is_nfs_path(self.cache_dir):
        print(
            f"Warning: The cache directory for DeepSpeed Triton autotune, {self.cache_dir}, appears to be on an NFS system. While this is generally acceptable, if you experience slowdowns or hanging when DeepSpeed exits, it is recommended to set the TRITON_CACHE_DIR environment variable to a non-NFS path."
        )
@loadams
Copy link
Contributor

loadams commented Sep 4, 2024

Thanks for the PR @jrandall

github-merge-queue bot pushed a commit that referenced this issue Sep 4, 2024
…ult (#6487)

move the logic that prints a warning when triton cache dir is on NFS to
act on the actual calculated cache_dir rather than on the default.

this means that:
- when the default directory (in the user's home directory) is on NFS
but `TRITON_CACHE_DIR` is set to a non-NFS directory, no warning will be
printed whereas prior to this change a spurious and confusing warning
was printed
- when the user's home directory is not on NFS but `TRITON_CACHE_DIR` is
set to an NFS directory, a warning will be printed whereas prior to this
change no warning would be printed
 
fixes #6486
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working inference
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants