Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: PermissionError: [WinError 5] while uploading dataset #1348

Open
Octoslav opened this issue Nov 19, 2024 · 6 comments · May be fixed by #1349
Open

Bug: PermissionError: [WinError 5] while uploading dataset #1348

Octoslav opened this issue Nov 19, 2024 · 6 comments · May be fixed by #1349
Labels
bug Something isn't working

Comments

@Octoslav
Copy link

Describe the Bug

When uploading a dataset using clearml-data sync on Windows, the following error intermittently occurs: PermissionError: [WinError 5] Access is denied.

Investigation

The error is triggered on the following line in the code:
https://github.com/allegroai/clearml/blob/master/clearml/binding/artifacts.py#L1141C18-L1141C19.

This issue is unstable and difficult to reproduce consistently. However, it becomes a significant problem when working with large datasets. For instance, my team faced this error while attempting to upload a dataset containing 700 000 items. The process took approximately 5 hours to compute all file hashes, only for the WinError 5 to unexpectedly appear.

Steps to Reproduce

The following script simulates the aforementioned line of code and demonstrates the issue:

import os
from tempfile import mkstemp
from tqdm import trange, tqdm
from shutil import move

for i in trange(100000):
    fd, local_filename = mkstemp()
    os.close(fd)
    fd, temp_filename = mkstemp()
    os.close(fd)
    
    try:
        os.replace(local_filename, temp_filename)
    except PermissionError:
        tqdm.write("PermissionError")
    
    os.remove(temp_filename)

When running this script on multiple Windows machines, I observed about 6 PermissionError exceptions over 100 000 iterations.
Running the script as administrator does not resolve the issue.

Possible Solution

I suggest replacing the use of os.replace(local_filename, temp_filename) with shutil.move(local_filename, temp_filename). Both methods achieve the same functionality, but shutil.move does not raise the PermissionError on Windows.

@Octoslav Octoslav added the bug Something isn't working label Nov 19, 2024
Octoslav pushed a commit to Octoslav/clearml that referenced this issue Nov 19, 2024
Octoslav added a commit to Octoslav/clearml that referenced this issue Nov 19, 2024
Octoslav added a commit to Octoslav/clearml that referenced this issue Nov 19, 2024
@Octoslav Octoslav linked a pull request Nov 19, 2024 that will close this issue
@jkhenning
Copy link
Member

@Octoslav are you saying that os.replace() raised by error by mistake? Or was there a concrete reason for it?

@Octoslav
Copy link
Author

Octoslav commented Nov 19, 2024

@Octoslav are you saying that os.replace() raised by error by mistake? Or was there a concrete reason for it?

As far as I know the race condition can occur: os.close() is executed, but Windows still keeps the file open for a short time in some cases. So os.replace(local_filename, temp_filename) throws an error.

shutil.move is a high level function and can handle such a case.

@Octoslav
Copy link
Author

@jkhenning Hi!

Do you have any thoughts? The error is quite annoying for windows users in my team.

@jkhenning
Copy link
Member

The solution seems OK to me, I'm just not comfortable with hiding errors in general 😄
Does this only happen in Windows? If so, I would prefer to have this behavior specific to windows, while keeping the previous behavior for other platforms

@Octoslav
Copy link
Author

@jkhenning

Does this only happen in Windows?

Didn't managed to reproduced it on linux machines

I'm just not comfortable with hiding errors in general

I didn't hide the error, I just used an another instrument that works better on Windows

@jkhenning
Copy link
Member

Any idea on the implications of this change in linux or other platforms?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants