Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc push may fail for remote hosted on network attached drives #306

Open
abahnasy opened this issue Sep 11, 2024 · 3 comments
Open

dvc push may fail for remote hosted on network attached drives #306

abahnasy opened this issue Sep 11, 2024 · 3 comments
Assignees

Comments

@abahnasy
Copy link

OS: Linux
Python 3.8
dvc 3.55.2
dvc-azure 3.1.0
dvc-data 3.16.5
dvc-http 2.32.0
dvc-objects 5.1.0
dvc-render 1.0.2
dvc-studio-client 0.21.0
dvc-task 0.4.0

Hi,

I come across a special case where the dvc push fails due to hosting the remote on a network shared drive (Samba Drive).

During pushing the files from the local cache to the remote storage, DVC will reach copyfile here and attempt to do CoW (Copy on Write) here but due to the fact that both (cache and remote) are on different file systems, it will fail and try to unlink here the linking happened due to calling fcntl.ioctl() here and then copyfile will try to copy the file using shutil.copyfile(src, dst) here. At this point, the call fails due to FileNotFoundError exception as the dst is seen as file (os.path.exists(dst) returns True) but it is not actually found on the filesystem.

it seems that due to network latency or caching from SMB drive, the unlinking here is not reflected immediately and I've to wait few milliseconds until the dst path is removed and could be seen as not found on the filesystem (os.path.exists(dst) returns False). Then, normal copying using shutil.copyfile(src, dst) could work fine.

I've written a simple script by extracting the code snippets from lib to reproduce the error.

import os
import shutil
from secrets import token_urlsafe
import fcntl
import time

FICLONE = 0x40049409
def reflink(src, dst):
    src_fd = os.open(src, os.O_RDONLY)
    try:
        dst_fd = os.open(dst, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o666)
    except OSError:
        os.close(src_fd)
        raise

    try:
        
        fcntl.ioctl(dst_fd, FICLONE, src_fd)
    except OSError as e:
        print(">>>", e)
        os.close(src_fd)
        os.close(dst_fd)
        try:
            os.unlink(dst)

            ### Temp fix, comment below code to reproduce the error ###
            while os.path.exists(dst): # Check if file still exists
                time.sleep(0.1)
                print(f"{dst} still exists after 0.1 second")
            ### Temp fix, comment above code to reproduce the error ###
            
            print(dst, "exists: ", os.path.exists(dst)) # after unlink, the `dst` shouldn't be found
        except OSError:
            print(">>> failed to unlink")
            pass
        raise
    else:
        os.close(src_fd)
        os.close(dst_fd)

def tmp_fname(prefix: str = "") -> str:
    """Temporary name for a partial download"""
    return f"{prefix}.{token_urlsafe(16)}.tmp"



rpath = "/mnt/interior_sensing/test-dvc/randombinfile.bin" #dest Samba drive
parent = "/mnt/interior_sensing/test-dvc"
lpath = "./randombitsfile.bin" # local filesystem
tmp_file = os.path.join(parent, tmp_fname())
# create random file
fileSizeInBytes = 1024*1024
with open(lpath, 'wb') as fout:
    fout.write(os.urandom(fileSizeInBytes)) 
try:
    reflink(lpath, tmp_file)
    # NOTE: reflink may or may not clone src permissions
    os.chmod(dest, 0o666 & ~umask)
    os.replace(tmp_file, rpath)
    print("Reflink Successfully !!!")
    exit()
except OSError:
    pass
# if reflink is succeeded, code will not reach here. Otherwise, do normal copy
shutil.copyfile(lpath, tmp_file)
os.replace(tmp_file, rpath)

Error output:

>>> [Errno 18] Invalid cross-device link
/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp exists:  True
Traceback (most recent call last):
  File "/home/user/env/test_dvc_script.py", line 67, in <module>
    shutil.copyfile(lpath, tmp_file)
  File "/home/user/env/lib/python3.9/shutil.py", line 266, in copyfile
    with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp'

Output after applying the temp fix

>>> [Errno 18] Invalid cross-device link #due to the fact that src, dst are not different filesystems
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp exists:  False
(env) user@machine:~/workspace$ ls /mnt/interior_sensing/test-dvc/
randombinfile.bin # file copied successfully

I'm not sure if this a case that should be handled or it is a case specific to my setup, but I thought it is worth mentioning.

@shcheklein
Copy link
Member

@skshetry can it be related to the recent changes?

@shcheklein
Copy link
Member

@skshetry what is your take here?

@skshetry skshetry self-assigned this Oct 9, 2024
@skshetry
Copy link
Member

skshetry commented Oct 9, 2024

This does not look related to recent changes at all.

with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp

This should never fail as we are opening in mode="wb", unless /mnt/interior_sensing/test-dvc does not exist.

@abahnasy, does that directory exist? Can you try to add os.makedirs(...) in the script above and see if it works? (I guess it does since unlink did not return any error).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants