You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I come across a special case where the dvc push fails due to hosting the remote on a network shared drive (Samba Drive).
During pushing the files from the local cache to the remote storage, DVC will reach copyfilehere and attempt to do CoW (Copy on Write) here but due to the fact that both (cache and remote) are on different file systems, it will fail and try to unlink here the linking happened due to calling fcntl.ioctl()here and then copyfile will try to copy the file using shutil.copyfile(src, dst)here. At this point, the call fails due to FileNotFoundError exception as the dst is seen as file (os.path.exists(dst) returns True) but it is not actually found on the filesystem.
it seems that due to network latency or caching from SMB drive, the unlinking here is not reflected immediately and I've to wait few milliseconds until the dst path is removed and could be seen as not found on the filesystem (os.path.exists(dst) returns False). Then, normal copying using shutil.copyfile(src, dst) could work fine.
I've written a simple script by extracting the code snippets from lib to reproduce the error.
importosimportshutilfromsecretsimporttoken_urlsafeimportfcntlimporttimeFICLONE=0x40049409defreflink(src, dst):
src_fd=os.open(src, os.O_RDONLY)
try:
dst_fd=os.open(dst, os.O_WRONLY|os.O_CREAT|os.O_TRUNC, 0o666)
exceptOSError:
os.close(src_fd)
raisetry:
fcntl.ioctl(dst_fd, FICLONE, src_fd)
exceptOSErrorase:
print(">>>", e)
os.close(src_fd)
os.close(dst_fd)
try:
os.unlink(dst)
### Temp fix, comment below code to reproduce the error ###whileos.path.exists(dst): # Check if file still existstime.sleep(0.1)
print(f"{dst} still exists after 0.1 second")
### Temp fix, comment above code to reproduce the error ###print(dst, "exists: ", os.path.exists(dst)) # after unlink, the `dst` shouldn't be foundexceptOSError:
print(">>> failed to unlink")
passraiseelse:
os.close(src_fd)
os.close(dst_fd)
deftmp_fname(prefix: str="") ->str:
"""Temporary name for a partial download"""returnf"{prefix}.{token_urlsafe(16)}.tmp"rpath="/mnt/interior_sensing/test-dvc/randombinfile.bin"#dest Samba driveparent="/mnt/interior_sensing/test-dvc"lpath="./randombitsfile.bin"# local filesystemtmp_file=os.path.join(parent, tmp_fname())
# create random filefileSizeInBytes=1024*1024withopen(lpath, 'wb') asfout:
fout.write(os.urandom(fileSizeInBytes))
try:
reflink(lpath, tmp_file)
# NOTE: reflink may or may not clone src permissionsos.chmod(dest, 0o666&~umask)
os.replace(tmp_file, rpath)
print("Reflink Successfully !!!")
exit()
exceptOSError:
pass# if reflink is succeeded, code will not reach here. Otherwise, do normal copyshutil.copyfile(lpath, tmp_file)
os.replace(tmp_file, rpath)
Error output:
>>> [Errno 18] Invalid cross-device link
/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp exists: True
Traceback (most recent call last):
File "/home/user/env/test_dvc_script.py", line 67, in<module>
shutil.copyfile(lpath, tmp_file)
File "/home/user/env/lib/python3.9/shutil.py", line 266, in copyfile
with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp'
Output after applying the temp fix
>>> [Errno 18] Invalid cross-device link #due to the fact that src, dst are not different filesystems
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp still exists after 0.1 second
/mnt/interior_sensing/test-dvc/.k28cCuZ4Tx5SFpVu-dn9RQ.tmp exists: False
(env) user@machine:~/workspace$ ls /mnt/interior_sensing/test-dvc/
randombinfile.bin # file copied successfully
I'm not sure if this a case that should be handled or it is a case specific to my setup, but I thought it is worth mentioning.
The text was updated successfully, but these errors were encountered:
This does not look related to recent changes at all.
with open(dst, 'wb') as fdst:
FileNotFoundError: [Errno 2] No such file or directory: '/mnt/interior_sensing/test-dvc/.a4qKiU5jH5cZ124XMHY-dg.tmp
This should never fail as we are opening in mode="wb", unless /mnt/interior_sensing/test-dvc does not exist.
@abahnasy, does that directory exist? Can you try to add os.makedirs(...) in the script above and see if it works? (I guess it does since unlink did not return any error).
OS: Linux
Python 3.8
dvc 3.55.2
dvc-azure 3.1.0
dvc-data 3.16.5
dvc-http 2.32.0
dvc-objects 5.1.0
dvc-render 1.0.2
dvc-studio-client 0.21.0
dvc-task 0.4.0
Hi,
I come across a special case where the
dvc push
fails due to hosting the remote on a network shared drive (Samba Drive).During pushing the files from the local cache to the remote storage, DVC will reach
copyfile
here and attempt to do CoW (Copy on Write) here but due to the fact that both (cache and remote) are on different file systems, it will fail and try to unlink here the linking happened due to callingfcntl.ioctl()
here and thencopyfile
will try to copy the file usingshutil.copyfile(src, dst)
here. At this point, the call fails due toFileNotFoundError
exception as thedst
is seen as file (os.path.exists(dst)
returnsTrue
) but it is not actually found on the filesystem.it seems that due to network latency or caching from SMB drive, the unlinking here is not reflected immediately and I've to wait few milliseconds until the
dst
path is removed and could be seen as not found on the filesystem (os.path.exists(dst)
returnsFalse
). Then, normal copying usingshutil.copyfile(src, dst)
could work fine.I've written a simple script by extracting the code snippets from lib to reproduce the error.
Error output:
Output after applying the temp fix
I'm not sure if this a case that should be handled or it is a case specific to my setup, but I thought it is worth mentioning.
The text was updated successfully, but these errors were encountered: