Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dvc: add sanity checks for hard/symlinks #3088

Merged
merged 1 commit into from
Jan 23, 2020
Merged

Conversation

efiop
Copy link
Contributor

@efiop efiop commented Jan 8, 2020

Fixes #3080

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • πŸ“– Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

@efiop efiop force-pushed the 3080 branch 6 times, most recently from 7839ccb to 6fce869 Compare January 11, 2020 04:50
@efiop efiop changed the title [WIP] system: use FindFirstFile to count hardlinks [WIP] dvc: add sanity checks for hard/symlinks Jan 11, 2020
dvc/system.py Outdated
@@ -28,6 +35,8 @@ def hardlink(source, link_name):
except OSError as exc:
raise DvcException("failed to link") from exc

_verify_link(link_name, System.is_hardlink, "hardlink")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit worried about a potential performance hit, so might just add these checks to dvc version. Or maybe to cache type caching mechanism that we have, so we could only run it once. Will take a closer look soon.

dvc/system.py Outdated
@@ -36,6 +45,8 @@ def symlink(source, link_name):
except OSError as exc:
raise DvcException("failed to symlink") from exc

_verify_link(link_name, System.is_symlink, "symlink")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We haven't seen any weird symlinks yet, but tbh I won't be surprised...

@rxxg
Copy link
Contributor

rxxg commented Jan 21, 2020

Hi @efiop, I just tested this PR on my setup and I think it has discovered a latent bug.

(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg> mkdir dvc_repo


    Directory: \\network.drive\rxxg


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----       21/01/2020     13:47                dvc_repo


(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg> mkdir dvc_cache


    Directory: \\network.drive\rxxg


Mode                LastWriteTime         Length Name
----                -------------         ------ ----
d-----       21/01/2020     13:47                dvc_cache


(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg> cd dvc_repo
(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg\dvc_repo> git init -q
(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg\dvc_repo> dvc init -q
(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg\dvc_repo> echo Hello > file
(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg\dvc_repo> dvc config cache.type reflink,symlink,hardlink,copy
WARNING: You have changed the 'cache.type' option. This doesn't update any existing workspace file links, but it can be done with:
             dvc checkout --relink
Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg\dvc_repo> dvc config cache.dir \\network.drive\rxxg\dvc_cache
(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg\dvc_repo> cat .\.dvc\config
[cache]
type = "reflink,symlink,hardlink,copy"
dir = \\network.drive\rxxg\dvc_cache

So repo and cache are both on the network drive (SMB). The user cannot create symlinks. The network protocol doesn't support creating reflinks. The network drive, although capable of creating hardlinks, cannot tell the difference between an ordinary file and a hardlink (as seen with #3080). So we expect that only the copy type will be used.

(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg\dvc_repo> dvc add file -v
DEBUG: Trying to spawn '['c:\\users\\rxxg\\dvc-3080\\scripts\\python.exe', 'C:\\Users\\rxxg\\dvc-3080\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: Spawned '['c:\\users\\rxxg\\dvc-3080\\scripts\\python.exe', 'C:\\Users\\rxxg\\dvc-3080\\Scripts\\dvc', 'daemon', '-q', 'updater']'
ERROR: unexpected error - unable to open database file
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\main.py", line 47, in main
    ret = cmd.run()
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\command\add.py", line 23, in run
    fname=self.args.file,
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\repo\__init__.py", line 31, in wrapper
    with repo.lock, repo.state:
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\state.py", line 133, in __enter__
    self.load()
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\state.py", line 225, in load
    self.database = _connect_sqlite(self.state_file, {"nolock": 1})
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\state.py", line 489, in _connect_sqlite
    return sqlite3.connect(uri, uri=True)
sqlite3.OperationalError: unable to open database file
------------------------------------------------------------


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

First problem: DVC (actually sqlite3) doesn't like having its local config on a network drive. Let's try a workaround to make it less obvious and try again.

(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg\dvc_repo> net use x: \\network.drive\rxxg
The command completed successfully.

(dvc-3080) PS Microsoft.PowerShell.Core\FileSystem::\\network.drive\rxxg\dvc_repo> x:
(dvc-3080) PS X:\dvc_repo> dvc add file -v
DEBUG: Trying to spawn '['c:\\users\\rxxg\\dvc-3080\\scripts\\python.exe', 'C:\\Users\\rxxg\\dvc-3080\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: Spawned '['c:\\users\\rxxg\\dvc-3080\\scripts\\python.exe', 'C:\\Users\\rxxg\\dvc-3080\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: Adding 'file' to '.gitignore'.
DEBUG: Path file inode 849120961163171990
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: Path file inode 849120961163171990
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
DEBUG: {'file': 'modified'}
DEBUG: Path file inode 849120961163171990
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1579610897368964864', '16', 'de7511b07cd72b25dc78aa472ee21cbc', '1579611241895180288')]
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
DEBUG: Computed stage 'file.dvc' md5: 'a404616fc083f4e9e5ebfc7dd725faf0'
DEBUG: Saving 'file' to '\\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc'.
DEBUG: cache '\\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc' expected 'de7511b07cd72b25dc78aa472ee21cbc' actual 'None'
DEBUG: Cache type 'reflink' is not supported: reflink is not supported
DEBUG: Cache type 'symlink' is not supported: failed to symlink
DEBUG: Cache type 'hardlink' is not supported: hardlink validation failed
DEBUG: Cache type 'copy' is not supported: Link 'file' already exists!
100% Add|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ|1.00/1.00 [00:00<00:00,  1.06file/s]
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(0,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
ERROR: no possible cache types left to try out.
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\command\add.py", line 23, in run
    fname=self.args.file,
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\repo\__init__.py", line 32, in wrapper
    ret = f(repo, *args, **kwargs)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\repo\scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\repo\add.py", line 53, in add
    stage.commit()
  File "c:\users\rxxg\dvc-3080\lib\site-packages\funcy\decorators.py", line 39, in wrapper
    return deco(call, *dargs, **dkwargs)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\stage.py", line 161, in rwlocked
    return call()
  File "c:\users\rxxg\dvc-3080\lib\site-packages\funcy\decorators.py", line 60, in __call__
    return self._func(*self._args, **self._kwargs)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\stage.py", line 824, in commit
    out.commit()
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\output\base.py", line 246, in commit
    self.cache.save(self.path_info, self.info)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\remote\base.py", line 507, in save
    self._save(path_info, checksum)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\remote\base.py", line 515, in _save
    self._save_file(path_info, checksum)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\remote\base.py", line 417, in _save_file
    self.link(cache_info, path_info)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\remote\base.py", line 369, in link
    self._link(from_info, to_info, self.cache_types)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\remote\base.py", line 376, in _link
    self._try_links(from_info, to_info, link_types)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\remote\slow_link_detection.py", line 39, in wrapper
    return f(remote, *args, **kwargs)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\remote\base.py", line 392, in _try_links
    raise DvcException("no possible cache types left to try out.")
dvc.exceptions.DvcException: no possible cache types left to try out.
------------------------------------------------------------


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

Note two things:

  1. We don't get any complaints about the cache being on a network drive, just the repo.
  2. The cache file is created, but the operation still doesn't succeed.
(dvc-3080) PS X:\dvc_repo> dvc version -v
DVC version: 0.80.0+6fce86
Python version: 3.7.1
Platform: Windows-10-10.0.16299-SP0
Binary: False
Package: None
Cache: reflink - False, hardlink - False, symlink - False

Note that I can workaround the issue by setting the cache type to be copy directly. So something still isn't detecting that links aren't supported.

(dvc-3080) PS X:\dvc_repo> rmdir ..\dvc_cache\

Confirm
The item at X:\dvc_cache\ has children and the Recurse parameter was not specified. If you continue, all children will be removed with the item. Are you sure you want to
continue?
[Y] Yes  [A] Yes to All  [N] No  [L] No to All  [S] Suspend  [?] Help (default is "Y"):
(dvc-3080) PS X:\dvc_repo> dvc config cache.type copy
WARNING: You have changed the 'cache.type' option. This doesn't update any existing workspace file links, but it can be done with:
             dvc checkout --relink
(dvc-3080) PS X:\dvc_repo> dvc add file -v
DEBUG: Trying to spawn '['c:\\users\\rxxg\\dvc-3080\\scripts\\python.exe', 'C:\\Users\\rxxg\\dvc-3080\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: Spawned '['c:\\users\\rxxg\\dvc-3080\\scripts\\python.exe', 'C:\\Users\\rxxg\\dvc-3080\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: PRAGMA user_version;
DEBUG: fetched: [(3,)]
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: Adding 'file' to '.gitignore'.
DEBUG: Path file inode 4326578069353857046
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1579613151295783168', '16', 'de7511b07cd72b25dc78aa472ee21cbc', '1579613416460647424')]
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
DEBUG: {'file': 'modified'}
DEBUG: Path file inode 4326578069353857046
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1579613151295783168', '16', 'de7511b07cd72b25dc78aa472ee21cbc', '1579613447218944512')]
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
DEBUG: Computed stage 'file.dvc' md5: 'a404616fc083f4e9e5ebfc7dd725faf0'
DEBUG: Saving 'file' to '\\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc'.
DEBUG: cache '\\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc' expected 'de7511b07cd72b25dc78aa472ee21cbc' actual 'None'
DEBUG: Created 'copy': \\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc -> file
DEBUG: Path file inode 3695209259957233900
DEBUG: REPLACE INTO link_state(path, inode, mtime) VALUES (?, ?, ?)
DEBUG: Path file inode 3695209259957233900
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
DEBUG: Path \\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc inode 4326578069353857046
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1579613151295783168', '16', 'de7511b07cd72b25dc78aa472ee21cbc', '1579613447238496768')]
DEBUG: UPDATE state SET mtime = ?, size = ?, md5 = ?, timestamp = ? WHERE inode = ?
DEBUG: Saving information to 'file.dvc'.
100% Add|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ|1.00/1.00 [00:02<00:00,  2.03s/file]

To track the changes with git, run:

        git add .gitignore file.dvc
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(1,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?

@rxxg
Copy link
Contributor

rxxg commented Jan 21, 2020

Or maybe the problem is just that on link validation failure the link needs to be deleted (remember that this FS can create hardlinks but not see them).

@efiop
Copy link
Contributor Author

efiop commented Jan 21, 2020

@rxxg Thanks for trying it out! Correct, current patch doesn't delete the link, it was just created as a POC. I'm working on an updated version right now, that will actually delete the hardlink/symlink if it doesn't pass verification. Thanks for the heads up! πŸ™‚

@efiop efiop self-assigned this Jan 22, 2020
@efiop efiop added the enhancement Enhances DVC label Jan 22, 2020
@efiop efiop force-pushed the 3080 branch 3 times, most recently from 6ecb01e to dc344f0 Compare January 22, 2020 14:57
@rxxg
Copy link
Contributor

rxxg commented Jan 22, 2020

dvc add is working fine with the latest version, but there is a crash just after on first call of dvc version. Second call works fine.

(dvc-3080) PS Z:\dvc_repo> dvc version -v
DVC version: 0.82.1+dc344f
Python version: 3.7.1
Platform: Windows-10-10.0.16299-SP0
Binary: False
Package: None
Cache: reflink - not supported, hardlink - broken, symlink - not supported
(dvc-3080) PS Z:\dvc_repo> dvc add file
100% Add|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ|1.00/1.00 [00:02<00:00,  2.41s/file]

To track the changes with git, run:

        git add file.dvc
(dvc-3080) PS Z:\dvc_repo> dvc version -v
ERROR: unexpected error - [WinError 5] Access is denied: '\\\\network.drive\\rxxg\\dvc_cache\\.f2612e58-173f-4e92-8a04-0a4cc473a916'
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\main.py", line 48, in main
    ret = cmd.run()
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\command\version.py", line 46, in run
    "Cache: {}".format(self.get_linktype_support_info(repo))
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\command\version.py", line 111, in get_linktype_support_info
    os.remove(src)
PermissionError: [WinError 5] Access is denied: '\\\\network.drive\\rxxg\\dvc_cache\\.f2612e58-173f-4e92-8a04-0a4cc473a916'
------------------------------------------------------------


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
(dvc-3080) PS Z:\dvc_repo> dvc version -v
DVC version: 0.82.1+dc344f
Python version: 3.7.1
Platform: Windows-10-10.0.16299-SP0
Binary: False
Package: None
Cache: reflink - not supported, hardlink - broken, symlink - not supported

@efiop efiop force-pushed the 3080 branch 3 times, most recently from 0e29a83 to 6b1f185 Compare January 22, 2020 15:59
@efiop
Copy link
Contributor Author

efiop commented Jan 22, 2020

@rxxg Oh, thanks for trying it out! Pushed a new version that should handle that more gracefully. Please give it a try πŸ™‚

@rxxg
Copy link
Contributor

rxxg commented Jan 22, 2020

(dvc-3080) PS \\network.drive\rxxg> dvc add file -v
DEBUG: Trying to spawn '['c:\\users\\rxxg\\dvc-3080\\scripts\\python.exe', 'C:\\Users\\rxxg\\dvc-3080\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: Spawned '['c:\\users\\rxxg\\dvc-3080\\scripts\\python.exe', 'C:\\Users\\rxxg\\dvc-3080\\Scripts\\dvc', 'daemon', '-q', 'updater']'
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: Adding 'file' to '.gitignore'.
DEBUG: Path file inode 5667741391016463476
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: Path file inode 5667741391016463476
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
DEBUG: {'file': 'modified'}
DEBUG: Path file inode 5667741391016463476
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1579709801785156096', '16', 'de7511b07cd72b25dc78aa472ee21cbc', '1579709813082017792')]
DEBUG: UPDATE state SET timestamp = ? WHERE inode = ?
DEBUG: Computed stage 'file.dvc' md5: 'a404616fc083f4e9e5ebfc7dd725faf0'
DEBUG: Saving 'file' to '\\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc'.
DEBUG: cache '\\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc' expected 'de7511b07cd72b25dc78aa472ee21cbc' actual 'None'
DEBUG: Cache type 'reflink' is not supported: reflink is not supported
DEBUG: Cache type 'symlink' is not supported: failed to symlink
DEBUG: Created 'hardlink': \\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc -> file
DEBUG: Removing '\\network.drive\rxxg\file'
DEBUG: Cache type 'hardlink' is not supported: failed to verify hardlink
DEBUG: Created 'copy': \\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc -> file
DEBUG: Path file inode 5666308902420454206
DEBUG: REPLACE INTO link_state(path, inode, mtime) VALUES (?, ?, ?)
DEBUG: Path file inode 5666308902420454206
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: []
DEBUG: INSERT INTO state(inode, mtime, size, md5, timestamp) VALUES (?, ?, ?, ?, ?)
DEBUG: Path \\network.drive\rxxg\dvc_cache\de\7511b07cd72b25dc78aa472ee21cbc inode 5667741391016463476
DEBUG: SELECT mtime, size, md5, timestamp from state WHERE inode=?
DEBUG: fetched: [('1579709801785156096', '16', 'de7511b07cd72b25dc78aa472ee21cbc', '1579709813109092096')]
DEBUG: UPDATE state SET mtime = ?, size = ?, md5 = ?, timestamp = ? WHERE inode = ?
DEBUG: Saving information to 'file.dvc'.
100% Add|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ|1.00/1.00 [00:01<00:00,  1.48s/file]

To track the changes with git, run:

        git add .gitignore file.dvc
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(0,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
(dvc-3080) PS \\network.drive\rxxg> dvc version -v
DEBUG: Removing '\\network.drive\rxxg\.dccd8f66-73d1-4822-9c18-91cd46123f09'
DEBUG: Removing '\\network.drive\rxxg\dvc_cache\.dccd8f66-73d1-4822-9c18-91cd46123f09'
ERROR: unexpected error - [WinError 5] Access is denied: '\\\\network.drive\\rxxg\\dvc_cache\\.dccd8f66-73d1-4822-9c18-91cd46123f09'
------------------------------------------------------------
Traceback (most recent call last):
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\main.py", line 48, in main
    ret = cmd.run()
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\command\version.py", line 48, in run
    "Cache: {}".format(self.get_linktype_support_info(repo))
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\command\version.py", line 114, in get_linktype_support_info
    remove(src)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\utils\fs.py", line 140, in remove
    _chmod(os.unlink, path, None)
  File "c:\users\rxxg\dvc-3080\lib\site-packages\dvc\utils\fs.py", line 119, in _chmod
    perm = os.lstat(p).st_mode
PermissionError: [WinError 5] Access is denied: '\\\\network.drive\\rxxg\\dvc_cache\\.dccd8f66-73d1-4822-9c18-91cd46123f09'
------------------------------------------------------------


Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!
(dvc-3080) PS \\network.drive\rxxg> dvc version -v
DEBUG: Removing '\\network.drive\rxxg\.04afda28-153f-4255-bfdd-27edacea2de5'
DEBUG: Removing '\\network.drive\rxxg\dvc_cache\.04afda28-153f-4255-bfdd-27edacea2de5'
DVC version: 0.82.1+d3c711
Python version: 3.7.1
Platform: Windows-10-10.0.16299-SP0
Binary: False
Package: None
Cache: reflink - not supported, hardlink - broken, symlink - not supported
(dvc-3080) PS \\network.drive\rxxg>

@efiop
Copy link
Contributor Author

efiop commented Jan 22, 2020

@rxxg Very interesting. It allows us to create the file and doesn't allow to stat it. Very odd. Something is very broken in your setup, I'm not sure I understand the reasons...

@efiop
Copy link
Contributor Author

efiop commented Jan 22, 2020

@rxxg Maybe there is something special about the mounting parameters that you use?

@rxxg
Copy link
Contributor

rxxg commented Jan 22, 2020

@rxxg Maybe there is anything about the mounting parameters that you use?

Sorry, no idea. The strange thing is that the file can be easily deleted by explorer or rm command. And of course, the second call succeeds.

@efiop
Copy link
Contributor Author

efiop commented Jan 22, 2020

@rxxg Got it. But dvc add and other stuff works fine now, right? It should be able to smoothly fallback to symlink if you have hardlink, symlink configured.

@rxxg
Copy link
Contributor

rxxg commented Jan 22, 2020

@rxxg Got it. But dvc add and other stuff works fine now, right?

Fine in my limited testing.

@efiop
Copy link
Contributor Author

efiop commented Jan 22, 2020

@rxxg Got it. I think we could leave it as is, until new problems arise or someone else runs into the same issues, at which point we will need to do an in-depth investigation.

@efiop efiop marked this pull request as ready for review January 22, 2020 16:46
@efiop efiop changed the title [WIP] dvc: add sanity checks for hard/symlinks dvc: add sanity checks for hard/symlinks Jan 22, 2020
dvc/command/version.py Outdated Show resolved Hide resolved
@efiop efiop merged commit a041917 into iterative:master Jan 23, 2020
@efiop efiop deleted the 3080 branch January 23, 2020 09:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hardlinks not correctly detected on SMB network share (NTFS?)
3 participants