Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected error when cache and artifacts are not on the same partition on SSH #2704

Closed
mslapek opened this issue Oct 31, 2019 · 15 comments · Fixed by #2709
Closed

Unexpected error when cache and artifacts are not on the same partition on SSH #2704

mslapek opened this issue Oct 31, 2019 · 15 comments · Fixed by #2709
Labels
bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. research

Comments

@mslapek
Copy link

mslapek commented Oct 31, 2019

Setup

pip installed, version 0.66.1.
Platform: Ubuntu 18.04

Steps to reproduce

  1. Create SSH remote cache, and configure it as SSH cache.
  2. Create directory on the same computer, but on other partition. Add some test.txt to this dir.
  3. Add this remote directory as SSH remote called workspace.
  4. Run dvc add remote://workspace/test.txt -f test.dvc.

Bash script reproducing error:

# arrange
workspace=$(mktemp -d)
cache=/tmp/ram

mkdir -p /tmp/ram
sudo mount -t tmpfs -o size=512M tmpfs /tmp/ram  # creating distinct partition for cache

cd $(mktemp -d)

dvc init --no-scm

dvc remote add workspace ssh://localhost/${workspace}
dvc remote add cache ssh://localhost/${cache}
dvc config cache.ssh cache

echo "foo" > ${workspace}/foo

# act
dvc add remote://workspace/foo

# cleanup
sudo umount /tmp/ram
rmdir /tmp/ram

Observed behavior

100%|██████████|Add                                                                                          1/1 [00:00<00:00,  1.02file/s]
ERROR: unexpected error - Failure

Expected behavior

I believe that by design DVC should raise error in this case - in order to avoid implicit movement of large files between partitions.

However, the error message should suggest solution:

ERROR: File move to cache failed.
Ensure that file remote://workspace/test.txt is on the same partition as SSH cache.

Some people might by mistake put file to different partition (it happened to me! - such message could save me a bit of time 😉).


Because I had already debugged/studied --verbose DVC to understand the source of the problem, I could try to implement such message through Pull Request.

@shcheklein
Copy link
Member

@mslapek it looks like a bug to me. Could you please share dvc add -v ...?

DVC indeed copies files (or utilizes reflinks if they are available) by default, but it's possible to setup cache (at least local cache) to use different strategies - symlinks in case when you have files on different partitions/disks. It can be a valid configuration on its own. See more info here https://dvc.org/doc/command-reference/config#cache (cache.type) and here https://dvc.org/doc/user-guide/large-dataset-optimization.

I'll check if cache.type can be used independently for SSH external cache. Not sure yet.

@ghost
Copy link

ghost commented Oct 31, 2019

@mslapek , as @shcheklein mentioned, you can use symlinks to work across disks, here's an example that you can try:

workspace=$(mktemp -d)
cache=$(mktemp -d)

sudo systemctl start sshd

dvc init --no-scm

dvc remote add workspace ssh://localhost/${workspace}
dvc remote add cache ssh://localhost/${cache}
dvc remote modify cache type symlink

dvc config cache.ssh cache

echo "foo" > ${workspace}/foo
dvc add remote://workspace/foo

By the way, thanks a lot for the interest in contributing! 😃

@shcheklein
Copy link
Member

@MrOutis thanks!

It looks dvc remote modify cache type symlink kind of things is not documented anywhere. Probably should be part of the dvc remote modify command reference. Created a ticket for this - iterative/dvc.org#764

Failure to add a file (even with default copy semantic) still looks like a bug to me.

@jorgeorpinel
Copy link
Contributor

jorgeorpinel commented Nov 1, 2019

Chnging the cache type seems like a correct solution/workaround to this scenario, but could the dvc add output msg, ERROR: unexpected error - Failure, be improved as well?

@mslapek
Copy link
Author

mslapek commented Nov 1, 2019

I've added reproduce script to the issue inspired by @MrOutis script.

@shcheklein here is output for --verbose:

/tmp/tmp.wpChWkmYs2$ dvc add remote://workspace/foo --verbose
DEBUG: PRAGMA user_version;
DEBUG: fetched: [(3,)]
DEBUG: CREATE TABLE IF NOT EXISTS state (inode INTEGER PRIMARY KEY, mtime TEXT NOT NULL, size TEXT NOT NULL, md5 TEXT NOT NULL, timestamp TEXT NOT NULL)
DEBUG: CREATE TABLE IF NOT EXISTS state_info (count INTEGER)
DEBUG: CREATE TABLE IF NOT EXISTS link_state (path TEXT PRIMARY KEY, inode INTEGER NOT NULL, mtime TEXT NOT NULL)
DEBUG: INSERT OR IGNORE INTO state_info (count) SELECT 0 WHERE NOT EXISTS (SELECT * FROM state_info)
DEBUG: PRAGMA user_version = 3;
DEBUG: Establishing ssh connection with 'localhost' through port '22' as user 'michal'
DEBUG: {'remote://workspace/foo': 'modified'}                                   
DEBUG: Computed stage 'foo.dvc' md5: '413f16ae596eab35c72cadab1806f36e'         
DEBUG: cache 'ssh://michal@localhost//tmp/ram/d3/b07384d113edec49eaa6238ad5ff00' expected 'd3b07384d113edec49eaa6238ad5ff00' actual 'None'
DEBUG: Saving 'ssh://michal@localhost//tmp/tmp.PxAtsLpwFR/foo' to 'ssh://michal@localhost//tmp/ram/d3/b07384d113edec49eaa6238ad5ff00'.
DEBUG: cache 'ssh://michal@localhost//tmp/ram/d3/b07384d113edec49eaa6238ad5ff00' expected 'd3b07384d113edec49eaa6238ad5ff00' actual 'None'
100%|██████████|Add                               1/1 [00:00<00:00,  1.56file/s]
DEBUG: SELECT count from state_info WHERE rowid=?
DEBUG: fetched: [(0,)]
DEBUG: UPDATE state_info SET count = ? WHERE rowid = ?
ERROR: unexpected error - Failure
------------------------------------------------------------
Traceback (most recent call last):
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/main.py", line 43, in main
    ret = cmd.run()
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/command/add.py", line 24, in run
    fname=self.args.file,
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/repo/__init__.py", line 33, in wrapper
    ret = f(repo, *args, **kwargs)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/repo/scm_context.py", line 4, in run
    result = method(repo, *args, **kw)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/repo/add.py", line 54, in add
    stage.commit()
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/stage.py", line 745, in commit
    out.commit()
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/output/base.py", line 254, in commit
    self.cache.save(self.path_info, self.info)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/remote/base.py", line 486, in save
    self._save(path_info, checksum)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/remote/base.py", line 494, in _save
    self._save_file(path_info, checksum)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/remote/base.py", line 423, in _save_file
    self.move(path_info, cache_info)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/remote/ssh/__init__.py", line 230, in move
    ssh.move(from_info.path, to_info.path)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/dvc/remote/ssh/connection.py", line 189, in move
    self.sftp.rename(src, dst)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/paramiko/sftp_client.py", line 423, in rename
    self._request(CMD_RENAME, oldpath, newpath)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/paramiko/sftp_client.py", line 813, in _request
    return self._read_response(num)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/paramiko/sftp_client.py", line 865, in _read_response
    self._convert_status(msg)
  File "/home/michal/anaconda3/envs/ketchup/lib/python3.7/site-packages/paramiko/sftp_client.py", line 898, in _convert_status
    raise IOError(text)
OSError: Failure
------------------------------------------------------------

Having any troubles? Hit us up at https://dvc.org/support, we are always happy to help!

@mslapek
Copy link
Author

mslapek commented Nov 1, 2019

@MrOutis I've tried the script from issue with proposed dvc remote modify cache type symlink (the line was added after dvc config cache.ssh cache).

Unfortunately, it still raises unexpected error.

(the difference from your script is in creation of separate partition for cache)

@ghost
Copy link

ghost commented Nov 1, 2019

Got it, @mslapek. I see the problem now, the move method is using SFTP RENAME underneath and it doesn't work across partitions. Let me see if paramiko support this or we would need to write our own implementation.

@shcheklein shcheklein added bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. labels Nov 1, 2019
@ghost
Copy link

ghost commented Nov 1, 2019

@mslapek , you can try with the following patch: #2709

It works for me. Same scenario as before but this time across two partitions:

sda             8:0    0 465.8G  0 disk
└─sda2          8:2    0 465.3G  0 part
  └─vg        254:0    0 465.3G  0 crypt
    ├─vg-swap 254:1    0     8G  0 lvm   [SWAP]
    ├─vg-root 254:2    0   450G  0 lvm   /
    └─vg-misc 254:3    0     7G  0 lvm   /mnt/misc
workspace=$(mktemp -d)
cache=$(mktemp -d -p /mnt/misc)

sudo systemctl start sshd

dvc init --no-scm

dvc remote add workspace ssh://localhost/${workspace}
dvc remote add cache ssh://localhost/${cache}

dvc config cache.ssh cache

echo "foo" > ${workspace}/foo
dvc add remote://workspace/foo

@shcheklein shcheklein assigned ghost Nov 1, 2019
@shcheklein
Copy link
Member

@MrOutis while you are on it could you check if OSError: Failure is the only information paramiko's sftp gives us? That error message is terrible indeed.

@shcheklein
Copy link
Member

@MrOutis also, please confirm that symlinks do work in this case indeed so that we can update our docs.

@ghost
Copy link

ghost commented Nov 1, 2019

@MrOutis while you are on it could you check if OSError: Failure is the only information paramiko's sftp gives us? That error message is terrible indeed.

@shcheklein, agree 😅

@MrOutis also, please confirm that symlinks do work in this case indeed so that we can update our docs.

Will do.

@ghost
Copy link

ghost commented Nov 1, 2019

@shcheklein , symlinks works:

$ ls $workspace
foo -> //mnt/misc/tmp.5ClPQUCWr7/d3/b07384d113edec49eaa6238ad5ff00

@shcheklein
Copy link
Member

@shcheklein after you enable them with dvc remote modify myremote type symlinks,copy, right?

@ghost
Copy link

ghost commented Nov 1, 2019

@shcheklein , yes :)

@mslapek
Copy link
Author

mslapek commented Nov 2, 2019

Glad to see that my first issue on GitHub was labelled as p0-critical bug 🎉.

@MrOutis I tried the scenario with your patch - it works 😊 (tried with default settings and cache type symlink).

I think that the issue might be closed after merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p0-critical Critical issue. Needs to be fixed ASAP. research
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants