Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure when using S3 remote cache #3523

Closed
helger opened this issue Mar 23, 2020 · 10 comments · Fixed by #3524
Closed

Failure when using S3 remote cache #3523

helger opened this issue Mar 23, 2020 · 10 comments · Fixed by #3524
Labels
bug Did we break something? p1-important Important, aka current backlog of things to do

Comments

@helger
Copy link

helger commented Mar 23, 2020

Trying to run the command

dvc run -d s3://mybucket/data.txt -o s3://mybucket/data_copy.txt aws s3 cp s3://mybucket/data.txt s3://mybucket/data_copy.txt

Using AWS S3 remote cache.

It fails with the error message ERROR: unexpected error - Parameter validation failed: Invalid length for parameter Key, value: 0, valid range: 1-inf

Attached is the log with -v turned on.

log.txt

and out of pip freeze

pip_freeze.txt

@efiop efiop transferred this issue from iterative/dvc.org Mar 23, 2020
@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Mar 23, 2020
@efiop efiop added the bug Did we break something? label Mar 23, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Mar 23, 2020
@efiop
Copy link
Contributor

efiop commented Mar 23, 2020

@helger Could you also show us your $ dvc version output, please?

@helger
Copy link
Author

helger commented Mar 23, 2020

@helger Could you also show us your $ dvc version output, please?

 dvc version
DVC version: 0.90.2
Python version: 3.7.4
Platform: Linux-4.4.0-1104-aws-x86_64-with-debian-stretch-sid
Binary: False
Package: pip
Supported remotes: azure, gs, hdfs, http, https, s3, ssh, oss
Filesystem type (workspace): ('ext4', '/dev/xvda1')

@helger
Copy link
Author

helger commented Mar 23, 2020

@helger Could you also show us your $ dvc version output, please?

dvc version
DVC version: 0.90.2
Python version: 3.7.4
Platform: Linux-4.4.0-1104-aws-x86_64-with-debian-stretch-sid
Binary: False
Package: pip
Supported remotes: azure, gs, hdfs, http, https, s3, ssh, oss
Filesystem type (workspace): ('ext4', '/dev/xvda1')

@helger
Copy link
Author

helger commented Mar 23, 2020

Output of pip freeze

pip_freeze.txt

@efiop
Copy link
Contributor

efiop commented Mar 23, 2020

@skshetry
Copy link
Member

skshetry commented Mar 23, 2020

This happens because we try to create parent when trying to link from cache to workspace (and, here, parent of s3://<bucket>/<name> is "", i.e. empty).

dvc/dvc/remote/base.py

Lines 380 to 385 in ee26afe

def _link(self, from_info, to_info, link_types):
assert self.isfile(from_info)
self.makedirs(to_info.parent)
self._try_links(from_info, to_info, link_types)

If the file is in s3://<bucket>/<namespaced>/<name>, this won't fail.

I don't think, we need to create parent directories for object-based storages except when checking out directory.

Google Cloud Storage should equally be affected here as it's the same logic.

@skshetry skshetry added the p0-critical Critical issue. Needs to be fixed ASAP. label Mar 23, 2020
@helger
Copy link
Author

helger commented Mar 23, 2020

@skshetry indeed adding another level fixes the issue

@efiop
Copy link
Contributor

efiop commented Mar 23, 2020

@skshetry Great catch! 🙏 In general we do need to create those, because s3 and s3 tools sometimes create dirs like that, it was discussed in #2683 . We don't need-need to do that, but it is a compromise to handle both approaches. But we definitely need to handle this corner case better.

@skshetry
Copy link
Member

skshetry commented Mar 24, 2020

@efiop, it depends on how you see it. We can handle this corner case, but, there's no need for us to create a directory in most of the cases (except of #2678 cases?).

i.e. dvc add s3://dvc-bucket/data/file should not even try to create s3://dvc-bucket/data/ as a directory.

@skshetry skshetry added p1-important Important, aka current backlog of things to do and removed p0-critical Critical issue. Needs to be fixed ASAP. labels Mar 24, 2020
@efiop
Copy link
Contributor

efiop commented Mar 24, 2020

i.e. dvc add s3://dvc-bucket/data/file should not even try to create s3://dvc-bucket/data/ as a directory.

@skshetry It does that as a part of linking, so it is fine. What is actually wrong here is that corner case handling, indeed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? p1-important Important, aka current backlog of things to do
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants