-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to dvc add more than 1 file at a time in s3 bucket #2678
Comments
dvc add
more than 1 file at a time in s3 bucket
Also here are the contents of my .dvc/config file if it helps:
|
Hi @andronovhopf ! Which EDIT: sorry, missed that you've already provided the version and it is indeed the latest one. Looks like we have a bug, looking into it now... |
@andronovhopf Could you please post output of |
No need for logs. I am able to reproduce by creating an empty dir first. Working on a fix right now.
|
For the record, full log:
|
Test
|
@efiop @MrOutis sorry forgot to put the context - https://discordapp.com/channels/485586884165107732/485596304961962003/637836708184064010 |
@andronovhopf Sorry for the delay, we are attending OS Summit in Lyon right now 🙂A proper patch is coming, but the workaround for you would be to do:
Note the
Deleting it doesn't delete the data inside of it. Btw, how did you upload the data there? ec2 instance through some s3fs? Or through s3api(e.g. python's boto3 or awscli CLI tool)? I'm wondering about it, because usually tools don't create an "empty dir" for s3, because s3 doesn't actually have the concept of dirs and prefixes are not required to be pre-created. Thank you for your patience 🙂 |
Hi @efiop thanks for your help! I believe I created the bucket and the data "directory" through the AWS dashboard GUI and then uploaded a bunch of files from my local machine using
In the future, what commands would you recommend using to create separate cache and data locations when preparing an s3 bucket for DVC? EDIT: Clarified: I've just learned it is not necessary to make a cache "empty directory" before the first |
Sorry for jumping late to the discussion (AFK during the weekend 🙈 )! @efiop , I was testing already for empty directories but forgot to do it with dvc/tests/unit/remote/test_s3.py Line 30 in 5d6918a
I would change your patch a little bit to the following: diff --git a/tests/unit/remote/test_s3.py b/tests/unit/remote/test_s3.py
index 7861fb5a..fb8393a0 100644
--- a/tests/unit/remote/test_s3.py
+++ b/tests/unit/remote/test_s3.py
@@ -80,6 +80,8 @@ def test_walk_files(remote):
remote.path_info / "data/subdir/1",
remote.path_info / "data/subdir/2",
remote.path_info / "data/subdir/3",
+ remote.path_info / "empty_file",
+ remote.path_info / "foo",
]
- assert list(remote.walk_files(remote.path_info / "data")) == files
+ assert list(remote.walk_files(remote.path_info)) == files |
** Note: DVC version 0.66.1, pip install, Ubuntu 16.04
I am having difficulty configuring DVC to track files on an s3 bucket with the structure
Specifically, I want to use DVC to version control *.png files stored in the data folder, and use the cache folder as DVC's cache.
Based on the docs provided here, I believe I've replicated exactly the provided steps. But I hit an error when I run
dvc add
:The output looks initially encouraging,
But then I get this error message:
I'm positive that there are files in the data folder and can view them by
aws s3 ls s3://ellesdatabucket/data
. And, if I try to rundvc add
with only a single file at a time instead of a whole directory, the command completes successfully. Although a bash script coulddvc add
each file in a loop, I want to make sure there's not a better way. Issue 2647 seems to be discussing a similar problem but I can't figure out how to apply that code to my own example here.Thank you for any help!
More context:
https://discordapp.com/channels/485586884165107732/485596304961962003/637836708184064010
The text was updated successfully, but these errors were encountered: