-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support adding directories in google cloud storage remote #2853
Conversation
Am I missing something? We agreed to not add this for now. The ticket #1654 is closed. |
FYI: added [WIP] prefix to make it more obvious ("draft" is fine, but prefix seems to be a bit easier to grasp, no biggie) π |
73d6102
to
87ee935
Compare
87ee935
to
fe248aa
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, good start.
We need to unify this and some existing code both in remotes and tests, see below.
Yep, I saw that, and this also suffers from same issue.
I didn't act on it because of #2683 (comment), as I am not really sure what you mean by solving it properly. I guess, I was just waiting for how it will be solved for cc @efiop EDIT: talked with efiop on chat
|
For the record, had an additional discussion on discord: https://discordapp.com/channels/485586884165107732/565699007037571084/649634041993101333 |
* master: address @efiop's suggestions s3: add support for top level empty directories remote: http: calculate length basing on response and not head call test: remote: http: clear test file name HTTPError: reformat error message Update dvc/remote/http.py remote: http: rename http error, refactor test remote: http: raise exception when download response with error status code Support GDrive as remote setup: don't forget contextlib2 remote: refactor ssh ask password code remote: protect all remote client/session creation code with locks test: refactor & remove redundant test fixtures NoRemoteInExternalRepoError: dont pass cause of exception perf: optimize cache listing for local, ssh and hdfs remote: small .save_info()/.get_checksum() cleanup
Wet should list all, not only first one indeed. Good catch.
ΡΠ±, 30 Π½ΠΎΡΠ±. 2019 Π³., 13:43 saugat pachhai <[email protected]>:
β¦ ***@***.**** commented on this pull request.
------------------------------
In dvc/remote/gs.py
<#2853 (comment)>:
> + dir_path = path_info / ""
+ fname = next(self._list_paths(path_info, max_items=1), "")
+ return path_info.path == fname or fname.startswith(dir_path.path)
Yes, sorry, I did not explain it correctly. The next() could provide you
subdir-file.txt which won't match, therefore, it returning False,
resulting in 'data/subdir' does not exist message.
There's no short-circuiting for checking if a given file/folder exists.
You have to first check if the folder exists, and then, loop through it
until you find exact match.
β
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#2853?email_source=notifications&email_token=AACFLR7SOU4VYZ4WN34NH2LQWJNVZA5CNFSM4JRYD7T2YY3PNVWWK3TUL52HS4DFWFIHK3DMKJSXC5LFON2FEZLWNFSXPKTDN5WW2ZLOORPWSZGOCNOXC2I#discussion_r352283639>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACFLR3FR5LN3P2JKUXT24LQWJNVZANCNFSM4JRYD7TQ>
.
|
For the record: windows tests are failing due to unrelated problems with chocolatey packages. The rest of the tests are fine. ;) EDIT: sorry, accidentally closed π |
Wrote about listing for exists here - #2873 (comment). TLDR, the reason why we do listing instead of some BTW, we will still need to list to check if dir exists, which is not good for cache batch exists, quoting from there:
|
* master: travis: windows: use python 3.7.5 test: skip non supported remotes fast in api tests
@@ -0,0 +1,337 @@ | |||
import os |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you postpone, then create an issue, add a link here and resolve this thread :)
@pytest.mark.parametrize("remote", [S3Mocked], indirect=True) | ||
def test_copy_preserve_etag_across_buckets(remote): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to create all those blobs for this test and parametrize
looks a bit stretched. I suggest using @mock_s3
and creating both remotes simply with RemoteS3(...)
and not using S3Mocked
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current approach is also fine, I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Keeping it this way for now, then. :)
tests/unit/remote/test_remote_dir.py
Outdated
@pytest.mark.parametrize("remote", [GCP], indirect=True) | ||
def test_isfile(remote): | ||
test_cases = [ | ||
(False, "empty_dir/"), | ||
(True, "empty_file"), | ||
(True, "foo"), | ||
(True, "data/alice"), | ||
(True, "data/alpha"), | ||
(True, "data/subdir/1"), | ||
(True, "data/subdir/2"), | ||
(True, "data/subdir/3"), | ||
(False, "data/subdir/empty_dir/"), | ||
(True, "data/subdir/empty_file"), | ||
(False, "something-that-does-not-exist"), | ||
(False, "data/subdir/empty-file/"), | ||
(False, "empty_dir"), | ||
] | ||
|
||
for expected, path in test_cases: | ||
assert remote.isfile(remote.path_info / path) == expected |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will need adjustment if #2873 gets merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's merge #2873 first, once we get one more approval π And then we'll adjust and merge this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks way cleaner than average dvc code now) And thanks for starting a remote fixtures refactor.
dvc/remote/gs.py
Outdated
|
||
eg: if `data/file.txt` exists, check for `data` should return True | ||
""" | ||
return self.isfile(path_info) or self.isdir(path_info / "") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return self.isfile(path_info) or self.isdir(path_info / "") | |
return self.isfile(path_info) or self.isdir(path_info) |
No need to make it twice.
BTW, need to create |
@Suor, I have created an issue: #2877. Regarding the suggestion #2853 (comment), I'll make a issue after this gets merged. |
Great stuff, thank you so much @skshetry ! π For the record: windows tests are failing due to temporary problems with chocolatey. Ignoring that for now. |
β Have you followed the guidelines in the Contributing to DVC list?
π Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
β Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.
Thank you for the contribution - we'll try to review it as soon as possible. π
Description
I have reused test code from
unit/test_s3.py
by moving it tofunctional
tests, as, I argue, moto server can be assumed as functional as reals3
(which is a selling point ofmoto
itself). The same tests are run with different remotes (i.e.s3
andgs
).And, I have stolen most of the code from #2619.
The logic behind
s3
andgs
is almost similar, as they both are object storages. So, there is an opportunity for abstraction forObjectStorageRemote
, but, I think that it'll be over-engineering for now (what aboutazure
?).Todo
empty_dir
. Will that work?DVC_TEST_GCP
set and make it pass.How was this tested?
dvc add gs://dvc-test/skshetry-test/data
(checkgs://dvc-test/skshetry-test/data1.txt
should not get uploaded).dvc add gs://dvc-test/skshetry-test/data/subdir
.dvc add gs://dvc-test/skshetry-test/data/subdir2
.dvc add gs://dvc-test/skshetry-test/data/subdir3
.dvc add gs://dvc-test/skshetry-test
anddvc add gs://dvc-test/skshetry-test/
.dvc run -d remote://gs/data 'echo hello world'
.Fixes #2814