-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
external repo: checkout revision before initializing dvc repo #2852
Conversation
If this change gets accepted we should consider replacing |
tests/func/test_get.py
Outdated
rconfig = RemoteConfig(dvc_repo.config) | ||
rconfig.add("upstream", dvc_repo.cache.local.cache_dir, default=True) | ||
dvc_repo.scm.add([dvc_repo.config.config_file]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can be removed after #2780
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only use it once. Why not just inline it into the test?
9d2d745
to
8d313c6
Compare
tests/func/test_get.py
Outdated
def test_get_from_non_dvc_master(empty_dir, erepo_no_dvc_master): | ||
Repo.get( | ||
erepo_no_dvc_master._root_dir, | ||
"foo", | ||
out="foo", | ||
rev=erepo_no_dvc_master.dvc_branch, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't check that it works.
tests/func/test_get.py
Outdated
rconfig = RemoteConfig(dvc_repo.config) | ||
rconfig.add("upstream", dvc_repo.cache.local.cache_dir, default=True) | ||
dvc_repo.scm.add([dvc_repo.config.config_file]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We only use it once. Why not just inline it into the test?
eaf5b1b
to
ec8d46f
Compare
dvc/external_repo.py
Outdated
# Checkout first in case of non dvc master | ||
checkout_revision(new_path, rev) | ||
|
||
repo = Repo(new_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment why we are doing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Suor I guess "# Adjust new clone/copy to fit rev and cache_dir" is for this, though it looks weird detached like that now. π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared i suppose you didn't push yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Logically it should be:
if rev is not None:
_git_checkout(new_path, rev)
To stay in line with the rest )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
if cache_dir is not None: | ||
cache_config = CacheConfig(repo.config) | ||
cache_config.set_dir(cache_dir, level=Config.LEVEL_LOCAL) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can skip creating Repo
and use Config
directly here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it can be done with just config, though it would require a change in test_get_from_non_dvc_repo
. I'll create an issue for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should not require changing any tests. This should be a self-contained change.
tests/basic_env.py
Outdated
@@ -21,7 +21,55 @@ | |||
logger = logging.getLogger("dvc") | |||
|
|||
|
|||
class TestDirFixture(object): | |||
class EmptyDirFixture(object): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared, not sure if it is the same, but what about?
@pytest.fixture
def chdir_tmp(tmp_path):
old_directory = os.cwd()
new_directory = fspath_py35(tmp_path)
os.chdir(new_directory)
yield
os.chdir(old_directory)
Also, why refactoring something that we will deprecate eventually) ?
tests/conftest.py
Outdated
|
||
|
||
@pytest.fixture | ||
def empty_dir(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see this comment: https://github.com/iterative/dvc/pull/2852/files#r352164905
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be needed when new dir_helper fixtures land. So maybe save effort on this refactoring. If you simply need an empty dir you may use tmp_path
pytest standard fixture and monkeypatch
to chdir to it:
@pytest.fixture
def empty_dir(tmp_path, monkeypatch):
monkeypatch.chdir(tmp_path)
return tmp_path
tests/conftest.py
Outdated
|
||
|
||
@pytest.fixture | ||
def empty_dir(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't be needed when new dir_helper fixtures land. So maybe save effort on this refactoring. If you simply need an empty dir you may use tmp_path
pytest standard fixture and monkeypatch
to chdir to it:
@pytest.fixture
def empty_dir(tmp_path, monkeypatch):
monkeypatch.chdir(tmp_path)
return tmp_path
tests/func/test_get.py
Outdated
@@ -87,3 +89,38 @@ def test_get_to_dir(dname, erepo): | |||
|
|||
assert os.path.isdir(dname) | |||
assert filecmp.cmp(erepo.FOO, dst, shallow=False) | |||
|
|||
|
|||
def test_get_from_non_dvc_master(empty_dir, git_erepo, caplog): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using erepo
just removing .dvc
dir at the start of the test?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, but test won't be much more readable due to fact that the default remote set inside erepo
creation is its own cache and I need to provide a new one during this particular test.
EDIT: look at next comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After giving it some thought, that is not possible. Such a case would defeat the test purpose. External repo is cloning the target, so it doesn't matter that I remove .dvc
from erepo
, it will still be dvc
repo after clone.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove .dvc
and commit that then that won't be a dvc repo :) Now you make a lot of manipulations.
aec6eea
to
42a9f34
Compare
42a9f34
to
65bbe3c
Compare
@pared check the tests, lots of them are failing. |
65bbe3c
to
673ae54
Compare
dvc/external_repo.py
Outdated
git = Git(repo_path) | ||
git.checkout(revision) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably close is explicitly or we might fail removing it later on Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by closing it, @Suor ?
Never mind π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared You still need to close it git.checkout
fails. Add try&finally
tests/func/test_get.py
Outdated
@@ -87,3 +89,38 @@ def test_get_to_dir(dname, erepo): | |||
|
|||
assert os.path.isdir(dname) | |||
assert filecmp.cmp(erepo.FOO, dst, shallow=False) | |||
|
|||
|
|||
def test_get_from_non_dvc_master(empty_dir, git_erepo, caplog): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you remove .dvc
and commit that then that won't be a dvc repo :) Now you make a lot of manipulations.
tests/func/test_get.py
Outdated
dvc_branch = "dvc_test" | ||
git_erepo.git.git.checkout("master", b=dvc_branch) | ||
|
||
dvc_repo = Repo.init(git_erepo._root_dir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you make a dvc repo in master branch, looks like this breaks the purpose of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK line 98 is equivalent to:
git checkout -b dvc_test
so we are actually on "dvc_test" :)
fc5d389
to
88351ab
Compare
@pared Tests are failing, please take a look π |
4445a59
to
f0069b6
Compare
tests/conftest.py
Outdated
repo.dvc.scm.add([repo.dvc.config.config_file]) | ||
repo.dvc.scm.commit("add remote") | ||
|
||
repo.create("version", "master") | ||
repo.dvc.add("version") | ||
repo.dvc.scm.add([".gitignore", "version.dvc"]) | ||
repo.dvc.scm.commit("master") | ||
repo.dvc.push() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? And all upstream_path stuff in general? Sorry, might've missed some discussion somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to me that erepo had default remote configured pointing to its own cache, in order to not trigger infamous NoRemoteError on any test related to get
and import
. In new test test_from_non_dvc_master
, where I am removing .dvc
I had to either manually specify new storage for erepo
, or adjust erepo
fixture. I choose the latter because I believe that setting cache as default remote is not realistic use case, which also omits some important details (eg you don't need to push to "add" something to remote cache, because its local cache and adding is done automatically) so we were testing some very specific use cases everywhere where we used erepo
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared But you are not even using "upstream_path" in your tests. And you have to modify a bunch of unrelated code for no good reason. In your test you simply need .dvc
to not be tracked on master, which you could do by simply adding .dvc
to gitignore on master and commiting (and doing that in your test without affecting other tests).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point about it being not realistic is a good one, but we will soon do that automatically by-default #2599 anyway, and, well, it is not nice to break other tests because of 1 test for specific use case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, sorry I didn't noticed that I am modifying repo, its leftover from some previous changes.
Still, I believe this change is desirable. I can achieve the desired master
state with properly editing .gitignore
but the actual problem is that erepo is not an actual use case. For example, test_get_repo_file
should fail in original erepo
setup. We were using a trick to make it pass, but without the push that we are commenting on, it should not pass.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2599 should fix our discussion, but since it's not in the current sprint, I wouldn't just leave it be, especially that this change is not that big.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared Sorry, I don't get the point about test_get_repo_file
, don't see any tricks in it.
def test_get_repo_file(erepo):
src = erepo.FOO
dst = erepo.FOO + "_imported"
Repo.get(erepo.root_dir, src, dst)
assert os.path.exists(dst)
assert os.path.isfile(dst)
assert filecmp.cmp(erepo.FOO, dst, shallow=False)
My point is that you don't need to modify erepo
and stuff around for your 1 specific test, you can do it with 2 lines that will add .dvc
dir to gitignore and commit it on master.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no trick in the test, the trick is inside erepo
:
rconfig = RemoteConfig(repo.dvc.config)
rconfig.add("upstream", repo.dvc.cache.local.cache_dir, default=True)
It's not about modifying whole erepo for one test. I believe erepo
is not being set up properly for tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pared Ah, got it. But that trick is something that we will implement in #2599 by default, and it doesn't seem to break any other tests, it is just an internal implementation detail for erepo. But I also understand your point about it being hacky. Maybe let's create an issue for it and for now stick with gitignore, so that we don't have to touch all other tests? It doesn't seem like it is in the scope of this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1756ee0
to
cca3f10
Compare
with caplog.at_level(logging.INFO, logger="dvc"): | ||
Repo.get(erepo._root_dir, erepo.FOO, out=imported_file, rev="branch") | ||
|
||
assert caplog.text == "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, why do we test that there is no info output to logger?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#2858
I think that we should use caplog
more often in order to prevent spawning of UI issues that are quite common recently. Do you think I should also check capsys
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'll probably get progress bars in capsys, not easy to test, especially automatically, but your call, I don't have a strong opinion here π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pbars go to stderr by default, so shouldn't be an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder why Repo.get()
has not output though.
8180515
to
a78e836
Compare
a78e836
to
d9ca23f
Compare
d9ca23f
to
ca85217
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! π
β Have you followed the guidelines in the Contributing to DVC list?
π Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.
β Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.
Thank you for the contribution - we'll try to review it as soon as possible. π
Fixes #2848