Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

external repo: checkout revision before initializing dvc repo #2852

Merged
merged 8 commits into from
Dec 4, 2019

Conversation

pared
Copy link
Contributor

@pared pared commented Nov 26, 2019

  • ❗ Have you followed the guidelines in the Contributing to DVC list?

  • πŸ“– Check this box if this PR does not require documentation updates, or if it does and you have created a separate PR in dvc.org with such updates (or at least opened an issue about it in that repo). Please link below to your PR (or issue) in the dvc.org repo.

  • ❌ Have you checked DeepSource, CodeClimate, and other sanity checks below? We consider their findings recommendatory and don't expect everything to be addressed. Please review them carefully and fix those that actually improve code or fix bugs.

Thank you for the contribution - we'll try to review it as soon as possible. πŸ™

Fixes #2848

@pared
Copy link
Contributor Author

pared commented Nov 26, 2019

If this change gets accepted we should consider replacing repo_dir with empty_dir in some tests. It seems to me that repo_dir has been used as empty dir initializer at least in some cases. Using empty_dir will make potential test debuggin easier.

Comment on lines 103 to 107
rconfig = RemoteConfig(dvc_repo.config)
rconfig.add("upstream", dvc_repo.cache.local.cache_dir, default=True)
dvc_repo.scm.add([dvc_repo.config.config_file])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can be removed after #2780

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use it once. Why not just inline it into the test?

@pared pared changed the title external repo: checkout revision before initializing dvc repo [WIP] external repo: checkout revision before initializing dvc repo Nov 26, 2019
@pared pared force-pushed the 2848_get_no_dvc_master branch 2 times, most recently from 9d2d745 to 8d313c6 Compare November 26, 2019 12:56
@pared pared changed the title [WIP] external repo: checkout revision before initializing dvc repo external repo: checkout revision before initializing dvc repo Nov 26, 2019
@pared pared requested review from Suor, a user and efiop November 26, 2019 13:00
Comment on lines 114 to 107
def test_get_from_non_dvc_master(empty_dir, erepo_no_dvc_master):
Repo.get(
erepo_no_dvc_master._root_dir,
"foo",
out="foo",
rev=erepo_no_dvc_master.dvc_branch,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't check that it works.

Comment on lines 103 to 107
rconfig = RemoteConfig(dvc_repo.config)
rconfig.add("upstream", dvc_repo.cache.local.cache_dir, default=True)
dvc_repo.scm.add([dvc_repo.config.config_file])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use it once. Why not just inline it into the test?

@pared pared requested a review from Suor November 27, 2019 09:02
@pared pared force-pushed the 2848_get_no_dvc_master branch 2 times, most recently from eaf5b1b to ec8d46f Compare November 29, 2019 14:19
Comment on lines 65 to 70
# Checkout first in case of non dvc master
checkout_revision(new_path, rev)

repo = Repo(new_path)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment why we are doing it.

Copy link
Contributor

@efiop efiop Nov 29, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Suor I guess "# Adjust new clone/copy to fit rev and cache_dir" is for this, though it looks weird detached like that now. πŸ™‚

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared i suppose you didn't push yet?

Copy link
Contributor

@Suor Suor Nov 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically it should be:

if rev is not None:
    _git_checkout(new_path, rev)

To stay in line with the rest )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 70 to 74
if cache_dir is not None:
cache_config = CacheConfig(repo.config)
cache_config.set_dir(cache_dir, level=Config.LEVEL_LOCAL)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can skip creating Repo and use Config directly here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it can be done with just config, though it would require a change in test_get_from_non_dvc_repo. I'll create an issue for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should not require changing any tests. This should be a self-contained change.

@@ -21,7 +21,55 @@
logger = logging.getLogger("dvc")


class TestDirFixture(object):
class EmptyDirFixture(object):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared, not sure if it is the same, but what about?

@pytest.fixture
def chdir_tmp(tmp_path):
    old_directory = os.cwd()
    new_directory = fspath_py35(tmp_path)

    os.chdir(new_directory)
    yield
    os.chdir(old_directory)

Also, why refactoring something that we will deprecate eventually) ?



@pytest.fixture
def empty_dir():
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be needed when new dir_helper fixtures land. So maybe save effort on this refactoring. If you simply need an empty dir you may use tmp_path pytest standard fixture and monkeypatch to chdir to it:

@pytest.fixture
def empty_dir(tmp_path, monkeypatch):
    monkeypatch.chdir(tmp_path)
    return tmp_path



@pytest.fixture
def empty_dir():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't be needed when new dir_helper fixtures land. So maybe save effort on this refactoring. If you simply need an empty dir you may use tmp_path pytest standard fixture and monkeypatch to chdir to it:

@pytest.fixture
def empty_dir(tmp_path, monkeypatch):
    monkeypatch.chdir(tmp_path)
    return tmp_path

@@ -87,3 +89,38 @@ def test_get_to_dir(dname, erepo):

assert os.path.isdir(dname)
assert filecmp.cmp(erepo.FOO, dst, shallow=False)


def test_get_from_non_dvc_master(empty_dir, git_erepo, caplog):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using erepo just removing .dvc dir at the start of the test?

Copy link
Contributor Author

@pared pared Dec 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but test won't be much more readable due to fact that the default remote set inside erepo creation is its own cache and I need to provide a new one during this particular test.
EDIT: look at next comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After giving it some thought, that is not possible. Such a case would defeat the test purpose. External repo is cloning the target, so it doesn't matter that I remove .dvc from erepo, it will still be dvc repo after clone.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove .dvc and commit that then that won't be a dvc repo :) Now you make a lot of manipulations.

@pared pared force-pushed the 2848_get_no_dvc_master branch from aec6eea to 42a9f34 Compare December 2, 2019 10:40
@pared pared requested review from a user and Suor December 2, 2019 10:45
dvc/external_repo.py Outdated Show resolved Hide resolved
@pared pared force-pushed the 2848_get_no_dvc_master branch from 42a9f34 to 65bbe3c Compare December 2, 2019 16:26
@efiop
Copy link
Contributor

efiop commented Dec 2, 2019

@pared check the tests, lots of them are failing.

@pared pared force-pushed the 2848_get_no_dvc_master branch from 65bbe3c to 673ae54 Compare December 2, 2019 17:31
Comment on lines 86 to 87
git = Git(repo_path)
git.checkout(revision)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably close is explicitly or we might fail removing it later on Windows.

Copy link

@ghost ghost Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared You still need to close it git.checkout fails. Add try&finally

@@ -87,3 +89,38 @@ def test_get_to_dir(dname, erepo):

assert os.path.isdir(dname)
assert filecmp.cmp(erepo.FOO, dst, shallow=False)


def test_get_from_non_dvc_master(empty_dir, git_erepo, caplog):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you remove .dvc and commit that then that won't be a dvc repo :) Now you make a lot of manipulations.

dvc_branch = "dvc_test"
git_erepo.git.git.checkout("master", b=dvc_branch)

dvc_repo = Repo.init(git_erepo._root_dir)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you make a dvc repo in master branch, looks like this breaks the purpose of the test.

Copy link
Contributor Author

@pared pared Dec 3, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAIK line 98 is equivalent to:
git checkout -b dvc_test
so we are actually on "dvc_test" :)

@pared pared force-pushed the 2848_get_no_dvc_master branch from fc5d389 to 88351ab Compare December 3, 2019 10:05
@pared pared requested a review from Suor December 3, 2019 10:18
@efiop
Copy link
Contributor

efiop commented Dec 3, 2019

@pared Tests are failing, please take a look πŸ™‚

@pared pared force-pushed the 2848_get_no_dvc_master branch from 4445a59 to f0069b6 Compare December 3, 2019 18:56
repo.dvc.scm.add([repo.dvc.config.config_file])
repo.dvc.scm.commit("add remote")

repo.create("version", "master")
repo.dvc.add("version")
repo.dvc.scm.add([".gitignore", "version.dvc"])
repo.dvc.scm.commit("master")
repo.dvc.push()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? And all upstream_path stuff in general? Sorry, might've missed some discussion somewhere.

Copy link
Contributor Author

@pared pared Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to me that erepo had default remote configured pointing to its own cache, in order to not trigger infamous NoRemoteError on any test related to get and import. In new test test_from_non_dvc_master, where I am removing .dvc I had to either manually specify new storage for erepo, or adjust erepo fixture. I choose the latter because I believe that setting cache as default remote is not realistic use case, which also omits some important details (eg you don't need to push to "add" something to remote cache, because its local cache and adding is done automatically) so we were testing some very specific use cases everywhere where we used erepo.

Copy link
Contributor

@efiop efiop Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared But you are not even using "upstream_path" in your tests. And you have to modify a bunch of unrelated code for no good reason. In your test you simply need .dvc to not be tracked on master, which you could do by simply adding .dvc to gitignore on master and commiting (and doing that in your test without affecting other tests).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The point about it being not realistic is a good one, but we will soon do that automatically by-default #2599 anyway, and, well, it is not nice to break other tests because of 1 test for specific use case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sorry I didn't noticed that I am modifying repo, its leftover from some previous changes.
Still, I believe this change is desirable. I can achieve the desired master state with properly editing .gitignore but the actual problem is that erepo is not an actual use case. For example, test_get_repo_file should fail in original erepo setup. We were using a trick to make it pass, but without the push that we are commenting on, it should not pass.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2599 should fix our discussion, but since it's not in the current sprint, I wouldn't just leave it be, especially that this change is not that big.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared Sorry, I don't get the point about test_get_repo_file, don't see any tricks in it.

def test_get_repo_file(erepo):                                                  
    src = erepo.FOO                                                             
    dst = erepo.FOO + "_imported"                                               
                                                                                
    Repo.get(erepo.root_dir, src, dst)                                          
                                                                                
    assert os.path.exists(dst)                                                  
    assert os.path.isfile(dst)                                                  
    assert filecmp.cmp(erepo.FOO, dst, shallow=False)  

My point is that you don't need to modify erepo and stuff around for your 1 specific test, you can do it with 2 lines that will add .dvc dir to gitignore and commit it on master.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no trick in the test, the trick is inside erepo:

rconfig = RemoteConfig(repo.dvc.config)
rconfig.add("upstream", repo.dvc.cache.local.cache_dir, default=True)

It's not about modifying whole erepo for one test. I believe erepo is not being set up properly for tests.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared Ah, got it. But that trick is something that we will implement in #2599 by default, and it doesn't seem to break any other tests, it is just an internal implementation detail for erepo. But I also understand your point about it being hacky. Maybe let's create an issue for it and for now stick with gitignore, so that we don't have to touch all other tests? It doesn't seem like it is in the scope of this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pared pared force-pushed the 2848_get_no_dvc_master branch 2 times, most recently from 1756ee0 to cca3f10 Compare December 4, 2019 11:29
Comment on lines +101 to +104
with caplog.at_level(logging.INFO, logger="dvc"):
Repo.get(erepo._root_dir, erepo.FOO, out=imported_file, rev="branch")

assert caplog.text == ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, why do we test that there is no info output to logger?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#2858
I think that we should use caplog more often in order to prevent spawning of UI issues that are quite common recently. Do you think I should also check capsys?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll probably get progress bars in capsys, not easy to test, especially automatically, but your call, I don't have a strong opinion here πŸ™‚

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pbars go to stderr by default, so shouldn't be an issue.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder why Repo.get() has not output though.

@pared pared force-pushed the 2848_get_no_dvc_master branch 2 times, most recently from 8180515 to a78e836 Compare December 4, 2019 12:45
@pared pared requested a review from efiop December 4, 2019 12:58
tests/func/test_get.py Outdated Show resolved Hide resolved
@pared pared force-pushed the 2848_get_no_dvc_master branch from a78e836 to d9ca23f Compare December 4, 2019 13:33
@pared pared force-pushed the 2848_get_no_dvc_master branch from d9ca23f to ca85217 Compare December 4, 2019 13:34
@pared pared requested a review from efiop December 4, 2019 13:41
Copy link
Contributor

@efiop efiop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! πŸ™

@efiop efiop merged commit 516e3b1 into iterative:master Dec 4, 2019
@pared pared deleted the 2848_get_no_dvc_master branch December 17, 2019 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

dvc get from dvc-branch in not dvc-repo
3 participants