Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial pkg implementation #2012

Merged
merged 2 commits into from
Jun 17, 2019
Merged

Initial pkg implementation #2012

merged 2 commits into from
Jun 17, 2019

Conversation

efiop
Copy link
Contributor

@efiop efiop commented May 16, 2019

  • Have you followed the guidelines in our
    Contributing document?

  • Does your PR affect documented changes or does it add new functionality
    that should be documented? If yes, have you created a PR for
    dvc.org documenting it or at
    least opened an issue for it? If so, please add a link to it.

Docs: iterative/dvc.org#385


Todo:

dvc/repo/pkg/__init__.py Outdated Show resolved Hide resolved
@efiop efiop force-pushed the pkgs branch 6 times, most recently from 76098ab to 9cd10dd Compare May 25, 2019 23:24
@Suor
Copy link
Contributor

Suor commented May 27, 2019

I think it would be better to not add knowledge about everything to config. Config should provide .get(), .set(), .update(), etc. Load and save itself but should know nothing about its keys and structure. Otherwise we are creating a superclass.

Knowledge about config contents will be ideally contained within specific code area and not slip anywhere else. E.g. remote will be able to configure itself from config and update config as needed: add or update a remote, set default one.

@efiop efiop force-pushed the pkgs branch 3 times, most recently from 328a014 to 84210ed Compare May 28, 2019 20:06
@efiop efiop changed the title [WIP] Initial pkg implementation Initial pkg implementation May 28, 2019
@efiop
Copy link
Contributor Author

efiop commented May 28, 2019

@Suor

I think it would be better to not add knowledge about everything to config. Config should provide .get(), .set(), .update(), etc. Load and save itself but should know nothing about its keys and structure. Otherwise we are creating a superclass.

There is a config like that already. It is called ConfigObj 🙂 Our config is dealing with managing particular sections(e.g. remote and pkg) within it and so has the necessary notion to properly parse them(e.g. schema, path resolutions and so on).

Knowledge about config contents will be ideally contained within specific code area and not slip anywhere else. E.g. remote will be able to configure itself from config and update config as needed: add or update a remote, set default one.

Do you mean something like this

r = Remote(repo, {"url": "https://example.com"})
r.save() # saving to config
r.set_default() # setting it in config as default remote

?

@efiop efiop changed the title Initial pkg implementation [WIP] Initial pkg implementation May 28, 2019
@efiop efiop force-pushed the pkgs branch 5 times, most recently from 02733a9 to 15b4b97 Compare May 29, 2019 02:10
@efiop efiop changed the title [WIP] Initial pkg implementation Initial pkg implementation May 30, 2019
@efiop efiop changed the title Initial pkg implementation [WIP] Initial pkg implementation May 30, 2019
@efiop efiop force-pushed the pkgs branch 2 times, most recently from 1a8e7dc to 59e89bd Compare May 30, 2019 01:34
efiop added a commit to efiop/dvc that referenced this pull request Jun 13, 2019
Fixes iterative#2039

Required by iterative#2012 to be able to fetch data through the API without
interuption from updater.

Signed-off-by: Ruslan Kuprieiev <[email protected]>
@efiop efiop force-pushed the pkgs branch 3 times, most recently from b7fd9cd to cf57131 Compare June 14, 2019 10:48
@@ -315,6 +321,9 @@ def used_cache(
stages = self.stages()

for stage in stages:
if stage.is_pkg_import:
continue

if active and not target and stage.locked:
logger.warning(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related not this PR: looks like either this message should be moved down the stack or we should add continue here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't have continue here, because we should push/pull/etc cache for the locked stage file itself. We could move this warning to repo.graph(), but it would be too specific to use it there. Hence why we put this one here.

@@ -315,6 +321,9 @@ def used_cache(
stages = self.stages()

for stage in stages:
if stage.is_pkg_import:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as far as I remember we already have a mechanism to avoid caching of an output? can we reuse it here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein That is an amazing idea! I didn't think about it that way. But indeed, we could treat it as --outs-no-cache. Thanks! 🙂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out it is not easy to do, since things like adding to gitignore and checking out are use_cache dependant. I've added get_used_cache method to Output to handle this more gracefully, but using --outs-no-cache is not a very good option for now.

@@ -213,6 +213,13 @@ def is_import(self):
"""Whether the stage file was created with `dvc import`."""
return not self.cmd and len(self.deps) == 1 and len(self.outs) == 1

@property
def is_pkg_import(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if feels like an output can decide if it needs to be cached or not

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Output needs to know if he is a part of import stage, so this is still used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not necessarily, when we create an output we can pass a flag. It can be and probably should be decided on the Cmd/API level. I can imagine we will have a flag to cache output anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will keep that in mind in #2138 (added a comment about this discussion there as well).

self.git.git.ls_remote("origin", self.rev, exit_code=True)
# fetching remote tag/branch so we can reference it locally
self.git.git.fetch("origin", "{rev}:{rev}".format(rev=self.rev))
except git.exc.GitCommandError:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fail if does not exist?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will fail later down the line, where we will try to use this. I'll take a second look to see if failing here would provide better UX. Thanks for the heads up!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adjusted the error handling down the line to handle it more gracefully.

dvc/pkg.py Show resolved Hide resolved
return

git.Repo.clone_from(
self.url, self.path, depth=1, no_single_branch=True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

qq - does it avoid restoring the workspace as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shcheklein Not sure what you mean. Could you elaborate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean --no-checkout option of the git clone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nope. So we do checkout to have a default version installed. If any dependency is using some particular version, it will get it. But if you've installed some package of specific version by default, then all dependencies that don't have version set explicitly, will use that default installed version.

output, = pkg.repo.find_outs_by_path(src)
pkg.repo.fetch(output.stage.path)
output.path_info = PathInfo(os.path.abspath(out))
with output.repo.state:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, does it make sense to disable lock and sqllite for this case? just to make sure that this easy scenario works everywhere.

Copy link
Contributor Author

@efiop efiop Jun 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean nfs and friends, right? Good point, I'll take a look if we could disable state for that purpose. Also need to disable our own .dvc/lock as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not trivial, created #2135 for it. Might be easier to just support nfs after all.

Copy link
Member

@shcheklein shcheklein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looks great! A few minor comments and questions.

@jorgeorpinel please, review the help messages and command description. Let's work together to make then user-friendly.

@efiop efiop force-pushed the pkgs branch 2 times, most recently from 8b0e69e to 1a8594d Compare June 17, 2019 13:07
efiop added 2 commits June 17, 2019 18:12
@efiop efiop merged commit 573c84e into iterative:master Jun 17, 2019
@efiop efiop deleted the pkgs branch June 17, 2019 17:21
This was referenced Jun 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants