-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial pkg implementation #2012
Conversation
7ddf5b9
to
0acc20e
Compare
76098ab
to
9cd10dd
Compare
I think it would be better to not add knowledge about everything to config. Config should provide Knowledge about config contents will be ideally contained within specific code area and not slip anywhere else. E.g. remote will be able to configure itself from config and update config as needed: add or update a remote, set default one. |
328a014
to
84210ed
Compare
There is a config like that already. It is called ConfigObj 🙂 Our config is dealing with managing particular sections(e.g.
Do you mean something like this
? |
02733a9
to
15b4b97
Compare
1a8e7dc
to
59e89bd
Compare
Fixes iterative#2039 Required by iterative#2012 to be able to fetch data through the API without interuption from updater. Signed-off-by: Ruslan Kuprieiev <[email protected]>
b7fd9cd
to
cf57131
Compare
dvc/repo/__init__.py
Outdated
@@ -315,6 +321,9 @@ def used_cache( | |||
stages = self.stages() | |||
|
|||
for stage in stages: | |||
if stage.is_pkg_import: | |||
continue | |||
|
|||
if active and not target and stage.locked: | |||
logger.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related not this PR: looks like either this message should be moved down the stack or we should add continue
here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have continue
here, because we should push/pull/etc cache for the locked stage file itself. We could move this warning to repo.graph()
, but it would be too specific to use it there. Hence why we put this one here.
dvc/repo/__init__.py
Outdated
@@ -315,6 +321,9 @@ def used_cache( | |||
stages = self.stages() | |||
|
|||
for stage in stages: | |||
if stage.is_pkg_import: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as far as I remember we already have a mechanism to avoid caching of an output? can we reuse it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shcheklein That is an amazing idea! I didn't think about it that way. But indeed, we could treat it as --outs-no-cache
. Thanks! 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out it is not easy to do, since things like adding to gitignore and checking out are use_cache dependant. I've added get_used_cache
method to Output to handle this more gracefully, but using --outs-no-cache
is not a very good option for now.
@@ -213,6 +213,13 @@ def is_import(self): | |||
"""Whether the stage file was created with `dvc import`.""" | |||
return not self.cmd and len(self.deps) == 1 and len(self.outs) == 1 | |||
|
|||
@property | |||
def is_pkg_import(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if feels like an output can decide if it needs to be cached or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Output needs to know if he is a part of import
stage, so this is still used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not necessarily, when we create an output we can pass a flag. It can be and probably should be decided on the Cmd/API level. I can imagine we will have a flag to cache output anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will keep that in mind in #2138 (added a comment about this discussion there as well).
self.git.git.ls_remote("origin", self.rev, exit_code=True) | ||
# fetching remote tag/branch so we can reference it locally | ||
self.git.git.fetch("origin", "{rev}:{rev}".format(rev=self.rev)) | ||
except git.exc.GitCommandError: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we fail if does not exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will fail later down the line, where we will try to use this. I'll take a second look to see if failing here would provide better UX. Thanks for the heads up!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted the error handling down the line to handle it more gracefully.
return | ||
|
||
git.Repo.clone_from( | ||
self.url, self.path, depth=1, no_single_branch=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq - does it avoid restoring the workspace as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shcheklein Not sure what you mean. Could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean --no-checkout
option of the git clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, nope. So we do checkout to have a default version installed. If any dependency is using some particular version, it will get it. But if you've installed some package of specific version by default, then all dependencies that don't have version set explicitly, will use that default installed version.
output, = pkg.repo.find_outs_by_path(src) | ||
pkg.repo.fetch(output.stage.path) | ||
output.path_info = PathInfo(os.path.abspath(out)) | ||
with output.repo.state: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
btw, does it make sense to disable lock and sqllite for this case? just to make sure that this easy scenario works everywhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean nfs and friends, right? Good point, I'll take a look if we could disable state for that purpose. Also need to disable our own .dvc/lock as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not trivial, created #2135 for it. Might be easier to just support nfs after all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looks great! A few minor comments and questions.
@jorgeorpinel please, review the help messages and command description. Let's work together to make then user-friendly.
8b0e69e
to
1a8594d
Compare
Signed-off-by: Ruslan Kuprieiev <[email protected]>
Signed-off-by: Ruslan Kuprieiev <[email protected]>
Have you followed the guidelines in our
Contributing document?
Does your PR affect documented changes or does it add new functionality
that should be documented? If yes, have you created a PR for
dvc.org documenting it or at
least opened an issue for it? If so, please add a link to it.
Docs: iterative/dvc.org#385
Todo:
uninstall
can remove corrupted pkgadd(import: add a mechanism to lock external dependencies #2139).dvc/pkg-list.yml
similar to https://github.com/sindresorhus/package-jsonadd(import: add a mechanism to lock external dependencies #2139).dvc/pkg-lock.yml
similar to https://flaviocopes.com/package-lock-json/ (install
should add an entry to it)support importing the whole package with(import: support importing the whole repo #2140)dvc import mypkg
. This is going to be similar to specifying multiple datas within the package, e.g.dvc import mypkg data1 data2
, but we will have to collect it ourselves for the whole package.dvc pkg get
for simply downloading data from the urldst
option into a-o
flag, so that we coulddvc import mypkg -o dir
;check that when running repro on import it will ask if it needs to remove modified outputs(will be implemented by run: warn and/or prompt user when deleting an output #2027)