Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repro: Move pull logic to inside Stage. #9434

Merged
merged 1 commit into from
May 16, 2023
Merged

repro: Move pull logic to inside Stage. #9434

merged 1 commit into from
May 16, 2023

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented May 10, 2023

@daavoo daavoo added the refactoring Factoring and re-factoring label May 10, 2023
@daavoo daavoo requested a review from skshetry May 10, 2023 12:45
@daavoo daavoo self-assigned this May 10, 2023
@daavoo daavoo force-pushed the repro-pull-followup branch 2 times, most recently from 365648d to 9c960d0 Compare May 10, 2023 13:19
@codecov
Copy link

codecov bot commented May 10, 2023

Codecov Report

Patch coverage: 100.00% and no project coverage change.

Comparison is base (e69deaa) 91.60% compared to head (8ecd5ab) 91.60%.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #9434   +/-   ##
=======================================
  Coverage   91.60%   91.60%           
=======================================
  Files         489      489           
  Lines       38166    38167    +1     
  Branches     5469     5469           
=======================================
+ Hits        34962    34963    +1     
  Misses       2639     2639           
  Partials      565      565           
Impacted Files Coverage Δ
dvc/repo/reproduce.py 97.63% <ø> (-0.06%) ⬇️
dvc/stage/__init__.py 93.89% <100.00%> (+0.05%) ⬆️
tests/func/test_run_cache.py 100.00% <100.00%> (ø)

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@dberenbaum
Copy link
Collaborator

Sorry, this is more about QA for #9395. I tried:

$ git clone [email protected]:iterative/example-get-started-experiments.git
$ cd example-get-started-experiments
$ dvc exp run --pull
Pulling run cache.
ERROR: run-cache is not supported for http filesystem: https://remote.dvc.org/get-started-pools

I think it shouldn't make the whole command fail if run cache isn't supported or fails to be pulled.

@daavoo
Copy link
Contributor Author

daavoo commented May 12, 2023

Sorry, this is more about QA for #9395. I tried:

$ git clone [email protected]:iterative/example-get-started-experiments.git
$ cd example-get-started-experiments
$ dvc exp run --pull
Pulling run cache.
ERROR: run-cache is not supported for http filesystem: https://remote.dvc.org/get-started-pools

I think it shouldn't make the whole command fail if run cache isn't supported or fails to be pulled.

I guess so.

I think it is a problem of --pull now doing 2 things (pull run cache and missing data). If it fails to pull data, you would expect and error, right?

@daavoo
Copy link
Contributor Author

daavoo commented May 12, 2023

ping @skshetry does this makes sense for your comment in #9395 (comment) ?

@dberenbaum
Copy link
Collaborator

I think it is a problem of --pull now doing 2 things (pull run cache and missing data). If it fails to pull data, you would expect and error, right?

IMHO since --pull is doing additional stuff to try to make repro work, I would say we should warn on failures for both and try to repro anyway. In many cases, if it fails to pull data, this will ultimately fail, but I don't see why I wouldn't want to run as much as possible until then.

Comment on lines 584 to 585
for objs in self.get_used_objs().values():
self.repo.cloud.pull(objs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably need to pass odb here.

Suggested change
for objs in self.get_used_objs().values():
self.repo.cloud.pull(objs)
for odb, objs in self.get_used_objs().items():
self.repo.cloud.pull(objs, odb=odb)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Talk to Ruslan and Peter please about this on how to pull from cloud versioned remotes. Most likely this involves index, but make sure to discuss. With this change, cloud versioning won't work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got the reasoning for putting it inside here because of the rwlocks, but still missing the point of not using the high level pull API, as it would solve that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@iterative/dvc could you reason what would be the downside of using self.repo.pull inside a stage vs trying to use some low-level API?

In addition to making this work for cloud versioning here, I also didn´t find a good way of pulling dependencies as in #9437 (comment) without using stage.repo.pull

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a large dependency that we are adding that is not meant for internal consumption. Repo.pull() is equivalent to dvc pull, so it's heavy and complex. And this one does that in a loop, for multiple stages.
Repo.pull(stage.addressing) reloads dvc.yaml/dvc.lock file, finds proper objects/index, and downloads them.

Also, improving UI for dvc repro/dvc exp run may be difficult as it's now tied with dvc pull. What we really need here is a way to pull for one stage.
I don't have much issue with this as temporary solution, but I could see it getting permanent.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense

What we really need here is a way to pull for one stage.
I don't have much issue with this as temporary solution, but I could see it getting permanent.

Ok. I will open an issue to revisit this. Probably worth waiting for the new fetch

self._sync_import(dry, force, kwargs.get("jobs", None), no_download)
elif not self.frozen and self.cmd:
self._run_stage(dry, force, **kwargs)
else:
elif kwargs.get("pull"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not this come before you run the stage?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for pulling outputs of unchanged stages.

The follow-up P.R. adds a new flag that enables pulling dependencies see #9437 (comment)

@daavoo daavoo force-pushed the repro-pull-followup branch from 9c960d0 to 3577947 Compare May 16, 2023 13:28
@daavoo daavoo enabled auto-merge (rebase) May 16, 2023 13:29
@daavoo daavoo force-pushed the repro-pull-followup branch from 3577947 to 8ecd5ab Compare May 16, 2023 13:29
@daavoo daavoo merged commit 1e538ed into main May 16, 2023
@daavoo daavoo deleted the repro-pull-followup branch May 16, 2023 14:11
@dberenbaum
Copy link
Collaborator

IMHO since --pull is doing additional stuff to try to make repro work, I would say we should warn on failures for both and try to repro anyway. In many cases, if it fails to pull data, this will ultimately fail, but I don't see why I wouldn't want to run as much as possible until then.

@daavoo Is it supposed to be fixed? Do you want me to create a follow-up issue? I still see the same error when trying to pull the run cache for an http remote.

@daavoo
Copy link
Contributor Author

daavoo commented May 17, 2023

IMHO since --pull is doing additional stuff to try to make repro work, I would say we should warn on failures for both and try to repro anyway. In many cases, if it fails to pull data, this will ultimately fail, but I don't see why I wouldn't want to run as much as possible until then.

@daavoo Is it supposed to be fixed? Do you want me to create a follow-up issue? I still see the same error when trying to pull the run cache for an http remote.

Sorry, no, It is not fixed. Please open issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Factoring and re-factoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants