Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: extract stage's relationship with Dvcfile out from Stage #3619

Merged
merged 5 commits into from
Apr 12, 2020

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Apr 10, 2020

  • ❗ I have followed the Contributing to DVC checklist.

  • 📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here. If the CLI API is changed, I have updated tab completion scripts.

  • ❌ I will check DeepSource, CodeClimate, and other sanity checks below. (We consider them recommendatory and don't expect everything to be addressed. Please fix things that actually improve code or fix bugs.)

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

This is just a refactor, which extracts Stage's relationship with actual Dvcfile (with regards to parsing/dumping, validating schema, etc) to a separate abstraction (Dvcfile).
Now, Stage is just an in-memory representation that can work without in-relation to Dvcfile.

Related to #1871

dvc/stages/__init__.py Outdated Show resolved Hide resolved
dvc/dvcfile.py Outdated Show resolved Hide resolved
dvc/stages/__init__.py Outdated Show resolved Hide resolved
dvc/stages/__init__.py Outdated Show resolved Hide resolved
@skshetry skshetry marked this pull request as ready for review April 10, 2020 15:27
@skshetry skshetry changed the title [WIP] refactor stage and dvcfile refactor: split Stage into Dvcfile Apr 10, 2020
@skshetry skshetry self-assigned this Apr 10, 2020
@skshetry skshetry added enhancement Enhances DVC refactoring Factoring and re-factoring labels Apr 10, 2020
@skshetry skshetry requested review from efiop and pared April 10, 2020 15:28
@skshetry skshetry changed the title refactor: split Stage into Dvcfile refactor: extract stage's relationship with Dvcfile out from Stage Apr 10, 2020
@skshetry skshetry requested review from casperdcl and pmrowla and removed request for casperdcl April 10, 2020 15:32
dvc/repo/update.py Outdated Show resolved Hide resolved
@skshetry skshetry linked an issue Apr 11, 2020 that may be closed by this pull request
@skshetry skshetry requested a review from efiop April 11, 2020 09:11
@skshetry
Copy link
Member Author

Travis is offline for maintenance at the moment. /cc @efiop

dvc/repo/reproduce.py Outdated Show resolved Hide resolved
dvc/repo/run.py Outdated

stage = Stage.create(self, **kwargs)
stage = Dvcfile.create_stage(self, **kwargs)
Copy link
Contributor

@efiop efiop Apr 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this feels weird (esp because of Dvcfile(self, stage.path).dump(stage) down below). The multistage PR still looks strange https://github.com/iterative/dvc/pull/3584/files#r407061889 too . We could definitely do better.

@skshetry skshetry mentioned this pull request Apr 11, 2020
3 tasks
dvc/dvcfile.py Outdated
raise StageFileFormatError(fname, exc)

@classmethod
def create_stage(cls, repo, accompany_outs=False, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pretty much the whole logic doesn't belong to dvcfile itself, but belongs to the Stage class. Seems like the only thing that Dvcfile should care about here is the filename.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. it should care about if os.path.exist at the bottom. So, for example, it could be Dvcfile.create() method, that would in turn call Stage.create() with all that meat and then would check for already existing dvcfile with a path like that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be somewhere in between of those two. Dvcfile should care about filename, if it's a multistage dvc, if stage already exists with the given name, etc.
Rest can be taken care by Stage, but it's not really linear at the time. :(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@skshetry Yes, that was my point. Right now create_stage does too much with the Stage. It should pass to Stage.create() and deal with dvcfile file/existing dvcfile/etc.

Copy link
Contributor

@efiop efiop Apr 11, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a closer look. Looks like Dvcfile.create should handle things like filename/wdir validation. E.g. things like _stage_name and _check_stage_path don't belong to Stage anymore, they should be handled in Dvcfile.

EDIT: though not quite :(

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did a bit of refactor. Most of the things stays on Stage::create(). Only overwriting stuffs have been pulled to Dvcfile.

@@ -47,7 +47,6 @@ def run(self):
no_exec=self.args.no_exec,
overwrite=overwrite,
ignore_build_cache=self.args.ignore_build_cache,
remove_outs=self.args.remove_outs,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed this deprecated flag.

)

@classmethod
def is_stage_file(cls, path):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This kind of naming will be changing on upcoming PRs. For now, stage_file and dvcfile are essentially the same thing and should work.

@classmethod
def check_dvc_filename(cls, path):
if not cls.is_valid_filename(path):
raise StageFileBadNameError(
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, exceptions will probably change wherever this does not make sense in future PRs.

@efiop
Copy link
Contributor

efiop commented Apr 12, 2020

We might've tried to bite more than we could chew here. Even though the arch is far from ideal yet, this PR does a lot of good refactorings, so let's merge it and move on, we'll introduce better interfaces this week.

@efiop efiop merged commit 65854c4 into iterative:master Apr 12, 2020
@skshetry skshetry deleted the refactor-stage-dvcfile branch April 13, 2020 01:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC refactoring Factoring and re-factoring
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants