Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add versioning scheme in the dvc.lock #5128

Merged
merged 3 commits into from
Jan 6, 2021

Conversation

skshetry
Copy link
Member

@skshetry skshetry commented Dec 18, 2020

dvc.lock

  • New version encloses all of the lock details of all the stages inside stages keyword.
  • Previously it uses to be just a dictionary of lock details of stages.
  • New meta keyword has version identifier. We could add more metadata there in the future.
  • 1.0 will be read-only and will be migrated to 2.0 format when any of the stages in the lock file are reproduced (should there be a command?).

v1.0

copy:
  cmd: cp foo bar
  deps:
  - path: foo
    md5: d3b07384d113edec49eaa6238ad5ff00
    size: 4
  outs:
  - path: bar
    md5: d3b07384d113edec49eaa6238ad5ff00
    size: 4

v2.0

meta:
  version: '2.0'
stages:
  copy:
    cmd: cp foo bar
    deps:
    - path: foo
      md5: d3b07384d113edec49eaa6238ad5ff00
      size: 4
    outs:
    - path: bar
      md5: d3b07384d113edec49eaa6238ad5ff00
      size: 4

Thank you for the contribution - we'll try to review it as soon as possible. 🙏

@skshetry skshetry added the enhancement Enhances DVC label Dec 18, 2020
@skshetry skshetry self-assigned this Dec 18, 2020
@skshetry skshetry marked this pull request as ready for review December 23, 2020 05:38
@skshetry skshetry requested review from efiop, pared and pmrowla December 23, 2020 05:42
@skshetry skshetry changed the title [WIP] Add versioning scheme in the dvc.yaml/dvc.lock Add versioning scheme in the dvc.yaml/dvc.lock Dec 23, 2020
@efiop
Copy link
Contributor

efiop commented Dec 23, 2020

v2.0
meta:
version: '2.0'

Could you please elaborate on using meta for version instead of putting it at the top level?

@efiop
Copy link
Contributor

efiop commented Dec 23, 2020

Change version to schema?
If we tie our version to DVC's version, then version might make sense (as in, having new file format updates each major version of DVC).

To major releases or to minor ones too? What if it stabilizes and dvc v101 will generate the same schema s dvc v2?

Should version be strict or just informative?
docker compose for example, uses versioning, but it's for backward compatibility and is for informative purpose.

So far we've been implicitly treating lock files as strict, and when unknown fields are found - we are deeming the whole lockfile corrupted. So strict version might continue the trend, but informative (+ warnings about unknown fields) is definitely friendlier (people were complaining about nfiles/size fields additions recently).

Should we always default version keyword to 1.0?
Alternately, we could support uppercased, single-word variables or break compatibility completely (% of users that will be affected by this change would be small). This is in addition to the dvc.yaml/dvc.lock having version identifier. This would allow us to always have the "version-less" dvc.yaml to be considered the latest version, but make DVC always add them by default and recommend user to do the same as well (if they want compatibility).

Could you elaborate on variables, please? Not sure I understand.

@skshetry
Copy link
Member Author

skshetry commented Dec 24, 2020

To major releases or to minor ones too?

Better to not break compatibility on minor releases. But, no strong opinion with schema either (feels like it's too much of a technical term).

Could you elaborate on variables, please? Not sure I understand.

The only thing that makes v1 -> v2 incompatible is the parametrization syntax (foreach .. do and vars can be made compatible). As an example, if user were using ${TMPDIR} in the cmd, v2 will start complaining considering it as a parametrization syntax rather than an env var and they'd have to escape them with \ (as, \${TMPDIR}).

But, we could make them compatible if we consider uppercased single-word inside ${} as a special case and avoid interpolating them. This way, we could make the version informative rather than being a schema This way v1 will still support foreach and vars, but DVC can warn users on missing/new attributes. If in the future, if only there will be any incompatible changes in some attributes' schema, DVC will take the version into account (and, refuse to run if it cannot proceed).

Or, we could just break compatibility (how many users will be using ${VAR} vs $VAR syntax in cmd?).

Making the version strict has it's advantage, as we could always abandon the schema completely, but it comes at a cost of user-facing version identifer.

Could you please elaborate on using meta for version instead of putting it at the top level?

No reason, just wanted to house all possible meta fields there.

@skshetry skshetry changed the title Add versioning scheme in the dvc.yaml/dvc.lock Add versioning scheme in the dvc.lock Jan 5, 2021
@skshetry skshetry merged commit 0cc5227 into iterative:master Jan 6, 2021
@skshetry skshetry deleted the version-schema branch January 6, 2021 09:59
@efiop
Copy link
Contributor

efiop commented Jan 6, 2021

For the record: accidental merge, but still need to review post-merge 🙂

@@ -49,7 +52,8 @@ def test_stage_overwrites_if_already_exists(tmp_dir, dvc):
stage = PipelineStage(name="s2", repo=dvc, path="path", cmd="command3")
lockfile.dump(stage)
assert lockfile.load() == {
"s2": {"cmd": "command3"},
"stages": {"s2": {"cmd": "command3"}},
"meta": {"version": "2.0"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe let's just have top-level schema: 2.0? Meta is confusing with meta from dvc.yaml and version is confusing with dvc version.

return data

@property
def meta(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will definitely get confused with stage meta :(

@efiop
Copy link
Contributor

efiop commented Jan 6, 2021

My only concern is with meta and version, otherwise looks good. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhances DVC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants