Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

output: Don't serialize hash_info or meta in cloud versioning. #8849

Merged
merged 1 commit into from
Jan 24, 2023

Conversation

daavoo
Copy link
Contributor

@daavoo daavoo commented Jan 20, 2023

Closes #8357


Here is how the .dvc file looks for 2.42.0 / this P.R at different steps of the workflow (click details to expand):

Importing with --version-aware

$ dvc import-url --version-aware $VERSIONED_BUCKET/data
$ cat data.dvc
2.42.0 #8849
md5: 677db99135cf82ed1b726b6be7abef49
frozen: true
deps:
- md5: a042035f7fb17fc84a42e33787ae4b6a.dir
  size: 13
  nfiles: 3
  path: s3://diglesia-bucket/data
  files:
  - size: 5
    version_id: dQymcW0EvEqy23qfrksrqo0ni2zGtPMG
    etag: 042967ff088f7f436015fbebdcad1f28
    relpath: bar
  - size: 4
    version_id: pd67.5YcvjqDm7jfmiHbpeQsXnprrXtf
    etag: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo
  - size: 4
    version_id: 2ruNQaorqjnsFQZ5cinHsZSHEw7cz26J
    etag: 7361528e901ca2f2f1952a68ad242d79
    relpath: goo
outs:
- md5: b5c587798f2c0f599ef935244b290bf1.dir
  size: 13
  nfiles: 3
  path: data
  push: false
md5: f642080c781ca41a33a3e6bf77959af5
frozen: true
deps:
- files:
  - size: 5
    version_id: dQymcW0EvEqy23qfrksrqo0ni2zGtPMG
    etag: 042967ff088f7f436015fbebdcad1f28
    relpath: bar
  - size: 4
    version_id: pd67.5YcvjqDm7jfmiHbpeQsXnprrXtf
    etag: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo
  - size: 4
    version_id: 2ruNQaorqjnsFQZ5cinHsZSHEw7cz26J
    etag: 7361528e901ca2f2f1952a68ad242d79
    relpath: goo
  path: s3://diglesia-bucket/data
outs:
- md5: b5c587798f2c0f599ef935244b290bf1.dir
  size: 13
  nfiles: 3
  path: data
  push: false

Tracking local folder with dvc add and running the first dvc push

$ mkdir data
$ echo bar > data/bar
$ echo foo > data/foo
$ dvc add data
$ dvc push -v
$ cat data.dvc
2.42.0 #8849
outs:
- md5: 168fd6761b9c3f2c081085b51fd7bf15.dir
  size: 8
  nfiles: 2
  path: data
  files:
  - size: 4
    version_id: iGoUk9Wlb95nNOw6iPHUrnNWq9E898YO
    etag: c157a79031e1c40f85931829bc5fc552
    md5: c157a79031e1c40f85931829bc5fc552
    relpath: bar
  - size: 4
    version_id: DwITkoYE8ipINKzK2QcGKvlwlqaS4oER
    etag: d3b07384d113edec49eaa6238ad5ff00
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo
outs:
- path: data
  files:
  - size: 4
    version_id: DhE9omrAq7IXC5wGIIGqruq9JyP_Nwhm
    etag: c157a79031e1c40f85931829bc5fc552
    md5: c157a79031e1c40f85931829bc5fc552
    relpath: bar
  - size: 4
    version_id: Vnu6l9_ECdLGrp6B5iK6YmQTm87UaDzP
    etag: d3b07384d113edec49eaa6238ad5ff00
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo

Tracking a local update with dvc add on that folder

$ echo bar2 > data/bar
$ dvc add data
$ cat data.dvc
2.42.0 #8849
outs:
- md5: 43f06a6517d71d0267bed23cfb1a53fb.dir
  size: 9
  nfiles: 2
  path: data
  files:
  - size: 5
    md5: 042967ff088f7f436015fbebdcad1f28
    relpath: bar
  - size: 4
    version_id: DwITkoYE8ipINKzK2QcGKvlwlqaS4oER
    etag: d3b07384d113edec49eaa6238ad5ff00
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo
outs:
- path: data
  files:
  - size: 5
    md5: 042967ff088f7f436015fbebdcad1f28
    relpath: bar
  - size: 4
    version_id: Vnu6l9_ECdLGrp6B5iK6YmQTm87UaDzP
    etag: d3b07384d113edec49eaa6238ad5ff00
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo

After pushing the local update with dvc push

$ dvc push 
$ cat data.dvc
2.42.0 #8849
outs:
- md5: 43f06a6517d71d0267bed23cfb1a53fb.dir
  size: 9
  nfiles: 2
  path: data
  files:
  - size: 5
    md5: 042967ff088f7f436015fbebdcad1f28
    relpath: bar
    version_id: QLNufQhPEwlSlNSfauemR5D23RRMSbfF
    etag: 042967ff088f7f436015fbebdcad1f28
  - size: 4
    version_id: DwITkoYE8ipINKzK2QcGKvlwlqaS4oER
    etag: d3b07384d113edec49eaa6238ad5ff00
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo
outs:
- path: data
  files:
  - size: 5
    md5: 042967ff088f7f436015fbebdcad1f28
    relpath: bar
  - size: 4
    version_id: Vnu6l9_ECdLGrp6B5iK6YmQTm87UaDzP
    etag: d3b07384d113edec49eaa6238ad5ff00
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo

Tracking a modification in the remote with dvc update

$ echo goo > goo
$ aws s3 cp goo $REMOTE/data/goo
$ dvc update data.dvc -v
$ cat data.dvc
2.42.0 #8849
outs:
- md5: b5c587798f2c0f599ef935244b290bf1.dir
  size: 13
  nfiles: 3
  path: data
  files:
  - size: 5
    version_id: QLNufQhPEwlSlNSfauemR5D23RRMSbfF
    etag: 042967ff088f7f436015fbebdcad1f28
    md5: 042967ff088f7f436015fbebdcad1f28
    relpath: bar
  - size: 4
    version_id: DwITkoYE8ipINKzK2QcGKvlwlqaS4oER
    etag: d3b07384d113edec49eaa6238ad5ff00
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo
  - size: 4
    version_id: xktZ1CcLphMaZheT_O9QrxFpmd9GrgK1
    etag: 7361528e901ca2f2f1952a68ad242d79
    md5: 7361528e901ca2f2f1952a68ad242d79
    relpath: goo
outs:
- path: data
  files:
  - size: 5
    version_id: dQymcW0EvEqy23qfrksrqo0ni2zGtPMG
    etag: 042967ff088f7f436015fbebdcad1f28
    md5: 042967ff088f7f436015fbebdcad1f28
    relpath: bar
  - size: 4
    version_id: pd67.5YcvjqDm7jfmiHbpeQsXnprrXtf
    etag: d3b07384d113edec49eaa6238ad5ff00
    md5: d3b07384d113edec49eaa6238ad5ff00
    relpath: foo
  - size: 4
    version_id: 2ruNQaorqjnsFQZ5cinHsZSHEw7cz26J
    etag: 7361528e901ca2f2f1952a68ad242d79
    md5: 7361528e901ca2f2f1952a68ad242d79
    relpath: goo

@daavoo daavoo linked an issue Jan 20, 2023 that may be closed by this pull request
Comment on lines +372 to +374
tree = Tree.from_list(self.files, hash_name=self.hash_name)
tree.digest()
self.hash_info = tree.hash_info
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This introduces overhead but need to check how big it is

@daavoo daavoo force-pushed the 8357-cloud-versioning-get-rid-of-dir-files branch from 34045d5 to 291418f Compare January 20, 2023 10:21
@codecov
Copy link

codecov bot commented Jan 20, 2023

Codecov Report

Base: 93.64% // Head: 93.70% // Increases project coverage by +0.05% 🎉

Coverage data is based on head (9019208) compared to base (61afdca).
Patch coverage: 98.21% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #8849      +/-   ##
==========================================
+ Coverage   93.64%   93.70%   +0.05%     
==========================================
  Files         453      453              
  Lines       36101    36146      +45     
  Branches     5237     5239       +2     
==========================================
+ Hits        33808    33871      +63     
+ Misses       1787     1765      -22     
- Partials      506      510       +4     
Impacted Files Coverage Δ
dvc/output.py 90.26% <94.11%> (+2.84%) ⬆️
dvc/stage/serialize.py 94.91% <100.00%> (+2.60%) ⬆️
tests/func/test_add.py 98.72% <100.00%> (+0.01%) ⬆️
tests/unit/output/test_output.py 100.00% <100.00%> (ø)
tests/unit/stage/test_serialize_pipeline_lock.py 100.00% <100.00%> (ø)
dvc/repo/experiments/queue/celery.py 86.29% <0.00%> (-1.86%) ⬇️
dvc/proc/manager.py 75.26% <0.00%> (-1.08%) ⬇️
dvc/stage/__init__.py 94.70% <0.00%> (+0.63%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@daavoo daavoo force-pushed the 8357-cloud-versioning-get-rid-of-dir-files branch 2 times, most recently from 74f5cc6 to 2138f0c Compare January 20, 2023 14:57
@daavoo daavoo marked this pull request as ready for review January 20, 2023 15:17
@daavoo daavoo requested review from a team and dberenbaum January 20, 2023 15:17
@dberenbaum
Copy link
Collaborator

LGTM 🎉 ! Nice work @daavoo!

@daavoo daavoo self-assigned this Jan 23, 2023
@daavoo daavoo added A: cloud-versioning Related to cloud-versioned remotes enhancement Enhances DVC labels Jan 24, 2023
@daavoo daavoo enabled auto-merge (rebase) January 24, 2023 14:29
@daavoo daavoo disabled auto-merge January 24, 2023 14:29
@daavoo daavoo enabled auto-merge (rebase) January 24, 2023 14:29
@daavoo daavoo force-pushed the 8357-cloud-versioning-get-rid-of-dir-files branch from 2138f0c to 28a80df Compare January 24, 2023 14:29
@daavoo daavoo removed the request for review from a team January 24, 2023 16:56
@daavoo daavoo force-pushed the 8357-cloud-versioning-get-rid-of-dir-files branch from 28a80df to 9019208 Compare January 24, 2023 16:57
@mergify
Copy link
Contributor

mergify bot commented Jan 24, 2023

⚠️ The sha of the head commit of this PR conflicts with #8829. Mergify cannot evaluate rules on this PR. ⚠️

@daavoo daavoo merged commit ea2e5dc into main Jan 24, 2023
@daavoo daavoo deleted the 8357-cloud-versioning-get-rid-of-dir-files branch January 24, 2023 17:40
Comment on lines +758 to +762
if (
(not self.IS_DEPENDENCY or self.stage.is_import)
and self.hash_info.isdir
and (kwargs.get("with_files") or self.files is not None)
):
Copy link
Member

@skshetry skshetry Jan 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to move this upward @daavoo? This might affect the ordering of metadata in .dvc file. We probably want files to appear at the end.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No real reason. Could be moved bellow

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #8879.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: cloud-versioning Related to cloud-versioned remotes enhancement Enhances DVC
Projects
None yet
Development

Successfully merging this pull request may close these issues.

cloud versioning: get rid of .dir files
3 participants