-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Incorporate file sizes in dvc file #3256
Comments
I like the idea! On the other hand it might complicate the merge, diffs will become bigger? Also, are there any other fields that potentially could be useful (names? modes? type?). Just a though to consider - does it makes sense to revisit the discussion around taking md5 files hashes out? Potentially all this meta information can go into the same place. |
Yes 🙂 |
💯 |
For the record, might be solved in #1871 . |
From #2982
|
* dvc: add size for deps/outs Related to #3256 * dvc: add nfiles for deps/outs * dvc: put size/nfiles into the hash_info
Currently we are converting dir_info to/from lists all the time. The reason is that dir_info is stored as list of dicts in *.dir files, but that makes it hard to work with. In addition to that, we will likely be changing .dir file format in the near future iterative#829, so we need to abstract away dir_info into something that we won't care how it will be stored on disk. Related iterative#3256 Related iterative#4847
Currently we are converting dir_info to/from lists all the time. The reason is that dir_info is stored as list of dicts in *.dir files, but that makes it hard to work with. In addition to that, we will likely be changing .dir file format in the near future #829, so we need to abstract away dir_info into something that we won't care how it will be stored on disk. Related #3256 Related #4847
I've noticed that the size of the whole tracked dir is registered in .dvc file. but there is no data for individual files inside the tracked directory. I would like to have this information so I can display a list of files&size inside the directory without downloading any file. is it possible to add the size to the json file generated to track contents inside the directory? |
Hi @MetalBlueberry ! Great question! Indeed, we are thinking about adding size to the .dir cache file, but adding those right now will result in older dvc versions registering it as a cache corruption and also us not being able to self-validate .dir files without filtering them first (md5 of them shouldn't depend on We will also add support for these to |
Most likely will be replaced by #8884 , but if .dir-s stay in some shape or form we'll redesign them from scratch. So closing this. |
If we add file sizes in DVC-files (when we calculate checksum - so, no extra reads) it will help us to show this info in
dvc diff
/dvc list
and other commands with no I/O or computational overhead.Related to #2982
The text was updated successfully, but these errors were encountered: