You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The chances of hash collisions between two different files is extraordinarily low -- unless the two files are both "empty", in which case they will both have checksum d41d8cd98f00b204e9800998ecf8427e.
Empty files can get created for various reasons, including by workflow tools such as Snakemake. Snakemake creates .snakemake_timestamp files that exist only for their mtime, which is then lost when the file is added to the cache. (#8602)
There is not much to be gained by caching/tracking these empty files either. We could explicitly ignore them via .dvcignore when we know that they might turn up, but perhaps DVC could ignore empty files by default?
By "ignore", I think I mean "leave in the workspace, don't add to cache". Not sure if they should be tracked by .dir files.
Not sure if there would be unintended consequences. If so, perhaps "ignore empty files" could be configurable.
The text was updated successfully, but these errors were encountered:
This would be breaking backward compatibility. Also imagine that your pipeline only created 1 file and it is empty - ignoring it would look like there was no output created at all, which looks like an error. Implicitly ignoring empty files seems too opinionated, you could indeed .dvcignore files that don't matter in your particular scenario instead.
The chances of hash collisions between two different files is extraordinarily low -- unless the two files are both "empty", in which case they will both have checksum
d41d8cd98f00b204e9800998ecf8427e
.Empty files can get created for various reasons, including by workflow tools such as Snakemake. Snakemake creates
.snakemake_timestamp
files that exist only for their mtime, which is then lost when the file is added to the cache. (#8602)There is not much to be gained by caching/tracking these empty files either. We could explicitly ignore them via
.dvcignore
when we know that they might turn up, but perhaps DVC could ignore empty files by default?By "ignore", I think I mean "leave in the workspace, don't add to cache". Not sure if they should be tracked by
.dir
files.Not sure if there would be unintended consequences. If so, perhaps "ignore empty files" could be configurable.
The text was updated successfully, but these errors were encountered: