You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Adding a directory instead of a single file as input/output of a stage would greatly facilitate the use of DVC with large datasets. A common use case would be when using a HDFS remote: a common practice is to partition a dataset in several different files so they can be read/written in parallel, but AFAIK with current implementation of DVC I would have to specify each single file with -d or -o, which could be in terms of hundreds.
If the remotes were made directory-aware, the .dvc file could store a list of checksums, one for each file in the directory.
The text was updated successfully, but these errors were encountered:
Dvc currently only supports local directories as both dependencies(-d dirname) and outputs(-o dirname) for the dvc run and they are also supported in dvc add dirname. Unfortunately as of right now directories are not supported for other types of remotes(s3, gs, hdfs, ssh, azure) except local in external output and external dependency scenarios. We will up the priority of this feature and will try to squeeze it into the next release or the one after that. ETA is end of this week.
efiop
changed the title
Specify a directory as output/input of a pipeline stage
hdfs: Specify a directory as output/input of a pipeline stage
Nov 24, 2018
Adding a directory instead of a single file as input/output of a stage would greatly facilitate the use of DVC with large datasets. A common use case would be when using a HDFS remote: a common practice is to partition a dataset in several different files so they can be read/written in parallel, but AFAIK with current implementation of DVC I would have to specify each single file with
-d
or-o
, which could be in terms of hundreds.If the remotes were made directory-aware, the .dvc file could store a list of checksums, one for each file in the directory.
The text was updated successfully, but these errors were encountered: