Replies: 1 comment 3 replies
-
Does your pipeline generate the csv's with different line endings depending on OS (i.e. DVC computes hashes by doing a dos2unix conversion for text files, but iirc the file sizes written to dvc.lock are read directly from the filesystem, which could lead to getting consistent hashes but different sizes in a case like this (because the hash generated on windows will be for the file without carriage returns, but the size will be for the file with carriage returns). |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I have some issues using a s3 bucket to push and pull input data with dvc.
I have a stage where the
deps
areouts
of a prior stage.When I reproduce the dvc.yaml and push the data on OSX(Mac)
and then pull it on another machine using Windows, I reproduce the dvc.yaml again,
dvc.lock shows the same hash but different file sizes for that stage
deps
and reruns the stage completely, then also showing different files sizes and hash on theouts
git marks the diff in my IDE in the deps of that stage in the .lock file only for sizes:
Any idea how the file size in
deps
could change if the hash is the same?What I tried so far:
deleting the files on OSX, pull them again from s3, reproducing dvc.yaml -> no changes detected, dvc.lock stays the same
delteing the files on Windows, pull them again from s3, reproducing dvc.yaml -> changes detected, dvc.lock shows different file sizes for the files in deps of that stage
On both systems, I definitely use the same git commit.
Beta Was this translation helpful? Give feedback.
All reactions