-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DVC pulls corrupted file from Minio (S3) without recalculating hash #5502
Comments
Hi @maxim1317 . By default we trust s3 remotes when downloading objects from them, but you could make your dvc repo instance not trust it by using Could you share more details on how the upload get corrupted? Was it really during the upload or did the corruption occur after, in minio storage? Also, do you use https://dvc.org/doc/user-guide/managing-external-data ? |
Oh, I'm sorry, I didn't know About corruption: I'm not really sure whether it was due to aborted push or due to connection error, but I'm pretty sure that corruption didn't occur in minio - only this machine has rights for dvc bucket (for now) |
@maxim1317 Could you share more details on the circumstances? E.g. were you on bad unstable connection or something else went wrong?
How did you detect the corruption error? |
@efiop on pulling, actually - the correct file is 20Mb, but the one i pulled was 12MB. |
@maxim1317 And |
@efiop I'll try try to reproduce it with |
@maxim1317 That file is a part of dataset? (i.e. part of directory that you've |
@efiop yeah, it is part of a directory that was added as a whole |
@maxim1317 If so, you also need to delete the corresponding |
@efiop oh, didn't know that |
@maxim1317 Have you been running into this problem again? We got a similar report in #9641 so I've created iterative/dvc-s3#45 |
@efiop To my knowledge, no. We have enabled |
Bug Report
dvc pull: DVC pulls corrupted file from Minio (S3) without recalculating hash
Description
Had connection issue during dvc push and file was corrupted.
Then after pulling got this corrupted file and could't reupload it because MD5 was calculated for correct file.
Reproduce
Example:
dvc init
dvc add 1.txt
dvc push
1.txt
in minio;dvc pull
Expected
I expect DVC to check file MD5 on pulling. Or be able to reupload correct file.
Environment information
Ubuntu 20.04.2 LTS
Output of
dvc version
:The text was updated successfully, but these errors were encountered: