Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push failing with "The write operation timed out" #4121

Closed
dreaquil opened this issue Jun 26, 2020 · 5 comments
Closed

Push failing with "The write operation timed out" #4121

dreaquil opened this issue Jun 26, 2020 · 5 comments
Labels
bug Did we break something? discussion requires active participation to reach a conclusion

Comments

@dreaquil
Copy link

Bug Report

dvc push of 302 files of approximately 8MB each (images) failed multiple times at approximately 2:20s managing to upload anywhere between 10-40 files each time. Once dvc push --jobs=1 was used, remaining 150 files uploaded in one go without issue.

Additional information:
ulimit -n = 1024

Please provide information about your setup

Tried on dvc version 1.0.0a11 and 1.0.2

Python version: 3.6.9
Platform: Linux-5.3.0-51-generic-x86_64-with-Ubuntu-18.04-bionic
Binary: False
Package: pip
Supported remotes: azure, http, https, ssh
Cache: reflink - not supported, hardlink - supported, symlink - supported
Filesystem type (cache directory): ('ext4', '/dev/nvme0n1p3')
Repo: dvc, git
Filesystem type (workspace): ('ext4', '/dev/nvme0n1p3')

Additional Information (if any):

If applicable, please also provide a --verbose output of the command, eg: dvc add --verbose.

@triage-new-issues triage-new-issues bot added the triage Needs to be triaged label Jun 26, 2020
@pared pared added the bug Did we break something? label Jun 26, 2020
@triage-new-issues triage-new-issues bot removed the triage Needs to be triaged label Jun 26, 2020
@pared
Copy link
Contributor

pared commented Jun 26, 2020

Workaround looks similar to our issues that we used to have with macOs.
Might be related to #2473

@efiop
Copy link
Contributor

efiop commented Jun 26, 2020

@dreaquil Could you please provide verbose log for the error you are getting?

@efiop efiop added the awaiting response we are waiting for your reply, please respond! :) label Jun 26, 2020
@dreaquil
Copy link
Author

After discussions, this is likely being cause by too many jobs flooding the connection causing timeouts. I was told by @pared that the current default num jobs if 4*num_cores. I believe this is too high as a default and should be reduced to approximately num_cores.

@efiop Sorry, I forgot to add the verbose logging. but reducing to 1 job resolved the issue in both instances.

@pared pared added discussion requires active participation to reach a conclusion and removed awaiting response we are waiting for your reply, please respond! :) labels Jul 13, 2020
@pared
Copy link
Contributor

pared commented Jul 13, 2020

I agree that in some cases, current defaults might be too much. Though, as I recall they have been set so, because of users complaining on slow upload/download.

@efiop
Copy link
Contributor

efiop commented Jul 13, 2020

@dreaquil Thanks! And thanks to @pared for investigating.

@dreaquil So is this azure that you are using? Could you provide full log, please?

We clearly shouldn't reduce the number of default jobs, but should either dynamically determine them (a lot of pre-requisites to it like #4050 ) or, what I suppose is the real cause, dynamically determine the upload chunk size, similar to how we do it in gs driver. But in order to tell we need the verbose log and info about what remote type you are using 🙂

@efiop efiop closed this as not planned Won't fix, can't repro, duplicate, stale Dec 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Did we break something? discussion requires active participation to reach a conclusion
Projects
None yet
Development

No branches or pull requests

3 participants