-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remote: avoid chunking on webdav. Fixes #4796 #4828
remote: avoid chunking on webdav. Fixes #4796 #4828
Conversation
@skshetry I have tested the "not-chunked" with webdav both in pushing and pulling files.
At this points I just miss how to implement the selection between chunked an not chunked. |
dvc/tree/webdav.py
Outdated
self._client.upload_to(buff=chunks(), remote_path=to_info.path) | ||
else: | ||
# Upload to WebDAV without chunking | ||
self._client.upload_file( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LucaButera, this won't show up the progress bar. Can you check the snippets on the issue with progress bar? Please see if works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my testing the progress bar does work, but instead of having a global progress bar with "sub-bars" for each file you only get a single bar which updates for each file sent (more or less). This is perfectly fine for uploads of folders containing many reasonably sized files. It may actually become a problem when you upload one (or more) huge files. In such cases you have no information about the progress.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check with the snippet here that I mentioned: #4796 (comment)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here's what I found out:
The snippet works flawlessly. It gives per file progress bar which updates correctly even in the case of a single, really big, file.
A few things that may or may not be related that I wish to mention:
- When querying the remote cache the progress bar starts from
0/512
even though the files are more than that. In fact, after it reaches 512, it starts going up like513/?
and so on. - After pushing a 7GB file, cache querying became extremely slow. I am not sure if it is just because the cache itself was bigger. Previously my remote contained 2.9GB divided in ~900 files and it seemed insanely fast in proportion.
- During push some files generate an error 403 or 423. Again this disappears the second time you try to push the files that gave the error.
Let me know what you think about this and how to proceed, thank you!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry Did you have time to read the issues I mentioned? Do you think they are just server related problems or could there be some issues with dvc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LucaButera, I am not sure of 1.
For the 2, can you post the log? This patch should not affect that. Maybe, you were uploading the folders/files too-many times during the tests, that the local cache was getting invalidated. We try to guess the number of files to reduce no. of requests, so, that's my guess. Maybe, same thing is affecting 1.
For 3, it might happen with Owncloud/NextCloud if you are deleting/adding files too quickly, re in your test. So, it's not a DVC's issue. If that happens, retry again in a while.
Anyway, it should not be related to this patch. Please create a issue if that happens later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, @LucaButera, thanks for the contribution. 🎉 I have a few suggestions above.
desc=to_info.url if name is None else name, | ||
disable=no_progress_bar, | ||
) as fd_wrapped: | ||
self._client.upload_to( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LucaButera, can you test with an empty file? If that works well, it should be good to merge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a problem with this. I get this error:
Traceback (most recent call last):
File "/Users/lucabutera/dvc/dvc/cache/local.py", line 32, in wrapper
func(from_info, to_info, *args, **kwargs)
File "/Users/lucabutera/dvc/dvc/tree/base.py", line 356, in upload
self._upload( # noqa, pylint: disable=no-member
File "/Users/lucabutera/dvc/dvc/tree/webdav.py", line 234, in _upload
self._client.upload_to(
File "/Users/lucabutera/dvc/.env/lib/python3.8/site-packages/webdav3/client.py", line 66, in _wrapper
res = fn(self, *args, **kw)
File "/Users/lucabutera/dvc/.env/lib/python3.8/site-packages/webdav3/client.py", line 438, in upload_to
self.execute_request(action='upload', path=urn.quote(), data=buff)
File "/Users/lucabutera/dvc/.env/lib/python3.8/site-packages/webdav3/client.py", line 208, in execute_request
response = self.session.request(
File "/Users/lucabutera/dvc/.env/lib/python3.8/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/Users/lucabutera/dvc/.env/lib/python3.8/site-packages/requests/sessions.py", line 643, in send
r = adapter.send(request, **kwargs)
File "/Users/lucabutera/dvc/.env/lib/python3.8/site-packages/requests/adapters.py", line 469, in send
for i in request.body:
TypeError: 'CallbackIOWrapper' object is not iterable
Which I think is related to the file being empty.
I created the file with touch filename.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LucaButera, you might be able to solve this with passing disable=True
when the size
is 0
(see the total
kwarg is getting passed the total size).
If that does not work, we can use a context manager nullcontext
in the case of 0
size, otherwise use Tqdm.wrappattr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@skshetry
So, I tried to set disable=no_progress_bar or os.path.getsize(from_file) == 0
to no success. It yields the same error as before.
Using nullcontext
, instead, solves the issue. I implemented it in this fashion:
with open(from_file, "rb") as fd:
progress_context = nullcontext(fd) if file_size == 0 else Tqdm.wrapattr(fd, "read", ...)
with progress_context as fd_wrapped:
self._client.upload_to(buff=fd_wrapped, remote_path=to_info.path)
but then another error occurs:
Traceback (most recent call last):
File "/Users/lucabutera/dvc/dvc/cache/local.py", line 32, in wrapper
func(from_info, to_info, *args, **kwargs)
File "/Users/lucabutera/dvc/dvc/tree/base.py", line 356, in upload
self._upload( # noqa, pylint: disable=no-member
File "/Users/lucabutera/dvc/dvc/tree/webdav.py", line 236, in _upload
self._client.upload_to(
File "/Users/lucabutera/dvc/.env/lib/python3.8/site-packages/webdav3/client.py", line 66, in _wrapper
res = fn(self, *args, **kw)
File "/Users/lucabutera/dvc/.env/lib/python3.8/site-packages/webdav3/client.py", line 438, in upload_to
self.execute_request(action='upload', path=urn.quote(), data=buff)
File "/Users/lucabutera/dvc/.env/lib/python3.8/site-packages/webdav3/client.py", line 226, in execute_request
raise ResponseErrorCode(url=self.get_url(path), code=response.status_code, message=response.content)
webdav3.exceptions.ResponseErrorCode: Request to https://<user>@drive.switch.ch/remote.php/dav/files/<user>/datasets/d4/1d8cd98f00b204e9800998ecf8427e failed with code 411 and message: b'<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">\n<html><head>\n<title>411 Length Required</title>\n</head><body>\n<h1>Length Required</h1>\n<p>A request of the requested method PUT requires a valid Content-length.<br />\n</p>\n<hr>\n<address>Apache/2.4.18 (Ubuntu) Server at a01.drive.switch.ch Port 80</address>\n</body></html>\n'
Since I don't see the Content-lenght
explicitly set anywhere I think it gets set somewhere at a lower level when the size is other than 0. Can this be the problem?
Regarding you last suggestion, to use Tqdm.wrapattr
, I am not sure what you mean. I can try that if you elaborate further.
In the end, do we wish that an empty file gets uploaded anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It'd be great, if we could support this. It seems if the file is empty, requests
sends a chunked-encoding request anyway, which is what is causing the error. Let's ignore this for now, as it only affects you.
…limit * 'master' of github.com:iterative/dvc: dag: add --outs option (iterative#4739) Add test server and tests for webdav (iterative#4827) Simpler param updates with python-benedict (iterative#4780) checkpoints: set DVC_ROOT environment variable (iterative#4877) api: add support for simple wildcards (iterative#4864) tests: mark azure test as flaky (iterative#4881) setup.py: limit responses version for moto (iterative#4879) remote: avoid chunking on webdav. Fixes iterative#4796 (iterative#4828) checkpoints: `exp run` and `exp res[ume]` refactor (iterative#4855)
❗ I have followed the Contributing to DVC checklist.
📖 If this PR requires documentation updates, I have created a separate PR (or issue, at least) in dvc.org and linked it here.
Thank you for the contribution - we'll try to review it as soon as possible. 🙏