Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remote: add support for WebDAV #3647

Closed
wants to merge 16 commits into from
Closed

Conversation

shizacat
Copy link

Fixes #1153

Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @shizacat. Thanks for the PR. Took a quick look and looks good in general, though I have question.

dvc/remote/webdav.py Show resolved Hide resolved
dvc/remote/base.py Outdated Show resolved Hide resolved
dvc/remote/webdav.py Outdated Show resolved Hide resolved
@shizacat
Copy link
Author

shizacat commented Apr 21, 2020

@efiop Added tests

tests/unit/test_path_info.py Outdated Show resolved Hide resolved
@efiop
Copy link
Contributor

efiop commented Apr 22, 2020

@shizacat Btw, looks like your git email is not added to your github email list. Please go to https://github.com/settings/emails and make sure that your git email (you can see it in git log, not posting it myself so it is not picked up by bots) is added there. That way all of your contributions will be taken into account by github slightly_smiling_face

dvc/path_info.py Outdated
("#" + self.fragment) if self.fragment else "",
)

def get_collections(self) -> list:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I wonder how is this different from parents?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shizacat ^ ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This me need only one thing - replace scheme

@efiop
Copy link
Contributor

efiop commented Apr 22, 2020

@shizacat Could you also submit docs for this change, please? 🙏 (see https://dvc.org/doc/command-reference/remote for remote add/modify). We are not very familiar with WebDav ourselves, so it would be rather hard for us to come up with the docs without your help. Surely we will help out with it too. I guess one of the interesting questions is does this work with box.com and maybe some other popular services that support Webdav? If so, which ones?

Copy link
Member

@skshetry skshetry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work @shizacat. Works great, but, how are you testing this locally because webdav:// urls are not allowed on configs.

You need to add something similar to following in this place:

"webdav": {**HTTP_COMMON, **REMOTE_COMMON}

P.S. Also for webdavs. :)

@shizacat
Copy link
Author

@skshetry You are right. I forgot to add in commit.

@shizacat
Copy link
Author

shizacat commented Apr 23, 2020

@efiop Checked with yandex.ru (webdav), work, but due to threads, paths can be created in parallel. Added fix for this. About docs, I will look

@cached_property
def url(self):
return "{}://{}{}{}{}{}".format(
self.scheme.replace("webdav", "http"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about https? webdavs?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If replace in "webdavS" webdav on http will be "httpS"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, 🤦 sorry :)

Ok, a few concerns still:

  1. Do we need to adjust DEFAULT_PORTS in the URLInfo base class?
  2. Redefining url like this means loosing information. E.g. when we write in logs we will be getting http instead. In general feels like not a good idea creating a discrepancy between what we have inside class and what we return here.

Feels like we should be doing it on the Remote class level, or there should be some separate method - access url or something?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Fixed

It seems to me to do a separate method is over head. It will be necessary to rewrite the entire code from HTTP with one fix, and at the same time, there will still be the possibility use in logs the wrong URL.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was merely a random suggestion. The whole approach with overriding url feels like a hack and can hit us somewhere down the road. There should be a better way of doing this - e.g. RemoteWebDav class should understand how to deal with those URLs (translate them if needed and pass to HTTPRemote), etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, makes me think if we should tackle it the other way around by simply adding some flag that will tell HTTP(s) remote to use webdav to access this remote that would otherwise have a normal HTTP(s) URL. Something like

dvc remote add myremote https://example.com/path

and then

dvc remote modify myremote webdav true

or

dvc remote modify myremote type webdav

or something else. Not sure if this makes sense. It kinda goes against current conventions, but also I'm not sure if webdav is unique enough for a separate remote, maybe it is better to handle it as HTTP with some special tweaks. 🤔

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hack - not sure. The same DEFAULT_PORTS also change the state of the object without notice. The same S3 is HTTP.
Leaving url unchanged - then the purpose of the _BasePath class is not very clear. It seemed to me that he should return the ready-made address format for working with him.

As for - as an extension of http. The more I work with him, the more he is everyday. The same gc. I don’t know how it works, but if only the remote method is needed for it, it’s easy to add and now a big difference from HTTP is already obtained.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same DEFAULT_PORTS also change the state of the object without notice

Default ports logic is part of the protocols ... e.g. we want to detect that example.com:80 is the same as example.com. In your case you create an object that provides an inconsistent interface - some methods (and some logic internally) might still rely on webdav, while url has http. It can be very confusing - you ask and object to print its name - it prints Duck but then behave like Dog.

The same S3 is HTTP

not sure I got this point, could you clarify, please?

As for - as an extension of http. The more I work with him, the more he is everyday. The same gc. I don’t know how it works, but if only the remote method is needed for it, it’s easy to add and now a big difference from HTTP is already obtained.

Again, not sure I understood this. Btw, we can jump on Discord - dvc.org/chat and discuss this with me and Ruslan (ivan and ruslan there). Please, feel free to ping me/us.

@shizacat
Copy link
Author

Docs - iterative/dvc.org#1187

@efiop
Copy link
Contributor

efiop commented Apr 30, 2020

For the record: discussed privately that we should look around to see if there are any good libraries to handle webdav for us.

@efiop
Copy link
Contributor

efiop commented May 11, 2020

Hi @shizacat ! Need any help? 🙂

@shizacat
Copy link
Author

@efiop Sorry, May holidays ...

@shizacat
Copy link
Author

It need implemented through a separate library (for work with webdav). I will try this to do in the new PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WebDav: new remote type
5 participants