Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alternative URL scheme: http://sock.local #34

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

msabramo
Copy link
Owner

@msabramo msabramo commented Feb 22, 2017

Add a much friendlier URL scheme:

e.g.: http://sock.local/var/run/docker.sock/version

while still supporting the old scheme too (http+unix://%2Fvar%2Frun%2Fdocker.sock/info)

The new scheme is nice, because it:

Example:

In [1]: import requests_unixsocket

In [2]: session = requests_unixsocket.Session()

In [3]: res = session.get('http://sock.local/var/run/docker.sock/version')

In [4]: res.json()
Out[4]:
{'ApiVersion': '1.24',
 'Arch': 'amd64',
 'BuildTime': 'Mon, 19 Dec 2016 09:20:48 +1300',
 'GitCommit': '6b644ec',
 'GoVersion': 'go1.6.2',
 'KernelVersion': '4.4.0-63-generic',
 'Os': 'linux',
 'Version': '1.12.3'}

Abstract namespace sockets (Linux only) are also supported:

In [12]: res = session.get('http://sock.local/%00test_socket/get')

In [13]: res.json()
Out[13]:
{'args': {},
 'headers': {'Accept': '*/*',
  'Accept-Encoding': 'gzip, deflate',
  'Host': 'localhost',
  'User-Agent': 'python-requests/2.13.0'},
 'origin': '73.93.155.107',
 'url': 'http://localhost/get'}

Cc: @jkbrzt, @jessfraz, @RantyDave, @Kobla, @habnabit, @graingert, @3XE, @esben, @avian2, @wrouesnel, @tjj5036, @rockstar

e.g.: http+unix://unix.socket/var/run/docker.sock/version
e.g.: http+unix://unix.socket/var/run/docker.sock/version
so we can override scheme from `http+unix` to something else

e.g.:

```
$ REQUESTS_UNIXSOCKET_URL_SCHEME=http://sock.local examples/simple-http.py http://sock.local/var/run/docker.sock/version | jq .
{
  "Version": "1.12.3",
  "ApiVersion": "1.24",
  "GitCommit": "6b644ec",
  "GoVersion": "go1.6.2",
  "Os": "linux",
  "Arch": "amd64",
  "KernelVersion": "4.4.0-63-generic",
  "BuildTime": "Mon, 19 Dec 2016 09:20:48 +1300"
}
```
@msabramo msabramo changed the title Alternative URL scheme Alternative URL scheme: http://sock.local Feb 22, 2017
@avian2
Copy link
Contributor

avian2 commented Feb 22, 2017

The problem I see with the new URL scheme is that it's ambiguous. Any other system would interpret http://sock.local/... as a normal HTTP resource on the sock mDNS host (https://tools.ietf.org/html/rfc6762) as would probably also any developer just looking at the URL. It's no longer a unique resource identifier.

To avoid problems with having a path in the hostname part, how about something like http+unix://localhost/var/run/docker.sock/version? It's longer, but avoids the confusion.

@RantyDave
Copy link
Contributor

RantyDave commented Feb 22, 2017

Agreed with avian. ".local" is the domain used to advertise with multicast DNS. Why not use the 'triple slash' method as used with file url's (i.e. file:///var/www/something.html) ... so http+unix:///var/run/docker.sock/version ?

Edit: Actually, I can't see anything wrong with http+file:///var/run/docker.sock either.

@graingert
Copy link

http+x is deprecated in requests and will be dropped in the future. You need to register your own domain and use that:

https://github.com/msabramo/requests-unixsocket/pull/25/files

@avian2
Copy link
Contributor

avian2 commented Feb 22, 2017

For reference, I'm guessing @graingert is talking about these problems with http+x schemes in Requests. The issue seems to be still under discussion.

https://github.com/kennethreitz/requests/issues/3734
https://github.com/kennethreitz/requests/issues/3735

@jkbrzt
Copy link

jkbrzt commented Feb 22, 2017

I vote for http+unix:///var/run/docker.sock/version

@msabramo
Copy link
Owner Author

msabramo commented Feb 22, 2017

Thanks folks, I didn't realize that .local was specifically for mDNS. I wonder if we could use .localhost? E.g.: http://sock.localhost/var/run/docker.sock/version ?

The name localhost is reserved by the Internet Engineering Task Force (IETF) in RFC 2606 (June 1999) as a domain name label that may not be installed as a top-level domain...This allows the use of these names for either documentation purposes or in local testing scenarios.

@graingert
Copy link

graingert commented Feb 22, 2017 via email

@msabramo
Copy link
Owner Author

msabramo commented Feb 22, 2017

http+unix:///var/run/docker.sock/version seems to be gathering some momentum. I had been excited about getting rid of the http+unix custom protocol, because it's funky and I thought some software might refuse to recognize these as URLs, but maybe it's not so bad. The main user of this is httpie-unixsocket and that will recognize this; in fact I might not even have to change it, which is kind of nice.

Is anyone using requests-unixsocket for things besides httpie-unixsocket? And if so, will the http+unix protocol impact it in any way? I am guessing not, but thought I'd make sure.

@graingert
Copy link

graingert commented Feb 22, 2017 via email

@msabramo
Copy link
Owner Author

msabramo commented Feb 22, 2017

I wonder if we should run this by an HTTP and requests expert, by which I mean @Lukasa...

It would be nice to pick something that requests is likely to support in the future, and that won't cause them any trouble.

@msabramo
Copy link
Owner Author

msabramo commented Feb 22, 2017

@graingert: This new approach eliminates the need for URL encoding, which was the ugliest and most error-prone part of the old format. Note that URL starts with http+unix:/// (three slashes). This means that the hostname is blank and the path to the socket is in the path part of the URL, so slashes should be fine. See https://github.com/msabramo/requests-unixsocket/pull/34/files#diff-b425cc2e20f49bc2f6ea4538b259ea20R69 for how it works in the code.

Did I miss something?

@graingert
Copy link

@msabramo the three slashes is a bug in how urlparse parsed URLs: https://////google.com is still valid

@Lukasa
Copy link

Lukasa commented Feb 22, 2017

@msabramo Hey, how's it going?

So, Requests has been dealing with this a bit recently, as @graingert noted. The relevant requests issues are: kennethreitz/requests#3734, kennethreitz/requests#3735, kennethreitz/requests#3713, urllib3/urllib3#1080, and urllib3/urllib3#1089. As you can see by the length of that list, which is not even entirely exhaustive, we've spent a lot of time trying to thread the needle of the kind of scheme overloading that is happening here.

The specific version of the problem, at least as Requests sees it, is this: Requests knows how the HTTP/HTTPS URL schemes work, and assumes it does not understand how anything else works. Requests has a very strong view of this, meaning that for a scheme it does not understand it won't do anything to the URL, including add the params parameters to it. This is intended to allow extremely different URLs such as file:// to get through unmolested.

However, there have been a few cases where users have wanted requests to treat the URL like it's HTTP, even though it isn't. In this case, due to a lazily-coded check in Requests, it has been possible to get this behaviour for schemes that begin with http and then go on to do something wacky. This causes us an enormous number of problems because users generally only want us to do a subset of the things we know how to do to URLs: the most recent examples are all related to unix sockets and indicate that those users want us to do query parameters, but not IDNA. Because we broke a bunch of people with our IDNA change, we reverted it in the 2.12 series.

We've committed to continue to support this behaviour in the 2.x release series. However, it has caused us an enormous amount of problems, because there is no consistency over what exactly users want. The only situation where the Requests team is safe to keep evolving our URL handling if is we touch only HTTP schemed URLs, so the only thing we're committing to for the future 3.0 release is to work with exactly http and https schemes only. This means that the 3.0 release of Requests may not work as you expect with a http+unix or similar scheme URL.

However, if we do that we'll be adding the ability for those that provide transport adapters to also provide hooks for URL handling, such that the logic currently done in prepare_url can be overridden in a scheme-appropriate way. This would allow requests-unixsocket to tell requests to treat a scheme in a specific way (including adding the params to it appropriately).

The TL;DR of all this is that any of the approaches here will work well. The approach of using http:// plus a dummy domain will work well in 3.0 as well, but http+whatever will potentially require more work for the 3.0 series.

Does that help answer your question @msabramo?

@RantyDave
Copy link
Contributor

@graingert The three slashes are very much on purpose. See https://www.ietf.org/rfc/rfc1738.txt section 3.10

One potential (but IMHO small) problem is that the 'triple slash' method doesn't delineate between the path to the socket and the path within the served namespace. I.e. is http:///var/socket/thing/a request to /var/socket for thing/a or is it a request to /var/socket/thing for /a? I assume that walking the tree stat'ing the nodes will make it clear which is the socket, however...

@graingert
Copy link

https://tools.ietf.org/html/rfc1738#section-3.1 is less important than how it's implemented in browsers:

The WHATWG spec says it has to be one slash and that a parser must accept an indefinite amount of slashes. “http:/example.com” and “http:////////////////////////////////////example.com” are both equally fine.

https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/

@Lukasa
Copy link

Lukasa commented Feb 23, 2017

@RantyDave @graingert What matters more is absolutely a matter of opinion. However, it is unquestionable that the WHATWG considers two or more slashes after the scheme equivalent to just two, and so if any Python project changes to using the WHATWG view of a URL then the "triple-slash" scheme will fall over in a heap.

Note also that this happens with Requests:

>>> r = requests.get('http+unix:///var/run/docker.sock/version')
requests.exceptions.InvalidURL: Invalid URL u'http+unix:///var/run/docker.sock/version': No host supplied

Specifically, if Requests thinks your scheme is a HTTP scheme then it wants a conforming URL, and there is no conforming URL that a user can provide that does not have a host part. So that's not going to fly.

@RantyDave
Copy link
Contributor

Oh, I wasn't looking to 'start something'. Whatever works the best and/or causes the least pain is fine by me :) It seems that in practice the http://something.mostly.valid/whatever scheme is pretty much a constraint leaving the choice of what 'something.mostly.valid' is. Perhaps the core of the issue is to identify a tld that will "never" used and run with that?

Disappointingly the "java" reverse notation can't be used because of . in filenames. I.e. http://docker.sock.run.var.unix-socket/version isn't a goer, but http://'docker.sock'.run.var.unix-socket/version might be. I have now run out of knowledge...

@graingert
Copy link

graingert commented Feb 23, 2017 via email

@PeteDevoy
Copy link

PeteDevoy commented Feb 23, 2017

I am glad to see this is being thought about as obviously the whole 2F% thing is not ideal for readability or logging. Also, personally, it feels a bit weird mixing application layer and transport layer in the scheme (http+unix).

RFC2606 defines .invalid as a reserved TLD which is intended to never function:

  ".invalid" is intended for use in online construction of domain
  names that are sure to be invalid and which it is obvious at a
  glance are invalid.

Yes, the spec says "intended for use online" but at least we would know the domain would never be valid. So could do something like http://tld.invalid/var/run/docker.sock/version?

@graingert
Copy link

graingert commented Feb 23, 2017 via email

@PeteDevoy
Copy link

PeteDevoy commented Feb 23, 2017

@graingert Could that happen with the adapter/monkey patch in place?

@graingert
Copy link

graingert commented Feb 23, 2017

I mean you could monkey patch requests to accept it regardless. But the one sure fire way to get this to work, is use a domain name that you actually control.

@PeteDevoy
Copy link

PeteDevoy commented Feb 23, 2017

I was referring to the existing and proposed code -- I thought it may clean out the domain before requests sees it but, taking a closer look, I see it is not the case.

Sure, developers would be welcome to register their own domain names if they feel it is appropriate for their project but it seems reasonable to expect that the library would ship with a domain which is not valid, conflicting or confusing.

Requests would be within its rights to reject tld.invalid

On this point, I struggle to imagine that the requests library would start concerning itself with domain validation, maybe @Lukasa could comment?

@Lukasa
Copy link

Lukasa commented Feb 23, 2017

It's definitely extremely unlikely that we'd do it, but that isn't the same as saying it's not something we'd ever consider doing.

@graingert
Copy link

graingert commented Feb 23, 2017 via email

msabramo added a commit that referenced this pull request Aug 15, 2019
This way people can customize it to their liking, as there a lot of
opinions about this, as evidenced by the comments on GH-34.
msabramo added a commit that referenced this pull request Aug 15, 2019
This way people can customize it to their liking, as there a lot of
opinions about this, as evidenced by the comments on GH-34.

The default parsing is still the same as before, so new version don't
break existing code. But the user has the option of passing in a
settings object, which has a `urlparse` attribute that can be set to a
custom function that processes the URL and splits into a `sockpath` and
a `reqpath`.
@msabramo
Copy link
Owner Author

There are differing opinions about how to structure the URLs. So I had the idea of making the URL parsing pluggable so that people can make it work however they see fit. The default is still the old, ugly format from before, so that I don't break code that's been written to use that. But now you can pass a settings object in and that can have a urlparse attribute that can be set to be a function that parses the URL and splits it into a sockpath and a reqpath.

See #41.

What do people think of that?

msabramo added a commit that referenced this pull request Aug 15, 2019
This way people can customize it to their liking, as there a lot of
opinions about this, as evidenced by the comments on GH-34.

The default parsing is still the same as before, so new version don't
break existing code. But the user has the option of passing in a
settings object, which has a `urlparse` attribute that can be set to a
custom function that processes the URL and splits into a `sockpath` and
a `reqpath`.
msabramo added a commit that referenced this pull request Dec 27, 2021
This way people can customize it to their liking, as there a lot of
opinions about this, as evidenced by the comments on GH-34.

The default parsing is still the same as before, so new version don't
break existing code. But the user has the option of passing in a
settings object, which has a `urlparse` attribute that can be set to a
custom function that processes the URL and splits into a `sockpath` and
a `reqpath`.
msabramo added a commit that referenced this pull request Dec 28, 2021
This way people can customize it to their liking, as there a lot of
opinions about this, as evidenced by the comments on GH-34.

The default parsing is still the same as before, so new versions don't
break existing code. But the user has the option of passing in a
settings object, which has a `urlparse` attribute that can be set to a
custom function that processes the URL and splits it into a `sockpath`
and a `reqpath`.

Sem-Ver: feature
msabramo added a commit that referenced this pull request Jan 3, 2022
This way people can customize it to their liking, as there a lot of
opinions about this, as evidenced by the comments on GH-34.

The default parsing is still the same as before, so new versions don't
break existing code. But the user has the option of passing in a
settings object, which has a `urlparse` attribute that can be set to a
custom function that processes the URL and splits it into a `sockpath`
and a `reqpath`.

Sem-Ver: feature
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants