-
-
Notifications
You must be signed in to change notification settings - Fork 90
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get rid of default Server header being non-ascii/unicode #27
Comments
@jaraco any ideas? |
So after some research in turned out that people on the Internet use Punycode representation of IDN. Also Python natively supports There's just one problem I faced: it is effectively possible to use spaces and potentially any weird symbols like lots of different types of unicode whitespaces which visually look the same. I'd use urlencode transformation to get rid of spaces, but I'm not sure whether it's a correct way to proceed. |
I've been playing with hostname conversion and here's what I've got so far: Prerequisites$ sudo hostname 'слава-україні йо!'
$ hostname
слава-україні йо! Python 2$ ipython
Python 2.7.10 (default, Nov 12 2015, 11:02:08)
Type "copyright", "credits" or "license" for more information.
IPython 4.1.1 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import socket
In [2]: hn = socket.gethostname()
In [3]: hn
Out[3]: '\xd1\x81\xd0\xbb\xd0\xb0\xd0\xb2\xd0\xb0-\xd1\x83\xd0\xba\xd1\x80\xd0\xb0\xd1\x97\xd0\xbd\xd1\x96 \xd0\xb9\xd0\xbe!'
In [4]: print(hn)
слава-україні йо!
In [8]: hn.decode('utf-8').encode('idna')
Out[8]: 'xn--- !-5cdabl4dgf2aovh9a7zpa'
In [9]: hn.decode('utf-8')
Out[9]: u'\u0441\u043b\u0430\u0432\u0430-\u0443\u043a\u0440\u0430\u0457\u043d\u0456 \u0439\u043e!'
In [13]: type(hn)
Out[13]: str
In [15]: print(hn.decode('utf-8'))
слава-україні йо!
In [16]: print(hn.decode('utf-8').encode('idna'))
xn--- !-5cdabl4dgf2aovh9a7zpa
In [18]: import urllib
In [24]: urllib.quote((hn.decode('utf-8').encode('idna')))
Out[24]: 'xn---%20%21-5cdabl4dgf2aovh9a7zpa'
In [25]: type(urllib.quote((hn.decode('utf-8').encode('idna'))))
Out[25]: str Python 3$ ipython
Python 3.6.0 (default, Jan 1 2017, 22:28:32)
Type "copyright", "credits" or "license" for more information.
IPython 5.3.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
In [1]: import socket
In [2]: hn = socket.gethostname()
In [3]: hn.encode('idna')
Out[3]: b'xn--- !-5cdabl4dgf2aovh9a7zpa'
In [4]: import urllib.parse
In [7]: urllib.parse.quote(hn)
Out[7]: '%D1%81%D0%BB%D0%B0%D0%B2%D0%B0-%D1%83%D0%BA%D1%80%D0%B0%D1%97%D0%BD%D1%96%20%D0%B9%D0%BE%21'
In [8]: urllib.parse.quote_from_bytes(hn.encode('idna'))
Out[8]: 'xn---%20%21-5cdabl4dgf2aovh9a7zpa'
In [9]: type(urllib.parse.quote_from_bytes(hn.encode('idna')))
Out[9]: str
In [10]: type(urllib.parse.quote_from_bytes(hn.encode('idna')).encode('ISO-8859-1'))
Out[10]: bytes
In [12]: urllib.parse.quote(hn.encode('idna').decode()).encode('ISO-8859-1')
Out[12]: b'xn---%20%21-5cdabl4dgf2aovh9a7zpa' |
I've tried sending Server header using IDNA, MIME and percent-encoding and here's what I see via HTTP clients (Chrome, Firefox, httpie, requests, curl):
This makes me feel confused:
/cc: @jaraco |
Problem
socket.gethostname()
returns current hostname and is used bycheroot.server.HTTPServer
as a default fallback;.encode('ISO-8859-1')
it when constructing headers, which leads toUnicodeEncodeError
being raised therefore.Proposal
Links
Ref: https://tools.ietf.org/html/rfc2231#section-4
Ref: werwolfby/monitorrent#214
The text was updated successfully, but these errors were encountered: