Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Get rid of default Server header being non-ascii/unicode #27

Closed
webknjaz opened this issue May 5, 2017 · 4 comments
Closed

Get rid of default Server header being non-ascii/unicode #27

webknjaz opened this issue May 5, 2017 · 4 comments
Assignees
Labels
bug Something is broken

Comments

@webknjaz
Copy link
Member

webknjaz commented May 5, 2017

Problem

  • socket.gethostname() returns current hostname and is used by cheroot.server.HTTPServer as a default fallback;
  • it is a user-defined input (one may set it to any value);
  • it may use unicode charset (confirmed at Linux and Win);
  • we forcefully .encode('ISO-8859-1') it when constructing headers, which leads to UnicodeEncodeError being raised therefore.

Proposal

  • urlencode header contents;
  • this looks allowed (and recommended by RFC);
  • think about other headers.

Links

Ref: https://tools.ietf.org/html/rfc2231#section-4
Ref: werwolfby/monitorrent#214

@webknjaz webknjaz added the bug Something is broken label May 5, 2017
@webknjaz
Copy link
Member Author

webknjaz commented May 5, 2017

@jaraco any ideas?

@webknjaz
Copy link
Member Author

So after some research in turned out that people on the Internet use Punycode representation of IDN. Also Python natively supports 'idna' codec.

There's just one problem I faced: it is effectively possible to use spaces and potentially any weird symbols like lots of different types of unicode whitespaces which visually look the same.

I'd use urlencode transformation to get rid of spaces, but I'm not sure whether it's a correct way to proceed.

@webknjaz
Copy link
Member Author

I've been playing with hostname conversion and here's what I've got so far:

Prerequisites

$ sudo hostname 'слава-україні йо!'
$ hostname
слава-україні йо!

Python 2

$ ipython
Python 2.7.10 (default, Nov 12 2015, 11:02:08) 
Type "copyright", "credits" or "license" for more information.

IPython 4.1.1 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import socket

In [2]: hn = socket.gethostname()

In [3]: hn
Out[3]: '\xd1\x81\xd0\xbb\xd0\xb0\xd0\xb2\xd0\xb0-\xd1\x83\xd0\xba\xd1\x80\xd0\xb0\xd1\x97\xd0\xbd\xd1\x96 \xd0\xb9\xd0\xbe!'

In [4]: print(hn)
слава-україні йо!

In [8]: hn.decode('utf-8').encode('idna')
Out[8]: 'xn--- !-5cdabl4dgf2aovh9a7zpa'

In [9]: hn.decode('utf-8')
Out[9]: u'\u0441\u043b\u0430\u0432\u0430-\u0443\u043a\u0440\u0430\u0457\u043d\u0456 \u0439\u043e!'

In [13]: type(hn)
Out[13]: str

In [15]: print(hn.decode('utf-8'))
слава-україні йо!

In [16]: print(hn.decode('utf-8').encode('idna'))
xn--- !-5cdabl4dgf2aovh9a7zpa

In [18]: import urllib

In [24]: urllib.quote((hn.decode('utf-8').encode('idna')))
Out[24]: 'xn---%20%21-5cdabl4dgf2aovh9a7zpa'

In [25]: type(urllib.quote((hn.decode('utf-8').encode('idna'))))
Out[25]: str

Python 3

$ ipython
Python 3.6.0 (default, Jan  1 2017, 22:28:32) 
Type "copyright", "credits" or "license" for more information.

IPython 5.3.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import socket

In [2]: hn = socket.gethostname()

In [3]: hn.encode('idna')
Out[3]: b'xn--- !-5cdabl4dgf2aovh9a7zpa'

In [4]: import urllib.parse

In [7]: urllib.parse.quote(hn)
Out[7]: '%D1%81%D0%BB%D0%B0%D0%B2%D0%B0-%D1%83%D0%BA%D1%80%D0%B0%D1%97%D0%BD%D1%96%20%D0%B9%D0%BE%21'

In [8]: urllib.parse.quote_from_bytes(hn.encode('idna'))
Out[8]: 'xn---%20%21-5cdabl4dgf2aovh9a7zpa'

In [9]: type(urllib.parse.quote_from_bytes(hn.encode('idna')))
Out[9]: str

In [10]: type(urllib.parse.quote_from_bytes(hn.encode('idna')).encode('ISO-8859-1'))
Out[10]: bytes

In [12]: urllib.parse.quote(hn.encode('idna').decode()).encode('ISO-8859-1')
Out[12]: b'xn---%20%21-5cdabl4dgf2aovh9a7zpa'

@webknjaz
Copy link
Member Author

I've tried sending Server header using IDNA, MIME and percent-encoding and here's what I see via HTTP clients (Chrome, Firefox, httpie, requests, curl):

Server:xn----7sbabh4ccwzd1a6ula
Server:=?UTF-8?Q?=E2=9C=B0?=
Server:%D0%9F%D1%80%D0%B8%D0%B2%D1%96%D1%82

This makes me feel confused:

  1. None of clients attempt to decode server value. Should we care about it?
  2. Why exactly do we use current hostname as a fallback? Can't we replace it with just Cheroot/v5.5.2?

/cc: @jaraco

@webknjaz webknjaz self-assigned this Jun 19, 2017
@webknjaz webknjaz changed the title [TODO] Support non-ascii server header Get rid of default Server header being non-ascii/unicode Jun 19, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something is broken
Projects
None yet
Development

No branches or pull requests

1 participant