Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent support of IDNA hostnames in Client #1444

Closed
Martiusweb opened this issue Dec 2, 2016 · 2 comments · Fixed by #1445
Closed

Inconsistent support of IDNA hostnames in Client #1444

Martiusweb opened this issue Dec 2, 2016 · 2 comments · Fixed by #1445
Labels

Comments

@Martiusweb
Copy link
Contributor

Long story short

aiohttp's client handle IDNA hostnames in a way that seems inconsistent: the Host header always contains a dedcoded utf-8 value which seems problematic.

For instance:

  • session.get("http://éé.com/") makes a request with Host: éé.com
  • session.get("http://xn--9caa.com/") also makes a request with Host: éé.com.

While it's unclear to me if an unicode hostname should always be IDNA encoded (see bellow), it should at least not be decoded when explicitly encoded by the caller.

IDNA or not?

The newest HTTP/1 RFCs doesn't specify the encoding of the headers, but recommend to handle them as US-ASCII characters only for security reasons (see: https://tools.ietf.org/html/rfc7230#section-3, especially the last paragraph of 3.2.4).
Most of the resources I read from the W3C or the IETF (normative or not) tells that the hostname should always be encoded, for instance, https://www.w3.org/International/articles/idn-and-iri/#resolvedomain says:

Finally the user agent sends the request for the page. Since punycode contains no characters outside those normally allowed for protocols such as HTTP, there is no issue with the transmission of the address. This should simply match against a registered domain name.

Browsers I tested (Firefox, Chromium) always encode the hostname in IDNA.

I made some tests on a random hostname with unicode characters served by nginx. Nginx doesn't care about the encoding and applies the virtual host rules matching the exact string. Ie: with xn--9caa.com I see the right website, while éé.com returns a 404 probably because only the IDNA encoded version is specified in the configuration.

Expected behaviour

  • session.get("http://xn--9caa.com/") must make a request with Host: xn--9caa.com (encoded host).
  • session.get("http://éé.com/") should make a request with Host: xn--9caa.com (encoded host)

Actual behaviour

  • session.get("http://xn--9caa.com/") makes a request with a decoded host: Host: éé.com (UTF-8 encoded host).
  • session.get("http://éé.com/") makes a request with Host: éé.com too.

Suggested fix

It seems that self.url.raw_host should be used rather than self.url.host in ClientRequest:
https://github.com/KeepSafe/aiohttp/blob/master/aiohttp/client_reqrep.py#L168
(according to my quick test, yarl.URL.raw_host is always return the idna-encoded version, regardless of the encoding of the input url).

@asvetlov
Copy link
Member

asvetlov commented Dec 2, 2016

Nice catch!
I agree with self.url.raw_host usage.
@Martiusweb would you make a Pull Request?

Martiusweb added a commit to Martiusweb/aiohttp that referenced this issue Dec 2, 2016
Martiusweb added a commit to Martiusweb/aiohttp that referenced this issue Dec 2, 2016
@lock
Copy link

lock bot commented Oct 29, 2019

This thread has been automatically locked since there has not been
any recent activity after it was closed. Please open a new issue for
related bugs.

If you feel like there's important points made in this discussion,
please include those exceprts into that new issue.

@lock lock bot added the outdated label Oct 29, 2019
@lock lock bot locked as resolved and limited conversation to collaborators Oct 29, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants