Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

httputil: Header normalization behaves oddly for "ß" #2043

Closed
kfrendrich opened this issue May 17, 2017 · 4 comments
Closed

httputil: Header normalization behaves oddly for "ß" #2043

kfrendrich opened this issue May 17, 2017 · 4 comments
Labels

Comments

@kfrendrich
Copy link

Hi.

We fuzzy tested our application and discovered a problem with the normalization of header keys.
If a header key contains special characters, the re-normalization of the key cause problems in certain cases.

I am attaching the code that helps you to reproduce the problems.
bug.py.txt

@bdarnell
Copy link
Member

Ah, interesting. The special character in question here is the German letter ß, whose capital form is the two-letter sequence "SS". This makes python3 's capitalize() method (which we use for header normalization) non-idempotent (In python 2 there's no problem because the capitalize method is ascii-only):

>>> 'ß'.capitalize()
'SS'
>>> 'ß'.capitalize().capitalize()
'Ss'

HTTP header names are (practically speaking) limited to ascii-only, so one option is to raise an error if we are given a non-ascii header name. Or we could normalize the headers to lowercase instead of title case (the use of title case is mainly an aesthetic preference, although I think there are some non-compliant clients in the wild that expect title case in HTTP/1. HTTP/2 has mandated lowercase for all headers). Or we could use an ascii-only capitalization filter (although unless we write it in C we may not be able to match the speed of the built-in capitalize(), and we wouldn't want to slow things down just to improve the handling of a character that isn't even supposed to be there).

@bdarnell bdarnell changed the title header key normalization problem during fuzzy testing httputil: Header normalization behaves oddly for "ß" May 21, 2017
@spaceone
Copy link
Contributor

A ASCII only capitalize is as simple as:
'A-ß-C'.encode('ISO8859-1').title().decode('ISO8859-1')
So this should be done at least.

If not already the case, Tornado should raise a BAD REQUEST (400) if any header name contains non-ISO8859-1 data - but ß can also be send as ISO8859-1.

@bdarnell
Copy link
Member

bdarnell commented Oct 1, 2021

Using bytes.title() is a clever way to do the ascii-only capitalization at C speed, although processing the string three times is unfortunate. (I thought there was no such method when I was doing the original conversion to Python 3, but it turns out it was just undocumented)

Anyway, as I said before I think it's better to just raise an error for anything non-ascii than to try and be smarter about capitalization for certain characters in iso-8859-1. However, there's another option that may be even better: just lowercase everything and don't try to capitalize the first letters of words at all. This is what HTTP/2 does and I think it makes more sense to evolve in that direction instead of maintaining capitalization patterns that were traditional with HTTP/1. The only problem is that is a visible change that may have unforeseen consequences.

@spaceone
Copy link
Contributor

spaceone commented Oct 1, 2021

I still like the capitalization. httoop and other HTTP libraries also raise errors on non-ASCII header names.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants