-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encoding headers with ASCII breaks applications expecting them to be encoded in latin-1 #1778
Comments
gunicorn implements the last specification (and it used to work with previous versions of flask). If we want to relax it for old applications maybe we may want yo try to encode to latin1 if encoding to ascii didn't work. Thoughts ? cc @tilgovi |
to be more complete, gunicorn use the updated HTTP 1.1 spec (7230 and related). That could be indeed documented. |
From my position, the spec says "SHOULD" and PEP 333 mentions latin-1 so I think Gunicorn could support latin-1 even if applications are advised not to use non-ASCII characters. That's my initial reading, but I'll wait for others to weigh in. |
By this I mean only that Gunicorn would not be breaking any spec. Right now Gunicorn is more strict than it needs to be. It is not necessarily wrong to be so strict. We can decide together. |
RFC 7320 says that new headers "SHOULD" use ASCII. That most headers use ASCII right now seems to be just information about the current situation. But I think this sentence is more important:
So, latin-1 is clearly allowed. If RFC7320 says that new headers "SHOULD" use ASCII, the question is, who defines the headers? Is it the application using gunicorn or is it gunicorn itself. As I understand, the application defines the headers. |
What annoyed me is that change has been done 3 years ago without much people beeing blocked by it on the different frameworks until that recent change in flask. I would have thought that in 2018 all applications would follow a specification published in 2014. Part of the issue is that PEP3333 allowing ISO-8859-1 aka latin 1. Anyway I don't disagree to be more relaxed on it. I propose to do the following :
Thoughts ? |
@benoitc sorry for causing trouble! It's hard to keep track of all the different specs. 😓 If you show that warning, I predict a regular stream of issues being open on Flask about it though. I think it makes more sense to just allow Latin-1 silently, as it's part of the PEP. |
@davidism well that's kind of the point: bugging the user until it makes the correct change ;) I think that would be better but I have no strong opinion on it. @tilgovi @berkerpeksag ? |
@davidism no worries :) Thanks for bringing it up. I'm inclined towards being tolerant in this case, since it aligns with the PEP and with popular frameworks. |
+1 |
Until when the status quo would be kept though... yes users will open some bugs which prompted our change. I don’t see why it couldn’t be also fixed in other libraries and the pep updated. At least we should document it somewhere so users using a strict client won’t be surprised as well. Also if we do the change it will have to be in a major version since it’s a breaking change. |
I'm happy to make this change. Do we need a major version? It makes requirements looser for applications. Existing applications should continue to work. Applications which did not work before, but which violate the "SHOULD" will now work. In any case, I can document it somehow. |
Maybe 19.9? We can keep making 19.9.x or 19.x.y releases for Python 2 maintenance critical bug fixes for a bit, but still keep 20 for Python 3 only? |
I've been thinking about this more, and I'm ok switching Flask and Werkzeug to use ASCII here. I still think it's probably good if Gunicorn supports it, since it's part of the PEP. I'm assuming you've never had an issue reported about this before now so it seems like it's probably not well known anyway. If you're worried about breaking changes, I can handle it on my side only. I do plan on implementing ASGI in Werkzeug, which leaves the header encoding to the application, so that would be a good time for me to fix inconsistent use of ASCII vs Latin-1 on my side. |
I support the idea of @davidism: Flask should follow the recommendation and use ASCII, but gunicorn should also follow PEP3333 and allow Latin-1, even if other specifications do not recommend to do so. |
I guess I should have done this sooner, but I just tried sending a unicode filename with Flask and Gunicorn and it worked fine: pallets/flask#2766 (comment). Now I'm not clear where the issue is. |
I can't remeber exactly and I don't have a logfile at hand at the moment. But the problem arised with filenames containing german umlauts, like äöü. For example a filename like "Sterntour-Glärnischhütte.xlsx" would cause problems, caused by the "ä" and "ü" in the filename. |
Thanks, that filename caused the issue. The name used in the test in Flask triggered the |
Just for the reference: @davidism fixed the encoding in flask with pull request pallets/flask#2804, which will be released in flask 1.0.3. This will make flask compatible with the current behaviour of gunicorn. At this occasion, i want to thank @benoitc , @tilgovi and @davidism (and all other involved devs!) for their work on gunicorn and flask. I am positively surprised about the discussion which unfold in this issue! The issue was addressed very quick, the discussion was based on facts and a good solution for both packages was found very quickly! I'm allways surprised how quick issues are addressed in the open source world and that, most of the time, good solutions can be found for all involved parties. I think a fix would have taken much longer if both packages would have been closed source libraries, owned by some big coorperations. Thank you very much and keep up the great work! |
flask/werkzeug will encode than decode cookie value to Unicode with latin1:
|
@georgexsh so what's your expectation? |
@benoitc to point out flask/werkzeug is still had incompatible behavior with gunicorn. I will be happier if #1778 (comment) get implemented. btw, your reply is very agile, I admire your diligence. |
Given that it's never had a problem before between that code and Gunicorn, I'm not too concerned, although I may switch more things to ASCII eventually. Do you have a real failing example? |
run it with gunicorn yields an error:
the error is from this header:
when werkzeug dump cookies with
there is more detail in apache/superset#5377 but I think this is a bug in the application (superset), it should not rely on flask/werkzeug to encode CSV string to bytes but encode by itself. |
That's definitely an issue with that application. |
Some notes on the behaviour differences between Werkzeug(+Flask) and Gunicorn(+Flask):
Note that when writing-out the response headers, Werkzeug delegates to Python builtin ... which uses That will explain the behaviour differences. My reading of the considerable number and history of RFCs on this suggests that applications and servers should be restricting themselves to Either way, ideally Gunicorn and Werkzeug should behave the same here if possible, and it seems that a latin-1 character should not error the response. Gunicorn has a superior way of serializing the header values, so if there is a non-latin-1 character present, it detects this before committing 200 OK (like happens in Werkzeug), and can send 500. But it appears to me that it should allow |
Another possibly interesting check in corelibs:
So Python WSGI is very strict on characters allowed in the header name (alpha-num, hyphen, underscore), and bans non-printables from values. |
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed benoitc#1778. Signed-off-by: Brett Randall <[email protected]>
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed benoitc#1778. Signed-off-by: Brett Randall <[email protected]>
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed benoitc#1778. Signed-off-by: Brett Randall <[email protected]>
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed benoitc#1778. Signed-off-by: Brett Randall <[email protected]>
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed benoitc#1778. Signed-off-by: Brett Randall <[email protected]>
This commit reverts one aspect changed by 5f4ebd2 (#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed #1778. Signed-off-by: Brett Randall <[email protected]>
This commit reverts one aspect changed by 5f4ebd2 (benoitc#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed benoitc#1778. Signed-off-by: Brett Randall <[email protected]> (cherry picked from commit 879651b)
This commit reverts one aspect changed by 5f4ebd2 (#1151); header-values are again encoded as latin-1 and not ascii. Test is restored but uses a latin-1-mappable test-character, not a general utf8 character. Fixed #1778. Signed-off-by: Brett Randall <[email protected]> (cherry picked from commit 879651b)
Applications, like flask, which expect that the headers are using latin-1 encoding as specified in rfc5987 will break gunicorn because gunicorn uses ASCII encoding.
Issue #1353 was closed with the comment
which is, according to rfc5987 wrong. Headers should be encoded ISO-8859-1 (aka latin-1). However, a comment in the issue also mentioned commit 5f4ebd2, in which the encoding of the headers was switched back from latin-1 to ascii based on a section in rfc7230 which states that US-ASCII should be used:
Now, I'm not into all this RFC specifications and don't know which is the right way to do it. However, as other applications expect latin-1 as specified in rfc5987 but gunicorn implements the version given in rfc7230, it might be worth to discuss this, even if it's just for a clarification on which track gunicorn is.
Related Issues in gunicorn:
#1353 : wsgi.py > send_headers: encoding problem.
Related Issues in Flask:
pallets/flask#2766 : send_file: latin-1 encoding not compatible with gunicorn
pallets/flask#2223 : Fix send_file's attachment_filename to work with non-ascii filenames
Related RFC:
https://tools.ietf.org/html/rfc5987#section-1
The text was updated successfully, but these errors were encountered: