-
-
Notifications
You must be signed in to change notification settings - Fork 948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide a way to gracefully and securely handle non-percent-encoded query string values #1685
Comments
|
Re Nginx, it's at least obvious if there is invalid percent coding: https://github.com/nginx/nginx/blob/37984f0be1a5d608015517a64927d498888fb7ca/src/http/ngx_http_parse.c#L1532 |
Talking this issue in general, I'm admittedly not completely convinced it's something a Python 3 framework should/is able to handle at all as per PEP-3333 mechanics. On Python 2, this hearkens back to #1031 which was IIRC quite hairy to properly address. But regardless of Python version, I OTOH agree it might be worth reviewing how our test client handles path encoding edge cases. |
Not sure if this is the issue this addresses but here's an example:
{'offset': ['0'], 'query': ['pelÃ-culas de terror'], 'limit': ['10']}
from urllib.parse import parse_qs
parse_qs("offset=0&query=películas%20de%20terror&limit=10&version=") returns {'offset': ['0'], 'query': ['películas de terror'], 'limit': ['10']} |
Hi @sirfz, Just checking, which WSGI (or ASGI?) app server are you using? And which HTTP client? |
it's WSGI hosted with the latest uWSGI (testing with curl). update: some extra context about this issue. The encoding of the |
I don't think it is a Falcon or uWSGI issue per se, but we filed this to track scenarios like yours, and discuss whether we as a framework can do anything sensible about it. If, as mentioned, you run it behind Nginx, non-ASCII URLs are rejected outright with an HTTP 400, so probably it is less of an issue in a typical production environment. In PEP-3333 (WSGI), strings are tunneled as Latin-1, and if curl sends a UTF-8 string, things can get messed up completely, as you observed. |
Example of an error I get in production. This isn't a valid URL path in my app, so I can block these upstream, but some option to handle decoding gracefully and route would be ideal.
|
Hi @rokclimb15 , Just to understand this better, which WSGI server are you using? Do you have any reverse proxy or API gateway in front of it? And if possible to reveal it to us, how does the whole URL path look like? |
It's an AWS HTTP Gateway, similar to API Gateway. CloudFront is in front of that. I'm going to run a trace to capture one of these bad URLs. It's possible it's an interaction with the WSGI wrapper for the API environment, which does the following: "PATH_INFO": urllib.parse.unquote(event["rawPath"], encoding="iso-8859-1") |
@rokclimb15 looking at this again, maybe we should file a separate issue for your case. The path that arrives to the application does not comply with the WSGI spec, which says that it must be UTF-8 bytes tunneled as Latin-1 (so no character above Would automatically raising an HTTP 400 fit your use case, or would you like to be able to intercept it regardless? |
@vytas7 I had failed to notice that this ticket has to do with query strings, thanks for pointing that out! If the path arriving doesn't comply with WSGI then the bug is actually in https://github.com/adamchainz/apig-wsgi. I'll work on gathering more details and filing/fixing it there. If Falcon caught this exception and raised a 400 that would be helpful I think. |
Cool beans. I'll file a new issue for rendering an HTTP 400 in the case the path contains characters that are not representable in the ISO-8859-1 encoding. |
@rokclimb15 I filed this as #2340 so that we can make a decision on this. According to the spec, Falcon actually is not doing anything wrong by exploding with an unhandled exception in this case; however, we could still provide a more helpful error, be it an HTTP client error, or a Python exception with a more helpful error message. As to https://github.com/adamchainz/apig-wsgi, you can quote this part of the spec (quoted below) in your bug report:
|
I think a 500 is a better code if we decide to handle this error |
When the web server allows a non-RFC-compliant query string (i.e., one that includes non-ASCII byte values that were not percent-encoded) to be passed to the app, we should provide a way for the app to at least detect such values. We might also provide an option for automatically rejecting them or perhaps attempting to decode them as UTF8.
Should we also add support for automatically percent-encoding query params when using the
testing.simulate_*
methods?The text was updated successfully, but these errors were encountered: