-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Super slow 206 range requests #359
Comments
I came up with this temporary hot-fix. Does it make sense, or am I way off? from waitress.buffers import ReadOnlyFileBasedBuffer
from werkzeug.wsgi import _RangeWrapper
@app.after_request
def hotfix_wrapper(response):
"""
Intercept the Waitress-Flask interaction so that Flask doesn't read the entire file
from the beginning on each range request (terrible for performance).
"""
f = response.response
if isinstance(f, _RangeWrapper) and isinstance(f.iterable, ReadOnlyFileBasedBuffer):
if hasattr(f.iterable, "seekable") and f.iterable.seekable():
f.seekable = True
f.iterable.seek = f.iterable.file.seek
f.iterable.tell = f.iterable.file.tell
return response |
I don't understand what is going on here. The Somewhere in your app its creating/using a Second it doesn't seem to be monkey patching waitress at all, it is updating some attributes on an instance of a class. My recommendation instead is to just update |
There is no reference in PEP 3333 to [1] https://github.com/Pylons/webob/blob/259230aa2b8b9cf675c996e157c5cf021c256059/src/webob/response.py#L1459 |
I wasn't sure which side is at fault here, if any. Sorry if "monkey-patch Waitress" was the wrong attribution. I changed the naming above. Should I open this ticket with werkzeug then, are they out of line with their wrapper? I'm not familiar with the specifications / interfaces, deferring to you. |
@piskvorky No, it's all good. I'm dealing with COVID brain and just couldn't envision how you got into this situation. Talked about it with @mmerickel and it finally clicked. This is a change we can make in waitress to support this use case! I don't know when I'll have time for it, but thanks for opening the issue! |
Changelog: ========= Python Version Support ---------------------- - Python 3.6 is no longer supported by Waitress - Python 3.10 is fully supported by Waitress Bugfix ------- - "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell" attributes from the underlying file if the underlying file is seekable. This allows WSGI middleware to implement things like range requests for example See Pylons/waitress#359 and Pylons/waitress#363 - In Python 3 "OSError" is no longer subscriptable, this caused failures on Windows attempting to loop to find an socket that would work for use in the trigger. See Pylons/waitress#361 - Fixed an issue whereby "BytesIO" objects were not properly closed, and thereby would not get cleaned up until garbage collection would get around to it. This led to potential for random memory spikes/memory issues, see Pylons/waitress#358 and Pylons/waitress#357 . Features -------- - When the WSGI app starts sending data to the output buffer, we now attempt to send data directly to the socket. This avoids needing to wake up the main thread to start sending data. Allowing faster transmission of the first byte. See Pylons/waitress#364 - Add REQUEST_URI to the WSGI environment. REQUEST_URI is similar to "request_uri" in nginx. It is a string that contains the request path before separating the query string and decoding "%"-escaped characters. Signed-off-by: Wang Mingyu <[email protected]> Signed-off-by: Khem Raj <[email protected]>
Changelog: ========= Python Version Support ---------------------- - Python 3.6 is no longer supported by Waitress - Python 3.10 is fully supported by Waitress Bugfix ------- - "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell" attributes from the underlying file if the underlying file is seekable. This allows WSGI middleware to implement things like range requests for example See Pylons/waitress#359 and Pylons/waitress#363 - In Python 3 "OSError" is no longer subscriptable, this caused failures on Windows attempting to loop to find an socket that would work for use in the trigger. See Pylons/waitress#361 - Fixed an issue whereby "BytesIO" objects were not properly closed, and thereby would not get cleaned up until garbage collection would get around to it. This led to potential for random memory spikes/memory issues, see Pylons/waitress#358 and Pylons/waitress#357 . Features -------- - When the WSGI app starts sending data to the output buffer, we now attempt to send data directly to the socket. This avoids needing to wake up the main thread to start sending data. Allowing faster transmission of the first byte. See Pylons/waitress#364 - Add REQUEST_URI to the WSGI environment. REQUEST_URI is similar to "request_uri" in nginx. It is a string that contains the request path before separating the query string and decoding "%"-escaped characters. Signed-off-by: Wang Mingyu <[email protected]> Signed-off-by: Khem Raj <[email protected]>
Changelog: ========= Python Version Support ---------------------- - Python 3.6 is no longer supported by Waitress - Python 3.10 is fully supported by Waitress Bugfix ------- - "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell" attributes from the underlying file if the underlying file is seekable. This allows WSGI middleware to implement things like range requests for example See Pylons/waitress#359 and Pylons/waitress#363 - In Python 3 "OSError" is no longer subscriptable, this caused failures on Windows attempting to loop to find an socket that would work for use in the trigger. See Pylons/waitress#361 - Fixed an issue whereby "BytesIO" objects were not properly closed, and thereby would not get cleaned up until garbage collection would get around to it. This led to potential for random memory spikes/memory issues, see Pylons/waitress#358 and Pylons/waitress#357 . Features -------- - When the WSGI app starts sending data to the output buffer, we now attempt to send data directly to the socket. This avoids needing to wake up the main thread to start sending data. Allowing faster transmission of the first byte. See Pylons/waitress#364 - Add REQUEST_URI to the WSGI environment. REQUEST_URI is similar to "request_uri" in nginx. It is a string that contains the request path before separating the query string and decoding "%"-escaped characters. Signed-off-by: Wang Mingyu <[email protected]> Signed-off-by: Khem Raj <[email protected]>
Changelog: ========= Python Version Support ---------------------- - Python 3.6 is no longer supported by Waitress - Python 3.10 is fully supported by Waitress Bugfix ------- - "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell" attributes from the underlying file if the underlying file is seekable. This allows WSGI middleware to implement things like range requests for example See Pylons/waitress#359 and Pylons/waitress#363 - In Python 3 "OSError" is no longer subscriptable, this caused failures on Windows attempting to loop to find an socket that would work for use in the trigger. See Pylons/waitress#361 - Fixed an issue whereby "BytesIO" objects were not properly closed, and thereby would not get cleaned up until garbage collection would get around to it. This led to potential for random memory spikes/memory issues, see Pylons/waitress#358 and Pylons/waitress#357 . Features -------- - When the WSGI app starts sending data to the output buffer, we now attempt to send data directly to the socket. This avoids needing to wake up the main thread to start sending data. Allowing faster transmission of the first byte. See Pylons/waitress#364 - Add REQUEST_URI to the WSGI environment. REQUEST_URI is similar to "request_uri" in nginx. It is a string that contains the request path before separating the query string and decoding "%"-escaped characters. Signed-off-by: Wang Mingyu <[email protected]> Signed-off-by: Khem Raj <[email protected]>
Changelog: ========= Python Version Support ---------------------- - Python 3.6 is no longer supported by Waitress - Python 3.10 is fully supported by Waitress Bugfix ------- - "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell" attributes from the underlying file if the underlying file is seekable. This allows WSGI middleware to implement things like range requests for example See Pylons/waitress#359 and Pylons/waitress#363 - In Python 3 "OSError" is no longer subscriptable, this caused failures on Windows attempting to loop to find an socket that would work for use in the trigger. See Pylons/waitress#361 - Fixed an issue whereby "BytesIO" objects were not properly closed, and thereby would not get cleaned up until garbage collection would get around to it. This led to potential for random memory spikes/memory issues, see Pylons/waitress#358 and Pylons/waitress#357 . Features -------- - When the WSGI app starts sending data to the output buffer, we now attempt to send data directly to the socket. This avoids needing to wake up the main thread to start sending data. Allowing faster transmission of the first byte. See Pylons/waitress#364 - Add REQUEST_URI to the WSGI environment. REQUEST_URI is similar to "request_uri" in nginx. It is a string that contains the request path before separating the query string and decoding "%"-escaped characters. Signed-off-by: Wang Mingyu <[email protected]> Signed-off-by: Khem Raj <[email protected]> Signed-off-by: Trevor Gamblin <[email protected]>
2.1.2 ----- Bugfix ~~~~~~ - When expose_tracebacks is enabled waitress would fail to properly encode unicode thereby causing another error during error handling. See Pylons/waitress#378 - Header length checking had a calculation that was done incorrectly when the data was received across multple socket reads. This calculation has been corrected, and no longer will Waitress send back a 413 Request Entity Too Large. See Pylons/waitress#376 Security Bugfix ~~~~~~~~~~~~~~~ - in 2.1.0 a new feature was introduced that allowed the WSGI thread to start sending data to the socket. However this introduced a race condition whereby a socket may be closed in the sending thread while the main thread is about to call select() therey causing the entire application to be taken down. Waitress will no longer close the socket in the WSGI thread, instead waking up the main thread to cleanup. See Pylons/waitress#377 2.1.1 ----- Security Bugfix ~~~~~~~~~~~~~~~ - Waitress now validates that chunked encoding extensions are valid, and don't contain invalid characters that are not allowed. They are still skipped/not processed, but if they contain invalid data we no longer continue in and return a 400 Bad Request. This stops potential HTTP desync/HTTP request smuggling. Thanks to Zhang Zeyu for reporting this issue. See GHSA-4f7p-27jc-3c36 - Waitress now validates that the chunk length is only valid hex digits when parsing chunked encoding, and values such as ``0x01`` and ``+01`` are no longer supported. This stops potential HTTP desync/HTTP request smuggling. Thanks to Zhang Zeyu for reporting this issue. See GHSA-4f7p-27jc-3c36 - Waitress now validates that the Content-Length sent by a remote contains only digits in accordance with RFC7230 and will return a 400 Bad Request when the Content-Length header contains invalid data, such as ``+10`` which would previously get parsed as ``10`` and accepted. This stops potential HTTP desync/HTTP request smuggling Thanks to Zhang Zeyu for reporting this issue. See GHSA-4f7p-27jc-3c36 2.1.0 ----- Python Version Support ~~~~~~~~~~~~~~~~~~~~~~ - Python 3.6 is no longer supported by Waitress - Python 3.10 is fully supported by Waitress Bugfix ~~~~~~ - ``wsgi.file_wrapper`` now sets the ``seekable``, ``seek``, and ``tell`` attributes from the underlying file if the underlying file is seekable. This allows WSGI middleware to implement things like range requests for example See Pylons/waitress#359 and Pylons/waitress#363 - In Python 3 ``OSError`` is no longer subscriptable, this caused failures on Windows attempting to loop to find an socket that would work for use in the trigger. See Pylons/waitress#361 - Fixed an issue whereby ``BytesIO`` objects were not properly closed, and thereby would not get cleaned up until garbage collection would get around to it. This led to potential for random memory spikes/memory issues, see Pylons/waitress#358 and Pylons/waitress#357 . With thanks to Florian Schulze for testing/vaidating this fix! Features ~~~~~~~~ - When the WSGI app starts sending data to the output buffer, we now attempt to send data directly to the socket. This avoids needing to wake up the main thread to start sending data. Allowing faster transmission of the first byte. See Pylons/waitress#364 With thanks to Michael Merickel for being a great rubber ducky! - Add REQUEST_URI to the WSGI environment. REQUEST_URI is similar to ``request_uri`` in nginx. It is a string that contains the request path before separating the query string and decoding ``%``-escaped characters.
Changelog: ========= Python Version Support ---------------------- - Python 3.6 is no longer supported by Waitress - Python 3.10 is fully supported by Waitress Bugfix ------- - "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell" attributes from the underlying file if the underlying file is seekable. This allows WSGI middleware to implement things like range requests for example See Pylons/waitress#359 and Pylons/waitress#363 - In Python 3 "OSError" is no longer subscriptable, this caused failures on Windows attempting to loop to find an socket that would work for use in the trigger. See Pylons/waitress#361 - Fixed an issue whereby "BytesIO" objects were not properly closed, and thereby would not get cleaned up until garbage collection would get around to it. This led to potential for random memory spikes/memory issues, see Pylons/waitress#358 and Pylons/waitress#357 . Features -------- - When the WSGI app starts sending data to the output buffer, we now attempt to send data directly to the socket. This avoids needing to wake up the main thread to start sending data. Allowing faster transmission of the first byte. See Pylons/waitress#364 - Add REQUEST_URI to the WSGI environment. REQUEST_URI is similar to "request_uri" in nginx. It is a string that contains the request path before separating the query string and decoding "%"-escaped characters. Signed-off-by: Wang Mingyu <[email protected]> Signed-off-by: Khem Raj <[email protected]> Signed-off-by: Trevor Gamblin <[email protected]>
Changelog: ========= Python Version Support ---------------------- - Python 3.6 is no longer supported by Waitress - Python 3.10 is fully supported by Waitress Bugfix ------- - "wsgi.file_wrapper" now sets the "seekable", "seek", and "tell" attributes from the underlying file if the underlying file is seekable. This allows WSGI middleware to implement things like range requests for example See Pylons/waitress#359 and Pylons/waitress#363 - In Python 3 "OSError" is no longer subscriptable, this caused failures on Windows attempting to loop to find an socket that would work for use in the trigger. See Pylons/waitress#361 - Fixed an issue whereby "BytesIO" objects were not properly closed, and thereby would not get cleaned up until garbage collection would get around to it. This led to potential for random memory spikes/memory issues, see Pylons/waitress#358 and Pylons/waitress#357 . Features -------- - When the WSGI app starts sending data to the output buffer, we now attempt to send data directly to the socket. This avoids needing to wake up the main thread to start sending data. Allowing faster transmission of the first byte. See Pylons/waitress#364 - Add REQUEST_URI to the WSGI environment. REQUEST_URI is similar to "request_uri" in nginx. It is a string that contains the request path before separating the query string and decoding "%"-escaped characters. Signed-off-by: Wang Mingyu <[email protected]> Signed-off-by: Khem Raj <[email protected]> Signed-off-by: Trevor Gamblin <[email protected]>
After some debugging of slow 206 responses from Flask's
send_file
, I narrowed the issue down to an interface mismatch between Waitress and werkzeug.Deep down, werkzeug expects
wsgi.file_wrapper
to implementseekable()
(andseek()
andtell()
) for efficient 206:https://github.com/pallets/werkzeug/blob/347291802fcf89bc89660cc9dc62eb1303337bc2/src/werkzeug/wsgi.py#L576-L577
But Waitress provides no such methods in its
ReadOnlyFileBasedBuffer
, even when the underlying file-like (a regularopen
file on disk) has them:waitress/src/waitress/buffers.py
Lines 141 to 187 in f41e598
As a result, werkzeug uses a code path without
seek()
, where it keeps reading blocks from the beginning of the file until it reaches the range start offset… yuck:https://github.com/pallets/werkzeug/blob/347291802fcf89bc89660cc9dc62eb1303337bc2/src/werkzeug/wsgi.py#L595-L604
Motivation: We often request 206 ranges from the end of huge files, such as ZIP archives where the "master ZIP record" lives at the end of the file. But because of the problem above, the app reads the entire multi-gigabyte ZIP file just to return the last 2 KB.
I'm not sure if this is an issue with werkzeug or waitress, but the performance is so bad that this is a show-stopper for us. Thanks.
The text was updated successfully, but these errors were encountered: