FileResponse failed to prepare proper headers for Range requests #2844

spcharc · 2018-03-17T14:51:46Z

Long story short

Please describe your problem and why the fix is important.

Users may pause their downloads of large static resources.
Many download managers including those browser-builtin ones check for Content-Range headers. Missing this header would cause downloads to fail.
I checked the source code and found out static resources are handled by FileResponse in web_fileresponse.py. But {0}.format(title)

Expected behaviour

What is the behaviour you expect?

Assume you have a BIG file big_file
big_file.size() == 10000 (9.77kb)
A user downloaded 500 bytes (0-499) and connection lost.
And then the user resumes download with request: "Range:bytes=500-"
Expected header: "Content-Range:bytes 500-9999/10000" <- this one is missing

Actual behaviour

What's actually happening?

The downloads just failed immediately after clicking resume button.
Actually aiohttp handles Range requests very well. HTTP 206 and Content-Length are both correctly returned to users.

Steps to reproduce

Please describe steps to reproduce the issue.
If you have a script that does that please include it here within
markdown code markup

Check the web_fileresponse.py source code, or try a download like this

    from aiohttp import web
    import asyncio
    loop = asyncio.get_event_loop()
    app = web.Application(loop = loop)
    app.add_static('/', folder_containing_large_file, show_index = True)
    web.run_app(app, host = '0.0.0.0', port = 4321)

goto http://your_addr:4321/
click the big_file

Your environment

Describe the environment you have that lead to your issue.
This includes aiohttp version, OS, proxy server and other bits that
are related to your case.

 IMPORTANT: aiohttp is both server framework and client library.
 For getting rid of confusing please put 'server', 'client' or 'both'
 word here.

Server
Python/3.6.4
aiohttp/3.0.9

Temporary workaround

in file web_fileresponse.py:
line 228 == "if count:"
in line 227 I added:
self.headers[hdrs.ACCEPT_RANGES] = 'bytes'
(Accept-Ranges header, which is to indicate that the server supports Range requests, is also missing. But I found out that usually web clients just ignore it. They still send Range requests even though this is missing. Maybe an Accept-Ranges:none header is needed to prevent Range requests?)
in line 229 I added:
if start is not None: self.headers[hdrs.CONTENT_RANGE] = f'bytes {start}-{start+count-1}/{file_size}'
(After this line is added, download managers work like a charm)
These two lines could be a temporary workaround. It has problems left listed as below

Other things to check

Some download managers use this header:

    If-Range:xxxxxx
    Range:bytes=yyyy-zzzz

And some use this (including Chrome browser):

    If-Unmodified-Since:xxxxxx
    Range:bytes=yyyy-zzzz

Both are not properly handled by FileResponse. Nothing is checked and HTTP 206 is returned with requested partial content
As far as I know, what is expected:

if condition:
    return HTTP 206, Content-Length=len(requested_part), Content-Range=start-end/full_len
else:
    return HTTP 200, Content-Length=full_len

Besides, multipart/byterange is not supported
A request with Range:bytes=0-10,20-30 will cause HTTP 416

(EOF. It's my first time using github, please tell me if anything wrong. English is not my first language, please point out if any mistake. Sorry about that)

Edit:
Only If-Range works in that way, but If-Unmodified-Since is like this:

if condition:
    return HTTP 206, Content-Length=len(requested_part), Content-Range=start-end/full_len
else:
    return HTTP 412 Precondition Failed,

The text was updated successfully, but these errors were encountered:

asvetlov · 2018-03-17T15:50:30Z

Thank you for report.
I agree that Content-Range / Accept-Ranges should be supported.
The issue is waiting for a champion for Pull Request :)

spcharc · 2018-03-17T16:03:19Z

What is a pull request? I've seen it a couple of times on github though.
You can fix it nice and easy as you have already made the handler for If-Modified-Since xxxxxx (expected HTTP 304 if unmodified else HTTP 200. FileResponse can handle this occasion already.)
The only problem would be multipart/byterange response for multiple ranges which is really rare. Most of the download managers don't use that kind of Range header

asvetlov · 2018-03-17T16:13:37Z

Pull Request is a way to making a patch for merging.
Quick googling suggest https://gist.github.com/Chaser324/ce0505fbed06b947d962 or https://yangsu.github.io/pull-request-tutorial/
There are a lot of resources describing how GitHub works with pull requests, what is workflow etc.
Please note: the change should contain tests for covering a new functionality.

spcharc · 2018-03-17T16:58:03Z

Thank you for your explanation. I'm trying to figure out how to use git command now.
I've checked this page
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Unmodified-Since
and found out if condition is False If-Unmodified-Since expects HTTP 412
(unlike If-Range expects HTTP 200 when False)

hubo1016 · 2018-03-18T02:37:48Z

If-Unmodified-Since should be processed together with If-Modified-Since, It is not required to be used together with range requests. If-Range could be processed together too, if not matched, remove the range header.

I don't have time for this now. If no one is working on it, I can do It later.

hubo1016 · 2018-03-18T03:06:01Z

If ETag is generated, client will use If-Match instead.

There is an implementation I created years ago:

https://github.com/hubo1016/vlcp/blob/master/vlcp/service/web/static.py#L65

It did not process If-Match header, But works for normal range requests. Only ETag is supported, Last-Modified is not returned.

spcharc · 2018-03-18T03:41:28Z

Thanks for reply. aiohttp does provide Last-Modified header and it doesn't have Etag support yet. So If-Match and If-None-Match handler isn't necessary as clients won't make requests with them. (They won't get any ETag beforehand)
ETag is better than Last-Modified on these occasions:

a file whose last-modified is changed a lot without changing its content.
a file with its content being changed serveral times in a single second. (minimum unit for Last-Modified header is sec)
you cannot get the accurate last-modified from a file.

Usually we don't change static resources a lot. And generating strong ETags requires some caching and hashing work. It's not cheap. Especially when dealing with large files.

hubo1016 · 2018-03-18T08:40:00Z

@spcharc It is not that different. In fact, usually we generate a ETag based on: modification time, inode, file size. They all come from fstat(). HMAC (sha256, md5) are NOT used. At least Apache and nginx uses this kind of ETag.

spcharc · 2018-03-18T13:10:56Z

I see. Thank you!
Actually I just want a fix because this problem is really annoying in my daily use
If ETag doesn't make my computer slower then I'm totally OK with it
I'm trying to make a pull request in about 2 days in this project, providing a simple fix for this issue. I'm learning to how to pull, feel free to close it

asvetlov · 2018-03-21T01:09:41Z

Fixed by #2847

spcharc · 2018-03-21T02:24:49Z

Today I thought about why ETag is better than date check again.
I searched on the web and I found out people mentioned that files extracted from compressed files would have a Last-Modification-Date in the past (before they were packed). If I extract files into the /static/ folder and overwrite something with the same name, it would cause a problem.
Besides, some people said sometimes a file of the same address could be different files on the server (depends on user settings)
For instance:
If user setting of color theme on a website changed from ‘red’ to ‘green’, then ‘/pics/background.jpg’ may changed from actually returning ‘/pics/red/backgroud.jpg’ to ‘/pics/green/backgroud.jpg’. If-Modified-Since would cause problem in this case, and ETag will work as expected
Maybe ETag support would be a necessary one?
But I don’t know how to implement a ETag support. (If someone share me the ETag algorithm I’m happy to help)
Fortunately we have @hubo1016 here who knows the implementation. We could expect a patch from him. He said that ETag would not impact performance, and does not involve hashing. I believe it would be a good choice to have it.

hubo1016 · 2018-03-21T02:56:21Z

@spcharc Currently I don't have enough time, but you can look at an implementation I wrote and tested before:
https://github.com/hubo1016/vlcp/blob/7d22c818a3056fa4265aaf9c309e63357483e439/vlcp/service/web/static.py#L139-L143

I've also modified it to a suitable version (removing Python 2.x compatibility)

def safehex(v):
    h = hex(v)[2:]
    return h

def _createetag(stat_info):
    etag = 'aiohttp-' + safehex(stat_info.st_ino) + '-' + safehex(int(stat_info.st_mtime)) + '-' + safehex(int(stat_info.st_size))
    # return etag.encode('ascii') # if bytes are wanted
    return etag

Call os.stat() on the file path to get the f_stat structure, use it to generate the ETag.

Whether to support ETag or not, and how to implement it is a decision of design. It should be discussed in a separated issue. And adding ETag should be a feature request rather than a bug fix which is not very urgent.

spcharc · 2018-03-21T03:07:46Z

@hubo1016 Thank you for your detailed explanation and suggestion. I will open another issue to discuss it.

lock · 2019-10-28T07:02:30Z

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a [new issue] for related bugs.
If you feel like there's important points made in this discussion, please include those exceprts into that [new issue].
[new issue]: https://github.com/aio-libs/aiohttp/issues/new

asvetlov added enhancement server labels Mar 17, 2018

spcharc mentioned this issue Mar 18, 2018

fileresponse content-range header #2847

Merged

asvetlov closed this as completed Mar 21, 2018

lock bot added the outdated label Oct 28, 2019

lock bot locked as resolved and limited conversation to collaborators Oct 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FileResponse failed to prepare proper headers for Range requests #2844

FileResponse failed to prepare proper headers for Range requests #2844

spcharc commented Mar 17, 2018 •

edited

Loading

asvetlov commented Mar 17, 2018

spcharc commented Mar 17, 2018 •

edited

Loading

asvetlov commented Mar 17, 2018

spcharc commented Mar 17, 2018

hubo1016 commented Mar 18, 2018

hubo1016 commented Mar 18, 2018

spcharc commented Mar 18, 2018 •

edited

Loading

hubo1016 commented Mar 18, 2018

spcharc commented Mar 18, 2018

asvetlov commented Mar 21, 2018

spcharc commented Mar 21, 2018 •

edited

Loading

hubo1016 commented Mar 21, 2018

spcharc commented Mar 21, 2018

lock bot commented Oct 28, 2019

FileResponse failed to prepare proper headers for Range requests #2844

FileResponse failed to prepare proper headers for Range requests #2844

Comments

spcharc commented Mar 17, 2018 • edited Loading

Long story short

Expected behaviour

Actual behaviour

Steps to reproduce

Your environment

Temporary workaround

Other things to check

asvetlov commented Mar 17, 2018

spcharc commented Mar 17, 2018 • edited Loading

asvetlov commented Mar 17, 2018

spcharc commented Mar 17, 2018

hubo1016 commented Mar 18, 2018

hubo1016 commented Mar 18, 2018

spcharc commented Mar 18, 2018 • edited Loading

hubo1016 commented Mar 18, 2018

spcharc commented Mar 18, 2018

asvetlov commented Mar 21, 2018

spcharc commented Mar 21, 2018 • edited Loading

hubo1016 commented Mar 21, 2018

spcharc commented Mar 21, 2018

lock bot commented Oct 28, 2019

spcharc commented Mar 17, 2018 •

edited

Loading

spcharc commented Mar 17, 2018 •

edited

Loading

spcharc commented Mar 18, 2018 •

edited

Loading

spcharc commented Mar 21, 2018 •

edited

Loading