Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed the support for byte range requests #360

Merged
merged 33 commits into from
Jun 1, 2020
Merged

Conversation

veloman-yunkan
Copy link
Collaborator

@veloman-yunkan veloman-yunkan commented May 22, 2020

Will close kiwix/kiwix-tools#148

This PR enables handling of partial content requests with a single byte-range. Requests for two or more byte ranges (even if they effectively constitute a single continuous range) are rejected with a 416 (Range Not Satisfiable) error response. Such behaviour complies with somewhat liberal interpretation of the spec):

The 416 (Range Not Satisfiable) status code indicates that none of the ranges in the request's Range header field (Section 3.1) overlap the current extent of the selected resource or that the set of ranges requested has been rejected due to invalid ranges or an excessive request of small or overlapping ranges.

Support for If-Range conditional requests is not implemented.

Partial requests work only for content (entries) from the served ZIM file(s). If both Range: and Accept-Encoding: deflate headers are present in the request, the former has precedence and uncompressed partial content is sent in a 206 response (even if a 200 response to a respective non-range request would have its content compressed).

The changes have been validated with new test cases in the server unit-test (as well as some manual experiments with kiwix-serve and curl).

This PR should be easy to review in its final state though some issues with the state of the current code in master are more obvious when looking at the first several commits in the PR history. I didn't spend any time cleaning up the development history, since I believe that this PR can be merged as a single squashed commit.

The second component of a byte range, if present, designates the
index of the last byte to be included in the partial response.
In case of a partial response the size of the response is different
from the served entry size.
After this commit valid ranges of the form "bytes=firstbyte-lastbyte" should
be handled correctly.
@veloman-yunkan veloman-yunkan self-assigned this May 22, 2020
@codecov
Copy link

codecov bot commented May 22, 2020

Codecov Report

Merging #360 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #360   +/-   ##
=======================================
  Coverage   57.19%   57.19%           
=======================================
  Files          48       48           
  Lines        3614     3614           
  Branches     1809     1809           
=======================================
  Hits         2067     2067           
  Misses       1538     1538           
  Partials        9        9           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f52b220...f52b220. Read the comment docs.

@veloman-yunkan veloman-yunkan changed the title [WIP] Fixing the support for byte range requests Fixed the support for byte range requests May 26, 2020
@veloman-yunkan veloman-yunkan marked this pull request as ready for review May 26, 2020 10:06
Copy link
Member

@mgautierfr mgautierfr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is maybe a discussion about your question, but otherwise the code is good.

Comment on lines 317 to 318
// XXX: Which of the Accept-Encoding and Range headers has precedence if both
// XXX: are present? Can partial content be compressed?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching (a bit) the internet, I've found nothing about this.
I would say that yes, partial content can be compressed.

I think we should resolve the byteRange first and then compress the content if needed.
However, our use case here is that we need byteRange for media (video) and they are not compressible.
I don't know if we should handle both case in the same time (but we must be sure to answer with the correct headers anyway)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum... I'm not sure what I've reviewed. I've just comment this lines and github say me this is outdated...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching (a bit) the internet, I've found nothing about this.

Same for me. I guess I better have mentioned that in the PR description.

I think we should resolve the byteRange first and then compress the content if needed.

The interaction of the Accept-Encoding and Range is a little tricky (and it's strange that there is no formal specification of what the correct interpretation should be). Suppose that the client issues a regular (non-range) request for a large compressible resource (e.g. a huge JSON or XML document). The resource is compressed and streamed to the client but the connection is interrupted at 95%. The client issues a range request to fetch the remaining 5% of the compressed representation of the resource, with the purpose of recombining it with whatever was sent in the first session and being able to decompress it. In such an scenario (which is different from your point of view of how Range should play with Accept-Encoding) the load on the server can be excessive - entire resource must be compressed with the purpose of sending only a small part of it.

Note that for partial/interrupted download of a compressed resource the client has no chance to decompress the received part, find out the byte-range for the uncompressed version of the resource and issue a range request for the remaining part.

That's why the safest bet is to not combine compression with partial responses.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that for partial/interrupted download of a compressed resource the client has no chance to decompress the received part, find out the byte-range for the uncompressed version of the resource and issue a range request for the remaining part

Why ? That's what I was thinking that what it has to be done.

For me the range is about the uncompressed content. If a connection is lost at 95%, client must uncompress the data it received and ask for what is left to have (in uncompressed range). This range will be compressed again in a new stream. But it cannot ask for the last 5% of the compressed stream.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption was that in general it is impossible to decompress a partial archive. That assumption is probably wrong for the compression algorithms/formats used in HTTP. Still, I doubt that if we implement that in kiwix-serve the handling on the client side will be like we expect. Therefore I think that we should not spend any time on this until we know of authoritative formal specification of how it must work.

@veloman-yunkan
Copy link
Collaborator Author

@mgautierfr I re-requested your review so that you can provide your feedback or approve the latest version.

@mgautierfr
Copy link
Member

I've reviewed the full PR. The code is good. We still have this discussion about Range and Accept-Encoding but it is not a blocker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Problem with byte range access in kiwix-serve
2 participants