Client explicitly asks server to not compress data #1251

sechkova · 2020-12-22T18:49:42Z

Description of issue or feature request:

After the acceptance of TAP10 the framework no longer handles compressed metadata. Instead,

the specification should instead recommend that compression be implemented at the presentation layer of the OSI protocol stack, which is better suited to handle decompression of data. Once decompressed, metadata can be validated against the hashes and signatures listed in the Snapshot file.

However, the implementation explicitly asks server to not compress data to avoid potential issues with some server configurations.

Two questions coming from this observation:

Do all the calculations up to now (for example the ones in PEP458) consider uncompressed metadata?
Should the implementation match tap10 and allow handling of compressed metadata at a lower level?

Current behavior:
Client explicitly asks server to not compress data

Expected behavior:
Allow handling of compressed metadata at the presentation layer

joshuagl · 2021-01-04T17:16:17Z

Looks like this session header was introduced (d03dd0f) back in 2013, when adding support for SSL certificate verification, and has since been carried across multiple rewrites of tuf.download.

Use of this header was originally copy/pasted from pip (which pip introduced to handle some unexpected behaviour in urllib2). AFAICT from a brief perusal pip now uses the default headers implemented by requests of 'Accept-Encoding': ', '.join(('gzip', 'deflate')).

joshuagl · 2021-01-04T17:24:24Z

However, the implementation explicitly asks server to not compress data to avoid potential issues with some server configurations.

As far as I can tell with some git archaeology, the server configurations in question are those which supply unexpected encodings when Accept-Encoding is empty.

I vote that we remove any setting of 'Accept-Encoding' and use the default settings of the underlying requests framework. In future if users want a non-default encoding, they can implement their own fetcher (see #1250) to achieve this.

trishankatdatadog · 2021-01-04T18:11:48Z

Looks like this session header was introduced (d03dd0f) back in 2013, when adding support for SSL certificate verification, and has since been carried across multiple rewrites of tuf.download.

I did not sign that commit, so it might or might not have been me from a different life...

jku · 2021-01-05T10:29:38Z

I vote that we remove any setting of 'Accept-Encoding' and use the default settings of the underlying requests framework. In future if users want a non-default encoding, they can implement their own fetcher (see #1250) to achieve this.

I lean to the same solution... but I'll still document the alternative though (sechkova already said this but just for emphasis): allow compression only for metadata. We know the metadata is not compressed "on disk" so that should always be safe, even with webservers broken as described in the below comment.

AFAICT from a brief perusal pip now uses the default headers implemented by requests of 'Accept-Encoding': ', '.join(('gzip', 'deflate')).

In pip (src/_internal/network/utils.py) the same 'identity' encoding trick with the same reasoning is still used:

# We use Accept-Encoding: identity here because requests defaults to
# accepting compressed responses. This breaks in a variety of ways
# depending on how the server is configured.
# - Some servers will notice that the file isn't a compressible file
#   and will leave the file alone and with an empty Content-Encoding
# - Some servers will notice that the file is already compressed and
#   will leave the file alone, adding a Content-Encoding: gzip header
# - Some servers won't notice anything at all and will take a file
#   that's already been compressed and compress it again, and set
#   the Content-Encoding: gzip header
# By setting this to request only the identity encoding we're hoping
# to eliminate the third case.  Hopefully there does not exist a server
# which when given a file will notice it is already compressed and that
# you're not asking for a compressed file and will then decompress it
# before sending because if that's the case I don't think it'll ever be
# possible to make this work.
HEADERS = {'Accept-Encoding': 'identity'}  # type: Dict[str, str]

This is something I will have to deal with in the pip branch (to make sure metadata does get compressed).

sechkova · 2021-01-05T12:33:09Z

Another observation is that currently TUF deals with raw response content.

IIUC the requests' docs, we would like to deal with raw content for target files but use the requests framework to decode potentially compressed metadata files as shown in the example:

with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

trishankatdatadog · 2021-01-05T15:14:06Z

IIUC the requests' docs, we would like to deal with raw content for target files but use the requests framework to decode potentially compressed metadata files as shown in the example:

httpx has a similar API. My only concern here might be things like ZIP bomb attacks, but we should be able to avoid them as long as we iterate by reasonably-fixed-size chunks.

joshuagl · 2021-01-08T14:14:45Z

I'm leaning towards leaving this behaviour as-is for the current codebase, and addressing this in the reference fetcher during our refactor efforts.

jku · 2021-01-08T15:37:08Z

I'm leaning towards leaving this behaviour as-is for the current codebase, and addressing this in the reference fetcher during our refactor efforts.

👍 This is fine for the pip use case if we manage to do the network io work -- then I can make sure metadata gets compressed in the pip-specific fetcher.

This commit tries to deal with two interests: * metadata is highly repetitive and compressible: allowing compression would be good * there may be broken web servers (see https://github.com/pypa/pip/blob/404838abcca467648180b358598c597b74d568c9/src/pip/_internal/download.py#L842) that have problems with compression on already compressed target files We can make things better for that first interest while we have no real data for the second interest -- our current workarounds to avoid compression are based on hearsay, not testing. Now that individual fetchers are possible I suggest we simplify ngclient and allow compression. As an example the pip Fetcher could still use the pip response chunking code with all their workarounds -- pip certainly has better capability to maintain a mountain of workarounds and also has endless amounts of real-world testing compared to python-tuf. Details: * Stop modifying Accept-Encoding (Requests default includes gzip) * Don't use response.raw in RequestsFetcher as there is no need anymore * simple_server.py in tests now supports gzip: this gets used in all tests as the requests default is to ask for "gzip". * Fix issue in test_session_get_timeout(): it's not mocking the error that requests really raises in this case Fixes theupdateframework#1251 Signed-off-by: Jussi Kukkonen <[email protected]>

This commit tries to deal with two interests: * metadata is highly repetitive and compressible: allowing compression would be good * there may be broken web servers (see https://github.com/pypa/pip/blob/404838abcca467648180b358598c597b74d568c9/src/pip/_internal/download.py#L842) that have problems with compression on already compressed target files We can make things better for that first interest while we have no real data for the second interest -- our current workarounds to avoid compression are based on hearsay, not testing. Now that individual fetchers are possible I suggest we simplify ngclient and allow compression. As an example the pip Fetcher could still use the pip response chunking code with all their workarounds -- pip certainly has better capability to maintain a mountain of workarounds and also has endless amounts of real-world testing compared to python-tuf. Details: * Stop modifying Accept-Encoding (Requests default includes gzip) * Don't use response.raw in RequestsFetcher as there is no need anymore * Fix issue in test_session_get_timeout(): it's not mocking the error that requests really raises in this case Fixes theupdateframework#1251 Signed-off-by: Jussi Kukkonen <[email protected]>

This commit tries to deal with two interests: * metadata is highly repetitive and compressible: allowing compression would be good * there may be broken web servers (see https://github.com/pypa/pip/blob/404838abcca467648180b358598c597b74d568c9/src/pip/_internal/download.py#L842) that have problems with compression on already compressed target files We can make things better for that first interest while we have no real data for the second interest -- our current workarounds to avoid compression are based on hearsay, not testing. Now that individual fetchers are possible I suggest we simplify ngclient and allow compression. As an example the pip Fetcher could still use the pip response chunking code with all their workarounds -- pip certainly has better capability to maintain a mountain of workarounds and also has endless amounts of real-world testing compared to python-tuf. Details: * Stop modifying Accept-Encoding (Requests default includes gzip) * Don't use response.raw in RequestsFetcher as there is no need: This was a workaround for false "Content-encoding: gzip" inserted by a broken server -- and the workaround was only possible because we knew we never asked for compression * Fix issue in test_session_get_timeout(): it's not mocking the error that requests really raises in this case Fixes theupdateframework#1251 Signed-off-by: Jussi Kukkonen <[email protected]>

joshuagl added the client Related to the client (updater) implementation label Jan 8, 2021

joshuagl added this to the Client Refactor milestone Jan 25, 2021

jku added backlog Issues to address with priority for current development goals ngclient labels May 26, 2021

sechkova self-assigned this Aug 27, 2021

sechkova removed their assignment Nov 16, 2021

jku mentioned this issue Jan 12, 2022

Version 1.0 release strategy #1645

Closed

jku added the 1.0.0 blockers To be addressed before 1.0.0 release label Jan 12, 2022

jku self-assigned this Jan 12, 2022

jku modified the milestones: Client Refactor, Sprint 15 Jan 12, 2022

jku mentioned this issue Jan 17, 2022

Allow compressed metadata in flight #1774

Merged

3 tasks

lukpueh closed this as completed in #1774 Jan 25, 2022

dennisvang mentioned this issue Jul 19, 2022

ngclient: Prevent automatic decoding of gzip files #2048

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Client explicitly asks server to not compress data #1251

Client explicitly asks server to not compress data #1251

sechkova commented Dec 22, 2020

joshuagl commented Jan 4, 2021

joshuagl commented Jan 4, 2021

trishankatdatadog commented Jan 4, 2021

jku commented Jan 5, 2021

sechkova commented Jan 5, 2021

trishankatdatadog commented Jan 5, 2021

joshuagl commented Jan 8, 2021

jku commented Jan 8, 2021

Client explicitly asks server to not compress data #1251

Client explicitly asks server to not compress data #1251

Comments

sechkova commented Dec 22, 2020

joshuagl commented Jan 4, 2021

joshuagl commented Jan 4, 2021

trishankatdatadog commented Jan 4, 2021

jku commented Jan 5, 2021

sechkova commented Jan 5, 2021

trishankatdatadog commented Jan 5, 2021

joshuagl commented Jan 8, 2021

jku commented Jan 8, 2021