Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client explicitly asks server to not compress data #1251

Closed
sechkova opened this issue Dec 22, 2020 · 8 comments · Fixed by #1774
Closed

Client explicitly asks server to not compress data #1251

sechkova opened this issue Dec 22, 2020 · 8 comments · Fixed by #1774
Assignees
Labels
1.0.0 blockers To be addressed before 1.0.0 release backlog Issues to address with priority for current development goals client Related to the client (updater) implementation ngclient
Milestone

Comments

@sechkova
Copy link
Contributor

Description of issue or feature request:

After the acceptance of TAP10 the framework no longer handles compressed metadata. Instead,

the specification should instead recommend that compression be implemented at the presentation layer of the OSI protocol stack, which is better suited to handle decompression of data. Once decompressed, metadata can be validated against the hashes and signatures listed in the Snapshot file.

However, the implementation explicitly asks server to not compress data to avoid potential issues with some server configurations.

Two questions coming from this observation:

  • Do all the calculations up to now (for example the ones in PEP458) consider uncompressed metadata?
  • Should the implementation match tap10 and allow handling of compressed metadata at a lower level?

Current behavior:
Client explicitly asks server to not compress data

Expected behavior:
Allow handling of compressed metadata at the presentation layer

@joshuagl
Copy link
Member

joshuagl commented Jan 4, 2021

Looks like this session header was introduced (d03dd0f) back in 2013, when adding support for SSL certificate verification, and has since been carried across multiple rewrites of tuf.download.

Use of this header was originally copy/pasted from pip (which pip introduced to handle some unexpected behaviour in urllib2). AFAICT from a brief perusal pip now uses the default headers implemented by requests of 'Accept-Encoding': ', '.join(('gzip', 'deflate')).

@joshuagl
Copy link
Member

joshuagl commented Jan 4, 2021

However, the implementation explicitly asks server to not compress data to avoid potential issues with some server configurations.

As far as I can tell with some git archaeology, the server configurations in question are those which supply unexpected encodings when Accept-Encoding is empty.

I vote that we remove any setting of 'Accept-Encoding' and use the default settings of the underlying requests framework. In future if users want a non-default encoding, they can implement their own fetcher (see #1250) to achieve this.

@trishankatdatadog
Copy link
Member

Looks like this session header was introduced (d03dd0f) back in 2013, when adding support for SSL certificate verification, and has since been carried across multiple rewrites of tuf.download.

I did not sign that commit, so it might or might not have been me from a different life...

@jku
Copy link
Member

jku commented Jan 5, 2021

I vote that we remove any setting of 'Accept-Encoding' and use the default settings of the underlying requests framework. In future if users want a non-default encoding, they can implement their own fetcher (see #1250) to achieve this.

I lean to the same solution... but I'll still document the alternative though (sechkova already said this but just for emphasis): allow compression only for metadata. We know the metadata is not compressed "on disk" so that should always be safe, even with webservers broken as described in the below comment.

AFAICT from a brief perusal pip now uses the default headers implemented by requests of 'Accept-Encoding': ', '.join(('gzip', 'deflate')).

In pip (src/_internal/network/utils.py) the same 'identity' encoding trick with the same reasoning is still used:

# We use Accept-Encoding: identity here because requests defaults to
# accepting compressed responses. This breaks in a variety of ways
# depending on how the server is configured.
# - Some servers will notice that the file isn't a compressible file
#   and will leave the file alone and with an empty Content-Encoding
# - Some servers will notice that the file is already compressed and
#   will leave the file alone, adding a Content-Encoding: gzip header
# - Some servers won't notice anything at all and will take a file
#   that's already been compressed and compress it again, and set
#   the Content-Encoding: gzip header
# By setting this to request only the identity encoding we're hoping
# to eliminate the third case.  Hopefully there does not exist a server
# which when given a file will notice it is already compressed and that
# you're not asking for a compressed file and will then decompress it
# before sending because if that's the case I don't think it'll ever be
# possible to make this work.
HEADERS = {'Accept-Encoding': 'identity'}  # type: Dict[str, str]

This is something I will have to deal with in the pip branch (to make sure metadata does get compressed).

@sechkova
Copy link
Contributor Author

sechkova commented Jan 5, 2021

Another observation is that currently TUF deals with raw response content.

IIUC the requests' docs, we would like to deal with raw content for target files but use the requests framework to decode potentially compressed metadata files as shown in the example:

with open(filename, 'wb') as fd:
    for chunk in r.iter_content(chunk_size=128):
        fd.write(chunk)

@trishankatdatadog
Copy link
Member

IIUC the requests' docs, we would like to deal with raw content for target files but use the requests framework to decode potentially compressed metadata files as shown in the example:

httpx has a similar API. My only concern here might be things like ZIP bomb attacks, but we should be able to avoid them as long as we iterate by reasonably-fixed-size chunks.

@joshuagl joshuagl added the client Related to the client (updater) implementation label Jan 8, 2021
@joshuagl
Copy link
Member

joshuagl commented Jan 8, 2021

I'm leaning towards leaving this behaviour as-is for the current codebase, and addressing this in the reference fetcher during our refactor efforts.

@jku
Copy link
Member

jku commented Jan 8, 2021

I'm leaning towards leaving this behaviour as-is for the current codebase, and addressing this in the reference fetcher during our refactor efforts.

👍 This is fine for the pip use case if we manage to do the network io work -- then I can make sure metadata gets compressed in the pip-specific fetcher.

@joshuagl joshuagl added this to the Client Refactor milestone Jan 25, 2021
@jku jku added backlog Issues to address with priority for current development goals ngclient labels May 26, 2021
@sechkova sechkova self-assigned this Aug 27, 2021
@sechkova sechkova removed their assignment Nov 16, 2021
@jku jku added the 1.0.0 blockers To be addressed before 1.0.0 release label Jan 12, 2022
@jku jku self-assigned this Jan 12, 2022
@jku jku modified the milestones: Client Refactor, Sprint 15 Jan 12, 2022
jku pushed a commit to jku/python-tuf that referenced this issue Jan 13, 2022
This commit tries to deal with two interests:
* metadata is highly repetitive and compressible: allowing compression
  would be good
* there may be broken web servers (see
  https://github.com/pypa/pip/blob/404838abcca467648180b358598c597b74d568c9/src/pip/_internal/download.py#L842)
  that have problems with compression on already compressed target files

We can make things better for that first interest while we have no real
data for the second interest -- our current workarounds to avoid
compression are based on hearsay, not testing.

Now that individual fetchers are possible I suggest we simplify
ngclient and allow compression. As an example the pip Fetcher
could still use the pip response chunking code with all their
workarounds -- pip certainly has better capability to maintain
a mountain of workarounds and also has endless amounts of real-world
testing compared to python-tuf.

Details:
* Stop modifying Accept-Encoding (Requests default includes gzip)
* Don't use response.raw in RequestsFetcher as there is no need anymore
* simple_server.py in tests now supports gzip: this gets used in all
  tests as the requests default is to ask for "gzip".
* Fix issue in test_session_get_timeout(): it's not mocking the error
  that requests really raises in this case

Fixes theupdateframework#1251

Signed-off-by: Jussi Kukkonen <[email protected]>
jku pushed a commit to jku/python-tuf that referenced this issue Jan 13, 2022
This commit tries to deal with two interests:
* metadata is highly repetitive and compressible: allowing compression
  would be good
* there may be broken web servers (see
  https://github.com/pypa/pip/blob/404838abcca467648180b358598c597b74d568c9/src/pip/_internal/download.py#L842)
  that have problems with compression on already compressed target files

We can make things better for that first interest while we have no real
data for the second interest -- our current workarounds to avoid
compression are based on hearsay, not testing.

Now that individual fetchers are possible I suggest we simplify
ngclient and allow compression. As an example the pip Fetcher
could still use the pip response chunking code with all their
workarounds -- pip certainly has better capability to maintain
a mountain of workarounds and also has endless amounts of real-world
testing compared to python-tuf.

Details:
* Stop modifying Accept-Encoding (Requests default includes gzip)
* Don't use response.raw in RequestsFetcher as there is no need anymore
* Fix issue in test_session_get_timeout(): it's not mocking the error
  that requests really raises in this case

Fixes theupdateframework#1251

Signed-off-by: Jussi Kukkonen <[email protected]>
jku pushed a commit to jku/python-tuf that referenced this issue Jan 14, 2022
This commit tries to deal with two interests:
* metadata is highly repetitive and compressible: allowing compression
  would be good
* there may be broken web servers (see
  https://github.com/pypa/pip/blob/404838abcca467648180b358598c597b74d568c9/src/pip/_internal/download.py#L842)
  that have problems with compression on already compressed target files

We can make things better for that first interest while we have no real
data for the second interest -- our current workarounds to avoid
compression are based on hearsay, not testing.

Now that individual fetchers are possible I suggest we simplify
ngclient and allow compression. As an example the pip Fetcher
could still use the pip response chunking code with all their
workarounds -- pip certainly has better capability to maintain
a mountain of workarounds and also has endless amounts of real-world
testing compared to python-tuf.

Details:
* Stop modifying Accept-Encoding (Requests default includes gzip)
* Don't use response.raw in RequestsFetcher as there is no need anymore
* Fix issue in test_session_get_timeout(): it's not mocking the error
  that requests really raises in this case

Fixes theupdateframework#1251

Signed-off-by: Jussi Kukkonen <[email protected]>
jku pushed a commit to jku/python-tuf that referenced this issue Jan 17, 2022
This commit tries to deal with two interests:
* metadata is highly repetitive and compressible: allowing compression
  would be good
* there may be broken web servers (see
  https://github.com/pypa/pip/blob/404838abcca467648180b358598c597b74d568c9/src/pip/_internal/download.py#L842)
  that have problems with compression on already compressed target files

We can make things better for that first interest while we have no real
data for the second interest -- our current workarounds to avoid
compression are based on hearsay, not testing.

Now that individual fetchers are possible I suggest we simplify
ngclient and allow compression. As an example the pip Fetcher
could still use the pip response chunking code with all their
workarounds -- pip certainly has better capability to maintain
a mountain of workarounds and also has endless amounts of real-world
testing compared to python-tuf.

Details:
* Stop modifying Accept-Encoding (Requests default includes gzip)
* Don't use response.raw in RequestsFetcher as there is no need anymore
* Fix issue in test_session_get_timeout(): it's not mocking the error
  that requests really raises in this case

Fixes theupdateframework#1251

Signed-off-by: Jussi Kukkonen <[email protected]>
jku pushed a commit to jku/python-tuf that referenced this issue Jan 21, 2022
This commit tries to deal with two interests:
* metadata is highly repetitive and compressible: allowing compression
  would be good
* there may be broken web servers (see
  https://github.com/pypa/pip/blob/404838abcca467648180b358598c597b74d568c9/src/pip/_internal/download.py#L842)
  that have problems with compression on already compressed target files

We can make things better for that first interest while we have no real
data for the second interest -- our current workarounds to avoid
compression are based on hearsay, not testing.

Now that individual fetchers are possible I suggest we simplify
ngclient and allow compression. As an example the pip Fetcher
could still use the pip response chunking code with all their
workarounds -- pip certainly has better capability to maintain
a mountain of workarounds and also has endless amounts of real-world
testing compared to python-tuf.

Details:
* Stop modifying Accept-Encoding (Requests default includes gzip)
* Don't use response.raw in RequestsFetcher as there is no need anymore
* Fix issue in test_session_get_timeout(): it's not mocking the error
  that requests really raises in this case

Fixes theupdateframework#1251

Signed-off-by: Jussi Kukkonen <[email protected]>
jku pushed a commit to jku/python-tuf that referenced this issue Jan 25, 2022
This commit tries to deal with two interests:
* metadata is highly repetitive and compressible: allowing compression
  would be good
* there may be broken web servers (see
  https://github.com/pypa/pip/blob/404838abcca467648180b358598c597b74d568c9/src/pip/_internal/download.py#L842)
  that have problems with compression on already compressed target files

We can make things better for that first interest while we have no real
data for the second interest -- our current workarounds to avoid
compression are based on hearsay, not testing.

Now that individual fetchers are possible I suggest we simplify
ngclient and allow compression. As an example the pip Fetcher
could still use the pip response chunking code with all their
workarounds -- pip certainly has better capability to maintain
a mountain of workarounds and also has endless amounts of real-world
testing compared to python-tuf.

Details:
* Stop modifying Accept-Encoding (Requests default includes gzip)
* Don't use response.raw in RequestsFetcher as there is no need:
  This was a workaround for false "Content-encoding: gzip" inserted by
  a broken server -- and the workaround was only possible because we
  knew we never asked for compression
* Fix issue in test_session_get_timeout(): it's not mocking the error
  that requests really raises in this case

Fixes theupdateframework#1251

Signed-off-by: Jussi Kukkonen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
1.0.0 blockers To be addressed before 1.0.0 release backlog Issues to address with priority for current development goals client Related to the client (updater) implementation ngclient
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants