Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compressed, chunk encoded streams are not flushed immediately #183

Merged
merged 1 commit into from
Oct 30, 2020

Conversation

matthiasblaesing
Copy link
Contributor

It was observed, that chunk encoded streams were not correctly passed
through the proxy servlet. Analysis showed, that the stream was gzip
compressed.

The issue is located in the Apache http client/JDK: if the stream from
client is compressed, apache http client will delegate to
GzipInputStream. The #available implementation of InflaterInputStream
(parent of GzipInputStream) returns 1 until EOF is reached.

This is not consistent with InputStream#available, which defines:

A single read or skip of this many bytes will not block, but may
read or skip fewer bytes.

To work around this, the optimization to check for available, a flush
is issued always.

@matthiasblaesing
Copy link
Contributor Author

matthiasblaesing commented Oct 23, 2020

I found an alternative - disable handling of decompression in client, then the stream can be directly passed through:

  /**
   * Called from {@link #init(javax.servlet.ServletConfig)}.
   * HttpClient offers many opportunities for customization.
   * In any case, it should be thread-safe.
   */
  protected HttpClient createHttpClient() {
    HttpClientBuilder clientBuilder = getHttpClientBuilder()
                                        .setDefaultRequestConfig(buildRequestConfig())
                                        .setDefaultSocketConfig(buildSocketConfig());
    
    clientBuilder.setMaxConnTotal(maxConnections);
    clientBuilder.setMaxConnPerRoute(maxConnections);
    clientBuilder.disableContentCompression();

    if (useSystemProperties)
      clientBuilder = clientBuilder.useSystemProperties();
    return buildHttpClient(clientBuilder);
  }

This has the drawback, that compression filters running on the webcontainer need respect already compressed data. This sounds reasonable, but ...

@dsmiley
Copy link
Collaborator

dsmiley commented Oct 25, 2020

Passing the stream directly through makes sense to me. Why decompress then re-compress after all?

@matthiasblaesing
Copy link
Contributor Author

Updated in place. The benefit is, that compression schemes other than gzip and deflate are no supported, if the backend server can handle them.

The potential problem I see: if a user has very simple compression filter active, that encodes the reply regardless of Content-Encoding headers already set, this will break. I checked the Jetty Gzip Handler and that correctly checks an existing Content-Encoding header to see if it should compress the stream coming from the servlet.

@dsmiley
Copy link
Collaborator

dsmiley commented Oct 25, 2020

The potential problem I see: if a user has very simple compression filter active, that encodes the reply regardless of Content-Encoding headers already set, this will break.

I'd argue it would be the fault of the hypothetical filter you speak of. I'm glad Jetty does this right; thanks for investigating.

Let's add a changes entry to mention GZip isn't done any more; the compression (or lack-there-of) is passed right through back to the origin. Users could configure it back if they want it.

@matthiasblaesing
Copy link
Contributor Author

I updated the change with a configuration parameter to return to the old behavior and a unittest, that verifies sane behavior for the passed on Accept-Encoding header. The change is documented in the CHANGES.md file.

@matthiasblaesing
Copy link
Contributor Author

And another update - ChunkedTransferTest is modified, to test all combinations of:

  • backend supports compression true/false
  • compression handling is delegated to apache http client true/false

To complete the image, the work-around for the invalid implementation of InputStream#available in the JDK is also present. This should protect all possible combinations, with minimal impact.

@dsmiley
Copy link
Collaborator

dsmiley commented Oct 29, 2020

+1 LGTM. Fantastic testing! Ready to merge?

It was observed, that chunk encoded streams were not correctly passed
through the proxy servlet. Analysis showed, that the stream was gzip
compressed.

The issue is located in the Apache http client/JDK: if the stream from
client is compressed, apache http client will delegate to
GzipInputStream. The #available implementation of InflaterInputStream
(parent of GzipInputStream) returns 1 until EOF is reached.

This is not consistent with InputStream#available, which defines:

    A single read or skip of this many bytes will not block, but may
    read or skip fewer bytes.

To work around this, two options are established:

- by default the stream from the backend will not be decompressed and
  passed through as is. In this case the problem is worked around
  because no decompression happens.

- an option is established to configure compression handling by apache
  http client. If this compression handling is enabled, the workaround
  is not to rely on InputStream#available, but flush always.
@matthiasblaesing
Copy link
Contributor Author

Yes I think this is ready - I just did an additional manual test with my problematic source and transfers look sane. I pushed a minimal update, that adjusts the name of the unittest for the Accept-Encoding header and does a slight adjustment to the CHANGES.md entry.

@dsmiley dsmiley merged commit 34edf58 into mitre:master Oct 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants