S3: getObject, ResponseInputStream close not working as expected. #2117

JstDust · 2020-10-26T12:42:28Z

Describe the bug

When you invoke close on ResponseInputStream from the getObject before you reach the end of the data. The stream will not immediately(/within reasonable time) close on big files.

Expected Behavior

When invoking close on an ResponseInputStream you would expect a close within at least a few seconds.

Current Behavior

It does not close until it did a complete read of the stream internally. (this seems related to connection recycling internally).

Steps to Reproduce

Upload a big file to S3 atleast a 1-2 gb to observe the issue on fast internet connection

var r = client.getObject(c -> {
            c.bucket(bucket).key(key).versionId(versionId);
        });
r.read(new byte[1024]); // just to make sure we actually opened the resource and did something.
r.close(); // now you have to wait until the entire file is read by the close internally.

Possible Solution

The internal handler closes the stream immediately, possibly by simply closing the underlying connection.
It could make a decision based on the remaining bytes to read to make the impact less significant on big files (e.g. if remaining > 1 MB close connection, otherwise consume).

Context

I was trying to create a test case for a wrapper i created for handling network related issues (e.g. connection reset). Where i would simply track the position in the stream and use a range request when the connection fails, to continue the stream transparently.
When testing it by creating a wrapper around the ResponseInputStream that throws randomly a IO exception i noticed the extreme delay when closing the ResponseInputStream.

Workaround

 var r = client.getObject(c -> {
            c.bucket(bucket).key(key).versionId(versionId);
        });
r.read(new byte[1024]); // just to make sure we actually opened the resource and did something.
r.abort(); // <<< Workaround, but this is not part of a InputStream contract
r.close(); // now the connection closes almost immediately.

Your Environment

AWS Java SDK version used: 2.15.14
JDK version used: 11
Operating System and version: windows 10

The text was updated successfully, but these errors were encountered:

debora-ito · 2020-11-10T23:44:37Z

It looks like abort is what you are looking for, according to the ResponseInputStream documentation:

If it is not desired to read remaining data from the stream, you can explicitly abort the connection via abort(). Note that this will close the underlying connection and require establishing an HTTP connection which may outweigh the cost of reading the additional data.

I'm not sure what is the expected behavior of close after a read has started, I'll do some research.

JstDust · 2020-11-17T14:57:16Z

@debora-ito It's fine that there is a abort() call for this. but it simply does not work well when you use wrappers to actually read the contents of a stream e.g. a simple InputStreamReader, or GZipInputStream etc.

Those are often returned directly like below, at that point the abort() and the whole S3 backend is lost to the caller.

public Reader readText(String bucket, String key){
  return new InputStreamReader(client.getObject(.....), StandardCharSets.UTF_8);
}

And the only thing that remains of the contract is the close() operation. In the java docs these are very clearly documented.
https://docs.oracle.com/javase/8/docs/api/java/io/InputStream.html#close--
https://docs.oracle.com/javase/8/docs/api/java/io/InputStreamReader.html#close--

"Closes the stream and releases any system resources associated with it."

I do not consider it a issue that on EOF the connection is returned to the pool. But if for some reason it gets closed early, continuing to read in the close() call is effectively not part of the contract. And it does cause long waits when something fails. e.g. when using a try-resource statement like below.

try(BufferedReader reader = (...s3call...)){
  String line;
  while((line=reader.readLine()) != null){
    if(line.length() < 10) throw new IllegalStateException("Line to short");
    ....some code...
  }
} // now have fun waiting if that is the first line in the multi gigabyte file.

So i consider it a contract violation of the InputStream#close() contract that getObject implements a InputStream and on close read is continued.
Also the gains of this aggressive pool return seems minimal at most specially on early close. Most of the time it's probably more IO and CPU friendly to to just close the underlying connection if the connection can not be returned immediately to the pool when close is invoked.

shearn89 · 2022-03-04T13:52:33Z

This appears to happen with other streams as well. I'm reading from a Kinesis Video stream that returns a GetMediaResponse for the important bit (reading the data). When I'm done, for example because I've hit my stopFragmentNumber, I stop processing. When I then call close() on the ResponseInputStream<GetMediaResponse> object, it takes about 3 seconds to close.

shearn89 · 2022-03-04T13:54:45Z

I can use the same workaround by doing:

        ResponseInputStream<GetMediaResponse> inputStream = (ResponseInputStream<GetMediaResponse>) kvsStreamTrackObject.getInputStream();
        inputStream.abort();

...although I then also have to call .release() else it throws an error on close.

…bda3b78d9 Pull request: release <- staging/8dbcb073-2c84-4481-bbc1-358bda3b78d9

hohwille · 2024-02-23T15:14:28Z

I fully agree to @JstDust. As some temporary workaround one would have to create its own S3InputStream class that delegates all method calls to a wraped ResponseInputStream but in the close() method it will first call abort() and then close().

@debora-ito IMHO there is no big thing to clarify as this is just the contract that every Java developer would expect. I can imagine that it is a huge effort to get all the S3 clients for every different programming languages perfect and it may also be tricky to consider changing existing behavior but this IMHO the way to go.
Java developers just want a seamless API experience for S3 and after all S3 stands for "Simple Storage Service" so developers would love to have it simple instead of figuring out all the things out themselves and add workarounds to gaps in the SDK such as also the demand to put via an OutputStream:

Could we expect things like this to be solved out of the box by the SDK from the top #1 hyperscaler?
From my PoV this would be lovely...

JstDust added bug This issue is a bug. needs-triage This issue or PR still needs to be triaged. labels Oct 26, 2020

debora-ito removed the needs-triage This issue or PR still needs to be triaged. label Nov 10, 2020

aws-sdk-java-automation added a commit that referenced this issue Aug 10, 2022

Merge pull request #2117 from aws/staging/8dbcb073-2c84-4481-bbc1-358…

494c102

…bda3b78d9 Pull request: release <- staging/8dbcb073-2c84-4481-bbc1-358bda3b78d9

yasminetalby added the p3 This is a minor priority issue label Nov 12, 2022

vatanrathi mentioned this issue Apr 2, 2023

[Bug]: Performance degradation of S3 Filesystem when working with large splittable files when closed before all data is consumed. apache/beam#25991

Closed

15 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

S3: getObject, ResponseInputStream close not working as expected. #2117

S3: getObject, ResponseInputStream close not working as expected. #2117

JstDust commented Oct 26, 2020

debora-ito commented Nov 10, 2020

JstDust commented Nov 17, 2020

shearn89 commented Mar 4, 2022

shearn89 commented Mar 4, 2022 •

edited

Loading

hohwille commented Feb 23, 2024

S3: getObject, ResponseInputStream close not working as expected. #2117

S3: getObject, ResponseInputStream close not working as expected. #2117

Comments

JstDust commented Oct 26, 2020

Describe the bug

Expected Behavior

Current Behavior

Steps to Reproduce

Possible Solution

Context

Workaround

Your Environment

debora-ito commented Nov 10, 2020

JstDust commented Nov 17, 2020

shearn89 commented Mar 4, 2022

shearn89 commented Mar 4, 2022 • edited Loading

hohwille commented Feb 23, 2024

shearn89 commented Mar 4, 2022 •

edited

Loading