[Bug]: Performance degradation of S3 Filesystem when working with large splittable files when closed before all data is consumed. #25991

vatanrathi · 2023-03-27T12:00:22Z

vatanrathi · 2023-03-27T12:01:10Z

@iemejia can you kindly help here ?

iemejia · 2023-03-30T13:50:53Z

Hi, I am less involved in Beam recently. I wonder if what is going on is that somehow the Amazon library is slow with the connections. One thing to check is if the delays are when trying to close/drain the connections from the AWS IO connector. Notice that I added this code because the library warns about not doing it, but from a quick look I have seen few connectors in other systems doing it.

Maybe this manifests because the Spark runner has many many workers, is this the case? If so, this could definitely count.

I cannot think of an easy fix, apart of reverting the change, but we need to check first with the current maintainers if they are ok with this @aromanenko-dev @mosche or if there are other ideas.

vatanrathi · 2023-03-31T03:10:59Z

@iemejia You might be correct in saying that there could be an underlying issue with amazon sdk.

This is what I did so far:

beam-sdks-java-io-amazon-web-services - I tried putting patch to remove "drainInputStream" call from close() and performance is same across all latest versions. But, then returns previous aws warning about "Not all bytes read"
beam-sdks-java-io-amazon-web-services2 - Putting same patch to ignore draining resulted in improved performance but still lot worse than sdk1 ... I noticed there seems to be an issue with closing of ResponseInputStream which appears to be waiting for a long time. Based on a sample test it took around 6mins to close, so I added a "abort()" call before close/drain and to my surprise it result significantly improved performance which I would expect from latest beam + spark3

Below logs suggest that program waited ~6min for closing ResponseStream
21:27:23 dtime="2023-03-30 21:27:15.978", thread="idle-connection-reaper", lvl="DEBUG", logger="software.amazon.awssdk.http.apache.internal.net.SdkSslSocket", ctx="debug", jobId="xxxxx", executionId="xxxxx", closing xxxxx.s3.ap-southeast-2.amazonaws.com/52.95.131.46:443
21:33:44 dtime="2023-03-30 21:33:33.406", thread="Executor task launch worker for task 4.0 in stage 0.0 (TID 4)", lvl="INFO", logger="org.apache.spark.storage.memory.MemoryStore", ctx="logInfo", jobId="", executionId="", Block rdd_8_4 stored as values in memory (estimated size 67.4 MiB, free 15.8 GiB)

After adding "abort" call before draining (https://github.com/apache/beam/blob/master/sdks/java/io/amazon-web-services2/src/main/java/org/apache/beam/sdk/io/aws2/s3/S3ReadableSeekableByteChannel.java#L168) on sdk2, I did not observe any wait ...
However, I am not sure If adding an "abort" call would cause any issue to my program or is it a bad choice

aromanenko-dev · 2023-03-31T15:54:25Z

@vatanrathi

I tried putting patch to remove "drainInputStream" call from close() and performance is same across all latest versions.

Do you mean that it was not improved with that patch?

I'm wondering if it's even possible that close() will be called under normal circumstances before all data is read?

vatanrathi · 2023-04-02T11:45:17Z

@aromanenko-dev Sorry If I was not clear before ... Let me explain

Currently we are on beam 2.23.0 versions and given job finishes in around 10min. I tried to upgrade to 2.45.0 and noticed performance issues on both aws sdk1 and 2. So, I thought of upgrading versions step by step and thats where I noticed that performance started degraded from ver 2.31.0. Thats where I noticed this change which I believe is the root cause.

Below is my final findings based on several iterations of tests.

With aws sdk1, if I drainInputStream is removed from close() call, then execution time is same across versions.
However with sdk2 , with drainInputStream call in close(), pipeline runs for hours which takes only ~10min to finish on aws sdk1. if drainInputStream is closed, performance is improved but it still it took ~30mins to finish. But if s3ResponseInputStream.abort() is called before s3ResponseInputStream.close() in close(), then performance is significantly imporved and pipeline finishes within 3minutes.

  @Override
  public void close() throws IOException {
    if (s3ResponseInputStream != null) {
      **s3ResponseInputStream.abort()**
      drainInputStream(s3ResponseInputStream);
      s3ResponseInputStream.close();
    }
    open = false;
  }

I found a bug aws/aws-sdk-java-v2#2117 raised in aws-sdk-java-v2 for close() call which also complains that close() call unexpectedly waits.

For your question "I'm wondering if it's even possible that close() will be called under normal circumstances before all data is read?", I dont know the exact answer but I think as beam reads data in burst so when data read in first fetch is being processed, s3 try to close connection.

If you think we can avoid close() call by tweaking some http connection param in pipeline options or in some other way, kindly let me know

mosche · 2023-04-03T09:02:57Z

@vatanrathi Thanks for raising this, your pointer to aws/aws-sdk-java-v2/issues/2117 is very helpful 👍 Trying to drain the input stream in all cases is certainly dangerous considering that files might be very large and the byte range requested was from "position" to the very end.

As a quick workaround using abort (if position != contentLength) seems totally eligible, particularly when dealing with large files. Though, drawback is respective connections cannot be reused.

If it is not desired to read remaining data from the stream, you can explicitly abort the connection via abort(). Note that this will close the underlying connection and require establishing an HTTP connection which may outweigh the cost of reading the additional data.

I'll have a closer look the next days and will think about alternatives. It will probably make sense to read the data in chunks to minimize the overhead when closing but also allow reusing connections.

vatanrathi · 2023-04-03T10:26:18Z

@mosche Thanks a lot for agreeing to look into it.. It was causing a lot of trouble for us.

I would also need your comment on connection reuse. We have very large files (some are more than 200GBs) to process.
I have setup my httpClientConfig as below

options.setHttpClientConfiguraton(HttpClientConfiguraton.builder()
.connectionTimeout(1000 * 60 * 60  * 10) // 10 hours
.socketTimeout((1000 * 60 * 60  * 10) // 10 hours
.connectionMaxIdleTime((1000  * 10) // 10 seconds
.build());

This is to ensure that we DO NOT REUSE conn from pool that have been idle for more than 10sec since s3 closes idle conn after 20 sec which could result in using an already closed conn. This idle timeout matters as BEAM process data in bursts.

So, I (think) connection is closed every 10 secs which invoke close() call. Do you think my above config is fine for this use case ?

At this stage, I have upgraded to BEAM 2.45.0 with spark3 and aws skd2. I have put a patch on aws skd2 by including abort() call within close() function which is giving me best performance in my SIT environment. Do you think I should be able to take it to prod until a proper fix/workaround is implemented in BEAM sdk ?

mosche · 2023-04-03T12:38:37Z

@vatanrathi Note, if you use abort connections won't be kept open and force-closed anyways.
But configuring idle timeout this way should be fine 👍

connectionTimeout should be magnitudes lower, that's the timeout to establish the connection!

The amount of time to wait when initially establishing a connection before giving up and timing out.

socketTimeout can be high. Nevertheless, as your large files are hopefully splittable they are never read at once. You should be fine using a much lower timeout here as well.

… files (fixes apache#25991)

vatanrathi added awaiting triage bug labels Mar 27, 2023

github-actions bot added java spark P3 labels Mar 27, 2023

mosche added aws and removed awaiting triage labels Apr 3, 2023

mosche changed the title ~~[Bug]: BEAM-12329 causes performance issues~~ [Bug]: Performance degradation of S3 Filesystem when working with large splittable files when closed before all data is consumed. Apr 3, 2023

mosche self-assigned this Apr 5, 2023

mosche pushed a commit to mosche/beam that referenced this issue Apr 5, 2023

(AWS S3 FS) Fix performance issue of S3 filesystem when reading large…

db656bf

… files (fixes apache#25991)

mosche mentioned this issue Apr 5, 2023

(AWS S3 FS) Fix performance issue of S3 filesystem when reading large files #26114

Merged

3 tasks

aromanenko-dev closed this as completed in #26114 Apr 7, 2023

github-actions bot added this to the 2.48.0 Release milestone Apr 7, 2023

vatanrathi mentioned this issue Nov 7, 2023

[Bug]: TextIO read record twice for large delimited files #29146

Closed

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Performance degradation of S3 Filesystem when working with large splittable files when closed before all data is consumed. #25991

[Bug]: Performance degradation of S3 Filesystem when working with large splittable files when closed before all data is consumed. #25991

vatanrathi commented Mar 27, 2023 •

edited by mosche

Loading

vatanrathi commented Mar 27, 2023

iemejia commented Mar 30, 2023

vatanrathi commented Mar 31, 2023

aromanenko-dev commented Mar 31, 2023

vatanrathi commented Apr 2, 2023

mosche commented Apr 3, 2023

vatanrathi commented Apr 3, 2023

mosche commented Apr 3, 2023

[Bug]: Performance degradation of S3 Filesystem when working with large splittable files when closed before all data is consumed. #25991

[Bug]: Performance degradation of S3 Filesystem when working with large splittable files when closed before all data is consumed. #25991

Comments

vatanrathi commented Mar 27, 2023 • edited by mosche Loading

What happened?

Issue Priority

Issue Components

vatanrathi commented Mar 27, 2023

iemejia commented Mar 30, 2023

vatanrathi commented Mar 31, 2023

aromanenko-dev commented Mar 31, 2023

vatanrathi commented Apr 2, 2023

mosche commented Apr 3, 2023

vatanrathi commented Apr 3, 2023

mosche commented Apr 3, 2023

vatanrathi commented Mar 27, 2023 •

edited by mosche

Loading