[BUG] netty OOM with MULTITHREADED shuffle #9153

abellina · 2023-08-31T14:02:45Z

I ran NDS @100TB and ran into a netty OOM in query50. This needs further investigation. It is likely that either our limits are not working or there is a leak somewhere:

io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 20999453 byte(s) of direct memory (used: 15263094153, max: 15271460864)
  at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:802)
  at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:731)
  at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:632)
  at io.netty.buffer.PoolArena$DirectArena.newUnpooledChunk(PoolArena.java:621)
  at io.netty.buffer.PoolArena.allocateHuge(PoolArena.java:213)
  at io.netty.buffer.PoolArena.allocate(PoolArena.java:141)
  at io.netty.buffer.PoolArena.allocate(PoolArena.java:126)
  at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:395)
  at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
  at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
  at io.netty.buffer.CompositeByteBuf.allocBuffer(CompositeByteBuf.java:1872)
  at io.netty.buffer.CompositeByteBuf.consolidate0(CompositeByteBuf.java:1751)
  at io.netty.buffer.CompositeByteBuf.consolidate(CompositeByteBuf.java:1739)
  at org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:179)
  at org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:98)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
  at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
  at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
  at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)

The text was updated successfully, but these errors were encountered:

abellina · 2023-09-06T18:14:48Z

Note that I also see this with Parquet when the number of cores is high. As a test, I changed spark.executor.cores to 64 (increased from 16). I am not recommending this config yet, but it is definitely pushing things in terms of host memory that is available for shuffle it seems. With this many cores, the shuffle stage would have more tasks in flight, so something is definitely amiss with the limits we have in place.

abellina · 2023-09-08T18:28:41Z

By setting spark.rapids.shuffle.multiThreaded.maxBytesInFlight I was able to fix this. I calculated a number to stay within what netty is computing as its max size (https://github.com/netty/netty/blob/4.1/common/src/main/java/io/netty/util/internal/PlatformDependent.java#L1231C51-L1231C70). Netty allows passing a java system property to override its maximum (-Dio.netty.maxDirectMemory), but by default it is defaulting to the maximum that the JVM reports is available for off heap.

Normally this should default to Xmx according to what I read in a similar issue in Spark https://issues.apache.org/jira/browse/SPARK-27991. In this case, the value is lower than Xmx, as I started with a 16GB heap, so it's not clear why that would be the case. The Spark strategy to deal with this issue, which we also follow, is to perform retries when something like this happens.

We may need to decrease our default maxBytesInFlight or configure this dynamically to make it easier to use.

revans2 · 2023-09-08T21:02:17Z

Great work @abellina we really need to figure out how to document and figure out how this fits in with our off heap memory limit story.

abellina · 2023-09-18T15:27:03Z

I added the reliability tag because we would like to document the behavior of the maxBytesInFlight setting relative to performance in addition to coming up with better defaults and also likely providing a way to tie this in with the host allocator interface.

abellina · 2023-09-28T21:01:55Z

Given the runs I have made I believe setting this config to somewhere between 64MB-128MB is going to be a good enough default. I am running more iterations, but that's the current status.

abellina · 2023-09-29T14:20:41Z

Here are findings at 3TB running in the cloud (dataproc) with 4 x n1-standard-16 nodes and using 16 threads for the MT shuffle. I see a lot of noise in here, but I do have another datapoint in our performance cluster for query67 specifically (this query is particularly sensitive to MT shuffle). Given the dataproc and perf cluster data, I am going to set the config to 128MB, and we'll need to continue this work as part of the host memory framework #8900

abellina added bug Something isn't working ? - Needs Triage Need team to review and classify reliability Features to improve reliability or bugs that severly impact the reliability of the plugin shuffle things that impact the shuffle plugin labels Aug 31, 2023

abellina changed the title ~~[BUG]~~ [BUG] netty OOM with MULTITHREADED shuffle Aug 31, 2023

mattahrens assigned abellina Sep 6, 2023

mattahrens removed the ? - Needs Triage Need team to review and classify label Sep 6, 2023

abellina added the performance A performance related task/issue label Sep 18, 2023

abellina mentioned this issue Sep 29, 2023

[FEA] Add retry to Multi-threaded shuffle reader for host memory allocations #8900

Open

abellina mentioned this issue Sep 29, 2023

Update MULTITHREADED shuffle maxBytesInFlight default to 128MB #9344

Merged

abellina closed this as completed in #9344 Oct 2, 2023

abellina mentioned this issue Oct 23, 2023

[DOC] Multi-Threaded shuffle documentation is not accurate on the read side #9512

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] netty OOM with MULTITHREADED shuffle #9153

[BUG] netty OOM with MULTITHREADED shuffle #9153

abellina commented Aug 31, 2023

abellina commented Sep 6, 2023

abellina commented Sep 8, 2023 •

edited

Loading

revans2 commented Sep 8, 2023

abellina commented Sep 18, 2023

abellina commented Sep 28, 2023

abellina commented Sep 29, 2023

[BUG] netty OOM with MULTITHREADED shuffle #9153

[BUG] netty OOM with MULTITHREADED shuffle #9153

Comments

abellina commented Aug 31, 2023

abellina commented Sep 6, 2023

abellina commented Sep 8, 2023 • edited Loading

revans2 commented Sep 8, 2023

abellina commented Sep 18, 2023

abellina commented Sep 28, 2023

abellina commented Sep 29, 2023

abellina commented Sep 8, 2023 •

edited

Loading