You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 18, 2023. It is now read-only.
Using the latest code of PMEM-SHUFFLE of branch-1.2 to run terasort with spark-3.1.1 and meet the error like below:
2021-08-16 17:47:40,924 INFO scheduler.TaskSetManager: Starting task 696.0 in stage 2.0 (TID 3996) (vsr221, executor 4, partition 696, PROCESS_LOCAL, 4282 bytes) taskResourceAssignments Map()
2021-08-16 17:47:40,952 WARN scheduler.TaskSetManager: Lost task 687.0 in stage 2.0 (TID 3987) (vsr221 executor 4): org.apache.spark.shuffle.FetchFailedException: failed to allocate 16777216 byte(s) of direct memory (used: 64424509440, max: 64424509440)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:770)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:685)
at org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:70)
at org.apache.spark.util.CompletionIterator.next(CompletionIterator.scala:29)
at scala.collection.Iterator.foreach(Iterator.scala:941)
at scala.collection.Iterator.foreach$(Iterator.scala:941)
at org.apache.spark.util.CompletionIterator.foreach(CompletionIterator.scala:25)
at scala.collection.TraversableOnce.size(TraversableOnce.scala:110)
at scala.collection.TraversableOnce.size$(TraversableOnce.scala:108)
at org.apache.spark.util.CompletionIterator.size(CompletionIterator.scala:25)
at org.apache.spark.shuffle.BaseShuffleReader.read(BaseShuffleReader.scala:93)
at org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:106)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:373)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:337)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:131)
at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 16777216 byte(s) of direct memory (used: 64424509440, max: 64424509440)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:754)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:709)
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:755)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:731)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:247)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:227)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:147)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:356)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:187)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:178)
at io.netty.buffer.AbstractByteBufAllocator.ioBuffer(AbstractByteBufAllocator.java:139)
at io.netty.channel.DefaultMaxMessagesRecvByteBufAllocator$MaxMessageHandle.allocate(DefaultMaxMessagesRecvByteBufAllocator.java:114)
at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:147)
at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:714)
at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:650)
at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:576)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:493)
at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:989)
at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
... 1 more
Using the latest code of PMEM-SHUFFLE of branch-1.2 to run terasort with spark-3.1.1 and meet the error like below:
The configuration of spark is showed below:
The cluster contain 1 master and 3 workers. Each worker contains 96 vores and 384GB DRAM and 3 * 1.8TB nvme disks. The bandwidth of network is 10Gbps.
The text was updated successfully, but these errors were encountered: