[SPARK-29435][Core]MapId in Shuffle Block is inconsistent at the writer and reader part when spark.shuffle.useOldFetchProtocol=true #26095

sandeep-katta · 2019-10-11T18:43:11Z

What changes were proposed in this pull request?

Shuffle Block Construction during Shuffle Write and Read is wrong

Shuffle Map Task (Shuffle Write)
19/10/11 22:07:32| ERROR| [Executor task launch worker for task 3] org.apache.spark.shuffle.IndexShuffleBlockResolver: ####### For Debug ############ /tmp/hadoop-root1/nm-local-dir/usercache/root1/appcache/application_1570422377362_0008/blockmgr-6d03250d-6e7c-4bc2-bbb7-22b8e3981c35/0d/shuffle_0_3_0.index

Result Task (Shuffle Read)
19/10/11 22:07:32| ERROR| [Executor task launch worker for task 6] org.apache.spark.storage.ShuffleBlockFetcherIterator: Error occurred while fetching local blocks
java.nio.file.NoSuchFileException: /tmp/hadoop-root1/nm-local-dir/usercache/root1/appcache/application_1570422377362_0008/blockmgr-6d03250d-6e7c-4bc2-bbb7-22b8e3981c35/30/shuffle_0_0_0.index
at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)

As per SPARK-25341 mapId of SortShuffleManager.getWriter changed to context.taskAttemptId() from partitionId

code

But MapOutputTracker.convertMapStatuses returns the wrong ShuffleBlock, if spark.shuffle.useOldFetchProtocol enabled, it returns paritionId as mapID which is wrong . Code

Why are the changes needed?

Already MapStatus returned by the ShuffleWriter has the mapId for e.g. code here. So it's nice to use status.mapTaskId

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing UT and manually tested with spark.shuffle.useOldFetchProtocol as true and false

sandeep-katta · 2019-10-11T18:45:18Z

cc @cloud-fan @xuanyuanking Please take a look at the fix, if it is okay I will add UT and refactor the code

xuanyuanking · 2019-10-13T09:33:16Z

core/src/main/scala/org/apache/spark/MapOutputTracker.scala

@@ -905,15 +905,8 @@ private[spark] object MapOutputTracker extends Logging {
        for (part <- startPartition until endPartition) {
          val size = status.getSizeForBlock(part)
          if (size != 0) {
-            if (useOldFetchProtocol) {


Thanks for the report and fix!
The root cause is while we set useOldFetchProtocol=true here, the shuffle id in the reader side and the writer side are inconsistent.
But we can't fix like this, because while useOldFetchProtocl=false, we'll use the old version of fetching protocol OpenBlocks, which consider map id is Integer and will directly parse the string. So for the big and long-running application, it will still not work. See the code:

spark/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockHandler.java

Line 296 in 148cd26

mapIdAndReduceIds[2 * i] = Integer.parseInt(blockIdParts[2]);

So the right way I think is doing the fix in ShuffleWriteProcessor, we should fill mapId with mapTaskId or mapIndex denpending on config spark.shuffle.useOldFetchProtocol.

sorry, could you explain why Integer directly parse the string for the big and long-running application not work?
Is it a performance problem?
looking forward for your reply.

cloud-fan · 2019-10-14T04:20:17Z

core/src/main/scala/org/apache/spark/shuffle/BlockStoreShuffleReader.scala

@@ -47,8 +47,7 @@ private[spark] class BlockStoreShuffleReader[K, C](
      context,
      blockManager.blockStoreClient,
      blockManager,
-      mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition,
-        SparkEnv.get.conf.get(config.SHUFFLE_USE_OLD_FETCH_PROTOCOL)),
+      mapOutputTracker.getMapSizesByExecutorId(handle.shuffleId, startPartition, endPartition),


This is the shuffle read side and we need to know the value of SHUFFLE_USE_OLD_FETCH_PROTOCOL. I think the bug is in the shuffle write side which is fixed in this PR. Do we really need to change the shuffle read side?

This is redundant code, since ShuffleWrite writes the mapId based on the spark.shuffle.useOldFetchProtocol flag, MapStatus.mapTaskId always gives the mapId which is set by the ShuffleWriter

cloud-fan · 2019-10-14T07:50:38Z

core/src/main/scala/org/apache/spark/MapOutputTracker.scala

-                ((ShuffleBlockId(shuffleId, status.mapTaskId, part), size, mapIndex))
-            }
+            splitsByAddress.getOrElseUpdate(status.location, ListBuffer()) +=
+              ((ShuffleBlockId(shuffleId, status.mapTaskId, part), size, mapIndex))


here we always pick status.mapTaskId as mapId, is this corrected?

OK I get it now. We should rename MapStatus.mapTaskId to mapId.

cloud-fan · 2019-10-14T07:51:10Z

ok to test

xuanyuanking · 2019-10-14T09:45:53Z

Please also change the PR description, the detailed log and code can be skipped, even the code linked with master will always in changing.
We need to mention the consistency between shuffle reader/writer while using old protocol, and the limitation of using mapTaskId in OpenBlock message.

SparkQA · 2019-10-14T10:09:15Z

Test build #112010 has finished for PR 26095 at commit 939887d.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

xuanyuanking · 2019-10-14T12:42:28Z

@sandeep-katta In case of the consistency issue, we'd better add a UT for the old fetch protocol config. I gave a PR to your branch, please have a review: sandeep-katta#3.
In the current master the UT will fail, and in your branch it will pass.

SparkQA · 2019-10-14T13:06:27Z

Test build #112014 has finished for PR 26095 at commit d50744c.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-10-14T15:28:58Z

Test build #112028 has finished for PR 26095 at commit 1e22dc3.

This patch passes all tests.
This patch merges cleanly.
This patch adds the following public classes (experimental):
class ShuffleOldFetchProtocolSuite extends ShuffleSuite with BeforeAndAfterAll

cloud-fan · 2019-10-14T16:03:24Z

thanks, merging to master!

correct ShuffleBlockId while constructing mapStatus

3d07479

xuanyuanking requested changes Oct 13, 2019

View reviewed changes

correct ShuffleBlockId while writing Shuffle Output

939887d

cloud-fan reviewed Oct 14, 2019

View reviewed changes

sandeep-katta changed the title ~~[SPARK-29435][Core]Shuffle is not working when spark.shuffle.useOldFetchProtocol=true~~ [SPARK-29435][Core]MapId in Shuffle Block is inconsistent at the writer and reader part when spark.shuffle.useOldFetchProtocol=true Oct 14, 2019

refactor mapTaskId to mapId

d50744c

cloud-fan approved these changes Oct 14, 2019

View reviewed changes

Add UT for spark.shuffle.useOldFetchProtocol config

1e22dc3

cloud-fan closed this in ba04562 Oct 14, 2019

dongjoon-hyun added the SPARK CORE label Mar 18, 2020

abellina mentioned this pull request Jul 6, 2020

[DISCUSS] Shuffle read-side error handling NVIDIA/spark-rapids#326

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-29435][Core]MapId in Shuffle Block is inconsistent at the writer and reader part when spark.shuffle.useOldFetchProtocol=true #26095

[SPARK-29435][Core]MapId in Shuffle Block is inconsistent at the writer and reader part when spark.shuffle.useOldFetchProtocol=true #26095

sandeep-katta commented Oct 11, 2019 •

edited

Loading

sandeep-katta commented Oct 11, 2019

xuanyuanking Oct 13, 2019 •

edited

Loading

Ted-Jiang Aug 4, 2020

cloud-fan Oct 14, 2019

sandeep-katta Oct 14, 2019 •

edited

Loading

cloud-fan Oct 14, 2019

cloud-fan Oct 14, 2019

cloud-fan commented Oct 14, 2019

xuanyuanking commented Oct 14, 2019

SparkQA commented Oct 14, 2019

xuanyuanking commented Oct 14, 2019

SparkQA commented Oct 14, 2019

SparkQA commented Oct 14, 2019

cloud-fan commented Oct 14, 2019

[SPARK-29435][Core]MapId in Shuffle Block is inconsistent at the writer and reader part when spark.shuffle.useOldFetchProtocol=true #26095

[SPARK-29435][Core]MapId in Shuffle Block is inconsistent at the writer and reader part when spark.shuffle.useOldFetchProtocol=true #26095

Conversation

sandeep-katta commented Oct 11, 2019 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

sandeep-katta commented Oct 11, 2019

xuanyuanking Oct 13, 2019 • edited Loading

Choose a reason for hiding this comment

Ted-Jiang Aug 4, 2020

Choose a reason for hiding this comment

cloud-fan Oct 14, 2019

Choose a reason for hiding this comment

sandeep-katta Oct 14, 2019 • edited Loading

Choose a reason for hiding this comment

cloud-fan Oct 14, 2019

Choose a reason for hiding this comment

cloud-fan Oct 14, 2019

Choose a reason for hiding this comment

cloud-fan commented Oct 14, 2019

xuanyuanking commented Oct 14, 2019

SparkQA commented Oct 14, 2019

xuanyuanking commented Oct 14, 2019

SparkQA commented Oct 14, 2019

SparkQA commented Oct 14, 2019

cloud-fan commented Oct 14, 2019

sandeep-katta commented Oct 11, 2019 •

edited

Loading

xuanyuanking Oct 13, 2019 •

edited

Loading

sandeep-katta Oct 14, 2019 •

edited

Loading