Skip to content

Commit

Permalink
Mark wholestageCodeGen as shouldRemove when childs are removed
Browse files Browse the repository at this point in the history
Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

Fixes NVIDIA#860

This commit adjusts the accuracy of the Qual tool by targeting the
following issues:
- child nodes of wholeStageCodeGen would not be assigned to stages if
  they have no metrics.
- there is a corner case when all the childs of wholeStageCodeGen are
  marked as shouldRemove. In that case, the node would still be
considered unsupported and contribute to the speedup.

The changes are:
- propagate the stageIDs of wholeStageCodeGen to the child nodes
- a wholeStageCodeGen node is marked as shouldRemove when all the childs
  are marked as shouldRemove.
- fix unit-test which has 4 different wholeStageCodeGen nodes that
  contain only `ColumnarToRow` execs
  • Loading branch information
amahussein committed Jun 26, 2024
1 parent 99e2ffd commit 2478f99
Show file tree
Hide file tree
Showing 3 changed files with 16 additions and 6 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,10 @@ case class ExecInfo(
stages = stageIDs
}

def appendToStages(stageIDs: Set[Int]): Unit = {
stages ++= stageIDs
}

def setShouldRemove(value: Boolean): Unit = {
shouldRemove ||= value
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -45,20 +45,26 @@ case class WholeStageExecParser(
val childNodes = node.nodes.flatMap { c =>
SQLPlanParser.parsePlanNode(c, sqlID, checker, app, reusedNodeIds)
}
// if any of the execs in WholeStageCodegen supported mark this entire thing
// as supported
// For the childNodes, we need to append the stages. Otherwise, nodes without metrics won't be
// assigned to stage
childNodes.foreach(_.appendToStages(stagesInNode))
// if any of the execs in WholeStageCodegen supported mark this entire thing as supported
val anySupported = childNodes.exists(_.isSupported == true)
val unSupportedExprsArray = childNodes.filter(_.unsupportedExprs.length > 0 ).map(
x => x.unsupportedExprs).flatten.toArray
val unSupportedExprsArray =
childNodes.filter(_.unsupportedExprs.nonEmpty).flatMap(x => x.unsupportedExprs).toArray
// average speedup across the execs in the WholeStageCodegen for now
val supportedChildren = childNodes.filterNot(_.shouldRemove)
val avSpeedupFactor = SQLPlanParser.averageSpeedup(supportedChildren.map(_.speedupFactor))
// TODO: revisit the logic behind adding the stages of child nodes to the current node.
// can't rely on the wholeStagecodeGen having a stage if children do so aggregate them together
// for now
val allStagesIncludingChildren = childNodes.flatMap(_.stages).toSet ++ stagesInNode.toSet
// Finally, the node should be marked as shouldRemove when all the children of the
// wholeStageCodeGen are marked as shouldRemove.
val removeNode = isDupNode || childNodes.forall(_.shouldRemove)
val execInfo = ExecInfo(node, sqlID, node.name, node.name, avSpeedupFactor, maxDuration,
node.id, anySupported, Some(childNodes), allStagesIncludingChildren,
shouldRemove = isDupNode, unsupportedExprs = unSupportedExprsArray)
shouldRemove = removeNode, unsupportedExprs = unSupportedExprsArray)
Seq(execInfo)
}
}
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
App Name,App ID,Recommendation,Estimated GPU Speedup,Estimated GPU Duration,Estimated GPU Time Saved,SQL DF Duration,SQL Dataframe Task Duration,App Duration,GPU Opportunity,Executor CPU Time Percent,SQL Ids with Failures,Unsupported Read File Formats and Types,Unsupported Write Data Format,Complex Types,Nested Complex Types,Potential Problems,Longest SQL Duration,SQL Stage Durations Sum,NONSQL Task Duration Plus Overhead,Unsupported Task Duration,Supported SQL DF Task Duration,Task Speedup Factor,App Duration Estimated,Unsupported Execs,Unsupported Expressions,Estimated Job Frequency (monthly)
"Spark shell","local-1629442299891","Not Recommended",1.02,19159.68,394.31,1151,920,19554,788,91.72,"","","CSV;JSON","","","",1235,1049,18251,290,630,2.0,false,"Execute InsertIntoHadoopFsRelationCommand csv;Execute InsertIntoHadoopFsRelationCommand json","",30
"Spark shell","local-1629442299891","Not Recommended",1.02,19080.81,473.18,1151,920,19554,788,91.72,"","","CSV;JSON","","","",1235,1049,18251,290,630,2.5,false,"Execute InsertIntoHadoopFsRelationCommand csv;Execute InsertIntoHadoopFsRelationCommand json","",30

0 comments on commit 2478f99

Please sign in to comment.