Fix implementation of processSQLPlanMetrics in Profiler #853

amahussein · 2024-03-14T14:15:29Z

Signed-off-by: Ahmed Hussein (amahussein) [email protected]

Fixes #851

There are still more opportunities to improve the performance but I limited the scope of this fix to get the highest gains with minimum efforts.

The implementation of processSQLPlanMetrics to parameterize the information based on the SqlID
Added a buffer SqlPlanInfoGraphBuffer to capture the construction of the SprkSqlPlan. Otherwise, the graph we constructed multiple times which was not efficient
Changed JDOC in AutoTuner to cleanup build warning.
Refactoring the implementation of jobAndStageMetricsAggregation reduced both Mem and CPU by 20%
Made a few changes in jobAndStageMetricsAggregation which reduced the total memory allcated by this method
Changed the type of local variables from Seq to Set
Removed some local variables that were causing memory bloats
Renamed GenerateDot.SparkPlanGraph to SparkPlanGraphForDot because it was conflicting with Spark.SparkPlanGraph class in the imports
Fixed unit test
Added a new Set to keep track of missing event classes which reduces the noise in the log files.

Overall Impact

After Changes:

Execution Time: 214169ms
CPU Time: 30% compared to original
Total Allocated memory: 50% compared to original
ProfileMain_2024_03_14_082406.jfr.zip

Before Changes:

The below snapshot indicates the frequency of GC and the idleness of CPU

Does this affect the end user?

Yes.

The output related to "problematic issues" has changed after fixing the bug.
- the generated output ofsql_duration_and_executor_cpu_time_percent.csv is different after fixing the bug
Runtime of the profiler was improved

Does this break the nightly builds?

Yes.
Some of generated output files have changed.

Does this require additional followups?

There is still a room to improve the efficiency of the core. Some of the ToDos:

Revisit the Qualification to apply the same techniques.
There is a ton of code duplicate and inefficiency in Analysis.scala where we keep extracting the Jobs/tasks in every function.
Improve the scope of variables so that GC can reclaim the memory once out of scope.
Improve the collection of System.properties in the profiler.

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]> Fixes NVIDIA#851 - The implementation of `processSQLPlanMetrics` to parameterize the information based on the SqlID - Added a buffer `SqlPlanInfoGraphBuffer` to capture the construction of the SprkSqlPlan. Otherwise, the graph we constructed multiple times which was not efficient - Changed JDOC in AutoTuner to cleanup build warning. - Refactoring the implementation of `jobAndStageMetricsAggregation` reduced both Mem and CPU by 20% - Made a few changes in `jobAndStageMetricsAggregation` which reduced the total memory allcated by this method - Changed the type of local variables from `Seq` to `Set` - Removed some local variables that were causing memory bloats - Renamed `GenerateDot.SparkPlanGraph` to `SparkPlanGraphForDot` because it was conflicting with Spark.SparkPlanGraph class in the imports - Fixed unit test - Added a new Set to keep track of missing event classes which reduces the noise in the log files.

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

amahussein · 2024-03-14T14:29:57Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Analysis.scala

        val stagesInJob = app.stageIdToInfo.filterKeys { case (sid, _) =>
-          stageIdsInJob.contains(sid)
-        }.keys.map(_._1).toSeq
+          jc.stageIds.contains(sid)


val stageIdsInJob = jc.stageIds was removed to save memory compared to directly accessing jc.stageIds

amahussein · 2024-03-14T14:30:58Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Analysis.scala

-          stageIdsInJob.contains(sid)
-        }.keys.map(_._1).toSeq
+          jc.stageIds.contains(sid)
+        }.keys.map(_._1).toSet


Use toSet instead of toSeq because it is going to be used mainly for lookup.

amahussein · 2024-03-14T14:31:20Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Analysis.scala

        }
      }
-      val missing = app.stageIdToInfo.keys.toSeq.diff(allStageinJobs.keys.toSeq)
+      val missing = app.stageIdToInfo.keys.toSet.diff(allStageInJobs.keys.toSet)


Sets are more time efficient compared to sequences

amahussein · 2024-03-14T14:33:43Z

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Analysis.scala

@@ -231,12 +227,12 @@ class Analysis(apps: Seq[ApplicationInfo]) {
    val allRows = apps.flatMap { app =>
      app.sqlIdToInfo.map { case (sqlId, sqlCase) =>
        val jcs = app.jobIdToInfo.filter { case (_, jc) =>
-          jc.sqlID.getOrElse(-1) == sqlId
+          jc.sqlID.isDefined && jc.sqlID.get == sqlId


getOrElse implies that the VM allocates memory even if the jc.sqlID is not defined. This tends to be very expensive in large structures.
Since this is a predicate filter, it is more memory efficient to get the sqlID only if it is defined.

amahussein · 2024-03-14T14:34:31Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala

-    val allMetaWithSchema = getPlanMetaWithSchema(planInfo)
-    val planGraph = ToolsPlanGraph(planInfo)
-    val allNodes = planGraph.allNodes
+    val allMetaWithSchema = getPlanMetaWithSchema(sqlPlanInfoGraph.planInfo)


Use the cached graph instead of reconstructing it again.

amahussein · 2024-03-14T14:35:25Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala

-    val allNodes = planGraph.allNodes
+    val allMetaWithSchema = getPlanMetaWithSchema(sqlPlanInfoGraph.planInfo)
+    val allNodes = sqlPlanInfoGraph.sparkPlanGraph.allNodes
+    val results = ArrayBuffer[DataSourceCase]()


cache the results to be returned to the caller. Otherwise, the original code was accessing the global data structure to extract what we have just calculated here.

amahussein · 2024-03-14T14:36:35Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/AppBase.scala

  }

  // This will find scans for DataSource V2, if the schema is very large it
  // will likely be incomplete and have ... at the end.
-  protected def checkGraphNodeForReads(sqlID: Long, node: SparkPlanGraphNode): Unit = {
+  protected def checkGraphNodeForReads(
+      sqlID: Long, node: SparkPlanGraphNode): Option[DataSourceCase] = {


Return the datasource to be used by the caller instead of querying the global data structure.

amahussein · 2024-03-14T14:39:26Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/profiling/ApplicationInfo.scala

+    }
+    connectOperatorToStage(createGraphFunc)
+    for (sqlPIGEntry <- sqlPlanInfoBuffer.sqlPlanInfoGraphs) {
+      var sqlIsDsOrRDD = false


Used to avoid accessing global sqlIDToDataSetOrRDDCase

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Analysis.scala

nartal1

Thanks @amahussein ! This helps in reducing the runtime of Profile tool. Agree that we can make some more improvements. Do you plan on creating followon issue for the TODO's mentioned in the description?

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Analysis.scala

amahussein · 2024-03-14T19:11:01Z

Thanks @amahussein ! This helps in reducing the runtime of Profile tool. Agree that we can make some more improvements. Do you plan on creating followon issue for the TODO's mentioned in the description?

Thanks @nartal1 !
I did not want to open more issues for now because it will become a swamp. In addition, the ToDos are some sort of obvious whenever we need to pursue this.
There is an open umbrella issue #367 where we can append things to be done related to performance.

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

amahussein added bug Something isn't working core_tools Scope the core module (scala) reliability labels Mar 14, 2024

amahussein self-assigned this Mar 14, 2024

Optimize potential problems constructions

dafff22

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

amahussein commented Mar 14, 2024

View reviewed changes

amahussein requested review from nartal1 and parthosa March 14, 2024 15:28

nartal1 reviewed Mar 14, 2024

View reviewed changes

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Analysis.scala Show resolved Hide resolved

nartal1 reviewed Mar 14, 2024

View reviewed changes

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Analysis.scala Outdated Show resolved Hide resolved

core/src/main/scala/com/nvidia/spark/rapids/tool/profiling/Analysis.scala Outdated Show resolved Hide resolved

Address code review-1

2c31233

Signed-off-by: Ahmed Hussein (amahussein) <[email protected]>

nartal1 approved these changes Mar 14, 2024

View reviewed changes

amahussein merged commit a8ed8f3 into NVIDIA:dev Mar 14, 2024
13 checks passed

amahussein deleted the spark-rapids-tools-851 branch March 14, 2024 20:04

nartal1 mentioned this pull request Apr 11, 2024

[FEA] Profiling tool: Map Potential problems with SQL ID #770

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix implementation of processSQLPlanMetrics in Profiler #853

Fix implementation of processSQLPlanMetrics in Profiler #853

amahussein commented Mar 14, 2024 •

edited

Loading

amahussein Mar 14, 2024

amahussein Mar 14, 2024

amahussein Mar 14, 2024

amahussein Mar 14, 2024

amahussein Mar 14, 2024

amahussein Mar 14, 2024

amahussein Mar 14, 2024

amahussein Mar 14, 2024

nartal1 left a comment

amahussein commented Mar 14, 2024

Fix implementation of processSQLPlanMetrics in Profiler #853

Fix implementation of processSQLPlanMetrics in Profiler #853

Conversation

amahussein commented Mar 14, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nartal1 left a comment

Choose a reason for hiding this comment

amahussein commented Mar 14, 2024

amahussein commented Mar 14, 2024 •

edited

Loading