Deduplicate SQL duration wallclock time for databricks eventlog #810

nartal1 · 2024-02-27T02:50:01Z

This fixes #780

For 780:
This PR adds support to detect rootExecution ID's if the eventlogs are from Spark-3.4+ or Databricks eventlogs.
If the start time of subexecution doesn't fall under the parent SQLID then we add that duration as well for SQL durations.
Added a unit test to detect root Execution ID.

Signed-off-by: Niranjan Artal <[email protected]>

amahussein

Thanks @nartal1
Did we verify which spark version exactly the rootExecutionId was added?

amahussein · 2024-02-27T15:19:14Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala

+              0L
+            }
+          } else {
+            0L


when rootExecutionInfo is not defined, then we should return the info.duration.getOrElse(0L)

amahussein · 2024-02-27T15:26:03Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala

  var lastSQLEndTime: Option[Long] = None
+  var lastSQLEndTimeId: Option[Long] = None


If we have the IDs of the job and the Sql, then we don't really need to store the endTime as well.
We can lookitup from the JobInfo and the Sql Info.

core/src/main/scala/org/apache/spark/sql/rapids/tool/EventProcessorBase.scala

amahussein · 2024-02-27T15:48:57Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/EventProcessorBase.scala

+    val rootExecutionIdOpt = Try {
+      val field = event.getClass.getDeclaredField("rootExecutionId")
+      field.setAccessible(true)
+      Option(field.get(event)).map(_.asInstanceOf[Option[Long]]).getOrElse(None)
+    }.toOption.flatten


The code is hard to read. Lets move that to EventUtils. We can have a new method readRootIDFromSQLStartEvent(event).

On a side node, this is a little bit expensive to do it in every sqlStartEvent.
Instead, we can evaluate that field using reflection once and use it directly in loading the rootID.
For example, EventUtils can store the rootExecutionId Field, if it is defined the readRootIDFromSQLStartEvent(event) is going to use it to read rootID value.

amahussein · 2024-02-27T15:49:56Z

.../main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationEventProcessor.scala

      app.lastSQLEndTime = Some(event.time)
+      app.lastSQLEndTimeId = Some(event.executionId)


I remember I had a question about that part.

Why do we check if (!perSqlOnly) before updating the field? Regardless of the fact that we report on SQL granularity or not, the same result should be maintained. right? or do I miss something?

should we check if the new event is actually LGE to the existing value? if (event.Time > app.lastSQLEndTime)

I looked into perSqlOnly flag and default value false is always used while calling QualificationAppInfo - link. per-sql reporting is captured in reportSqlLevel argument.
So it doesn't have any affect on the if condition as it's always true. I think this condition can be removed.

RunningQualificationEventProcessor initializes qualApp with perSqlOnly set to True

class RunningQualificationEventProcessor(sparkConf: SparkConf) extends SparkListener with Logging { private val qualApp = new RunningQualificationApp(true)

QualificationAppInfo has two different flags: perSqlOnly and reportSqlLevel . The second one seems to be specific to the reporting. I wonder what perSqlOnly really do...

perSqlOnly is only used in RunningQualificationEventProcessor as we print per-sql in the output file i.e once the SQLExecutionEnd event is processed and it doesn't track the entire application.
The code to match the event is here - onOtherEvent

case e: SparkListenerSQLExecutionEnd => writeSQLDetails(e.executionId) case _ =>

amahussein · 2024-02-27T15:51:21Z

.../main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationEventProcessor.scala

@@ -149,6 +150,7 @@ class QualificationEventProcessor(app: QualificationAppInfo, perSqlOnly: Boolean
    super.doSparkListenerJobEnd(app, event)
    if (!perSqlOnly) {
      app.lastJobEndTime = Some(event.time)
+      app.lastJobEndTimeId = Some(event.jobId)


same comment as for the lastSQLEndTime. why doing it conditionally?

This also applies for RunningQualificationEventProcessor where per-sql output is written. So lastJobEndTime is skipped in this case as well.

amahussein · 2024-02-27T16:00:59Z

core/src/test/scala/com/nvidia/spark/rapids/tool/planparser/SqlPlanParserSuite.scala

+  test("test subexecutionId mapping to rootExecutionId") {
+    val eventlog = ToolTestUtils.getTestResourcePath("" +
+        "spark-events-qualification/db_subExecution_id.zstd")
+    val app = createAppFromEventlog(eventlog)
+    // In Spark 3.4.0+ and later, all the sub-executions will be grouped if they are part of the
+    // the same root execution.
+    if (ToolUtils.isSpark340OrLater()) {
+      assert(app.sqlIdToInfo.values.exists(_.rootExecutionID.isDefined))
+    } else {
+      assert(app.sqlIdToInfo.values.forall(_.rootExecutionID.isEmpty))
+    }
+  }
+


The test should be executed conditionally for the correct target versions. we have some unit tests that executes conditionally according to the spark version used in the unit-tests.

I do not find the test is doing anything valuable.
It is not validating the values are correct and it does not validate the SQL durations is not duplicate.

Updated the test. Now it checks for subExecutionId's for a given rootId and also verifies that we are not double counting the durations.

amahussein · 2024-02-28T02:27:01Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala

+          info.duration.getOrElse(0L)
+        case Some(rootExecutionID) if rootExecutionID != info.sqlID =>
+          val rootExecutionInfo = sqlIdToInfo.get(rootExecutionID)
+          if (rootExecutionInfo.nonEmpty) {


Is this a typo? Usually nonEmpty is for itertable objects but in this case a value is retrieved from the hashMap which may not be dfined.
This might cause NPE if the SQLevent is processed before the rootSQL has been added to the hashMap.

Thanks! It was a typo. Updated it to get the rootExecutionInfo only if it present.

amahussein · 2024-02-28T02:30:09Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala

+        case Some(rootExecutionID) if rootExecutionID == info.sqlID =>
+          info.duration.getOrElse(0L)


This case is redundant because it is handled in the case _ => info.duration.getOrElse(0L)

amahussein · 2024-02-28T02:40:05Z

core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala

+            if (sqlStartTime < rootExecutionStartTime || sqlEndTime > rootExecutionEndTime) {
+              info.duration.getOrElse(0L)


We can add a comment here saying why this if condition is enough to check for overlap. Later we may forget why we are not applying stricter checks.
For example:

the above check condition will be true if the child is not completely inside the root. nevertheless we still account for its total duration and not the overlap.

the condition will be true if the child starts after the root ends. is this even a possible case?

the condition will be true if the spans the root. is this possible case?

…cate_sqlId

core/src/main/scala/org/apache/spark/sql/rapids/tool/util/EventUtils.scala

core/src/main/scala/org/apache/spark/sql/rapids/tool/qualification/QualificationAppInfo.scala

core/src/test/scala/com/nvidia/spark/rapids/tool/planparser/SqlPlanParserSuite.scala

Co-authored-by: Partho Sarthi <[email protected]>

This reverts commit 5b9e192.

parthosa

Thanks @nartal1.

Deduplicate SQL duration wallclock time for databricks eventlog

b84ce10

Signed-off-by: Niranjan Artal <[email protected]>

nartal1 added bug Something isn't working core_tools Scope the core module (scala) labels Feb 27, 2024

nartal1 self-assigned this Feb 27, 2024

nartal1 requested a review from amahussein February 27, 2024 02:50

amahussein requested changes Feb 27, 2024

View reviewed changes

amahussein reviewed Feb 28, 2024

View reviewed changes

parthosa self-requested a review February 28, 2024 17:42

nartal1 added 3 commits March 4, 2024 15:42

addressed review comments

285ecca

Merge branch 'dev' of github.com:NVIDIA/spark-rapids-tools into dupli…

8d13e9d

…cate_sqlId

update test

542038c

parthosa reviewed Mar 7, 2024

View reviewed changes

nartal1 and others added 4 commits March 7, 2024 11:52

Update test

ca334ad

Co-authored-by: Partho Sarthi <[email protected]>

address review comments

8c103d9

add rootSqlId to output file

5b9e192

Revert "add rootSqlId to output file"

1d9e8fd

This reverts commit 5b9e192.

parthosa approved these changes Mar 11, 2024

View reviewed changes

amahussein approved these changes Mar 12, 2024

View reviewed changes

amahussein merged commit 6855297 into NVIDIA:dev Mar 12, 2024
13 checks passed

nartal1 mentioned this pull request Mar 23, 2024

Add rootExecutionID to output csv files #871

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deduplicate SQL duration wallclock time for databricks eventlog #810

Deduplicate SQL duration wallclock time for databricks eventlog #810

nartal1 commented Feb 27, 2024 •

edited

Loading

amahussein left a comment

amahussein Feb 27, 2024

amahussein Feb 27, 2024

amahussein Feb 27, 2024

amahussein Feb 27, 2024

nartal1 Feb 27, 2024

amahussein Feb 27, 2024

nartal1 Mar 7, 2024

amahussein Feb 27, 2024

nartal1 Mar 7, 2024

amahussein Feb 27, 2024

nartal1 Mar 7, 2024

amahussein Feb 28, 2024

nartal1 Mar 7, 2024

amahussein Feb 28, 2024

amahussein Feb 28, 2024

parthosa left a comment

		var lastSQLEndTime: Option[Long] = None
		var lastSQLEndTimeId: Option[Long] = None

		app.lastSQLEndTime = Some(event.time)
		app.lastSQLEndTimeId = Some(event.executionId)

		case Some(rootExecutionID) if rootExecutionID == info.sqlID =>
		info.duration.getOrElse(0L)

		if (sqlStartTime < rootExecutionStartTime \|\| sqlEndTime > rootExecutionEndTime) {
		info.duration.getOrElse(0L)

Deduplicate SQL duration wallclock time for databricks eventlog #810

Deduplicate SQL duration wallclock time for databricks eventlog #810

Conversation

nartal1 commented Feb 27, 2024 • edited Loading

amahussein left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

parthosa left a comment

Choose a reason for hiding this comment

nartal1 commented Feb 27, 2024 •

edited

Loading