-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor Stage info code between Q/P tools #971
Changes from 2 commits
0ee634e
221b46c
5e690a9
674af86
85b7d3f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,5 @@ | ||
/* | ||
* Copyright (c) 2021-2022, NVIDIA CORPORATION. | ||
* Copyright (c) 2021-2024, NVIDIA CORPORATION. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
|
@@ -32,7 +32,8 @@ class CompareApplications(apps: Seq[ApplicationInfo]) extends Logging { | |
def findMatchingStages(): (Seq[CompareProfileResults], Seq[CompareProfileResults]) = { | ||
val normalizedByAppId = apps.map { app => | ||
val normalized = app.sqlPlans.mapValues { plan => | ||
SparkPlanInfoWithStage(plan, app.accumIdToStageId).normalizeForStageComparison | ||
SparkPlanInfoWithStage(plan, | ||
app.stageManager.reduceAccumMapping()).normalizeForStageComparison | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
} | ||
(app.appId, normalized) | ||
}.toMap | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -88,7 +88,7 @@ object GenerateDot { | |
val accumSummary = accums.map { a => | ||
Seq(a.sqlID, a.accumulatorId, a.total) | ||
} | ||
val accumIdToStageId = app.accumIdToStageId | ||
val accumIdToStageId = app.stageManager.reduceAccumMapping() | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is a hack to get the generateDot to work with the 1-to-M map. |
||
val formatter = java.text.NumberFormat.getIntegerInstance | ||
val stageIdToStageMetrics = app.taskEnd.groupBy(task => task.stageId).mapValues { tasks => | ||
val durations = tasks.map(_.duration) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Old code used to iterate on all jobs to get the stages, then iterate on all tasks within each stage to aggregate. This will create all the jobs rows.
Then it will do the same sequence to get all the stage rows.
This is clearly very time consuming.
Instead the new code does the following: