-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEA] Generate Status Report for Profiling Tool #1012
[FEA] Generate Status Report for Profiling Tool #1012
Conversation
Signed-off-by: cindyyuanjiang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cindyyuanjiang
I have some quick notes regarding testing because Profiling had some special cases to handle:
- Generally speaking, the CLI does not produce enough coverage for the changes in this PR. This change needs to be tested using the jar command because we need to verify the different type of Profiling modes. See the different modes (combined, collection, and compare) in the Profiling jar documentation. For all the Profiling modes, the profiling_status.csv should be the same (also use multiple eventlogs)
- Test Profiling Jar cmd with CPU/GPU eventlogs to verify that the app won't be skipped like the qualification
- Test Profiling Jar cmd with Photon and Streamed Applications to confirm the behavior
- Run Qualification jar cmd on the same eventlogs (GPU/CPU/Photon/..etc) to verify that the Qualification has not changed by moving the code into common methods
- Finally, add a unit test to verify the generated results are correct.
// Write status reports for all event logs | ||
val profileOutputWriter = new ProfileOutputWriter(outputDir, Profiler.PROFILE_LOG_NAME, | ||
numOutputRows, outputCSV = outputCSV) | ||
val reportResults = generateStatusProfResults(appStatusReporter.asScala.values.toSeq) | ||
profileOutputWriter.write("Profiling Status", reportResults) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a tricky difference between Qualification and Profiler.
For instance, Profiler generates a profile.log
that contains text formatted data for each table.
Therefore, the above code will create two files profile.log
and profiling_status.csv
. That's going to cause confusion of the purpose of those files that have naming overlap with the subdirectories of the apps. We need only the CSV file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed profile.log
generation by separating writeCSVTable
as an individual function
Signed-off-by: cindyyuanjiang <[email protected]>
case class StatusProfileResult( | ||
path: String, | ||
status: String, | ||
message: String = "") extends ProfileResult { | ||
override val outputHeaders: Seq[String] = Seq("Event Log", "Status", "Description") | ||
|
||
override def convertToSeq: Seq[String] = { | ||
Seq(path, status, message) | ||
} | ||
|
||
override def convertToCSVSeq: Seq[String] = { | ||
Seq(path, status, message) | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that you tried to do some refactor in this PR to make the code reusable.
If we define a new StatusProfileResult
, then we have two classes
QualificationAppInfo.StatusSummaryInfo
: which has an extra fields like appIDProfileClassWarehouse.StatusProfileResult
One of the above should go away because they do the same thing.
So, we should add the extra field appID to StatusProfileResult
, then replace the usage of StatusSummaryInfo
with StatusProfileResult
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Added appID
in StatusProfileResult
and removed StatusSummaryInfo
* Write a CSV file give the input header and data. | ||
*/ | ||
def writeCSVTable(header: String, outRows: Seq[ProfileResult], | ||
outputCSV: Boolean = true, outputDir: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The outputCSV: Boolean = true
should be removed as an argument from this method.
The method's name is writeCSVTable
which makes the argument to writeCSV redundant.
The method should simply prints to CSV without checks. the caller is responsible to do that check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed outputCSV
from writeCSVTable
def write(headerText: String, outRows: Seq[ProfileResult], | ||
emptyTableText: Option[String] = None, tableDesc: Option[String] = None): Unit = { | ||
writeTextTable(headerText, outRows, emptyTableText, tableDesc) | ||
ProfileOutputWriter.writeCSVTable(headerText, outRows, outputCSV, outputDir) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should check if outputCSV
before calling this and the writeCSVTable
should not be taking an argument outputCSV
if (outputCSV) {
ProfileOutputWriter.writeCSVTable(headerText, outRows, outputDir)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added this check before calling ProfileOutputWriter.writeCSVTable
/** | ||
* For each app status report, generate a StatusProfileResult. | ||
* @return Seq[StatusProfileResult] - Seq[(path, status, description)] | ||
*/ | ||
private def generateStatusProfResults(appStatuses: Seq[AppResult]): Seq[StatusProfileResult] = { | ||
appStatuses.map { | ||
case FailureAppResult(path, message) => StatusProfileResult(path, "FAILURE", message) | ||
case SkippedAppResult(path, message) => StatusProfileResult(path, "SKIPPED", message) | ||
case SuccessAppResult(path, _, message) => StatusProfileResult(path, "SUCCESS", message) | ||
case UnknownAppResult(path, _, message) => StatusProfileResult(path, "UNKNOWN", message) | ||
case profAppResult: AppResult => | ||
throw new UnsupportedOperationException(s"Invalid status for $profAppResult") | ||
} | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
once you use same class StatusProfileResult for both Q/P tools, this code can be used by Qualification and P tools
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reuse generateStatusProfResults
for both Q/P tool, moved definition to Profiler.generateStatusProfResults
case oom: OutOfMemoryError => | ||
logError(s"OOM error while processing large file: ${path.eventLog.toString}." + | ||
s" Increase heap size. Exiting ...", oom) | ||
logError(s"OOM error while processing large file: $pathStr." + | ||
s"Increase heap size.", oom) | ||
sys.exit(1) | ||
case NonFatal(e) => | ||
logWarning(s"Exception occurred processing file: ${path.eventLog.getName}", e) | ||
case o: Throwable => | ||
logError(s"Error occurred while processing file: ${path.eventLog.toString}. Exiting ...", o) | ||
case o: Error => | ||
logError(s"Error occurred while processing file: $pathStr", o) | ||
sys.exit(1) | ||
case e: Exception => | ||
progressBar.foreach(_.reportFailedProcess()) | ||
val failureAppResult = FailureAppResult(pathStr, | ||
s"Unexpected exception processing log, skipping!") | ||
failureAppResult.logMessage(Some(e)) | ||
appStatusReporter.put(pathStr, failureAppResult) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we fix some things with this block?
I prefer that it checks for NoFatal
to create a failureApp new object. Anything else should cause an exit... I am not sure what are the cases that can produce Error
.
Also, this code is pretty much the same we have in Qualification.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, in future we need to be careful about the exit part when we merge profiling tool to qual tool. It should not exit the qual tool process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks! I updated this block. It is similar to Qualification. We can try to refactor something common out. I also want to get this correct as a starting point.
Signed-off-by: cindyyuanjiang <[email protected]>
case SkippedAppResult(path, message) => StatusProfileResult(path, "SKIPPED", message) | ||
case SuccessAppResult(path, _, message) => StatusProfileResult(path, "SUCCESS", message) | ||
case UnknownAppResult(path, _, message) => StatusProfileResult(path, "UNKNOWN", message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are storing appId
in SucessAppResult and UnknownAppResult. Can we include the appIds in the csv as well? This help is back tracking appId
--> eventlog
and vice versa.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @parthosa! I will add the appId into message. This is consistent with the qualification output.
Signed-off-by: cindyyuanjiang <[email protected]>
Signed-off-by: cindyyuanjiang <[email protected]>
Signed-off-by: cindyyuanjiang <[email protected]>
Signed-off-by: cindyyuanjiang <[email protected]>
Thanks @amahussein and @parthosa! I addressed all review feedback. I will work on further testing which is outlined in the description. |
@@ -751,6 +751,27 @@ class ApplicationInfoSuite extends FunSuite with Logging { | |||
assert(execInfo.head.maxMem === 5538054144L) | |||
} | |||
|
|||
test("test malformed json eventlog") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest a different name to reflect the real purpose of that unit-test. the current string may cause someone to think that the unit-test is just testing against malformated eventlogs.
Also, add a comment explaining the unit-test logic and what it tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @amahussein! Updated this.
Signed-off-by: cindyyuanjiang <[email protected]>
Signed-off-by: cindyyuanjiang <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cindyyuanjiang. It would be nice if we can have generic naming like StatusAppResult
or AppStatusResult
instead of StatusProfilerResult
since we use this case in both tools.
I think we can leave the writer classes separate as they require other parameters as well.
@@ -595,4 +574,23 @@ object Profiler { | |||
propStr + s"\nComments:\n$commentsToStr\n" | |||
} | |||
} | |||
|
|||
/** |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this function be present in a common trait since its being used by both Profiler
and Qualification
now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa! Moving this function to common trait RuntimeReporter
@@ -338,7 +339,7 @@ class QualOutputWriter(outputDir: String, reportReadSchema: Boolean, | |||
} | |||
} | |||
|
|||
def writeStatusReport(statusReports: Seq[StatusSummaryInfo], order: String): Unit = { | |||
def writeStatusReport(statusReports: Seq[StatusProfileResult], order: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we make StatusProfileResult
generic since its used by both tools?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable.
If moving that does not require too many changes, then will be good to do it in this PR.
Otherwise, we can keep it the way it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I replaced StatusProfileResult
to a more generic name AppStatusResult
. AppStatusResult
is defined in ProfileClassWareHouse.scala. I think it is okay to keep the definition there because the file documentation says This is a warehouse to store all Classes used for profiling and qualification
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cindyyuanjiang
I was playing around with teh changes and I noticed that Skipped rows will have empty AppID.
Looks like this PR uses FailureApp
to create a skipped App which which does not support an AppID.
IMHO, it is cleaner to have a defined value for a skipped app but I am fine with that given that there is somewhere in the code docs that explains what to expect from those fields.
@@ -338,7 +339,7 @@ class QualOutputWriter(outputDir: String, reportReadSchema: Boolean, | |||
} | |||
} | |||
|
|||
def writeStatusReport(statusReports: Seq[StatusSummaryInfo], order: String): Unit = { | |||
def writeStatusReport(statusReports: Seq[StatusProfileResult], order: String): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds reasonable.
If moving that does not require too many changes, then will be good to do it in this PR.
Otherwise, we can keep it the way it is.
Signed-off-by: cindyyuanjiang <[email protected]>
Thanks @amahussein! Yes both P/Q tool use |
Signed-off-by: cindyyuanjiang <[email protected]>
Thanks @cindyyuanjiang ! |
Signed-off-by: cindyyuanjiang <[email protected]>
Signed-off-by: cindyyuanjiang <[email protected]>
Thanks @amahussein! Resolved conflict. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @cindyyuanjiang for merging the status report generation for both tools.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @cindyyuanjiang
This feature was needed for quite some time.
Fixes #998
Changes
Example Output
spark_rapids profiling --eventlogs <my_event_logs> --tools_jar <my_tools_jar> --verbose
In file
rapids_4_spark_profile/profiling_status.csv
:Testing
Created a directory
sample_eventlogs
, which contains 1 CPU event log, 1 GPU event log, 1 Photon event log, 1 event log with Structured Streaming, and 1 event log without ApplicationInfo. We will use these event logs for all following tests.combined
,collection
, andcompare
modes and verifyprofiling_status.csv
is the sameprofiling_status.csv
is the same before and after this PR