Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip generating timeline for stages that do not have completion time #1290

Merged
merged 1 commit into from
Aug 19, 2024

Conversation

nartal1
Copy link
Collaborator

@nartal1 nartal1 commented Aug 15, 2024

This fixes a small bug when the tools is run with --generate-timeline argument on an incomplete eventlog.
The issue is that the completionTime of a stage can be None for inprogess eventlogs and we see error message as below. The fix is to generate timeline only for completed stages.
In this function, we already do similar checks for jobIdToInfo and sqlIdToInfo.

Error without this fix:

24/08/14 17:59:18 INFO ToolTextFileWriter: Profile summary output location: ./rapids_4_spark_profile/application_1723001058316_0011/profile.log
24/08/14 17:59:18 WARN FailureAppResult: File: file:/home/nartal/nvbug/eventlogs/history_spark-events_application_1723001058316_0011.inprogress, Message: Unexpected exception processing log, skipping!
java.lang.Exception: None.get
	at com.nvidia.spark.rapids.tool.profiling.Profiler.com$nvidia$spark$rapids$tool$profiling$Profiler$$profileApp(Profiler.scala:189)
	at com.nvidia.spark.rapids.tool.profiling.Profiler$ProfileProcessThread$1.run(Profiler.scala:263)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:750)
Caused by: java.util.NoSuchElementException: None.get
	at scala.None$.get(Option.scala:529)
	at scala.None$.get(Option.scala:527)
	at com.nvidia.spark.rapids.tool.profiling.GenerateTimeline$.$anonfun$generateFor$20(GenerateTimeline.scala:341)
	at scala.collection.TraversableLike.$anonfun$flatMap$1(TraversableLike.scala:293)
	at scala.collection.immutable.List.foreach(List.scala:431)
	at scala.collection.TraversableLike.flatMap(TraversableLike.scala:293)
	at scala.collection.TraversableLike.flatMap$(TraversableLike.scala:290)

With this PR:

24/08/14 18:00:21 INFO ToolTextFileWriter: Profile summary output location: ./rapids_4_spark_profile/application_1723001058316_0011/profile.log
24/08/14 18:00:21 INFO SuccessAppResult: File: file:/home/nartal/nvbug/eventlogs/history_spark-events_application_1723001058316_0011.inprogress, Message: Took 1298ms to process
24/08/14 18:00:21 INFO ToolTextFileWriter: Profiling Status CSV: output location: ./rapids_4_spark_profile/profiling_status.csv

@nartal1 nartal1 added bug Something isn't working core_tools Scope the core module (scala) labels Aug 15, 2024
@nartal1 nartal1 self-assigned this Aug 15, 2024
@tgravescs
Copy link
Collaborator

so the downside to this is the person looking at the graph doesn't see that there was a stage there. Is there an easy way on the graph to say it goes til the end of the chart? Do we have any end time, like job end?

@nartal1
Copy link
Collaborator Author

nartal1 commented Aug 15, 2024

so the downside to this is the person looking at the graph doesn't see that there was a stage there. Is there an easy way on the graph to say it goes til the end of the chart? Do we have any end time, like job end?

Unfortunately, that's the case with the eventlogs which don't have stageEnd time or jobEnd time. If the stage is not completed, we won't have the jobEnd time either. So in the graph, we are skipping those in the JOBS section. Another reason we cannot plot it is the duration will be set as "zero" for these cases. And duration is required for plotting in the graph.

Copy link
Collaborator

@cindyyuanjiang cindyyuanjiang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @nartal1!

@tgravescs
Copy link
Collaborator

so the downside to this is the person looking at the graph doesn't see that there was a stage there. Is there an easy way on the graph to say it goes til the end of the chart? Do we have any end time, like job end?

Unfortunately, that's the case with the eventlogs which don't have stageEnd time or jobEnd time. If the stage is not completed, we won't have the jobEnd time either. So in the graph, we are skipping those in the JOBS section. Another reason we cannot plot it is the duration will be set as "zero" for these cases. And duration is required for plotting in the graph.

Right but arguably similar to the Spark UI, these things should just be shown as RUNNING.. so in a graph they would show the start time and no stop time. Otherwise someone looking at the graph could totally miss those. If the jobs section do this now (please confirm)? I'm ok with this fix short term but we may want to file a followup to investigate if there is a way to show them from start to the end of whatever the graph shows.

@nartal1
Copy link
Collaborator Author

nartal1 commented Aug 16, 2024

so the downside to this is the person looking at the graph doesn't see that there was a stage there. Is there an easy way on the graph to say it goes til the end of the chart? Do we have any end time, like job end?

Unfortunately, that's the case with the eventlogs which don't have stageEnd time or jobEnd time. If the stage is not completed, we won't have the jobEnd time either. So in the graph, we are skipping those in the JOBS section. Another reason we cannot plot it is the duration will be set as "zero" for these cases. And duration is required for plotting in the graph.

Right but arguably similar to the Spark UI, these things should just be shown as RUNNING.. so in a graph they would show the start time and no stop time. Otherwise someone looking at the graph could totally miss those. If the jobs section do this now (please confirm)? I'm ok with this fix short term but we may want to file a followup to investigate if there is a way to show them from start to the end of whatever the graph shows.

Thanks @tgravescs ! We are skipping those for jobs and stages. Since we see those in history server, we should probably update our code as well instead of skipping those. Have filed a follow up issue to investigate that - #1295

@nartal1 nartal1 merged commit b30ca6d into NVIDIA:dev Aug 19, 2024
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working core_tools Scope the core module (scala)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants