Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23121][core] Fix for ui becoming unaccessible for long running streaming apps #20330

Closed
wants to merge 7 commits into from

Conversation

smurakozi
Copy link
Contributor

@smurakozi smurakozi commented Jan 19, 2018

What changes were proposed in this pull request?

The allJobs and the job pages attempt to use stage attempt and DAG visualization from the store, but for long running jobs they are not guaranteed to be retained, leading to exceptions when these pages are rendered.

To fix it store.lastStageAttempt(stageId) and store.operationGraphForJob(jobId) are wrapped in store.asOption and default values are used if the info is missing.

How was this patch tested?

Manual testing of the UI, also using the test command reported in SPARK-23121:

./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark

Closes #20287

@smurakozi
Copy link
Contributor Author

@guoxiaolongzte could you please check if this change fixes the issue you have observed?

@smurakozi
Copy link
Contributor Author

cc @jiangxb1987, @srowen, @vanzin

val (_, lastStageDescription) = lastStageNameAndDescription(store, job)
val displayJobDescription =
if (lastStageDescription.isEmpty) {
job.name
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@srowen
Copy link
Member

srowen commented Jan 19, 2018

Jenkins add to whitelist

@SparkQA
Copy link

SparkQA commented Jan 19, 2018

Test build #86389 has finished for PR 20330 at commit 6525ef4.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

.getOrElse("")
val (_, lastStageDescription) = lastStageNameAndDescription(store, job)
val displayJobDescription =
if (lastStageDescription.isEmpty) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I generally prefer the opposite check.

if (data is good) 
  do something with data 
else 
  fallback to something else


val formattedJobDescription =
UIUtils.makeDescription(lastStageDescription, basePath, plainText = false)
val jobDescription = UIUtils.makeDescription(lastStageDescription, basePath, plainText = false)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to check for empty description here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastStageDescription may be empty, but it will not cause problems, makeDescription will handle it properly, just like in the version before lastStageAttempt was used:

  val jobDescription = UIUtils.makeDescription(jobData.description.getOrElse(""), 
  basePath, plainText = false)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but don't you want the same behavior as above here (falling back to the job name)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved this logic to lastStageNameAndDescription, so it's uniform.

val operationGraphContent = store.asOption(store.operationGraphForJob(jobId)) match {
case Some(operationGraph) => UIUtils.showDagVizForJob(jobId, operationGraph)
case None =>
<div id="no-info">
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is off.

@@ -18,7 +18,7 @@
package org.apache.spark.ui.jobs

import java.net.URLEncoder
import java.util.Date
import java.util.{Collections, Date}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New import is unused?

@@ -31,6 +31,7 @@ import org.apache.spark.SparkConf
import org.apache.spark.internal.config._
import org.apache.spark.scheduler.TaskLocality
import org.apache.spark.status._
import org.apache.spark.status.api.v1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use JobData since it's already imported?

@SparkQA
Copy link

SparkQA commented Jan 19, 2018

Test build #4063 has finished for PR 20330 at commit 6525ef4.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@SparkQA
Copy link

SparkQA commented Jan 20, 2018

Test build #86399 has finished for PR 20330 at commit d5fdabb.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

store.asOption(store.lastStageAttempt(job.stageIds.max)) match {
case Some(lastStageAttempt) =>
(lastStageAttempt.name, lastStageAttempt.description.getOrElse(job.name))
case None => ("", "")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before, you were doing if (lastStageDescription.isEmpty) job.name else blah at the call site.

Now, when the last stage is not in the store, the call site is getting an empty string as the description, instead of using the job name.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would probably be simpler:

val stage = store.asOption(...)
(stage.map(_.name).getOrElse(""), stage.map(_.description.getOrElse(job.name)))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks for catching.

@vanzin
Copy link
Contributor

vanzin commented Jan 20, 2018

You could also add 'Closes #20287' to the PR description to close the other PR for the same bug automatically.

@SparkQA
Copy link

SparkQA commented Jan 20, 2018

Test build #86404 has finished for PR 20330 at commit f19d3a1.

  • This patch fails Spark unit tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@jiangxb1987
Copy link
Contributor

cc @gengliangwang

@SparkQA
Copy link

SparkQA commented Jan 21, 2018

Test build #86417 has finished for PR 20330 at commit c733ac9.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

Copy link
Member

@gengliangwang gengliangwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I check the other calls of lastStageAttempt and all looks OK.

@vanzin
Copy link
Contributor

vanzin commented Jan 22, 2018

Merging to master / 2.3.

asfgit pushed a commit that referenced this pull request Jan 22, 2018
… streaming apps

## What changes were proposed in this pull request?

The allJobs and the job pages attempt to use stage attempt and DAG visualization from the store, but for long running jobs they are not guaranteed to be retained, leading to exceptions when these pages are rendered.

To fix it `store.lastStageAttempt(stageId)` and `store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default values are used if the info is missing.

## How was this patch tested?

Manual testing of the UI, also using the test command reported in SPARK-23121:

./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark

Closes #20287

Author: Sandor Murakozi <[email protected]>

Closes #20330 from smurakozi/SPARK-23121.

(cherry picked from commit 446948a)
Signed-off-by: Marcelo Vanzin <[email protected]>
@asfgit asfgit closed this in 446948a Jan 22, 2018
@smurakozi
Copy link
Contributor Author

Thanks for your help @vanzin, @gengliangwang, @jiangxb1987

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants