Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-23121][WEB-UI] When the Spark Streaming app is running for a period of time, the page is incorrectly reported when accessing '/jobs' or '/jobs/job?id=13' #20287

Closed
wants to merge 1 commit into from

Conversation

guoxiaolongzte
Copy link

What changes were proposed in this pull request?

When the Spark Streaming app is running for a period of time, the page is incorrectly reported when accessing '/ jobs /' or '/ jobs / job /? Id = 13' and ui can not be accessed.

Test command:
./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark

The app is running for a period of time, ui can not be accessed, please see attachment.
fix before:
1
2

How was this patch tested?

manual tests

Please review http://spark.apache.org/contributing.html before opening a pull request.

…riod of time, the page is incorrectly reported when accessing '/josb' or '/jobs/job?id=13'
Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't be committed because it masks all errors. At best, changing this to return a 404 instead of 500 would be ideal. That's not what this does though.

displayJobDescription = store.lastStageAttempt(job.stageIds.max).description
.map(UIUtils.makeDescription(_, "", plainText = true).text).getOrElse("")
} catch {
case e => displayJobDescription = job.description.getOrElse("")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, you're catching all exceptions. This should check whether the last stage attempt exists rather than catching anything that goes wrong.

I don't think it's correct to pretend it exists but is empty. The request is for something that doesn't exist and ideally generates a 404.

@guoxiaolongzte
Copy link
Author

Well, then you can tell me how specific changes? I do not have a good idea right now. The problem is that the page crashes, it should be a fatal bug.

@guoxiaolongzte
Copy link
Author

@smurakozi
Help review the code, this bug results from your added functionality.

@srowen
Copy link
Member

srowen commented Jan 17, 2018

This is far from a fatal bug. The page doesn't render, but it should give some kind of error anyway. What change are you referring to? In any event, this change is unsuitable.

@AmplabJenkins
Copy link

Can one of the admins verify this patch?

Copy link
Contributor

@smurakozi smurakozi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @srowen that this change is not the best solution. I could already reproduce the issue and I will check what better alternatives we have.

val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max)
val lastStageDescription = lastStageAttempt.description.getOrElse("")

var lastStageDescription = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of catching the exception the logic should be modified to be prepared for a missing stageAttempt.

jobId, store.operationGraphForJob(jobId))
} catch {
case e => None
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here. We should avoid the situation when the exception is thrown. Catching the exception and doing nothing just hides the problems.

@guoxiaolongzte
Copy link
Author

@smurakozi @vanzin @srowen
Thanks, i will close the PR.

asfgit pushed a commit that referenced this pull request Jan 22, 2018
… streaming apps

## What changes were proposed in this pull request?

The allJobs and the job pages attempt to use stage attempt and DAG visualization from the store, but for long running jobs they are not guaranteed to be retained, leading to exceptions when these pages are rendered.

To fix it `store.lastStageAttempt(stageId)` and `store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default values are used if the info is missing.

## How was this patch tested?

Manual testing of the UI, also using the test command reported in SPARK-23121:

./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark

Closes #20287

Author: Sandor Murakozi <[email protected]>

Closes #20330 from smurakozi/SPARK-23121.

(cherry picked from commit 446948a)
Signed-off-by: Marcelo Vanzin <[email protected]>
ghost pushed a commit to dbtsai/spark that referenced this pull request Jan 22, 2018
… streaming apps

## What changes were proposed in this pull request?

The allJobs and the job pages attempt to use stage attempt and DAG visualization from the store, but for long running jobs they are not guaranteed to be retained, leading to exceptions when these pages are rendered.

To fix it `store.lastStageAttempt(stageId)` and `store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default values are used if the info is missing.

## How was this patch tested?

Manual testing of the UI, also using the test command reported in SPARK-23121:

./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark

Closes apache#20287

Author: Sandor Murakozi <[email protected]>

Closes apache#20330 from smurakozi/SPARK-23121.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants