-
Notifications
You must be signed in to change notification settings - Fork 28.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-23121][WEB-UI] When the Spark Streaming app is running for a period of time, the page is incorrectly reported when accessing '/jobs' or '/jobs/job?id=13' #20287
Conversation
…riod of time, the page is incorrectly reported when accessing '/josb' or '/jobs/job?id=13'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can't be committed because it masks all errors. At best, changing this to return a 404 instead of 500 would be ideal. That's not what this does though.
displayJobDescription = store.lastStageAttempt(job.stageIds.max).description | ||
.map(UIUtils.makeDescription(_, "", plainText = true).text).getOrElse("") | ||
} catch { | ||
case e => displayJobDescription = job.description.getOrElse("") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, you're catching all exceptions. This should check whether the last stage attempt exists rather than catching anything that goes wrong.
I don't think it's correct to pretend it exists but is empty. The request is for something that doesn't exist and ideally generates a 404.
Well, then you can tell me how specific changes? I do not have a good idea right now. The problem is that the page crashes, it should be a fatal bug. |
@smurakozi |
This is far from a fatal bug. The page doesn't render, but it should give some kind of error anyway. What change are you referring to? In any event, this change is unsuitable. |
Can one of the admins verify this patch? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @srowen that this change is not the best solution. I could already reproduce the issue and I will check what better alternatives we have.
val lastStageAttempt = store.lastStageAttempt(jobData.stageIds.max) | ||
val lastStageDescription = lastStageAttempt.description.getOrElse("") | ||
|
||
var lastStageDescription = "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of catching the exception the logic should be modified to be prepared for a missing stageAttempt.
jobId, store.operationGraphForJob(jobId)) | ||
} catch { | ||
case e => None | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here. We should avoid the situation when the exception is thrown. Catching the exception and doing nothing just hides the problems.
@smurakozi @vanzin @srowen |
… streaming apps ## What changes were proposed in this pull request? The allJobs and the job pages attempt to use stage attempt and DAG visualization from the store, but for long running jobs they are not guaranteed to be retained, leading to exceptions when these pages are rendered. To fix it `store.lastStageAttempt(stageId)` and `store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default values are used if the info is missing. ## How was this patch tested? Manual testing of the UI, also using the test command reported in SPARK-23121: ./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark Closes #20287 Author: Sandor Murakozi <[email protected]> Closes #20330 from smurakozi/SPARK-23121. (cherry picked from commit 446948a) Signed-off-by: Marcelo Vanzin <[email protected]>
… streaming apps ## What changes were proposed in this pull request? The allJobs and the job pages attempt to use stage attempt and DAG visualization from the store, but for long running jobs they are not guaranteed to be retained, leading to exceptions when these pages are rendered. To fix it `store.lastStageAttempt(stageId)` and `store.operationGraphForJob(jobId)` are wrapped in `store.asOption` and default values are used if the info is missing. ## How was this patch tested? Manual testing of the UI, also using the test command reported in SPARK-23121: ./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark Closes apache#20287 Author: Sandor Murakozi <[email protected]> Closes apache#20330 from smurakozi/SPARK-23121.
What changes were proposed in this pull request?
When the Spark Streaming app is running for a period of time, the page is incorrectly reported when accessing '/ jobs /' or '/ jobs / job /? Id = 13' and ui can not be accessed.
Test command:
./bin/spark-submit --class org.apache.spark.examples.streaming.HdfsWordCount ./examples/jars/spark-examples_2.11-2.4.0-SNAPSHOT.jar /spark
The app is running for a period of time, ui can not be accessed, please see attachment.
fix before:
How was this patch tested?
manual tests
Please review http://spark.apache.org/contributing.html before opening a pull request.