-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update error handling in python for parsing cluster information #1394
base: dev
Are you sure you want to change the base?
Conversation
Signed-off-by: Partho Sarthi <[email protected]>
Signed-off-by: Partho Sarthi <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @parthosa!
# If number of worker nodes is invalid, log error and return | ||
if pd.isna(num_worker_nodes) or num_worker_nodes <= 0: | ||
self._log_inference_failure(app_id, 'Number of worker nodes cannot be determined. ' | ||
'See logs for details.') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my only comment here is what logs are you wanting the user to look at? Is there actual logs that are useful for user to know something... if not I would just leave "See logs for details" off.
for instance, Some of these is expected for like onprem where we don't want to tell them the number since dynamic allocation may be on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifically, I want the users to refer to Scala logs since it would have the specific reason.
For example in this case, Scala logs would have
Qualification-0 WARN QualificationAppInfo: Could not determine if any executors were allocated or the number of cores used per executor. Can't build existing cluster information!
However, from a user perspective, they would see both logs from Scala and Python on the console together. I dont think we can distinguish between these and point them to Scala logs specifically.
Fixes #1392.
The Python code parses cluster information generated by Scala for display on STDOUT and store in
app_metadata.json
.This PR ensures that new changes on the Scala side are properly handled by Python.
Error Handling Improvements:
Added a check to log an error:
Test