Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Improve log message when qual tool could not access an eventlog #1280

Closed
kuhushukla opened this issue Aug 12, 2024 · 2 comments · Fixed by #1281
Closed

[BUG] Improve log message when qual tool could not access an eventlog #1280

kuhushukla opened this issue Aug 12, 2024 · 2 comments · Fixed by #1281
Assignees
Labels
usability track issues related to the Tools's user experience

Comments

@kuhushukla
Copy link
Collaborator

Describe the bug
If a file , say in hdfs file system is not accessible by the tool, the verbose output shows the following but does not propagate the error message from the exception. This will apply to other exceptions as well since the catch is for Exception

Steps/Code to reproduce bug
Run qual tool on an eventlog that is not accessible/have permissions

spark_rapids qualification --eventlogs= hdfs:/nn:8020/my-app-eventlog
 WARN UnknownAppResult: File: hdfs:/nn:8020/my-app-eventlog, Message: AccessControlException: Got unexpected exception processing file: hdfs://nn:8020/my-app-eventlog

Expected behavior
Add more info to the verbose log line. If this seems excessive we can skip it but I found the o/p useful while debugging and also brings more attention to the warning.

Environment details (please complete the following information)

  • hdfs file path , qual 24.06.1

Additional context
Add any other context about the problem here.

@kuhushukla kuhushukla added ? - Needs Triage usability track issues related to the Tools's user experience labels Aug 12, 2024
@kuhushukla kuhushukla self-assigned this Aug 12, 2024
@parthosa
Copy link
Collaborator

parthosa commented Aug 12, 2024

#1187 and #1235 resolved this issue when tools would not show any output. Now, we show the number of apps that are provided, successfully processed and are top candidates Additionally the status csv was updated to store the exact cause of error (if any) for each app/file provided

CMD:

spark_rapids qualification --platform onprem --eventlogs hdfs:/nn:8020/my-app-eventlog --tools_jar $SPARK_RAPIDS_TOOLS_JAR

Output

Tools Version: latest dev

Console

    - Application status report: /Users/psarthi/Work/tools-run/qual_20240812213733_3EaE3EdF/rapids_4_spark_qualification_output/rapids_4_spark_qualification_output_status.csv

Qualification tool found no successful applications to process.

Report Summary:
----------------------  -
Total applications      1
Processed applications  0
Top candidates          0
----------------------  -

Status File

File: qual_2024xxx/rapids_4_spark_qualification_output/rapids_4_spark_qualification_output_status.csv

|-------------------------------|---------|-------|-------------------------------------------------------------|
| Event Log                     | Status  | AppID | Description                                                 |
|-------------------------------|---------|-------|-------------------------------------------------------------|
| hdfs:/nn:8020/my-app-eventlog | FAILURE | N/A   | Incomplete HDFS URI, no host: hdfs:/nn:8020/my-app-eventlog |
|-------------------------------|---------|-------|-------------------------------------------------------------|

Could you test on the latest dev branch and let us know if the issue still persists?

@kuhushukla
Copy link
Collaborator Author

kuhushukla commented Aug 12, 2024

@parthosa my changes are against dev branch. I'm not sure my change corresponds to the status file you mentioned. When we print verbose and we are calling out the exception we should give info on how to triage it by simply including the stack.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usability track issues related to the Tools's user experience
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants