Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Pipeline logs error states #2300

Conversation

manaswinidas
Copy link
Contributor

@manaswinidas manaswinidas commented Dec 12, 2023

JIRA:
RHOAIENG-254
RHOAIENG-255

500 Internal Error state will be handled by RHOAIENG-1067

Description

  1. Adds error messages for failed and cleaned-up pods for Pipeline logs
  2. Adds error message in case there is a network issue
  3. Hides the Logs toolbar toolbar for such states

Cleaned-up pods(Error message according to this Slack message):

Screenshot 2023-12-19 at 7 07 08 PM

No internet(Error message according to this Slack message):

Screenshot 2024-01-15 at 5 19 15 PM

Failed pod(Error message according to this Slack message):

Screenshot 2024-01-15 at 8 14 15 PM

How Has This Been Tested?

  1. Click on any running pipeline with a failed node and check the error state in the Logs Tab
  2. Click on any node of an old pipeline(pods are cleaned-up) and check the error state
  3. Check the error state for no-internet state too.

Test Impact

Request review criteria:

Self checklist (all need to be checked):

  • The developer has manually tested the changes and verified that the changes work
  • Commits have been squashed into descriptive, self-contained units of work (e.g. 'WIP' and 'Implements feedback' style messages have been removed)
  • Testing instructions have been added in the PR body (for PRs involving changes that are not immediately obvious).
  • The developer has added tests or explained why testing cannot be added (unit tests & storybook for related changes)

If you have UI changes:

  • Included any necessary screenshots or gifs if it was a UI change.
  • Included tags to the UX team if it was a UI/UX change (find relevant UX in the SMEs section).

After the PR is posted & before it merges:

  • The developer has tested their solution on a cluster by using the image produced by the PR to main

@manaswinidas manaswinidas changed the title Handle failed and cleaned-up pods WIP: Handle failed and cleaned-up pods Dec 12, 2023
@openshift-ci openshift-ci bot added the do-not-merge/work-in-progress This PR is in WIP state label Dec 12, 2023
@manaswinidas
Copy link
Contributor Author

WIP to add tests and screenshots for cleaned-up pods.

@manaswinidas manaswinidas changed the title WIP: Handle failed and cleaned-up pods Handle failed and cleaned-up pods Dec 19, 2023
@openshift-ci openshift-ci bot removed the do-not-merge/work-in-progress This PR is in WIP state label Dec 19, 2023
@manaswinidas
Copy link
Contributor Author

manaswinidas commented Dec 19, 2023

@yih-wang can you check the error states? Do we want to show the toolbar in any of the states above?

@manaswinidas manaswinidas changed the title Handle failed and cleaned-up pods Improve Pipeline logs error states Dec 20, 2023
@yih-wang
Copy link

@manaswinidas No, I don't think we will include any error state in the toolbar. The only two ways we display the error messages are:

  1. From the inline alert as what you show in the screenshot - that will show the pod error or some other general issue, e.g. network issue
  2. From the step dropdown - that will specify which step is encountering the error, offering users detailed insights into the exact point of failure for more effective troubleshooting

@yih-wang
Copy link

yih-wang commented Dec 20, 2023

Oops sorry that I misread the question...
So for the failed to fetch case, is that true that all the steps will fail to fetch the log, or is it possible that only the current step failed to fetch while other steps could still have logs avaible?

@manaswinidas
Copy link
Contributor Author

@yih-wang it's highly unlikely that some step logs may be fetched while some cannot, in case of network issues.

@openshift-merge-robot openshift-merge-robot added the needs-rebase PR needs to be rebased label Jan 4, 2024
@yih-wang
Copy link

yih-wang commented Jan 4, 2024

@manaswinidas Then I think we should still show the toolbar in the 'No internet' error case to provide the ability to switch to other steps.

@openshift-merge-robot openshift-merge-robot removed the needs-rebase PR needs to be rebased label Jan 8, 2024
@manaswinidas
Copy link
Contributor Author

@yih-wang But we can't retrieve logs when there is no internet connection, even if we are able to switch the steps using the dropdown or the download dropdown, it's doing nothing because there is no internet. Here's a screen recording to demonstrate the same. Do we still show the toolbar in this case?

Screen.Recording.2024-01-08.at.11.43.27.PM.mov

@manaswinidas manaswinidas force-pushed the improve-log-error-state branch 2 times, most recently from df0ae8a to 9e76f19 Compare January 8, 2024 19:29
@yih-wang
Copy link

yih-wang commented Jan 9, 2024

@manaswinidas Oops, read your previous message again and realized you were saying it's unlikely that some steps have logs while others do not... Then yes you are right, we don't show the toolbar in the network issue case too.

Copy link
Member

@Gkrumbach07 Gkrumbach07 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

on error, you are still polling, i assume this is intended?

@manaswinidas manaswinidas force-pushed the improve-log-error-state branch 3 times, most recently from febc1d8 to 44e0dfa Compare January 15, 2024 14:44
@manaswinidas
Copy link
Contributor Author

@yih-wang Can you have a final look at the screenshots?

@yih-wang
Copy link

@manaswinidas Didn't we combine case 1 (failed/cleaned-up pods) and case 3 (failed pod) you show in the screenshots?

@manaswinidas
Copy link
Contributor Author

@yih-wang I did according to this discussion we had last month

@yih-wang
Copy link

The error messages look good to me.
Should the 1st case apply to only cleaned-up pods (now it's for failed/cleaned-up pods) since we have an error for failed pods in the 3rd case?

@manaswinidas
Copy link
Contributor Author

@yih-wang Thanks for pointing it out. I updated it just now.

@Gkrumbach07
Copy link
Member

code looks good to me.

/lgtm

just need @yih-wang approval and an advisor

@manaswinidas
Copy link
Contributor Author

We have @yih-wang approval here. She was just asking me to change the PR description as it was outdated.

Copy link
Contributor

openshift-ci bot commented Jan 17, 2024

New changes are detected. LGTM label has been removed.

@manaswinidas
Copy link
Contributor Author

Rebased, cleaned up a few nits after the last merge.

@Gkrumbach07
Copy link
Member

/lgtm all still works

Copy link
Contributor

openshift-ci bot commented Jan 17, 2024

[APPROVALNOTIFIER] This PR is APPROVED

Approval requirements bypassed by manually added approval.

This pull-request has been approved by: mturley

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 991cd37 into opendatahub-io:f/pipelines-enhancement Jan 17, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants