Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add script to get annotations from errored actions #383

Merged
merged 6 commits into from
Aug 2, 2023

Conversation

rgraber
Copy link
Contributor

@rgraber rgraber commented Aug 2, 2023

See diff for a fuller description of what the script does.
To run, make sure you have your GITHUB_TOKEN set in the environment.
Example output: https://docs.google.com/spreadsheets/d/1GBfLjpxx-r8lAohevpNCZhYFeETpk4imAxUbn87uumg/edit#gid=1755866686

Merge checklist:
Check off if complete or not applicable:

  • Version bumped
  • Changelog record added
  • Documentation updated (not only docstrings)
  • Fixup commits are squashed away
  • Unit tests added/updated
  • Manual testing instructions provided
  • Noted any: Concerns, dependencies, migration issues, deadlines, tickets

Copy link
Contributor

@robrap robrap left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice.

  1. I did a quick pass and seems good. I'm relying on the fact that it produced an example spreadsheet to say it is much better than what we have now. :)
  2. Consider having the script point to an example spreadsheet, or maybe copying in an example header and row, or maybe just pointing to our runbook and doing any of this from the runbook. It's useful to see an example of what I should expect.
  3. Are we able to actually see how often some of the problems we were seeing were happening, or are we not getting that level of detail?
  4. Is there any use in adding conditional output for admins to gather more of what we need? Non-blocking, but curious.

I wonder how and if you plan to update the runbook, and whether you've tried using this data.

Thank you.

@rgraber
Copy link
Contributor Author

rgraber commented Aug 2, 2023

Nice.

1. I did a quick pass and seems good. I'm relying on the fact that it produced an example spreadsheet to say it is much better than what we have now. :)

That's what I was going for.

2. Consider having the script point to an example spreadsheet, or maybe copying in an example header and row, or maybe just pointing to our runbook and doing any of this from the runbook. It's useful to see an example of what I should expect.

Good call. Added headers and example in the docstring.

3. Are we able to actually see how often some of the problems we were seeing were happening, or are we not getting that level of detail?

Some of them. We can see how often we get that lost connection error. Not as much other ones.

4. Is there any use in adding conditional output for admins to gather more of what we need? Non-blocking, but curious.

There probably would be but it's not code I can test so I didn't feel comfortable adding it in. It would be great to be able to get full logs for failed runs so we could see other errors and maybe do some processing there.

I wonder how and if you plan to update the runbook, and whether you've tried using this data.

Planning on updating the runbook for pipeline failures. Something like "To see historical data about GitHub checks against edx-platform, run this script." I did notice after running that the lost connection error doesn't happen nearly as often as I was worried it did. I'm going to run it for a longer time to see if I can get a better pattern but that takes a bit.

Thank you.

@rgraber rgraber merged commit 2ccf0fd into main Aug 2, 2023
4 checks passed
@rgraber rgraber deleted the rsgraber/20230802-get-frequency-of-action-errors branch August 2, 2023 20:38
@rgraber
Copy link
Contributor Author

rgraber commented Aug 2, 2023

On further reflection I could probably have tested the admin code against edx-arch-experiments or another repository where I have admin rights but I'd already gone past the 2-day timebox so it's left to future devs.

@robrap
Copy link
Contributor

robrap commented Aug 2, 2023

Planning on updating the runbook for pipeline failures. Something like "To see historical data about GitHub checks against edx-platform, run this script." I did notice after running that the lost connection error doesn't happen nearly as often as I was worried it did. I'm going to run it for a longer time to see if I can get a better pattern but that takes a bit.

Great. I wonder if it would be useful to append to the same spreadsheet, and add a link to the Runbook, so we can collect more and more data over time? Could be in new tabs, or however.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants