Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Send overall job status in init-post status report #2097

Merged
merged 17 commits into from
Jan 26, 2024

Conversation

angelapwen
Copy link
Contributor

@angelapwen angelapwen commented Jan 24, 2024

This PR sends a job_status field in the init-post status report so that we will be able to more accurately measure our SLOs (see internal backlinked issue for more).

To populate this field, the change:

  • sets the job status environment variable to Failure or ConfigurationError the first time an error has been established. This is already checked for in the getActionsStatus method that is called when we establish status of each individual action, so we additionally set the environment variable at this time.
  • in the init-post Action, if the ANALYZE_DID_COMPLETE_SUCCESSFULLY environment variable was not true, and the job status environment variable was not already set, we set it to ConfigurationError. This catches the case shown in the submit SARIF after failure PR check, where the failure occurs within the job but outside of the Action steps where we send status reports. We consider this a configuration error for the purposes of telemetry as these steps are not owned by the Action.
  • We also explicitly set the environment variable to Success if ANALYZE_DID_COMPLETE_SUCCESSFULLY is true. Note that in the case where analyze was successful, but some step between analyze and the init-post action fails, we'll incorrectly mark the job status as Success. Presumably this is a very uncommon scenario though 🤔
  • sends the value of this environment variable, or Unknown if not previously populated, to the init-post status report.

At each step, if the environment variable already has a value, we never override it.

I logged the job status in a debug statement to confirm that the appropriate statuses are being sent as expected:

  • Success: successful run log
  • Failure: debug artifacts failure run (failure in analyze step) log
  • Configuration Error: submit SARIF after run (failure occurs in step where we do not send a status report) log

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Confirm the readme has been updated if necessary.
  • Confirm the changelog has been updated if necessary.

@angelapwen angelapwen marked this pull request as ready for review January 24, 2024 12:51
@angelapwen angelapwen requested a review from a team as a code owner January 24, 2024 12:51
Copy link
Contributor

@henrymercer henrymercer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, a couple of thoughts:

src/environment.ts Outdated Show resolved Hide resolved
src/status-report.ts Outdated Show resolved Hide resolved
src/init-action-post-helper.ts Outdated Show resolved Hide resolved
src/init-action-post.ts Outdated Show resolved Hide resolved
@henrymercer
Copy link
Contributor

  • We also explicitly set the environment variable to Success if ANALYZE_DID_COMPLETE_SUCCESSFULLY is true. Note that in the case where analyze was successful, but some step between analyze and the init-post action fails, we'll incorrectly mark the job status as Success. Presumably this is a very uncommon scenario though 🤔

Just to check my understanding here, if one of the Actions we maintain is called after we successfully uploaded a SARIF file and it fails, then this will set the job status environment variable, so we'll have a failure. If a third-party Action is called after we successfully uploaded a SARIF file, then we won't know about that so we'll mark this as a success. This seems fine as code scanning has succeeded in that situation at analyzing the repo and uploading the analysis results.

@angelapwen
Copy link
Contributor Author

Just to check my understanding here, if one of the Actions we maintain is called after we successfully uploaded a SARIF file and it fails, then this will set the job status environment variable, so we'll have a failure. If a third-party Action is called after we successfully uploaded a SARIF file, then we won't know about that so we'll mark this as a success. This seems fine as code scanning has succeeded in that situation at analyzing the repo and uploading the analysis results.

Yes!

henrymercer
henrymercer previously approved these changes Jan 25, 2024
Copy link
Contributor

@henrymercer henrymercer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

src/environment.ts Outdated Show resolved Hide resolved
src/init-action-post-helper.ts Outdated Show resolved Hide resolved
src/status-report.ts Show resolved Hide resolved
Co-authored-by: Henry Mercer <[email protected]>
@angelapwen
Copy link
Contributor Author

Will hold on merging this until the monolith PR is approved and merged.

henrymercer
henrymercer previously approved these changes Jan 25, 2024
src/status-report.ts Outdated Show resolved Hide resolved
Co-authored-by: Henry Mercer <[email protected]>
@henrymercer henrymercer added the Rebuild Re-transpile JS & re-generate workflows label Jan 25, 2024
@henrymercer
Copy link
Contributor

Ah, rebuild doesn't have permission to write to forks.

@angelapwen
Copy link
Contributor Author

The failures are actually unrelated upstream API failures. The comment change doesn't affect the transpilation files (I think)

@henrymercer henrymercer removed the Rebuild Re-transpile JS & re-generate workflows label Jan 25, 2024
@angelapwen
Copy link
Contributor Author

Monolith PR is merged so I'll merge this now and keep an eye on our telemetry to make sure the change is looking okay!

@angelapwen angelapwen merged commit 61bf025 into github:main Jan 26, 2024
318 of 319 checks passed
@angelapwen angelapwen deleted the send-job-status branch January 26, 2024 14:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants