-
Notifications
You must be signed in to change notification settings - Fork 137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create more detailed error responses when submitting batch jobs #2940
Create more detailed error responses when submitting batch jobs #2940
Conversation
* Update docker stuff for latest airflow 2.7.0 * Use AirflowException instead of AirflowFailException to allow for retries Signed-off-by: Rich Scott <[email protected]>
Signed-off-by: Rich Scott <[email protected]>
* update * update pulsar client * Fix bug causing server spinning * Abstract out the retry until success logic for testing (armadaproject#2901) * Respond to review --------- Co-authored-by: Chris Martin <[email protected]> Co-authored-by: Daniel Rastelli <[email protected]> Signed-off-by: Rich Scott <[email protected]>
) Signed-off-by: Rich Scott <[email protected]>
* allow logger to report caller * allow logger to report caller * lint --------- Co-authored-by: Chris Martin <[email protected]> Signed-off-by: Rich Scott <[email protected]>
…2907) Signed-off-by: Rich Scott <[email protected]>
Co-authored-by: Adam McArthur <[email protected]> Signed-off-by: Rich Scott <[email protected]>
…roject#2743) Snyk has created this PR to upgrade @typescript-eslint/parser from 5.52.0 to 5.61.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]>
Snyk has created this PR to upgrade @types/react from 16.14.32 to 16.14.43. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]>
…aproject#2316) Bumps [github.com/go-openapi/jsonreference](https://github.com/go-openapi/jsonreference) from 0.20.0 to 0.20.2. - [Release notes](https://github.com/go-openapi/jsonreference/releases) - [Commits](go-openapi/jsonreference@v0.20.0...v0.20.2) --- updated-dependencies: - dependency-name: github.com/go-openapi/jsonreference dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]>
This will ensure the job leased first, gets send to the cluster first Currently we just order by postgres default sorting - which often picks the most recently leased - causing the first lease jobs to get stuck - This only occurs when scheduling is faster than leasing Signed-off-by: Rich Scott <[email protected]>
…ject#2302) Bumps [webpack](https://github.com/webpack/webpack) from 5.75.0 to 5.77.0. - [Release notes](https://github.com/webpack/webpack/releases) - [Commits](webpack/webpack@v5.75.0...v5.77.0) --- updated-dependencies: - dependency-name: webpack dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]>
…ject#2806) Bumps [word-wrap](https://github.com/jonschlinkert/word-wrap) from 1.2.3 to 1.2.5. - [Release notes](https://github.com/jonschlinkert/word-wrap/releases) - [Commits](jonschlinkert/word-wrap@1.2.3...1.2.5) --- updated-dependencies: - dependency-name: word-wrap dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]>
Co-authored-by: Adam McArthur <[email protected]> Signed-off-by: Rich Scott <[email protected]>
…rmadaproject#2744) Snyk has created this PR to upgrade @typescript-eslint/eslint-plugin from 5.52.0 to 5.61.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]>
Snyk has created this PR to upgrade react-router-dom from 6.9.0 to 6.14.1. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]>
…t#2661) Bumps [semver](https://github.com/npm/node-semver) from 6.3.0 to 6.3.1. - [Release notes](https://github.com/npm/node-semver/releases) - [Changelog](https://github.com/npm/node-semver/blob/v6.3.1/CHANGELOG.md) - [Commits](npm/node-semver@v6.3.0...v6.3.1) --- updated-dependencies: - dependency-name: semver dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Adam McArthur <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]>
Signed-off-by: Rich Scott <[email protected]>
* Helm chart update: executor At the moment the helm chart for the executor doesn't include priorityClass even though one is created in the chart. This means that the executor deployment is unable to set the priorityClass. Signed-off-by: Rich Scott <[email protected]>
* Bump github.com/go-openapi/strfmt from 0.21.3 to 0.21.7 Bumps [github.com/go-openapi/strfmt](https://github.com/go-openapi/strfmt) from 0.21.3 to 0.21.7. - [Release notes](https://github.com/go-openapi/strfmt/releases) - [Commits](go-openapi/strfmt@v0.21.3...v0.21.7) --- updated-dependencies: - dependency-name: github.com/go-openapi/strfmt dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/go-openapi/runtime from 0.24.2 to 0.26.0 Bumps [github.com/go-openapi/runtime](https://github.com/go-openapi/runtime) from 0.24.2 to 0.26.0. - [Release notes](https://github.com/go-openapi/runtime/releases) - [Commits](go-openapi/runtime@v0.24.2...v0.26.0) --- updated-dependencies: - dependency-name: github.com/go-openapi/runtime dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/goreleaser/nfpm/v2 from 2.25.1 to 2.29.0 Bumps [github.com/goreleaser/nfpm/v2](https://github.com/goreleaser/nfpm) from 2.25.1 to 2.29.0. - [Release notes](https://github.com/goreleaser/nfpm/releases) - [Changelog](https://github.com/goreleaser/nfpm/blob/main/.goreleaser.yml) - [Commits](goreleaser/nfpm@v2.25.1...v2.29.0) --- updated-dependencies: - dependency-name: github.com/goreleaser/nfpm/v2 dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> * Bump github.com/go-playground/validator/v10 from 10.11.1 to 10.14.1 Bumps [github.com/go-playground/validator/v10](https://github.com/go-playground/validator) from 10.11.1 to 10.14.1. - [Release notes](https://github.com/go-playground/validator/releases) - [Commits](go-playground/validator@v10.11.1...v10.14.1) --- updated-dependencies: - dependency-name: github.com/go-playground/validator/v10 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> * Bump Grpc.Net.Client in /client/DotNet/ArmadaProject.Io.Client Bumps [Grpc.Net.Client](https://github.com/grpc/grpc-dotnet) from 2.47.0 to 2.52.0. - [Release notes](https://github.com/grpc/grpc-dotnet/releases) - [Changelog](https://github.com/grpc/grpc-dotnet/blob/master/doc/release_process.md) - [Commits](grpc/grpc-dotnet@v2.47.0...v2.52.0) --- updated-dependencies: - dependency-name: Grpc.Net.Client dependency-type: direct:production ... Signed-off-by: dependabot[bot] <[email protected]> * fix: upgrade @mui/material from 5.10.17 to 5.13.6 Snyk has created this PR to upgrade @mui/material from 5.10.17 to 5.13.6. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade prettier from 2.7.1 to 2.8.8 Snyk has created this PR to upgrade prettier from 2.7.1 to 2.8.8. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade @mui/icons-material from 5.10.16 to 5.14.3 Snyk has created this PR to upgrade @mui/icons-material from 5.10.16 to 5.14.3. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade eslint-plugin-import from 2.26.0 to 2.28.0 Snyk has created this PR to upgrade eslint-plugin-import from 2.26.0 to 2.28.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * fix: upgrade eslint-config-prettier from 8.5.0 to 8.10.0 Snyk has created this PR to upgrade eslint-config-prettier from 8.5.0 to 8.10.0. See this package in npm: See this project in Snyk: https://app.snyk.io/org/dave-gantenbein/project/5064983e-fa14-4803-8fc2-cfd6f1fa81b6?utm_source=github&utm_medium=referral&page=upgrade-pr * Trying to update klog * go mod fix --------- Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: snyk-bot <[email protected]> Co-authored-by: Mohamed Abdelfatah <[email protected]> Signed-off-by: Rich Scott <[email protected]>
* Add error message of final job run to JobFailedMessage When we hit the maximum retry limit, the JobFailedMessage just says something along the lines of "Job has been retried too many times, giving up" Now we include the final run error in that message - to make it easier to work out the cause of retries * Fix bug causing GetJobSetEvents to get stuck GetJobSetEvents only increments its fromId variable on sending new messages However now all redis events produce api events that will be sent downstream The issue here is if we get 500 redis events in a row that don't produce api events, then the fromId never gets updated - Meaning the watching gets stuck here To fix this, ReadEvents now returns a lastMessageId. So if there are no messages to process, the fromId should be updated using the lastMessageId * Formatting Signed-off-by: Rich Scott <[email protected]>
…madaproject#2931) Bumps [@adobe/css-tools](https://github.com/adobe/css-tools) from 4.0.1 to 4.3.1. - [Changelog](https://github.com/adobe/css-tools/blob/main/History.md) - [Commits](https://github.com/adobe/css-tools/commits) --- updated-dependencies: - dependency-name: "@adobe/css-tools" dependency-type: indirect ... Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Signed-off-by: Rich Scott <[email protected]>
for i, item := range request.JobRequestItems { | ||
jobId := getUlid() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a change in behaviour in that previously we used to return nil
for the id when we rejected jobs but now we return a jobId. I think this is good in that we now have an id that the user can give us to search logs etc, but does have a couple of downsides:
- Users may get confused if they see jobIds that don't appear anywhere else in the system (e.g. lookout)
- It's a mildly breaking interface change- e.g. if clients previously had code that did
if jobResponse.JobId == nil...
this will now behave differently.
I think I'm mildly in favour of this change, but it might be good to get a second opinion- @severinson ?
@@ -253,13 +268,30 @@ func (server *SubmitServer) DeleteQueue(ctx context.Context, request *api.QueueD | |||
func (server *SubmitServer) SubmitJobs(ctx context.Context, req *api.JobSubmitRequest) (*api.JobSubmitResponse, error) { | |||
principal := authorization.GetPrincipal(ctx) | |||
|
|||
jobs, e := server.createJobs(req, principal.GetName(), principal.GetGroupNames()) | |||
jobs, responseItems, e := server.createJobs(req, principal.GetName(), principal.GetGroupNames()) | |||
if e != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any limit to the size of the response we can generate here? E.g. if the user submits a batch of 100 jobs and they all fail with some massive error are we sure we're not going to generate a response that exceeds the max grpc response size? Likewise (although admittedly loess likely) could we end up generaling such a massive message that we would OOM.
If it were me I'd try and limit the size of the error message by saying:
- how may jobs failed validation (out of how may submitted)
- detailed error messages for the first n jobs that failed. (maybe n=5 or something)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@d80tb7 I'll replace that potentially-huge error with a concise/distilled one, using that logic as you suggest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Signed-off-by: Rich Scott <[email protected]>
Signed-off-by: Rich Scott <[email protected]>
Signed-off-by: Rich Scott <[email protected]>
Generate and return more detailed submission and/or validation errors. If there are numerous jobs with errors, just give the number of failed jobs (and the total number originally submitted), and truncate the list of failed jobs errors to just the first 5 (this is defined in a single constant variable, if neededed to change later). Signed-off-by: Rich Scott <[email protected]>
NOTE: this PR is a continuation of the work/PR that our summer 2023 MLH fellow/intern Raaj Patel had created. Unfortunately, his internship ended and he has returned to university before that PR (#2623) was fully finished, approved, and merged. The only change to this code since he departed I have made is to fix a conflict in go.mod and go.sum.
Following is his description in the original PR:
What type of PR is this?
This PR adds functionality to JobSubmitResponse when an error has occured. Previously the only thing returned was the error. The changes made in this PR allow the code to do 2 things:
Which issue(s) this PR fixes:
Fixed #1748
Original Description from issue 1748 in JIRA, opened by @d80tb7 :