-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: acl bootstrap errors aren't 500 #6421
Conversation
c1b3668
to
94adf89
Compare
nomad/acl_endpoint.go
Outdated
} | ||
if out == nil { | ||
return fmt.Errorf("cannot find token %s", token.AccessorID) | ||
return structs.NewErrRPCCodedf(400, "cannot find token %s", token.AccessorID) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be a 404?
Noticed that ACL endpoints return 500 status code for user errors. This is confusing and can lead to false monitoring alerts. Here, I introduce a concept of RPCCoded errors to be returned by RPC that signal a code in addition to error message. Codes for now match HTTP codes to ease reasoning. ``` $ nomad acl bootstrap Error bootstrapping: Unexpected response code: 500 (ACL bootstrap already done (reset index: 9)) $ nomad acl bootstrap Error bootstrapping: Unexpected response code: 400 (ACL bootstrap already done (reset index: 9)) ```
94adf89
to
6b25483
Compare
nomad/structs/errors.go
Outdated
@@ -25,6 +26,8 @@ const ( | |||
ErrUnknownJobPrefix = "Unknown job" | |||
ErrUnknownEvaluationPrefix = "Unknown evaluation" | |||
ErrUnknownDeploymentPrefix = "Unknown deployment" | |||
|
|||
errRPCCodedErrorPrefix = "RPC_ERROR::" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A quick rg --fixed-strings 'RPC Error: '
yielded no results, so we could use that instead. My concern is that I'd love to start using this pattern more widely, but it could easily leak into CLI or client agent logs if we aren't careful about our checks/conversions.
errRPCCodedErrorPrefix = "RPC_ERROR::" | |
errRPCCodedErrorPrefix = "RPC Error: " |
Not a big deal either way. Users aren't going to get upset from seeing RPC_ERROR::
once or twice if it ever even happens. 😅
@@ -144,3 +147,31 @@ func IsErrUnknownNomadVersion(err error) bool { | |||
func IsErrNodeLacksRpc(err error) bool { | |||
return err != nil && strings.Contains(err.Error(), errNodeLacksRpc) | |||
} | |||
|
|||
func NewErrRPCCoded(code int, msg string) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comments on exported funcs.
Don't forget a changelog entry for the incorrect error code bug fix. |
I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions. |
Noticed that ACL endpoints return 500 status code for user errors. This
is confusing and can lead to false monitoring alerts.
Here, I introduce a concept of RPCCoded errors to be returned by RPC
that signal a code in addition to error message. Codes for now match
HTTP codes to ease reasoning.
Here, I start with ACL endpoints, but we can propagate status code for other endpoints as necessary.
I see that we have special cased permission and not found errors for some endpoints, but I didn't want to follow this pattern as it seems brittle and wrapping seems like too much magic for my taste.