-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ccl,sql,util: Fix jwt auth and add sensitive error logs #123697
Conversation
e25ad0e
to
8a7c2d7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @dhartunian)
a discussion (no related file):
Should we support cluster settings to control http client behaviour for jwt?
Do we log err desc. (which contains sensitive info) in db logs but redact in sentry? Also do we send err desc to client?
9e1c365
to
806b46e
Compare
@@ -71,6 +71,8 @@ type jwtAuthenticatorConf struct { | |||
jwks jwk.Set | |||
claim string | |||
jwksAutoFetchEnabled bool | |||
//clientTimeout time.Duration |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented code may be intended to be removed ?
@@ -91,6 +93,8 @@ func (authenticator *jwtAuthenticator) reloadConfigLocked( | |||
jwks: mustParseJWKS(JWTAuthJWKS.Get(&st.SV)), | |||
claim: JWTAuthClaim.Get(&st.SV), | |||
jwksAutoFetchEnabled: JWKSAutoFetchEnabled.Get(&st.SV), | |||
// clientTimeout: JWTClientTimeout.Get(&st.SV), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Commented code may be intended to be removed ?
@@ -179,7 +186,7 @@ func (authenticator *jwtAuthenticator) ValidateJWTLogin( | |||
// Now that both the issuer and key-id are matched, parse the token again to validate the signature. | |||
parsedToken, err := jwt.Parse(tokenBytes, jwt.WithKeySet(jwkSet), jwt.WithValidate(true), jwt.InferAlgorithmFromKey(true)) | |||
if err != nil { | |||
return errors.Newf("JWT authentication: invalid token") | |||
return errors.WithDetailf(errors.Newf("JWT authentication: invalid token"), "unable to parse token: %v", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought WithDetailf is going to send the error details all the way to client. With this change, looks like we are sending more details visible to client ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BabuSrithar, @bdarnell, and @souravcrl)
pkg/ccl/jwtauthccl/authentication_jwt.go
line 189 at r3 (raw file):
Previously, BabuSrithar (BabuSrithar) wrote…
I thought WithDetailf is going to send the error details all the way to client. With this change, looks like we are sending more details visible to client ?
What we did in the OIDC code was put the detailed error behind a conditional log using log.V(1)
and keep the simple error for returning to client.
cockroach/pkg/ccl/oidcccl/authentication_oidc.go
Lines 343 to 345 in 44d7da5
if log.V(1) { | |
log.Infof(ctx, "check redirect URL OIDC cluster setting: "+OIDCRedirectURLSettingName) | |
} |
This way a customer can turn on the logs using vmodule when necessary for debugging and then turn them off again.
vmodule levels can be enabled via a flag to cockroach like: --vmodule=authentication_oidc=1
(It's based on filename) or via SQL on a running cluster like this: SELECT crdb_internal.set_vmodule('authentication_oidc=1');
pkg/ccl/jwtauthccl/authentication_jwt_test.go
line 797 at r3 (raw file):
// happen to create an http client, then the DefaultTransport will have // been been initialized with an empty Proxy. So, set proxy directly. http.DefaultTransport.(*http.Transport).Proxy = func(_ *http.Request) (*url.URL, error) {
try using testutils.TestingHook
for this pattern just to keep things consistent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 8 files at r1, 4 of 7 files at r2, 2 of 2 files at r3, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BabuSrithar and @souravcrl)
-- commits
line 45 at r3:
In general, we want to unify all the HTTP client usage in the system and not add new knobs that are specific to one use or another. We could add a custom_ca setting if we need it (following the example of cloudstorage.http.custom_ca) but we shouldn't have a new system_proxy setting that nothing else uses (the NO_PROXY env var can be use to control when the proxy is and is not used).
pkg/ccl/jwtauthccl/authentication_jwt.go
line 189 at r3 (raw file):
Previously, dhartunian (David Hartunian) wrote…
What we did in the OIDC code was put the detailed error behind a conditional log using
log.V(1)
and keep the simple error for returning to client.cockroach/pkg/ccl/oidcccl/authentication_oidc.go
Lines 343 to 345 in 44d7da5
if log.V(1) { log.Infof(ctx, "check redirect URL OIDC cluster setting: "+OIDCRedirectURLSettingName) } This way a customer can turn on the logs using vmodule when necessary for debugging and then turn them off again.
vmodule levels can be enabled via a flag to cockroach like:
--vmodule=authentication_oidc=1
(It's based on filename) or via SQL on a running cluster like this:SELECT crdb_internal.set_vmodule('authentication_oidc=1');
Yes, we want to return limited data to the client, and specifically the client should not be able to learn any details about URLs or proxy setup from our error messages. I think errors relating to the network fetch should use log.Errorf (without log.V(1) - that just makes it harder to see because no one knows which vmodule options to turn on).
Parsing could go either way - in theory the parse error would not contain any information that the client didn't just give to us, so it would be safe to return as an error detail, but it's probably better to play it safe and leave the details in the log. This would be a better use for log.V(1) so that a malicious client can't easily spam the logs.
pkg/sql/pgwire/auth.go
line 556 at r3 (raw file):
var errStr redact.RedactableString if detailedErr != nil { // should we apply redact here?
I don't think so - redaction is about what can be sent to CRL. For error messages we generally want to either send the whole thing to the client or none of it.
pkg/sql/pgwire/auth_methods.go
line 789 at r3 (raw file):
c.LogAuthFailed(ctx, eventpb.AuthFailReason_CREDENTIALS_INVALID, err) // we are returning unsafe error to client in description but only safe // log parts are sent to sentry. is this correct way to handle this?
Yes, it can be (although it depends on what the "unsafe" parts are). For example, usernames are generally marked as redacted and should not be sent to sentry, but they could be appropriate to return in an error message.
pkg/util/httputil/client.go
line 46 at r3 (raw file):
var tlsConf *tls.Config if customCAPem != "" { roots, err := x509.SystemCertPool()
I think custom CAs are generally used with a new empty cert pool, not SystemCertPool.
81f294c
to
a54cea7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PTAL. I have addressed the previous comments.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BabuSrithar, @bdarnell, and @dhartunian)
Previously, bdarnell (Ben Darnell) wrote…
In general, we want to unify all the HTTP client usage in the system and not add new knobs that are specific to one use or another. We could add a custom_ca setting if we need it (following the example of cloudstorage.http.custom_ca) but we shouldn't have a new system_proxy setting that nothing else uses (the NO_PROXY env var can be use to control when the proxy is and is not used).
removed cluster settings client.timeout
and client.system_proxy.enabled
pkg/ccl/jwtauthccl/authentication_jwt.go
line 74 at r3 (raw file):
Previously, BabuSrithar (BabuSrithar) wrote…
Commented code may be intended to be removed ?
removed commented code post discussion
pkg/ccl/jwtauthccl/authentication_jwt.go
line 96 at r3 (raw file):
Previously, BabuSrithar (BabuSrithar) wrote…
Commented code may be intended to be removed ?
removed commented code post discussion
pkg/ccl/jwtauthccl/authentication_jwt.go
line 189 at r3 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Yes, we want to return limited data to the client, and specifically the client should not be able to learn any details about URLs or proxy setup from our error messages. I think errors relating to the network fetch should use log.Errorf (without log.V(1) - that just makes it harder to see because no one knows which vmodule options to turn on).
Parsing could go either way - in theory the parse error would not contain any information that the client didn't just give to us, so it would be safe to return as an error detail, but it's probably better to play it safe and leave the details in the log. This would be a better use for log.V(1) so that a malicious client can't easily spam the logs.
If we log it here, we would be double logging since we are also logging in pgwire/auth_methods.go
cockroach/pkg/sql/pgwire/auth_methods.go
Lines 783 to 786 in b8ba30a
if err = jwtVerifier.ValidateJWTLogin(ctx, execCfg.Settings, user, []byte(token), identMap); err != nil { | |
c.LogAuthFailed(ctx, eventpb.AuthFailReason_CREDENTIALS_INVALID, err) | |
return err | |
} |
Incase when we don't want to send back these details to client which are then logged with
c.LogAuthFailed()
, I am sending only the authLog back to client. c.LogAuthFailed() has the option of logging only unlogged errors but I don't have access to AuthConn
object in authentication_jwt.go
which can configure this parameter. The AuthConn
object can be added but I thought of handling this in the higher layer of pgwire inline with previous comment:https://reviewable.io/reviews/cockroachdb/cockroach/117054#-Nne7jnh5oeb9srWuZw2
Also in pgwire, other authentication errors are not behind log.V(1), so I am assuming we were not focussed about malicious clients or overlogging. Only level 2 logs are being used for logging some info logs. LMK if we would still need this because our use case here is different from pwd or cert authentication.
pkg/sql/pgwire/auth_methods.go
line 789 at r3 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Yes, it can be (although it depends on what the "unsafe" parts are). For example, usernames are generally marked as redacted and should not be sent to sentry, but they could be appropriate to return in an error message.
errors.WithDetailf()
mentions that error details is supposed to contain PII data.
WithDetail decorates an error with a textual detail. The detail may contain PII and thus will not reportable. The suggested use case for detail is to augment errors with information useful for debugging.
Since we have the detailed error here which contains token related information which client itself sends, it makes sense to send back these to client, anyways sentry would be redacting these.
I think specifically for c.LogAuthFailed()
we should add back the detailed error(like from http clients which might contain data sql client did not send us), so that it could be logged. I have made these changes here.
pkg/util/httputil/client.go
line 46 at r3 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I think custom CAs are generally used with a new empty cert pool, not SystemCertPool.
updated to use NewCertPool()
pkg/ccl/jwtauthccl/authentication_jwt_test.go
line 797 at r3 (raw file):
Previously, dhartunian (David Hartunian) wrote…
try using
testutils.TestingHook
for this pattern just to keep things consistent.
updated to use testutils.TestingHook
pkg/sql/pgwire/auth.go
line 556 at r3 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
I don't think so - redaction is about what can be sent to CRL. For error messages we generally want to either send the whole thing to the client or none of it.
noted. sending the logged error as is, the handling is done is auth_methods.go
directly.
a54cea7
to
fb57f5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated auth_methods
logging, since I found we were redacting in LogAuthFailed
called inside authenticator fn. If we put the error in details part of errors.WithDetailf or had the error from http at a different level only the first level of error is logged. So I performed join operation on errors for logging. I have updated commit message to reflect that
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BabuSrithar, @bdarnell, and @dhartunian)
pkg/ccl/jwtauthccl/authentication_jwt.go
line 189 at r3 (raw file):
Previously, souravcrl wrote…
If we log it here, we would be double logging since we are also logging in
pgwire/auth_methods.go
cockroach/pkg/sql/pgwire/auth_methods.go
Lines 783 to 786 in b8ba30a
if err = jwtVerifier.ValidateJWTLogin(ctx, execCfg.Settings, user, []byte(token), identMap); err != nil { c.LogAuthFailed(ctx, eventpb.AuthFailReason_CREDENTIALS_INVALID, err) return err }
Incase when we don't want to send back these details to client which are then logged withc.LogAuthFailed()
, I am sending only the authLog back to client. c.LogAuthFailed() has the option of logging only unlogged errors but I don't have access toAuthConn
object inauthentication_jwt.go
which can configure this parameter. TheAuthConn
object can be added but I thought of handling this in the higher layer of pgwire inline with previous comment:
https://reviewable.io/reviews/cockroachdb/cockroach/117054#-Nne7jnh5oeb9srWuZw2Also in pgwire, other authentication errors are not behind log.V(1), so I am assuming we were not focussed about malicious clients or overlogging. Only level 2 logs are being used for logging some info logs. LMK if we would still need this because our use case here is different from pwd or cert authentication.
We also have a cluster setting for logging auth errors: server.auth_log.sql_sessions.enabled
which needs to be set to true
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 4 of 10 files at r4, 3 of 3 files at r5, 6 of 6 files at r6, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BabuSrithar, @dhartunian, and @souravcrl)
pkg/ccl/jwtauthccl/authentication_jwt.go
line 142 at r6 (raw file):
tokenBytes []byte, identMap *identmap.Conf, ) (detailedErrorMsg []byte, authError error) {
Why is detailedErrorMsg a []byte
instead of a string
? or even another error
?
pkg/ccl/jwtauthccl/authentication_jwt.go
line 333 at r6 (raw file):
var getHttpResponse = func(ctx context.Context, url string, authenticator *jwtAuthenticator) ([]byte, error) { defaultTimeout := httputil.StandardHTTPTimeout httpClient := httputil.NewClientWithTimeoutsCustomCA(
This reparses the custom CA every time, which seems kind of expensive. I don't think we have to fix it in this PR (do we have a customer planning to use custom CAs?) but we should make a note to cache the httpClient somewhere (could do it in a callback attached to the cluster setting).
pkg/ccl/jwtauthccl/settings.go
line 91 at r6 (raw file):
settings.ApplicationLevel, JWTAuthClientCustomCASettingName, "sets the custom root CA (appended to system's default CAs) for verifying "+
"Appended to system's default CAs" is not accurate now that you're not using SystemCertPool below. I'd just remove the parenthetical.
c015dba
to
cfae89f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BabuSrithar, @bdarnell, and @dhartunian)
pkg/ccl/jwtauthccl/authentication_jwt.go
line 142 at r6 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Why is detailedErrorMsg a
[]byte
instead of astring
? or even anothererror
?
we cannot send as error
because lint doesn't allow it inside a block where err != nil, but one of the returned variables is nil.
Reconsidered this and updated to use string to avoid unnecessary typecasts.
pkg/ccl/jwtauthccl/authentication_jwt.go
line 333 at r6 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
This reparses the custom CA every time, which seems kind of expensive. I don't think we have to fix it in this PR (do we have a customer planning to use custom CAs?) but we should make a note to cache the httpClient somewhere (could do it in a callback attached to the cluster setting).
the proxy server client is using may have custom CA requirement. adding a note to cache the http client in authenticator which could be set on change for customCA cluster setting.
pkg/ccl/jwtauthccl/settings.go
line 91 at r6 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
"Appended to system's default CAs" is not accurate now that you're not using SystemCertPool below. I'd just remove the parenthetical.
remove the part in parentheses
17d2645
to
d526a7a
Compare
@bdarnell @souravcrl - I am wondering if custom-ca is a relevant thing in case of JWT auth. The only place I see this being useful is JWT server having slef-signed cert. Even in that case, can't the self signed cert be imported in to system's cert store ? |
d526a7a
to
e053c5a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BabuSrithar, @dhartunian, and @souravcrl)
pkg/util/httputil/client.go
line 74 at r8 (raw file):
Previously, BabuSrithar (BabuSrithar) wrote…
@bdarnell @souravcrl - I am wondering if custom-ca is a relevant thing in case of JWT auth. The only place I see this being useful is JWT server having slef-signed cert. Even in that case, can't the self signed cert be imported in to system's cert store ?
Yes, this is useful when the JWT server has a cert that is either self-signed or comes from a private CA. You could add the private CA to the system cert store in this case (and I think most of the time it would be a good thing), but you might not want to. The system cert store is generally intended to store CAs for the public web. Mixing private CAs here might be undesirable from either direction. It would give your private CA the ability to impersonate public websites (probably not a problem in the context of a database server), and more worryingly it could give public CAs the ability to impersonate your private websites (this matters to the kinds of organizations that have to worry about high-level nation-state security threats).
I think the question is whether we have customer feedback asking for this. I assumed we did if Sourav went to the trouble of adding it, but if not I think we could remove it and see if the system cert store alone can be a good solution.
e053c5a
to
bb1923b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to remove the cluster setting for custom CA. PTAL @BabuSrithar
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @BabuSrithar, @bdarnell, and @dhartunian)
pkg/ccl/jwtauthccl/authentication_jwt.go
line 333 at r9 (raw file):
// and other http client cluster settings as re parsing the custom CA every // time is expensive httpClient := httputil.NewClientWithTimeout(httputil.StandardHTTPTimeout)
sticking to standard http timeout of 3 seconds here.
pkg/util/httputil/client.go
line 74 at r8 (raw file):
Previously, bdarnell (Ben Darnell) wrote…
Yes, this is useful when the JWT server has a cert that is either self-signed or comes from a private CA. You could add the private CA to the system cert store in this case (and I think most of the time it would be a good thing), but you might not want to. The system cert store is generally intended to store CAs for the public web. Mixing private CAs here might be undesirable from either direction. It would give your private CA the ability to impersonate public websites (probably not a problem in the context of a database server), and more worryingly it could give public CAs the ability to impersonate your private websites (this matters to the kinds of organizations that have to worry about high-level nation-state security threats).
I think the question is whether we have customer feedback asking for this. I assumed we did if Sourav went to the trouble of adding it, but if not I think we could remove it and see if the system cert store alone can be a good solution.
removed the additional cluster setting and code for custom/private CA and sticking to system cert store solution. I have created a ticket with link to this discussion for future reference: https://cockroachlabs.atlassian.net/browse/CRDB-38682
We are running into issues with jwt authentication and currently unable to provide support as we are not logging the error from the http client used in the authenticator. The PR looks to propagate this obtained error from `ValidateJWTLogin` http client. The http client now also respects the system http proxy if set. Validated the error details when presenting an expired token ``` ERROR: JWT authentication: invalid token SQLSTATE: 28000 DETAIL: unable to parse token: exp not satisfied Failed running "sql" ``` Validated error on setting wrong proxy params ``` ERROR: JWT authentication: unable to validate token SQLSTATE: 28000 Failed running "sql" ``` and logged error: ``` I240510 08:31:28.604141 1473 4@util/log/event_log.go:32 ⋮ [T1,Vsystem,n1,client=127.0.0.1:56289,hostssl,user=‹sourav.sarangi›] 3 ={"Timestamp":1715329888604122000,"EventType":"client_authentication_failed","InstanceID":1,"Network":"tcp","RemoteAddress":"‹127.0.0.1:56289›","SessionID":"17ce136f2a8ecd480000000000000001","Transport":"hostssl","User":"‹sourav.sarangi›","SystemIdentity":"‹sourav.sarangi›","Reason":"CREDENTIALS_INVALID","Detail":"JWT authentication: unable to validate token\nunable to fetch jwks: Get \"https://accounts.google.com/.well-known/openid-configuration\": proxyconnect tcp: dial tcp [::1]:3129: connect: connection refused","Method":"jwt_token"} ``` Verified access logs after setting up squid proxy and passing env HTTP_PROXY and HTTPS_PROXY params ``` 1715103871.761 144 ::1 TCP_TUNNEL/200 5708 CONNECT accounts.google.com:443 - HIER_DIRECT/74.125.200.84 - 1715103871.836 73 ::1 TCP_TUNNEL/200 5964 CONNECT www.googleapis.com:443 - HIER_DIRECT/142.250.182.10 - ``` fixes cockroachdb#123575, CRDB-38386, CRDB-38408 Epic None Release note: None
bb1923b
to
dadee6d
Compare
TFTR! bors r+ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 2 files at r3, 4 of 7 files at r9, 3 of 3 files at r10, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @bdarnell and @dhartunian)
bors r- |
Canceled. |
bors r+ |
Encountered an error creating backports. Some common things that can go wrong:
You might need to create your backport manually using the backport tool. error creating merge commit from dadee6d to blathers/backport-release-23.1-123697: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.1.x failed. See errors above. error creating merge commit from dadee6d to blathers/backport-release-23.2-123697: POST https://api.github.com/repos/cockroachdb/cockroach/merges: 409 Merge conflict [] you may need to manually resolve merge conflicts with the backport tool. Backport to branch 23.2.x failed. See errors above. error setting reviewers, but backport branch blathers/backport-release-24.1-123697 is ready: POST https://api.github.com/repos/cockroachdb/cockroach/pulls/124103/requested_reviewers: 422 Reviews may only be requested from collaborators. One or more of the teams you specified is not a collaborator of the cockroachdb/cockroach repository. [] Backport to branch 24.1.x failed. See errors above. 🦉 Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf. |
ccl,sql,util: Fix jwt auth and add sensitive error logs
We are running into issues with jwt authentication and currently unable to
provide support as we are not logging the error from the http client used in the
authenticator. The PR looks to propagate this obtained error from
ValidateJWTLogin
http client. The http client now also respects the systemhttp proxy if set.
Validated the error details when presenting an expired token
Validated error on setting wrong proxy params
and logged error:
Verified access logs after setting up squid proxy and passing env HTTP_PROXY and
HTTPS_PROXY params
fixes #123575, CRDB-38386, CRDB-38408
Epic None
Release note: Noneccl,sql,util: Fix jwt auth and add sensitive error logs
We are running into issues with jwt authentication and currently unable to
provide support as we are not logging the error from the http client used in the
authenticator. The PR looks to propagate this obtained error from
ValidateJWTLogin
http client. The http client now also respects the systemhttp proxy if set.
Validated the error details when presenting an expired token
Validated error on setting wrong proxy params
and logged error:
Verified access logs after setting up squid proxy and passing env HTTP_PROXY and
HTTPS_PROXY params
fixes #123575, CRDB-38386, CRDB-38408
Epic None
Release note: None