Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug 2097830: macOS: certificate is untrusted error #1207

Closed
wants to merge 1 commit into from

Conversation

rosspeoples
Copy link

On Mac, attempting to connect to an API server with a SHA1 signature (even if it also has a SHA256 signature) will cause oc login to respond with:

certificate is not trusted

This error is of type *errors.errorString, so there's not a great way to detect it, other than to check the error message string. Worse yet, the error must be created (not wrapped) as x509.CertificateInvalidError with reason as x509.Expired in order to include the original error message. While the user will likely not see this error, it may appear in rare conditions.

I'm open to suggestions on how to improve this.

/assign @soltysh

@openshift-ci openshift-ci bot added bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 15, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 15, 2022

@deejross: This pull request references Bugzilla bug 2097830, which is invalid:

  • expected the bug to target the "4.12.0" release, but it targets "---" instead
  • expected the bug to be in one of the following states: NEW, ASSIGNED, ON_DEV, POST, POST, but it is MODIFIED instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

Bug 2097830: macOS: certificate is untrusted error

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested review from mfojtik and soltysh July 15, 2022 19:53
@rosspeoples
Copy link
Author

/bugzilla refresh

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 15, 2022

@deejross: This pull request references Bugzilla bug 2097830, which is invalid:

  • expected the bug to target the "4.12.0" release, but it targets "---" instead

Comment /bugzilla refresh to re-evaluate validity if changes to the Bugzilla bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@rosspeoples
Copy link
Author

/bugzilla refresh

@openshift-ci openshift-ci bot added bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. and removed bugzilla/invalid-bug Indicates that a referenced Bugzilla bug is invalid for the branch this PR is targeting. labels Jul 15, 2022
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jul 15, 2022

@deejross: This pull request references Bugzilla bug 2097830, which is valid. The bug has been moved to the POST state. The bug has been updated to refer to the pull request using the external bug tracker.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target release (4.12.0) matches configured target release for branch (4.12.0)
  • bug is in the state NEW, which is one of the valid states (NEW, ASSIGNED, ON_DEV, POST, POST)

Requesting review from QA contact:
/cc @zhouying7780

In response to this:

/bugzilla refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci openshift-ci bot requested a review from zhouying7780 July 15, 2022 20:00
@@ -136,6 +137,10 @@ func (o *LoginOptions) getClientConfig() (*restclient.Config, error) {
// try to TCP connect to the server to make sure it's reachable, and discover
// about the need of certificates or insecure TLS
if err := dialToServer(*clientConfig); err != nil {
if strings.Contains(err.Error(), "certificate is not trusted") {
err = x509.CertificateInvalidError{Reason: x509.Expired, Detail: err.Error()}
Copy link
Member

@ingvagabund ingvagabund Jul 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am new to the code so my comment in the PR may be off. The BZ reports "Login is not successful and error message is seen". When you wrap the error with x509.CertificateInvalidError can I assume the error is handled properly as before? So the BZ issue is really about properly handling "certificate is not trusted" so the oc code can choose a different way of going around the untrusted certificate? What's the connection between forcing x509.Expired and the actual "certificate is not trusted" error message?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating the x509.CertificateInvalidError causes the switch case to catch this like any other invalid certificate issue. The idea was to use the best descriptive error type, and unfortunately, x509.CertificateInvalidError.Error() method only outputs the Detail string when a certain Reason is set. In this case, x509.Expired was the closest reason I could find.

I haven't found anything in kubectl that handles this yet, but oc and kubectl have very different authentication mechanisms. I agree that reusing x509.CertificateInvalidError in such a way is not ideal. I was trying to keep the number of changes as small as possible, but I may have to create a custom error for this condition to better handle it and reduce the user confusion of getting both an "expired" and "not trusted" message in the same error.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although it is hacky, we might move forward with that fix by adding some todo comment with the real bug in upstream. Whenever upstream bug is fixed, we can remove that patch.

Are we sure that fixes the problem exactly?, and once user successfully logs in, there won't be any other cert issues for other commands?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested this on a Mac, and it does resolve the issue for logins, I'm not 100% sure about other commands though. It should be possible to replicate this fix globally if it is an issue outside of login.

Copy link
Member

@ingvagabund ingvagabund Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on https://go-review.googlesource.com/c/go/+/418835 the Go release cycle entered the code freeze phase (Nov-Jan). So I don't expect the fix getting reviews anytime soon. Though, it's still worth reaching the upstream Go community to get their attention so they are aware of the code change request. It seems noone is aware of the PR. @deejross worth checking the thread in https://groups.google.com/g/golang-dev/c/K7oGURi0wTM/. Comment from Ian in https://groups.google.com/g/golang-dev/c/K7oGURi0wTM/m/HTyfcmY0BwAJ:

In general it should not be necessary to ask for an additional Googler
review. If your CL shows up on https://go.dev/s/needs-review, then
some Googler will get to it soon.

If your CL does not show up on that list, then most likely it has an
unresolved comment, or the trybots have not been run.

Please do ping if your CL is not on that list and you don't understand
why, or if it has been on the list for several days. But in general
pinging should not be required.

Copy link
Member

@ingvagabund ingvagabund Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When it comes to resolving this issue. @deejross if you are confident this change fixes the login issue without any negative side effects, let's proceed. @kasturinarra would you please perform pre-merge testing on all OSes for the oc login? Just to be sure we are not breaking anything.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can put a TODO explaining why we did this and when we can remove this hacky solution with the links.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ingvagabund thanks, i will ask @zhouying7780 to take a look at it, thanks !!

@rosspeoples
Copy link
Author

Found the upstream bug: golang/go#52010

Hopefully this gets fixed and we can close this PR.

@rosspeoples
Copy link
Author

I have submitted a PR upstream to resolve this issue, so hopefully this PR can get closed: golang/go#53986

@@ -136,6 +137,10 @@ func (o *LoginOptions) getClientConfig() (*restclient.Config, error) {
// try to TCP connect to the server to make sure it's reachable, and discover
// about the need of certificates or insecure TLS
if err := dialToServer(*clientConfig); err != nil {
if strings.Contains(err.Error(), "certificate is not trusted") {
err = x509.CertificateInvalidError{Reason: x509.Expired, Detail: err.Error()}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, we can put a TODO explaining why we did this and when we can remove this hacky solution with the links.

@@ -136,6 +137,10 @@ func (o *LoginOptions) getClientConfig() (*restclient.Config, error) {
// try to TCP connect to the server to make sure it's reachable, and discover
// about the need of certificates or insecure TLS
if err := dialToServer(*clientConfig); err != nil {
if strings.Contains(err.Error(), "certificate is not trusted") {
Copy link
Member

@ardaguclu ardaguclu Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we do this workaround only if the err is not type x509.CertificateInvalidError;

if strings.Contains(err.Error(), "certificate is not trusted") {
if _, ok := err.(x509.CertificateInvalidError); !ok {
}
}

Otherwise, in the future when problem is fixed. We'll still be wrapping the correct error message to the same.

Copy link
Member

@ingvagabund ingvagabund Nov 8, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more the fix gets executed only for Mac the better. The smaller the error space affected the better.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 8, 2022

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: deejross
Once this PR has been reviewed and has the lgtm label, please ask for approval from soltysh by writing /assign @soltysh in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Nov 8, 2022

@deejross: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6 0aae5f4 link false /test e2e-metal-ipi-ovn-ipv6

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@cfergeau
Copy link
Contributor

I tested this on a Mac, and it does resolve the issue for logins, I'm not 100% sure about other commands though. It should be possible to replicate this fix globally if it is an issue outside of login.

Actually I think I'm hitting this problem in a different codepath.
https://github.com/crc-org/crc/blob/641ff18f521344a35df9e48f82fe31a49e2b20bd/pkg/crc/machine/kubeconfig.go#L130-L152
We call tokencmd.RequestToken with only one root CA, which should be fine, but looking at RequestToken implementation, it adds the root certs anyway:

// we are going to use this transport to talk
// with a server that may not be the api server
// thus we need to include the system roots
// in our ca data otherwise an external
// oauth server with a valid cert will fail with
// error: x509: certificate signed by unknown authority
rt, err := transportWithSystemRoots(o.Issuer, o.ClientConfig)
if err != nil {
return "", err
}

and transportWithSystemRoots has this code:
_, err = verifyServerCertChain(issuerURL.Hostname(), resp.TLS.PeerCertificates)
switch err.(type) {
case nil:
// copy the config so we can freely mutate it
configWithSystemRoots := restclient.CopyConfig(clientConfig)
// explicitly unset CA cert information
// this will make the transport use the system roots or OS specific verification
// this is required to have reasonable behavior on windows (cannot get system roots)
// in general there is no good with to say "I want system roots plus this CA bundle"
// so we just try system roots first before using the kubeconfig CA bundle
configWithSystemRoots.CAFile = ""
configWithSystemRoots.CAData = nil
// no error meaning the system roots work with the OAuth server
klog.V(4).Info("using system roots as no error was encountered")
systemRootsRT, err := restclient.TransportFor(configWithSystemRoots)
if err != nil {
return nil, err
}
return systemRootsRT, nil
case x509.UnknownAuthorityError, x509.HostnameError, x509.CertificateInvalidError, x509.SystemRootsError,
tls.RecordHeaderError, *net.OpError:
// fallback to the CA in the kubeconfig since the system roots did not work
// we are very broad on the errors here to avoid failing when we should fallback
klog.V(4).Infof("falling back to kubeconfig CA due to possible x509 error: %v", err)
return restclient.TransportFor(clientConfig)
default:
switch err {
case io.EOF, io.ErrUnexpectedEOF, io.ErrNoProgress:
// also fallback on various io errors
klog.V(4).Infof("falling back to kubeconfig CA due to possible IO error: %v", err)
return restclient.TransportFor(clientConfig)
}
// unknown error, fail (ideally should never occur)
klog.V(4).Infof("unexpected error during system roots probe: %v", err)
return nil, err
}
}

which returns a certificate is not trusted error which is not currently caught.

@rosspeoples
Copy link
Author

The upstream changes on Mac that lead to this were quite extensive, so I'm not surprised there would be issues with some use cases. The real question becomes, do we push upstream to resolve them, or do we try and fix oc-specific use cases ourselves?

@cfergeau
Copy link
Contributor

The real question becomes, do we push upstream to resolve them, or do we try and fix oc-specific use cases ourselves?

This all depends on how fast a fix would land upstream, and be backported to 1.18/1.19 releases. My feeling is that this would take time (say more than a few weeks)? An alternative could be to get the a patch in the fedora/RHEL packages, but that might be a hard sell.
We have a lot more control over when it gets fixed if we add oc-specific changes to avoid the issue.

In short, I would pursue both. The oc changes would be a short-term fix and it would avoid annoying user-visible failures on macOS. The real fix belongs in go, but it may take some time to be fixed, so this is going to be a longer term fix. Once the issue is fixed in the go compiler(s) we use to build oc releases, the oc changes can be removed.

@ardaguclu
Copy link
Member

In my opinion, we need to fix that problem in oc. Because even we fix it in Go, that will not mean that some versions of oc(4.12) will include fix.

@cfergeau
Copy link
Contributor

cfergeau commented Dec 7, 2022

This is also impacting odo, and users of odo redhat-developer/vscode-openshift-tools#2693
brew install odo-dev && odo login https://api.crc.testing:6443 is currently broken because of this bug.

@cfergeau
Copy link
Contributor

cfergeau commented Dec 8, 2022

I rebuilt odo with the patch, it helps a bit, but is not enough.

  • odo.git built with golang 1.19, no oc patches
% ./odo login -u developer https://api.crc.testing:6443   
Connecting to the OpenShift cluster

 �  x509: �kube-apiserver-lb-signer� certificate is not trusted
  • odo.git built with golang 1.19 with the patch from this PR
% ./odo login  -u developer https://api.crc.testing:6443        
Connecting to the OpenShift cluster

The server is using an invalid certificate: x509: certificate has expired or is not yet valid: x509: �kube-apiserver-lb-signer� certificate is not trusted
You can bypass the certificate check, but any data you send to the server could be intercepted by others.
Use insecure connections? (y/n): y

 �  x509: �ingress-operator@1669088888� certificate is not trusted
  • odo.git built with go 1.17, no patch
% ./odo login  -u developer https://api.crc.testing:6443   
Connecting to the OpenShift cluster

The server uses a certificate signed by an unknown authority.
You can bypass the certificate check, but any data you send to the server could be intercepted by others.
Use insecure connections? (y/n): y

Authentication required for https://api.crc.testing:6443 (openshift)
Username: developer
Password: 

@cfergeau
Copy link
Contributor

cfergeau commented Dec 19, 2022

I did the same change as in this PR in a few more places in https://github.com/cfergeau/oc/commits/macos-cert-not-trusted , see commit cfergeau@d6ee395
I used git grep "x509.*Error" to search for places in oc code which could need patching.

This fixes both crc and odo when I patch them to use this branch.

@soltysh
Copy link
Contributor

soltysh commented Jan 19, 2023

/hold
this should not merge in the current form

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 19, 2023
@gbraad
Copy link
Contributor

gbraad commented Jan 19, 2023

Can you explain what the follow-up actions will be? As it is currently held, what does that mean for a possible solution?

@cfergeau
Copy link
Contributor

cfergeau commented Feb 6, 2023

Newer go versions should no longer have this problem, see golang/go#56891

@ingvagabund
Copy link
Member

golang/go#57427 mentions 1.19.5

@cfergeau
Copy link
Contributor

golang/go#57427 mentions 1.19.5

1.18.10, 1.19.5 and 1.20 should all have this fixed.

@ingvagabund
Copy link
Member

Fixed in golang 1.19.5
/close

@openshift-ci openshift-ci bot closed this Mar 30, 2023
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 30, 2023

@ingvagabund: Closed this PR.

In response to this:

Fixed in golang 1.19.5
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Mar 30, 2023

@deejross: This pull request references Bugzilla bug 2097830. The bug has been updated to no longer refer to the pull request using the external bug tracker.

In response to this:

Bug 2097830: macOS: certificate is untrusted error

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla/severity-high Referenced Bugzilla bug's severity is high for the branch this PR is targeting. bugzilla/valid-bug Indicates that a referenced Bugzilla bug is valid for the branch this PR is targeting. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants