Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

collector-tls-config-volume Not Created #1029

Closed
chaospuppy opened this issue Apr 20, 2020 · 14 comments
Closed

collector-tls-config-volume Not Created #1029

chaospuppy opened this issue Apr 20, 2020 · 14 comments
Labels
help wanted Extra attention is needed

Comments

@chaospuppy
Copy link

chaospuppy commented Apr 20, 2020

Summary:
Jaeger Pods are failing to be created due to missing "<deployments-name>-collector-tls-config-volume" secret.

Platform:
OSE 3.11

Images:
jaegertracing/jaeger-operator:1.17.1
jaegertracing/all-in-one:1.17.1

CSV:
1.17.1

OLM Version:
0.12.0

Jaeger Custom Resource yaml:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: jaeger
spec:
  strategy: allInOne
  storage:
    type: badger
    options:
      badger:
        ephemeral: true
        directory-key: /badger/key
        directory-value: /badger/data
  volumeMounts:
  - name: data
    mountPath: /badger
  volumes:
  - name: data
    emptyDir: {}

What is happening:
Jaeger pods fail to start, citing the following error:

  • MountVolume.SetUp failed for volume "<deployment-name>-collector-tls-config-volume" : secrets "<deployment-name>-collector-headless-tls" not found

What is expected:
Jaeger pods start successfully

Also of note:
The Custom Resource described above worked fine up until last week, when cached images refreshed, including the jaegertracing/all-in-one:1.17.1 image.

@ghost ghost added the needs-triage New issues, in need of classification label Apr 20, 2020
@AdrieVanDijk
Copy link
Contributor

I have the same problem on Openshift 3.11.154
The operator has been installed with the helm chart https://github.com/jaegertracing/helm-charts/tree/master/charts/jaeger-operator
I have tried CR:

apiVersion: jaegertracing.io/v1
kind: Jaeger
metadata:
  name: allinone-jaeger

and several others. In all cases the operator generates a deployment that refers to secret <CR-name>-collector-headless-tls, but the secret itself is not created.
The log of the operator shows this:

time="2020-04-30T10:27:11Z" level=info msg="Storage type not provided. Falling back to 'memory'" instance=allinone-jaeger namespace=ienw-sp-ont
time="2020-04-30T10:27:13Z" level=warning msg="failed to reconcile pod autoscalers" error="no matches for kind \"HorizontalPodAutoscaler\" in version \"autoscaling/v2beta2\"" instance=allinone-jaeger namespace=ienw-sp-ont
time="2020-04-30T10:32:13Z" level=error msg="failed to apply the changes" error="timed out waiting for the condition" execution="2020-04-30 10:27:13.237989133 +0000 UTC" instance=allinone-jaeger namespace=ienw-sp-ont

The HPA warning is because kubernetes 1.11 only supports api version v2beta1, but I don't think this is the reason for the problem with the secret.
It has actually worked before recently, I don't know what has changed.

@AdrieVanDijk
Copy link
Contributor

The problem is only in version 1.17.1 of the operator. When I use version 2.14.1 of the helm chart, which uses version 1.17.0 of the operator, the problem does not occur.

@chaospuppy
Copy link
Author

I was also able to avoid the issue by rolling back to CSV 1.17.0, which in turn rolled back to the 1.17.0 tag for the operator image and the all-in-one image. I suspect something in the all-in-one image update from 1.17.0 to 1.17.1 is the culprit.

@objectiser
Copy link
Contributor

This appears related to #914, which relies on an annotation service.beta.openshift.io/serving-cert-secret-name being used to auto-create a secret. This is likely not supported on OpenShift 3.11 - so we need to make that mechanism only enabled for OCP4.x.

@annanay25 Do you agree with the assessment? Would you be able to provide a fix?

@annanay25
Copy link
Member

HI @objectiser - Sorry I didn't catch this notification. Unfortunately I will not be able to work on the operator for now, I hope someone else can pick this up?

@objectiser objectiser added help wanted Extra attention is needed and removed needs-triage New issues, in need of classification labels May 14, 2020
@MarekPokornyOva
Copy link

I have this same issue. It was helpfull to get understand the problem.
I'd like to make fix for this, unfortunatelly I don't know GO either have access to a test environment.

I guess these might help to make the fix:

  1. "github.com/RHsyseng/operator-utils/pkg/utils/openshift"
  2. set "service.alpha.openshift.io/serving-cert-secret-name" annotation if either
    a) openshift.CompareOpenShiftVersion() version<"4"
    OR
    b) openshift.GetPlatformInfo() + openshift.MapKnownVersion() starts with "3."

@basch255
Copy link

basch255 commented Aug 6, 2020

Hi,
topic is important for me too.
Could someone fix this?

@jpkrohling
Copy link
Contributor

@basch255 are you also using OpenShift 3.x?

@basch255
Copy link

@jpkrohling Recently yes.

@jpkrohling
Copy link
Contributor

As far as I know, OpenShift 3.x isn't receiving updates anymore, you are really encouraged to migrate to 4.x. You may want to try to set the platform to kubernetes explicitly, which should prevent the operator from attempting to provision the TLS certs.

@jpkrohling
Copy link
Contributor

I'm closing this, as we don't want OpenShift 3.x specific code in the operator:

  1. it's an ancient version by now and people should really be encouraged to move to newer versions
  2. having OpenShift 3.x support in the code might give the false impression that we expect it to work there. If it works, it's only incidental
  3. code is the easy part: making sure it keeps working is harder. For instance, we'd have to make sure that CI is in place and that it "always" works. We have had OpenShift 3.x support in our CI in the past and removed, perhaps for a good reason (can't remember exactly why)

@BlackTX
Copy link

BlackTX commented Aug 12, 2020

Openshift 3.11 has maintenance support till June 2022 (1), that is why corporations choose to stay with it instead of migration to 4.X with 9 months support and not all important features migrated yet.

I don't understand why annotation cannot have a simple condition there.

(1) https://access.redhat.com/support/policy/updates/openshift_noncurrent

@jpkrohling
Copy link
Contributor

Openshift 3.11 has maintenance support till June 2022 (1),

Interesting, didn't know that. I stand corrected then.

I don't understand why annotation cannot have a simple condition there.

Would you be willing to send a PR with this change + e2e tests + a change to the CI to make sure this feature doesn't break in the future? If so, we could certainly consider incorporating it!

@woland7
Copy link

woland7 commented Sep 30, 2021

Rolling back to 1.17.0 solved the problem for me on an old 3.11 Openshift Cluster, yet I agree everyone should upgrade to 4.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

9 participants