Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow option to create opentelemetry-collector CR along with otel-operator helm installation #69

Open
VineethReddy02 opened this issue Aug 23, 2021 · 17 comments
Labels
chart:operator Issue related to opentelemetry-operator helm chart help wanted Extra attention is needed

Comments

@VineethReddy02
Copy link
Contributor

VineethReddy02 commented Aug 23, 2021

I am currently evaluating opentelemetry-operator helm chart. I am stuck on an issue. I wanted to add the opentelemetry-collector custom resource along with otel-operator helm chart installation.

The otel-col CR creation fails during installation with the below error:

Internal error occurred: conversion webhook for cert-manager.io/v1alpha2, Kind=Certificate failed: Post "https://cert-manager-webhook.cert-manager.svc:443/convert?timeout=30s": dial tcp 10.102.92.197:443: connect: connection refused

Because the CR creation triggers the mutating webhook request to the operator and on creating otel-col CR during installation the otel-operator is still in the container creating phase, so the CR creation is failing. Is there any solution to fix this issue? I want to create otelcol CR along with the otel operator installation.

As a workaround, I am installing otel-col CR alongside otel-operator installation by adding helm pre-install webhook to create otel-col CR prior to the creation of webhooks and I am myself adding the otel-col default mode as deployment and labels that are configured using mutating webhook as below

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry-collector
  labels:
    "app.kubernetes.io/managed-by": "opentelemetry-operator"
  annotations:
    "helm.sh/hook": "pre-install"
spec:
  mode: deployment

Are there any alternative solutions to do this? Is my workaround an appropriate fix?
cc @Saber-W @jpkrohling

@Saber-W
Copy link
Member

Saber-W commented Aug 23, 2021

Are there any alternative solutions to do this? Is my workaround an appropriate fix?

I think using pre-install mechnism is an appropriate workaround if you want to "create otelcol CR along with the otel operator installation". If you can accept Collector to be installed slight after the Operator's readiness, I think kubectl wait would also be a good fix.

@VineethReddy02
Copy link
Contributor Author

But how can I use kubectl wait inside helm chart installation?

I'm adding a otel-col CR inside /templates directory to create a otel-col with provided config alongside the otel-operator installation.

@jpkrohling
Copy link
Member

If you install the CR before the hook is installed, your CR will not be validated and we might not normalize it (defaulter). Not sure how to do it with Helm, but you do need to create the CR after the operator is ready.

@VineethReddy02
Copy link
Contributor Author

I am trying to create otel-col in deployment mode. So the validating webhook has very little to do with type deployment as more validations logic is on other types of deployment modes. And mutating webhook is responsible for injecting the labels
"app.kubernetes.io/managed-by": "opentelemetry-operator" I am adding it myself in config as with pre-hook CR creation cannot perform mutation on existing resources.

It works for now. But I know adding new functionality into operator webhooks will definitely break this. :)

And I am not sure of any other alternatives to include otel-col CR within the otel-operator helm installation. :/

@jpkrohling
Copy link
Member

It might be a short-term solution for you, but we definitely need a proper solution for the wider community.

@VineethReddy02 VineethReddy02 changed the title Question: Create opentelemetry-collector CR along with otel-operator helm installation Allow option to create opentelemetry-collector CR along with otel-operator helm installation Aug 25, 2021
@mars64
Copy link

mars64 commented Aug 25, 2021

Isn't this more of an issue with helm/your release process, than an issue with this chart? (though, if your pre-install hook works for you then awesome!)

I think its generally recommended to not tie together the CRD and CR in the same chart (see Helm's docs on this topic and its caveats -- specifically that bit about unintentional data loss).

Also according to helm:

there is currently no community consensus around how to handle CRDs and their lifecycle

... which is why I generally prefer Method 2 - Separate Charts.

Precisely how to make that happen in a single step depends on your release system, which is why I think its not exactly an issue with this chart.

For example in my case I can use ArgoCD sync waves to ensure the CRD is stood up in a way that admission/validation webhooks fire correctly by the time I install the relevant CustomResource -- and I prefer keeping the CRD chart separate from the CR anyway, for the reasons above.

@VineethReddy02
Copy link
Contributor Author

@mars64 thanks for the input.

But dealing with separate charts makes it more complex right? I mean the upgrades and management of multiple charts, which represent a single system i.e. the actual components and the supporting components.

@mars64
Copy link

mars64 commented Sep 7, 2021

But dealing with separate charts makes it more complex right? I mean the upgrades and management of multiple charts, which represent a single system i.e. the actual components and the supporting components.

The way I read your request, the complexity comes from the goal of combining two different resources (CRD and the CR that relies on the CRD -- each with different lifecycles) into a single step (i.e. install a single helm chart). I'm trying to point out that the helm community generally doesn't seem to have a good answer for this. In other words, it's additional complexity either way.

Personally, I think its less complex to rely on two different charts that more directly serve their purposes, than to attempt to make a chart designed for the CRD lifecycle also responsible for the lifecycle of what is potentially many CR's depending on the use case. This is why I mentioned my use of ArgoCD sync waves to control CRD and CR installation in separate charts but in a single "installation step". I'd love to hear if your or others have had success to the contrary!

@dmitryax dmitryax added the chart:operator Issue related to opentelemetry-operator helm chart label Mar 3, 2022
@dmitryax
Copy link
Member

dmitryax commented Mar 4, 2022

I like the method 2 with separate helm charts, but not sure what's the established practice in other helm charts out there

@jaronoff97
Copy link
Contributor

Any updates here? I'm running in to the same problem now (as described here)

@jaronoff97
Copy link
Contributor

jaronoff97 commented Apr 19, 2022

For context, the kube-prometheus-stack helm chart is able to install the CRDs and an instance of a Prometheus CRD in the same chart. To me, it's easier to manage a single chart that installs the whole prometheus ecosystem rather than two charts. I would like to do the same with Otel. They do specific orchestration around helm to achieve this

@dmitryax dmitryax added the help wanted Extra attention is needed label Apr 20, 2022
@dmitryax
Copy link
Member

I don't think anyone is working on it. Any help would be appreciated

@jaronoff97
Copy link
Contributor

@TylerHelmuth I think this is still open in the operator world, I don't have a lot of cycles right now to work on fixing this, but can bring this up at the next operator SIG and see if someone else does.

@TylerHelmuth
Copy link
Member

@jaronoff97 sounds good. I can dedicate cycles to this as well I think.

@Allex1
Copy link
Contributor

Allex1 commented Jan 17, 2023

@jaronoff97 @VineethReddy02 I'm really interested in the use cases for having the CR's as part of the operator chart.

In the case of kube-prometheus-stack the bundle makes sense as the core prometheus/alertmanager config is pretty generic and the custom config is handled in decoupled CR's (servicemonitor/prometheusrule).
In the case of otel where the otelcol/otelinst reuses vanilla config which varies wildly I don't yet see a generic use case.

We use both kube-prometheus-stack and otel-operator in different setups:

  1. 1 operator + 1 prometheus pair + webhooks per cluster for cluster/infra monitoring. Everything, including CRD's are deployed by the cluster admin on cluster create.
  2. multiple operator/prometheus pairs for multiple tenants that own multiple namespaces in the same cluster. CRD's are reused from 1 and webhooks are disabled.
  3. 1 admin operator per cluster that watches all namespaces for tenants to create their custom CR's via the same kps helm (with operator disabled) or other helm charts/templating tools/manifests.
    Everything except CR's are deployed by the cluster admin.

For otel-operator I see 3 as the best option or 1 if you'd want to replace Prometheus cluster monitoring with a collector+several backends. I believe this chart is trying to achieve 1.
2 doesn't really work in multi tenant k8s clusters as users cannot deploy webhooks which are crucial for the otelinst and otelcol sidecar mode to work.
For 3 we'd need to find a way to reuse the collector chart for creating otelcol CR's while avoiding duplication.

@illrill
Copy link

illrill commented May 29, 2023

Bumped into the same problem. After reading comments by @mars64, I wanted to pitch in a clarification that this problem isn't the classic "CRD must be installed before CR" scenario. For me it's caused by the operator's webhook not yet being ready when the CR is applied (when the operator and CR are both installed in the same release).

Outline of the sequential steps in my CI/CD pipeline:

  1. helm dependency update . to download opentemeletry-operator as a subchart.
  2. helm show crds . | kubectl apply --server-side --force-conflicts -f - to install CRDs separately as per helm method #2. This ensures that CRDs are there before the CR comes.
  3. helm upgrade -i opentelemetry . --skip-crds to install my parent chart which includes the downloaded opentemeletry-operator chart, as well as an OpenTelemetryCollector CR (all in the same release).

  • On the first pipeline run, this fails with:
Internal error occurred: failed calling webhook "mopentelemetrycollector.kb.io": failed to call webhook: Post "[https://opentelemetry-operator-webhook.opentelemetry.svc:443/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector?timeout=10s"](https://opentelemetry-operator-webhook.opentelemetry.svc/mutate-opentelemetry-io-v1alpha1-opentelemetrycollector?timeout=10s%22): no endpoints available for service "opentelemetry-operator-webhook"
  • On the second pipeline run (retry), everything works perfect, because the webhook is up and ready at that time.

./Chart.yaml

apiVersion: v2
name: opentelemetry
type: application
version: 1.0.0
  - name: opentelemetry-operator
    version: 0.30.0
    repository: https://open-telemetry.github.io/opentelemetry-helm-charts/

./templates/opentelemetrycollector.yaml

apiVersion: opentelemetry.io/v1alpha1
kind: OpenTelemetryCollector
metadata:
  name: opentelemetry
spec:
  mode: deployment 
  config: |
    receivers:
      otlp:
        protocols:
          grpc:
          http:
    processors:
    exporters:
      logging:
    service:
      pipelines:
        traces:
          receivers: [otlp]
          processors: []
          exporters: [logging]

Ideally I'd like to deploy the operator and instantiate an operand in the same release, but the only reasonable workaround I can think of is to have them in separate releases/charts 😢

@jaronoff97
Copy link
Contributor

@illrill see here for more info on this... I have some open work i need to do to make a helm chart that enables exactly what you're talking about. I haven't had the time unfortunately.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chart:operator Issue related to opentelemetry-operator helm chart help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

9 participants