-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[receiver/k8scluster] Use newer v2 HorizontalPodAutoscaler for Kubernetes 1.26 #20480
Comments
Pinging code owners for receiver/k8scluster: @dmitryax. See Adding Labels via Comments if you do not have permissions to add labels yourself. |
It is also on the Collector version: v0.73.0 and it is not only for the HPA... it is also related to the v1beta1.CronJob See Example of my Logfile. |
@AchimGrolimund can you please provide more details about your Kubernetes environment? I didn't see this issue in my Kops created Kubernetes 1.25 cluster. We have support for batchv1.CronJob so I'm wondering how this is happening. |
Hello @jvoravong
We are using ROSA 4.12
https://docs.openshift.com/container-platform/4.12/release_notes/ocp-4-12-release-notes.html
Next week, i can provide more infos.
We are using the splunk-otel-collector v0.72.0
Gesendet von Outlook für iOS<https://aka.ms/o0ukef>
…________________________________
Von: jvoravong ***@***.***>
Gesendet: Friday, April 7, 2023 4:20:08 PM
An: open-telemetry/opentelemetry-collector-contrib ***@***.***>
Cc: Achim Grolimund ***@***.***>; Mention ***@***.***>
Betreff: Re: [open-telemetry/opentelemetry-collector-contrib] [receiver/k8scluster] Use newer v2 HorizontalPodAutoscaler for Kubernetes 1.26 (Issue #20480)
@AchimGrolimund<https://github.com/AchimGrolimund> can you please provide more details about your Kubernetes environment?
I didn't see this issue in my Kops created Kubernetes 1.25 cluster. We have support for (batchv1.CronJob:)[https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/315fdf3e571088c855f359b85e79cfd6d3ad9e50/receiver/k8sclusterreceiver/internal/collection/collector.go#L136] so I'm wondering how this is happening.
—
Reply to this email directly, view it on GitHub<#20480 (comment)>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/AFIBOX72FYLSMKX4SGAEBX3XAAPBRANCNFSM6AAAAAAWMC3A6Q>.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
I can help supporting HorizontalPodAutoscaler v2 |
@jvoravong We are currently using the following version:
and here still the logs:
Can we expect a solution soon? |
What is supported is please provide an ETA |
Here some additional Informations:
|
Looking into this, will get back here soon. |
Thanks @jvoravong I am the support engineer on this CASE 3182925, appreciate your help on this. |
I did miss adding a watcher for the HPA v2 code. Got a fix started for it. I verified k8s.hpa.* and k8s.job.* metrics are exported in Kubernetes 1.25 and 1.26. |
That's fine. We have the same for jobs when both versions supported by the k8s API |
Closing as resolved by #21497 |
@AchimGrolimund, looking at the log output splunk-otel-collector-agent-96r7z-splunk-otel-collector-agent.log, it seems like the errors are coming from |
Hey @dmitryax ---
# Source: splunk-otel-collector/templates/configmap-agent.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: splunk-otel-collector-agent-configmap
namespace: xxxxxxxx-splunk-otel-collector
labels:
app: splunk-otel-collector-agent
data:
relay: |
exporters:
sapm:
access_token: ${SPLUNK_OBSERVABILITY_ACCESS_TOKEN}
endpoint: https://xxxxxx:443/ingest/v2/trace
signalfx:
access_token: ${SPLUNK_OBSERVABILITY_ACCESS_TOKEN}
api_url: https://xxxxxxx:443/api/
correlation: null
ingest_url: https://xxxxxxx:443/ingest/
sync_host_metadata: true
extensions:
health_check: null
k8s_observer:
auth_type: serviceAccount
node: ${K8S_NODE_NAME}
memory_ballast:
size_mib: ${SPLUNK_BALLAST_SIZE_MIB}
zpages: null
processors:
batch: null
filter/logs:
logs:
exclude:
match_type: strict
resource_attributes:
- key: splunk.com/exclude
value: "true"
groupbyattrs/logs:
keys:
- com.splunk.source
- com.splunk.sourcetype
- container.id
- fluent.tag
- istio_service_name
- k8s.container.name
- k8s.namespace.name
- k8s.pod.name
- k8s.pod.uid
k8sattributes:
extract:
annotations:
- from: pod
key: splunk.com/sourcetype
- from: namespace
key: splunk.com/exclude
tag_name: splunk.com/exclude
- from: pod
key: splunk.com/exclude
tag_name: splunk.com/exclude
- from: namespace
key: splunk.com/index
tag_name: com.splunk.index
- from: pod
key: splunk.com/index
tag_name: com.splunk.index
labels:
- key: app
metadata:
- k8s.namespace.name
- k8s.node.name
- k8s.pod.name
- k8s.pod.uid
- container.id
- container.image.name
- container.image.tag
filter:
node_from_env_var: K8S_NODE_NAME
pod_association:
- sources:
- from: resource_attribute
name: k8s.pod.uid
- sources:
- from: resource_attribute
name: k8s.pod.ip
- sources:
- from: resource_attribute
name: ip
- sources:
- from: connection
- sources:
- from: resource_attribute
name: host.name
memory_limiter:
check_interval: 2s
limit_mib: ${SPLUNK_MEMORY_LIMIT_MIB}
resource:
attributes:
- action: insert
key: k8s.node.name
value: ${K8S_NODE_NAME}
- action: upsert
key: k8s.cluster.name
value: HCP-ROSA-PROD1
resource/add_agent_k8s:
attributes:
- action: insert
key: k8s.pod.name
value: ${K8S_POD_NAME}
- action: insert
key: k8s.pod.uid
value: ${K8S_POD_UID}
- action: insert
key: k8s.namespace.name
value: ${K8S_NAMESPACE}
resource/logs:
attributes:
- action: upsert
from_attribute: k8s.pod.annotations.splunk.com/sourcetype
key: com.splunk.sourcetype
- action: delete
key: k8s.pod.annotations.splunk.com/sourcetype
- action: delete
key: splunk.com/exclude
resourcedetection:
detectors:
- env
- ec2
- system
override: true
timeout: 10s
receivers:
smartagent/openshift-cluster:
type: openshift-cluster
alwaysClusterReporter: true
kubernetesAPI:
authType: serviceAccount
datapointsToExclude:
- dimensions:
metricNames:
- '*appliedclusterquota*'
- '*clusterquota*'
extraMetrics:
- kubernetes.container_cpu_request
- kubernetes.container_memory_request
- kubernetes.job.completions
- kubernetes.job.active
- kubernetes.job.succeeded
- kubernetes.job.failed
hostmetrics:
collection_interval: 10s
scrapers:
cpu: null
disk: null
filesystem: null
load: null
memory: null
network: null
paging: null
processes: null
jaeger:
protocols:
grpc:
endpoint: 0.0.0.0:14250
thrift_http:
endpoint: 0.0.0.0:14268
kubeletstats:
auth_type: serviceAccount
collection_interval: 10s
endpoint: ${K8S_NODE_IP}:10250
extra_metadata_labels:
- container.id
metric_groups:
- container
- pod
- node
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
prometheus/agent:
config:
scrape_configs:
- job_name: otel-agent
scrape_interval: 10s
static_configs:
- targets:
- 127.0.0.1:8889
receiver_creator:
receivers:
smartagent/coredns:
config:
extraDimensions:
metric_source: k8s-coredns
port: 9154
skipVerify: true
type: coredns
useHTTPS: true
useServiceAccount: true
rule: type == "pod" && namespace == "openshift-dns" && name contains "dns"
smartagent/kube-controller-manager:
config:
extraDimensions:
metric_source: kubernetes-controller-manager
port: 10257
skipVerify: true
type: kube-controller-manager
useHTTPS: true
useServiceAccount: true
rule: type == "pod" && labels["app"] == "kube-controller-manager" && labels["kube-controller-manager"]
== "true"
smartagent/kubernetes-apiserver:
config:
extraDimensions:
metric_source: kubernetes-apiserver
skipVerify: true
type: kubernetes-apiserver
useHTTPS: true
useServiceAccount: true
rule: type == "port" && port == 6443 && pod.labels["app"] == "openshift-kube-apiserver"
&& pod.labels["apiserver"] == "true"
smartagent/kubernetes-proxy:
config:
extraDimensions:
metric_source: kubernetes-proxy
#port: 29101
port: 9101
useHTTPS: true
skipVerify: true
useServiceAccount: true
type: kubernetes-proxy
rule: type == "pod" && labels["app"] == "sdn"
smartagent/kubernetes-scheduler:
config:
extraDimensions:
metric_source: kubernetes-scheduler
# port: 10251
port: 10259
type: kubernetes-scheduler
useHTTPS: true
skipVerify: true
useServiceAccount: true
rule: type == "pod" && labels["app"] == "openshift-kube-scheduler" && labels["scheduler"]
== "true"
watch_observers:
- k8s_observer
signalfx:
endpoint: 0.0.0.0:9943
smartagent/signalfx-forwarder:
listenAddress: 0.0.0.0:9080
type: signalfx-forwarder
zipkin:
endpoint: 0.0.0.0:9411
service:
extensions:
- health_check
- k8s_observer
- memory_ballast
- zpages
pipelines:
metrics:
exporters:
- signalfx
processors:
- memory_limiter
- batch
- resourcedetection
- resource
receivers:
- hostmetrics
- kubeletstats
- otlp
- receiver_creator
- signalfx
- smartagent/openshift-cluster
metrics/agent:
exporters:
- signalfx
processors:
- memory_limiter
- batch
- resource/add_agent_k8s
- resourcedetection
- resource
receivers:
- prometheus/agent
traces:
exporters:
- sapm
- signalfx
processors:
- memory_limiter
- k8sattributes
- batch
- resourcedetection
- resource
receivers:
- otlp
- jaeger
- smartagent/signalfx-forwarder
- zipkin
telemetry:
metrics:
address: 127.0.0.1:8889 Best Regards Achim |
@AchimGrolimund Thank you. This is coming from |
Looks like k8scluster receiver supports scraping additional OpenShift metrics https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/k8sclusterreceiver#openshift, but it should be run separately as 1-replica deployment. @AchimGrolimund did you try it by chance? |
Just to add, in case of Azure you will not be able to upgrade from 1.25.* to 1.26.* as the agent is still querying the v2beta2 autoscaler API. As Azure prevents upgrading when deprecated API's are still being used the upgrade fails. You either have to force the upgrade, or remove the signalfx agent, wait for 12hours and then try again. Would be nice if the agent checks the kubernetes version, if higher then 1.25 then do not monitoring the |
The customer xxx updated the Splunk OTC agent to version 0.77.0 and still gets the same error messages. W0522 06:11:24.226426 1 reflector.go:533] k8s.io/[email protected]/tools/cache/reflector.go:231[mailto:k8s.io/[email protected]/tools/cache/reflector.go:231](mailto:%5Bk8s.io/[email protected]/tools/cache/reflector.go:231%5D(https://k8s.io/[email protected]/tools/cache/reflector.go:231)): failed to list *v2beta1.HorizontalPodAutoscaler: the server could not find the requested resource |
Update on Deprecated Endpoint Removal:
Additional Context: |
Component(s)
receiver/k8scluster
What happened?
Description
Right now we only support v2beta2 HPA. To support Kubernetes v1.26, we need to add support for v2 HPA.
Kubernetes v1.26 was released in December 2022. This version is still new and distributions like AKS, EKS, Openshift, and GKE will start using it soon (if not already).
Related Startup Log Warning Message:
autoscaling/v2beta2 HorizontalPodAutoscaler is deprecated in v1.23+, unavailable in v1.26+; use autoscaling/v2 HorizontalPodAutoscaler
`
Steps to Reproduce
Spin up a Kubernetes 1.25 cluster.
Deploy the k8scluster receiver to your cluster.
Follow the startup logs of the collector and you will notice the error log mentioned above.
Expected Result
The k8scluster can monitor v2 HorizontalPodAutoscaler objects.
Actual Result
In Kubernetes 1.25, you get a warning within the collector logs.
In Kubernetes 1.26, you will get an error in the logs and users might notice HPA metrics are missing that they were expecting.
Collector version
v0.72.0
Environment information
Environment
Will affect all Kubernetes 1.26 cluseters.
I tested and found the related log warnings in Rosa 4.12 (Openshift 4.12, Kubernetes 1.25).
OpenTelemetry Collector configuration
Log output
Additional context
Related to: signalfx/splunk-otel-collector#2457
The text was updated successfully, but these errors were encountered: