Remove/propose different "no metrics known for pod" log #349

serathius · 2019-11-05T23:06:39Z

There are a lot of issues where users are seeing metrics-server reporting error "no metrics known for pod" and asking for help.

To my understanding this error is expected to occur in normal healthy metrics-server.
Metrics Server periodically scrapes all nodes to gather metrics and populate it's internal cache. When there is a request to Metrics API, metrics-server reaches to this cache and looks for existing value for pod. If there is no value for existing pod in k8s, metrics server reports error "no metrics known for pod". This means this error can happen in situation when:

fresh metrics-server is deployed with clean cache
query is about fresh pod/node that was not yet scraped

Providing better information to users would greatly reduce throughput of tickets.

serathius · 2019-11-05T23:21:33Z

/help

k8s-ci-robot · 2019-11-05T23:21:34Z

@serathius:
This request has been marked as needing help from a contributor.

Please ensure the request meets the requirements listed here.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

tomknock · 2019-11-10T07:06:06Z

There is a same metric-server problem in my cluster: unable to fetch pod metrics for pod kube-system/POD_NAME:no metrics known for pod.
Also,Dashboard couldn't be deployed successfully on k8s:v1.16.2 without metric/heapster: No metric client provided. Skipping metrics.

zhangyu84848245 · 2019-11-21T22:33:18Z

@serathius

I had the same problem before。
add

resources:
limits:
cpu: 100m
memory: 300Mi
requests:
cpu: 5m
memory: 50Mi
I can run it， You try。

serathius · 2019-12-09T13:57:50Z

Hey @zhangyu84848245,
This message is related metrics-server not having cache pre-filled. Increasing resources like you suggested can reduce chances of log message appearing, but will not fully remove it (additional cpu will shorten time needed to generate self-signed cert)

serathius · 2019-12-16T08:41:36Z

/assign

serathius · 2020-03-15T19:36:53Z

Possible solutions (in order of implementation complexity):

[Preferred] Don't treat missing metrics as errors (remove log calls)
Dont' report error for newly created nodes/containers (time.Now().Sub(startTime) < metricResolution + cAdvisorHousekeepingTime)
Don't report error if node with metrics was not scraped before container start (keep start time per node and check scrapeTime.Sub(startTime) < cAdvisorHousekeepingTime)

Where cAdvisorHousekeepingTime = 15s

I propose to remove error logs, as those error logs can be caused by:

Failure in scraping. Still they don't provide useful for debugging, still requiring to read scrape failure logs.
Metric availability delay. Misinforming that there is a problem with pipeline, instead informing about this being expected behavior.

Other ways we can improve visibility in metric availability delay:

Document what is expected delay of metrics from freshly created containers and nodes.
Create histogram metric that measures freshness of served metrics.

Both option 2 & 3 try to add logic to guess health of metrics pipeline. They complicate code without providing any additional benefits. Measuring health of pipeline should be done via defining proper metrics and defining externally monitored SLOs.

/cc @s-urbaniak
Do you agree with this approach?

serathius · 2020-03-29T16:33:31Z

/cc @kawych

JoseThen · 2020-03-29T20:40:38Z

Thank you for this @serathius , I was concerned about the issue but noticed it would stop logging after some time, everything is so far looking good 🙇

serathius · 2020-04-20T09:24:18Z

ping @s-urbaniak

s-urbaniak · 2020-04-20T12:32:47Z

I agree with just going forward with option 1. I think options 2. and 3. should be solved via a higher level alerting system.

serathius · 2020-06-15T08:57:25Z

Looks like work was done

serathius added this to the v0.4.0 milestone Nov 5, 2019

k8s-ci-robot added the help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. label Nov 5, 2019

serathius mentioned this issue Nov 8, 2019

error: metrics not available yet #247

Closed

serathius mentioned this issue Nov 12, 2019

Metrics server issue with hostname resolution of kubelet and apiserver unable to communicate with metric-server clusterIP #131

Closed

serathius mentioned this issue Nov 19, 2019

no metrics known for pod #237

Closed

serathius added the kind/bug Categorizes issue or PR as related to a bug. label Dec 12, 2019

k8s-ci-robot assigned serathius Dec 16, 2019

serathius mentioned this issue Dec 17, 2019

Missing metrics on IKS (IBM Cloud) v1.13.12+IKS #390

Closed

serathius mentioned this issue Jan 2, 2020

Incompatible with some kubelet args #402

Closed

serathius added kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. and removed kind/bug Categorizes issue or PR as related to a bug. labels Feb 7, 2020

This was referenced Mar 29, 2020

Unable to get pod metrics k3s-io/k3s#1067

Closed

no metrics known for pod k3s-io/k3s#1149

Closed

serathius added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Apr 5, 2020

This was referenced Apr 20, 2020

Remove "no metrics know for pod" log #513

Merged

Change decoder error logs to info #514

Merged

serathius closed this as completed Jun 15, 2020

serathius mentioned this issue Oct 1, 2020

unable to fetch pod metrics #603

Closed

serathius mentioned this issue Feb 23, 2022

Support logs in JSON format #830

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove/propose different "no metrics known for pod" log #349

Remove/propose different "no metrics known for pod" log #349

serathius commented Nov 5, 2019 •

edited

Loading

serathius commented Nov 5, 2019

k8s-ci-robot commented Nov 5, 2019

tomknock commented Nov 10, 2019

zhangyu84848245 commented Nov 21, 2019

serathius commented Dec 9, 2019

serathius commented Dec 16, 2019

serathius commented Mar 15, 2020 •

edited

Loading

serathius commented Mar 29, 2020

JoseThen commented Mar 29, 2020

serathius commented Apr 20, 2020

s-urbaniak commented Apr 20, 2020

serathius commented Jun 15, 2020

Remove/propose different "no metrics known for pod" log #349

Remove/propose different "no metrics known for pod" log #349

Comments

serathius commented Nov 5, 2019 • edited Loading

serathius commented Nov 5, 2019

k8s-ci-robot commented Nov 5, 2019

tomknock commented Nov 10, 2019

zhangyu84848245 commented Nov 21, 2019

serathius commented Dec 9, 2019

serathius commented Dec 16, 2019

serathius commented Mar 15, 2020 • edited Loading

serathius commented Mar 29, 2020

JoseThen commented Mar 29, 2020

serathius commented Apr 20, 2020

s-urbaniak commented Apr 20, 2020

serathius commented Jun 15, 2020

serathius commented Nov 5, 2019 •

edited

Loading

serathius commented Mar 15, 2020 •

edited

Loading