-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Conversation
aaaec54
to
9403449
Compare
@piosz @DirectXMan12 @mwielgus PTAL. |
@timstclair @vishh can one of you take a look? Does it make sense to have those metrics? |
In on-prem clusters, disk is often a contended resource. We do not provide
any isolation for disk yet. The best we can do is to expose usage metrics.
Disk IO is a common issue and exposing that metrics to users will enable
them to identify offending pods and take action. So I'd recommend adding
metrics for "space", "inodes" and "IO" for disk
…On Fri, Jan 13, 2017 at 12:48 AM, Piotr Szczesniak ***@***.*** > wrote:
@timstclair <https://github.com/timstclair> @vishh
<https://github.com/vishh> can one of you take a look? Does it make sense
to have those metrics?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#1450 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AGvIKFT_w-7tp_eSX_LeMdlS9Q7eukwQks5rRzp0gaJpZM4LePK3>
.
|
Sorry for the delay.
|
So there's good news and bad news. 👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there. 😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that they're okay with their commits being contributed to this project. Please have them confirm that here in the pull request. Note to project maintainer: This is a terminal state, meaning the |
c3f0a98
to
0fa1ccb
Compare
CLAs look good, thanks! |
Comments addressed. Friendly ping @timstclair @vishh |
Friendly ping again. :) @timstclair @vishh |
Removing out of 1.3 |
This PR will be very useful especially if one runs database shards in kubernetes. Please do merge it :) |
Has there been any updates on this? When running a self hosted cluster with spindle disks this is very useful information to have. |
it needs to be rebased before it can be merged. |
@andyxning, will you pick up this PR again? |
@jingxu97 Yes. We need this also. I will rebase and resolve the conflicts in next week. BTW, any thoughts on the implementations or suggestions. |
e5f1618
to
3ffb7af
Compare
@andyxning apologies for the delay. I'm fine with this PR. As @DirectXMan12 wrote please have in mind that At the same time it's not deprecated in pure Heapster to some degree - we just need to tune Heapster a bit to work with standalone cadvisor. |
/lgtm |
Thanks @DirectXMan12 @piosz . Yes, it is quite a question when the cluster becomes more and more large. With the original cAdvisor Btw, the summary api needs to be enhanced to accomplish our basic metrics requirements. :) Will do some works on that. :) |
This needs legacy source for kubelet instead of the summary one. |
I don't see them in the recorded metrics |
@weikinhuang How is your configuration for |
@andyxning Here is my deployment spec for heapster (mostly from the kubernetes/addons dir) apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: heapster-v1.5.0
namespace: kube-system
labels:
k8s-app: heapster
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v1.5.0
spec:
replicas: 1
selector:
matchLabels:
k8s-app: heapster
version: v1.5.0
template:
metadata:
labels:
k8s-app: heapster
version: v1.5.0
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ''
spec:
containers:
- image: gcr.io/google_containers/heapster-amd64:v1.5.0
name: heapster
livenessProbe:
httpGet:
path: /healthz
port: 8082
scheme: HTTP
initialDelaySeconds: 180
timeoutSeconds: 5
command:
- /heapster
- --source=kubernetes.summary_api:''
- --sink=influxdb:http://monitoring-influxdb:8086
- image: gcr.io/google_containers/heapster-amd64:v1.5.0
name: eventer
command:
- /eventer
- --source=kubernetes:''
- --sink=influxdb:http://monitoring-influxdb:8086
- image: gcr.io/google_containers/addon-resizer:1.7
name: heapster-nanny
resources:
limits:
cpu: 50m
memory: 90Mi
requests:
cpu: 50m
memory: 90Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=80m
- --extra-cpu=0.5m
- --memory=140Mi
- --extra-memory=4Mi
- --deployment=heapster-v1.5.0
- --container=heapster
- --poll-period=300000
- image: gcr.io/google_containers/addon-resizer:1.7
name: eventer-nanny
resources:
limits:
cpu: 50m
memory: 200Mi
requests:
cpu: 50m
memory: 200Mi
env:
- name: MY_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: MY_POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
command:
- /pod_nanny
- --cpu=100m
- --extra-cpu=0m
- --memory=190Mi
- --extra-memory=500Ki
- --deployment=heapster-v1.5.0
- --container=eventer
- --poll-period=300000
serviceAccountName: heapster
tolerations:
- key: "CriticalAddonsOnly"
operator: "Exists" |
Ah. It looks like I'm using the summary API. Which does not have these metrics. Is there any issue or pr I can subscribe to for disk metrics in the summary API? Thanks @andyxning |
@weikinhuang As for now there is not pr for adding disk io metric in summary api. I am planning to add this in next week. If you can help and make a PR about this, i will be ok to review it. :) |
I would love to help, but my knowledge of golang's syntax is limited. |
@weikinhuang To that situation, i suggest to wait for another week maybe for this feature added to summary api source. I will give it a try. |
No problem, thanks @andyxning! |
@andyxning just wondering that are you still working on adding disk metrics to summary api source? |
I use "kubernetes" source and still am not seeing any disk metrics, any idea? I'm using gcr.io/google_containers/heapster:v1.5.0 running on kubernetes 1.8.6. |
Any logs for heapster? |
Yes, i will work on this. But no time recently available working on this. :( |
@andyxning no error nor warning message from heapster's log. |
@Gimi Which sink backend do you use? |
@andyxning sorry for the late response, I use statsd. |
@Gimi Just have thought that do you use the summary data source? The disk io is currently only available on legacy data source. The summary api is used when start with heapster with |
@Gimi I have test master branch with statsd sink and legacy data source and the disk io metrics is available like:
|
thanks @andyxning for checking. I at the beginning did use the summary api resource, but I switched back to the legacy one when I found this ticket. But still did not see any disk metrics. I'll check again and see if I missed anything. Thank you. |
Address #690.
This PR will add disk io metrics to heapster. It will add these metrics:
disk/io_read_bytes
: Number of bytes read from a disk partitiondisk/io_write_bytes
: Number of bytes written to a disk partitiondisk/io_read_bytes_rate
: Number of bytes read from a disk partition per seconddisk/io_write_bytes_rate
: Number of bytes written to a disk partition per secondall these metrics will add
resource_id
label with values inmajor:minor
format.FYI:kernel blkio cgroup