Skip to content
This repository has been archived by the owner on Dec 1, 2018. It is now read-only.

add disk io metric #1450

Merged
merged 1 commit into from
Dec 7, 2017
Merged

Conversation

andyxning
Copy link
Contributor

@andyxning andyxning commented Jan 9, 2017

Address #690.

This PR will add disk io metrics to heapster. It will add these metrics:

  • disk/io_read_bytes: Number of bytes read from a disk partition
  • disk/io_write_bytes: Number of bytes written to a disk partition
  • disk/io_read_bytes_rate: Number of bytes read from a disk partition per second
  • disk/io_write_bytes_rate: Number of bytes written to a disk partition per second

all these metrics will add resource_id label with values in major:minor format.

FYI:kernel blkio cgroup

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 9, 2017
@k8s-reviewable
Copy link

This change is Reviewable

@andyxning andyxning force-pushed the add_disk_io_metrics branch 2 times, most recently from aaaec54 to 9403449 Compare January 9, 2017 15:02
@andyxning
Copy link
Contributor Author

@piosz @DirectXMan12 @mwielgus PTAL.

@piosz
Copy link
Contributor

piosz commented Jan 13, 2017

@timstclair @vishh can one of you take a look? Does it make sense to have those metrics?

@vishh
Copy link
Contributor

vishh commented Jan 13, 2017 via email

@andyxning
Copy link
Contributor Author

andyxning commented Feb 21, 2017

Sorry for the delay.

@vishh

So I'd recommend adding metrics for "space", "inodes" and "IO" for disk

  • There already exists filesystem/usage, filesystemd/limit and filesystem/available metrics for space.
  • As for inodes, it seems that blkio cgroup has no info about this. I have found that inodes info is available in cAdvisor FsStats. Will add this in another PR(add filesystem inode metrics #1542).
  • blkio cgroup supports setting resource Throttling/Upper limit policy for IO. This is what this PR does.

@googlebot
Copy link

So there's good news and bad news.

👍 The good news is that everyone that needs to sign a CLA (the pull request submitter and all commit authors) have done so. Everything is all good there.

😕 The bad news is that it appears that one or more commits were authored by someone other than the pull request submitter. We need to confirm that they're okay with their commits being contributed to this project. Please have them confirm that here in the pull request.

Note to project maintainer: This is a terminal state, meaning the cla/google commit status will not change from this state. It's up to you to confirm consent of the commit author(s) and merge this pull request when appropriate.

@googlebot
Copy link

CLAs look good, thanks!

@andyxning
Copy link
Contributor Author

Comments addressed. Friendly ping @timstclair @vishh

@andyxning
Copy link
Contributor Author

Friendly ping again. :) @timstclair @vishh

@DirectXMan12 DirectXMan12 added this to the v1.3 milestone Mar 9, 2017
@piosz
Copy link
Contributor

piosz commented Mar 16, 2017

Removing out of 1.3

@piosz piosz removed this from the v1.3 milestone Mar 16, 2017
@dnavre
Copy link
Contributor

dnavre commented Jul 9, 2017

This PR will be very useful especially if one runs database shards in kubernetes. Please do merge it :)

@weikinhuang
Copy link

Has there been any updates on this? When running a self hosted cluster with spindle disks this is very useful information to have.

@DirectXMan12
Copy link
Contributor

it needs to be rebased before it can be merged.

@jingxu97
Copy link
Contributor

@andyxning, will you pick up this PR again?

@andyxning
Copy link
Contributor Author

@jingxu97 Yes. We need this also. I will rebase and resolve the conflicts in next week.

BTW, any thoughts on the implementations or suggestions.

@andyxning andyxning force-pushed the add_disk_io_metrics branch from e5f1618 to 3ffb7af Compare December 6, 2017 15:16
@piosz piosz added this to the v1.5 milestone Dec 7, 2017
@piosz
Copy link
Contributor

piosz commented Dec 7, 2017

@andyxning apologies for the delay. I'm fine with this PR. As @DirectXMan12 wrote please have in mind that /stats endpoint is deprecated in Kubelet and will be removed from there at some point, especially Heapster while using it consumes much more resources.

At the same time it's not deprecated in pure Heapster to some degree - we just need to tune Heapster a bit to work with standalone cadvisor.

@piosz
Copy link
Contributor

piosz commented Dec 7, 2017

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2017
@piosz piosz merged commit 02692f8 into kubernetes-retired:master Dec 7, 2017
@andyxning andyxning deleted the add_disk_io_metrics branch December 7, 2017 15:12
@andyxning
Copy link
Contributor Author

Thanks @DirectXMan12 @piosz .

Yes, it is quite a question when the cluster becomes more and more large. With the original cAdvisor stats endpoint, more and more redundant data needs to be scraped as the cluster grows. This is quite a huge overhead.

Btw, the summary api needs to be enhanced to accomplish our basic metrics requirements. :) Will do some works on that. :)

@weikinhuang
Copy link

How do I access this in v1.5.0 within influxdb/grafana? When I try to add a new dashboard graph, that field seems to be missing:

screen shot 2017-12-13 at 12 41 36 pm

@andyxning
Copy link
Contributor Author

andyxning commented Dec 14, 2017

@weikinhuang

disk/io_read_bytes_rate: Number of bytes read from a disk partition per second
disk/io_write_bytes_rate: Number of bytes written to a disk partition per second

This needs legacy source for kubelet instead of the summary one.

@weikinhuang
Copy link

I don't see them in the recorded metrics

@andyxning
Copy link
Contributor Author

@weikinhuang How is your configuration for --source?

@weikinhuang
Copy link

@andyxning Here is my deployment spec for heapster (mostly from the kubernetes/addons dir)

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: heapster-v1.5.0
  namespace: kube-system
  labels:
    k8s-app: heapster
    kubernetes.io/cluster-service: "true"
    addonmanager.kubernetes.io/mode: Reconcile
    version: v1.5.0
spec:
  replicas: 1
  selector:
    matchLabels:
      k8s-app: heapster
      version: v1.5.0
  template:
    metadata:
      labels:
        k8s-app: heapster
        version: v1.5.0
      annotations:
        scheduler.alpha.kubernetes.io/critical-pod: ''
    spec:
      containers:
        - image: gcr.io/google_containers/heapster-amd64:v1.5.0
          name: heapster
          livenessProbe:
            httpGet:
              path: /healthz
              port: 8082
              scheme: HTTP
            initialDelaySeconds: 180
            timeoutSeconds: 5
          command:
            - /heapster
            - --source=kubernetes.summary_api:''
            - --sink=influxdb:http://monitoring-influxdb:8086
        - image: gcr.io/google_containers/heapster-amd64:v1.5.0
          name: eventer
          command:
            - /eventer
            - --source=kubernetes:''
            - --sink=influxdb:http://monitoring-influxdb:8086
        - image: gcr.io/google_containers/addon-resizer:1.7
          name: heapster-nanny
          resources:
            limits:
              cpu: 50m
              memory: 90Mi
            requests:
              cpu: 50m
              memory: 90Mi
          env:
            - name: MY_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: MY_POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          command:
            - /pod_nanny
            - --cpu=80m
            - --extra-cpu=0.5m
            - --memory=140Mi
            - --extra-memory=4Mi
            - --deployment=heapster-v1.5.0
            - --container=heapster
            - --poll-period=300000
        - image: gcr.io/google_containers/addon-resizer:1.7
          name: eventer-nanny
          resources:
            limits:
              cpu: 50m
              memory: 200Mi
            requests:
              cpu: 50m
              memory: 200Mi
          env:
            - name: MY_POD_NAME
              valueFrom:
                fieldRef:
                  fieldPath: metadata.name
            - name: MY_POD_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
          command:
            - /pod_nanny
            - --cpu=100m
            - --extra-cpu=0m
            - --memory=190Mi
            - --extra-memory=500Ki
            - --deployment=heapster-v1.5.0
            - --container=eventer
            - --poll-period=300000
      serviceAccountName: heapster
      tolerations:
        - key: "CriticalAddonsOnly"
          operator: "Exists"

@weikinhuang
Copy link

Ah. It looks like I'm using the summary API. Which does not have these metrics. Is there any issue or pr I can subscribe to for disk metrics in the summary API? Thanks @andyxning

@andyxning
Copy link
Contributor Author

@weikinhuang As for now there is not pr for adding disk io metric in summary api. I am planning to add this in next week. If you can help and make a PR about this, i will be ok to review it. :)

@weikinhuang
Copy link

I would love to help, but my knowledge of golang's syntax is limited.

@andyxning
Copy link
Contributor Author

@weikinhuang To that situation, i suggest to wait for another week maybe for this feature added to summary api source. I will give it a try.

@weikinhuang
Copy link

No problem, thanks @andyxning!

@zliang-min
Copy link

@andyxning just wondering that are you still working on adding disk metrics to summary api source?

@zliang-min
Copy link

I use "kubernetes" source and still am not seeing any disk metrics, any idea? I'm using gcr.io/google_containers/heapster:v1.5.0 running on kubernetes 1.8.6.

@andyxning
Copy link
Contributor Author

Any logs for heapster?

@andyxning
Copy link
Contributor Author

just wondering that are you still working on adding disk metrics to summary api source?

Yes, i will work on this. But no time recently available working on this. :(
Maybe one week later.

@zliang-min
Copy link

@andyxning no error nor warning message from heapster's log.

@andyxning
Copy link
Contributor Author

@Gimi Which sink backend do you use?

@zliang-min
Copy link

@andyxning sorry for the late response, I use statsd.

@andyxning
Copy link
Contributor Author

andyxning commented Feb 7, 2018

@Gimi Just have thought that do you use the summary data source? The disk io is currently only available on legacy data source. The summary api is used when start with heapster with --source=kubernetes.summary_api:''

@andyxning
Copy link
Contributor Author

@Gimi I have test master branch with statsd sink and legacy data source and the disk io metrics is available like:

node.192_168_0_1.batch.sandbox;beta_kubernetes_io/arch.disk/io_read_bytes_rate./dev/sdb:15381.686|g
node.192_168_0_1.batch.sandbox;beta_kubernetes_io/arch.disk/io_read_bytes_rate./dev/sda:-1.3200837e+09|g
node.192_168_0_1.batch.sandbox;beta_kubernetes_io/arch.disk/io_write_bytes_rate./dev/sdb:2.1783398e+06|g
node.192_168_0_1.batch.sandbox;beta_kubernetes_io/arch.disk/io_write_bytes_rate./dev/sda:-5.901337e+10|g

@zliang-min
Copy link

thanks @andyxning for checking. I at the beginning did use the summary api resource, but I switched back to the legacy one when I found this ticket. But still did not see any disk metrics. I'll check again and see if I missed anything. Thank you.

@cblecker cblecker unassigned vishh and piosz Nov 30, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm Indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.