Skip to content

Commit

Permalink
Merge pull request #756 from haiyanmeng/runtimeclass
Browse files Browse the repository at this point in the history
Add `Monitoring` section into RuntimeClass KEP
  • Loading branch information
k8s-ci-robot authored Mar 6, 2019
2 parents 8373ca8 + 37e5e17 commit 7a9618c
Showing 1 changed file with 28 additions and 1 deletion.
29 changes: 28 additions & 1 deletion keps/sig-node/runtime-class.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ status: implementable
* [Runtime Handler](#runtime-handler)
* [Versioning, Updates, and Rollouts](#versioning-updates-and-rollouts)
* [Implementation Details](#implementation-details)
* [Monitoring](#monitoring)
* [Risks and Mitigations](#risks-and-mitigations)
* [Graduation Criteria](#graduation-criteria)
* [Implementation History](#implementation-history)
Expand Down Expand Up @@ -272,6 +273,32 @@ an error.

[runpodsandbox]: https://github.com/kubernetes/kubernetes/blob/b05a61e299777c2030fbcf27a396aff21b35f01b/pkg/kubelet/apis/cri/runtime/v1alpha2/api.proto#L344

#### Monitoring

The first round of monitoring implementation for `RuntimeClass` covers the
following two areas and is finished (tracked in
[#73058](https://github.com/kubernetes/kubernetes/issues/73058)):

- `how robust is every runtime?` A new metric
[RunPodSandboxErrors](https://github.com/kubernetes/kubernetes/blob/596a48dd64bcaa01c1d2515dc79a558a4466d463/pkg/kubelet/metrics/metrics.go#L351)
is added to track the RunPodSandbox operation errors, broken down by
RuntimeClass.
- `how expensive is every runtime in terms of latency?` A new metric
[RunPodSandboxDuration](https://github.com/kubernetes/kubernetes/blob/596a48dd64bcaa01c1d2515dc79a558a4466d463/pkg/kubelet/metrics/metrics.go#L341)
is added to track the duration of RunPodSandbox operations, broken down by
RuntimeClass.

The following monitoring areas will be skipped for now, but may be considered
after the RuntimeClass scheduling is implemented:

- how many runtimes does a cluster support?
- how many scheduling failures were caused by unsupported runtimes or insufficient
resources of a certain runtime?

Currently, we assume that all the nodes in a cluster are homogeneous. After
heterogeneous clusters are implemented, we may need to monitor how many runtimes
a node supports.

### Risks and Mitigations

**Scope creep.** RuntimeClass has a fairly broad charter, but it should not become a default
Expand Down Expand Up @@ -329,7 +356,7 @@ Beta:
- [ ] [CRI validation tests][cri-validation]
- [ ] RuntimeClasses are configured in the E2E environment with test coverage of a non-default
RuntimeClass
- [ ] Comprehensive coverage of RuntimeClass metrics. Details TBD. [#73058](http://issue.k8s.io/73058)
- [x] Comprehensive coverage of RuntimeClass metrics. [#73058](http://issue.k8s.io/73058)
- [ ] The update & upgrade story is revisited, and a longer-term approach is implemented as necessary.

[cri-validation]: https://github.com/kubernetes-sigs/cri-tools/blob/master/docs/validation.md
Expand Down

0 comments on commit 7a9618c

Please sign in to comment.