This charter adheres to the conventions described in the Kubernetes Charter README and uses the Roles and Organization Management outlined in sig-governance.
SIG Scalability's primary responsibilities are to define and drive scalability goals for Kubernetes. This involves defining, testing and measuring performance and scalability related Service Level Indicators (SLIs) and ensuring that every Kubernetes release meets Service Level Objectives (SLOs) built on top of those SLIs.
We also coordinate and contribute to general system-wide scalability and performance improvements (that don't fall into the charter of another individual SIG) by driving large architectural changes and finding bottlenecks, as well as provide consultations about any scalability and performance related aspects of Kubernetes.
- Scalability and performance testing frameworks. Examples include:
- Scalability and performance tests:
- Defining what does “Kubernetes scales” mean by defining (or approving) individual performance SLIs/SLOs, ensuring they are all oriented on user experience and consistent with each other:
- Ensuring that each official Kubernetes release satisfies all scalability and performance related requirements, as stated in "Kubernetes scalability" definition.
- Establishing and documenting best practises on how to design and/or implement Kubernetes features in scalable and performant way. Educating contributors and consulting individual designs/implementations to ensure that those are widely used. Example artifacts:
- Finding system bottlenecks and coordinating improvement on cross-cutting architectural changes.
- Improving performance/scalability of features falling into charters of individual SIGs.
Scalability and performance are horizontal aspects of the system - changes in a single place of Kubernetes may affect the whole system. As a result, to effectively ensure Kubernetes scales, we need a special cross-SIG privileges.
- We can rollback any merged PR if it has been identified as a cause of any performance/scalability SLOs regression (identified by the set of release blocking scalability/performance tests). The offending PR should only be merged again after proving to pass tests at scale.
- In the event of a performance regression, we can block all PRs from being merged into the relevant repos until the cause of the regression is identified and mitigated. The “Rules of engagement” of pausing merge-queue and rationale for necessity of its introduce are explained in a separate doc.
- We require significant changes (in terms of impact, such as: update of etcd,
update of Go version, major architectural changes, etc.) may only be merged:
- with an explicit approval from a SIG-scalability tech lead and
- after having passed performance testing on biggest supported clusters (unless found unnecessary by approver)
- We can block a feature from transitioning:
- to Beta status, if (when turned on) it causes violation of already existing performance/scalability SLOs;
- to GA status, when it can be used scale. That means:
- in rare cases, introducing a new SLI and SLO and ensuring it is met at scale
- in most of cases, extending scalability tests to use it and ensuring that existing SLOs are still met
- We can require a SIG to introduce a regression-catching benchmark test for a scalability-critical functionality.
This sig follows adheres to the Roles and Organization Management outlined in sig-governance and opts-in to updates and modifications to sig-governance.
SIG Scalability delegates subproject approval to Technical Leads. See Subproject creation - Option 1.