Skip to content

Commit

Permalink
Introduce a formal policy for maintaining cloudproviders
Browse files Browse the repository at this point in the history
The policy largely codifies what we've already been doing for years
(including the requirements we've already imposed on new providers). It
also introduces a new 'cloudprovider maintenance request' mechanism,
following the general idea discussed recently in the sig-meeting.
  • Loading branch information
MaciekPytel committed Sep 19, 2022
1 parent c38cc74 commit 7ac5564
Showing 1 changed file with 127 additions and 0 deletions.
127 changes: 127 additions & 0 deletions cluster-autoscaler/cloudprovider/POLICY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Cloudprovider policy

As of the moment this policy is written (September 2022) Cluster Autoscaler has
integrations with almost 30 different cloudproviders. At the same time there
are only a handful of core CA maintainers. The maintainers don't have the
capacity to build new integrations or maintain existing ones. In most cases they
also have no experience with particular clouds and no access to a test
environment.

Due to above reasons each integration is required to have a set of OWNERS who
are responsible for development and maintenance of the integration. This
document describes the role and responsibilities of core maintainers and
integration owners. A lot of what is described below has been unofficial
practice for multiple years now, but this policy also introduces some new
requirements for cloudprovider maintenance.

## Responsbilities

Cloudprovider owners are responsible for:

* Maintaining their integrations.
* Testing their integrations. Currently any new CA release is tested e2e on
GCE, testing on other platforms is the responsibility of cloudprovider
maintainers (note: there is an effort to make automated e2e tests possible
to run on other providers, so this may improve in the future).
* Addressing any issues raised in autoscaler github repository related to a
given provider.
* Reviewing any pull requests to their cloudprovider.
* Pull requests that only change cloudprovider code do not require any
review or approval from core maintainers.
* Pull requests that change cloudprovider and core code require approval
from both the cloudprovider owner and core maintainer.

The core maintainers will generally not interfere with cloudprovider
development, but they may take the following actions without seeking approval
from cloudprovider owners:

* Make trivial changes to cloudproviders when needed to implement changes in
CA core (ex. updating function signatures when a go interface
changes).
* Revert any pull requests that break tests, prevent CA from compiling, etc.

## Adding new cloud provider integration

In order to add new integration you need to open a pull request implementing
the interfaces defined in cloud\_provider.go. This policy requires that any new
cloudprovider follows the following rules:

* Cloudprovider needs to have an OWNERS file that lists its maintainers.
Kubernetes policy requires that code OWNERS are members of the Kubernetes
organization.
* This can create a chicken and egg problem, where adding a cloudprovider
requires being a member of Kubernetes org and becoming a member of the
organization requires a history of code contributions. For this reason it
is allowed for the OWNERS file to temporarily contain
commented out github handles. There is an expectation that at least some of
the owners will ultimately join Kubernetes organization (by following the
[process](https://github.com/kubernetes/community/blob/master/community-membership.md))
so that they can approve PRs to their cloudprovider.
* Cloudprovider shouldn't introduce new dependencies (such as clients/SDKs)
to top-level go.mod vendor, unless those dependencies are already imported
by kubernetes/kubernetes repository and the same version of the library is
used by CA and Kubernetes. This requirement is mainly driven by
the problems with version conflicts in transitive dependencies we've
experienced in the past.

Note: Any functions in cloud\_provider.go marked as 'Implementation optional'
may be left unimplemented. Those functions provide additional functionality, but
are not critical. To leave a function unimplemented just have it return
cloudprovider.ErrNotImplemented.

### External provider

An alternative to implementing an in-tree cloudprovider is to use existing
[External
gRPC](https://github.com/kubernetes/autoscaler/tree/master/cluster-autoscaler/cloudprovider/externalgrpc)
provider. Integrating with gRPC interface may be easier than implementing an
in-tree cloudprovider and the gRPC provider comes with some essential caching
built in.

An external cloudprovider implementation doesn't live in this repository and is
not a part of CA image. As such it is also not a subject to this policy.

## Cloudprovider maintenance requirements

In order to allow code changes to Cluster Autoscaler that would require
non-trivial changes in cloudproviders this policy introduces _Cloudprovider
maintenance request_ (CMR) mechanism.

* CMR will be issued via a github issue tagging all
cloudprovider owners and describing the problem being solved and the changes
requested.
* CMR will clearly state the minor version in which the changes are expected
(ex. 1.26).
* CMR will need to be discussed on sig-autoscaling meeting and approved by
sig leads before being issued. It will also be announced on sig-autoscaling
slack channel and highlited in sig-autoscaling meeting notes.
* A CMR may be issued no later then [enhancements
freeze](https://github.com/kubernetes/sig-release/blob/master/releases/release_phases.md#enhancements-freeze)
of a given Kubernetes minor version.

Cloudprovider owners will be required to address CMR or request an exception via
the CMR github issue. A failure to take any action will result in cloudprovider
being considered abandoned and marking it as deprecated as described below.

### Empty maintenance request

If no CMRs are issued in a given minor release, core maintainers will issue an
_empty CMR_. The purpose of an empty CMR is to verify that cloudprovider owners
are still actively maintaining their integration. The only action required for
an empty CMR is replying on the github issue. Only one owner from each
cloudprovider needs to reply on the issue.

Empty CMR follows the same rules as any other CMR. In particular it needs to be
issued by enhancements freeze.

### Cloudprovider deprecation and deletion

If cloudprovider owners fail to take actions described above, the particular
integration will be marked as deprecated in the next CA minor release. A
deprecated cloudprovider will be completely removed after 1 year as per
[Kubernetes deprecation
policy](https://kubernetes.io/docs/reference/using-api/deprecation-policy/#deprecating-a-feature-or-behavior).

A deprecated cloudprovider may become maintained again if the owners become
active again or new owners step up. In order to regain maintained status any
outstanding CMRs will need to be addressed.

0 comments on commit 7ac5564

Please sign in to comment.