-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add optional highAvailability feature for registry cache #298
base: main
Are you sure you want to change the base?
Conversation
The Gardener project currently lacks enough active contributors to adequately respond to all PRs.
You can:
/lifecycle stale |
/remove-lifecycle stale |
3fe41e1
to
2c2264c
Compare
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
The Gardener project currently lacks enough active contributors to adequately respond to all PRs.
You can:
/lifecycle stale |
/remove-lifecycle stale |
/test pull-gardener-extension-registry-cache-unit |
The registry cache runs with a single replica. This fact may lead to concerns for the high availability such as "What happens when the registry cache is down? Does containerd fail to pull the image?". As outlined in the [How does it work? section](#how-does-it-work), containerd is configured to fall back to the upstream registry if it fails to pull the image from the registry cache. Hence, when the registry cache is unavailable, the containerd's image pull operations are not affected because containerd falls back to image pull from the upstream registry. | ||
Per default the registry cache runs with a single replica. This fact may lead to concerns for the high availability such as "What happens when the registry cache is down? Does containerd fail to pull the image?". As outlined in the [How does it work? section](#how-does-it-work), containerd is configured to fall back to the upstream registry if it fails to pull the image from the registry cache. Hence, when the registry cache is unavailable, the containerd's image pull operations are not affected because containerd falls back to image pull from the upstream registry. | ||
|
||
In special cases where this is not enough (for example when using the registry cache with a proxy) it is possible to set `providerConfig.caches[].highAvailability` to `true`, this will add the label `high-availability-config.resources.gardener.cloud/type=server` and scale the statefulset to 2 replicas. The `topologySpreadConstraints` is added according to the cluster configuration. See also [High Availability of Deployed Components](https://gardener.cloud/docs/gardener/high-availability/#system-components). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you elaborate more on the use case that requires running the registry cache in HA mode? I am not able to understand how a registry cache running against an upstream behind a proxy requires the registry cache to run in HA mode.
@@ -100,6 +100,8 @@ The `providerConfig.caches[].proxy.httpProxy` field represents the proxy server | |||
|
|||
The `providerConfig.caches[].proxy.httpsProxy` field represents the proxy server for HTTPS connections which is used by the registry cache. It must include an `https://` or `http://` scheme. | |||
|
|||
The `providerConfig.caches[].highAvailability` defines if the StatefulSet is scaled to 2 replicas. See [Increase the cache disk size](#high-availability) for more information. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The `providerConfig.caches[].highAvailability` defines if the StatefulSet is scaled to 2 replicas. See [Increase the cache disk size](#high-availability) for more information. | |
The `providerConfig.caches[].highAvailability` defines if the registry cache is scaled to 2 replicas. See [High Availability](#high-availability) for more information. |
@@ -40,6 +40,8 @@ type RegistryCache struct { | |||
SecretReferenceName *string | |||
// Proxy contains settings for a proxy used in the registry cache. | |||
Proxy *Proxy | |||
// HighAvailability defines if the StatefulSet is scaled with the [High Availability](https://gardener.cloud/docs/gardener/high-availability/#system-components) feature. | |||
HighAvailability *bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would suggest the new field structure to be:
highAvailability:
enabled: true
This makes the API open for future backwards-compatible modifications - for example add another high availability related field.
@@ -18,6 +18,7 @@ import ( | |||
gardencorev1beta1 "github.com/gardener/gardener/pkg/apis/core/v1beta1" | |||
v1beta1constants "github.com/gardener/gardener/pkg/apis/core/v1beta1/constants" | |||
v1beta1helper "github.com/gardener/gardener/pkg/apis/core/v1beta1/helper" | |||
"github.com/gardener/gardener/pkg/apis/resources/v1alpha1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit:
"github.com/gardener/gardener/pkg/apis/resources/v1alpha1" | |
resourcesv1alpha1 "github.com/gardener/gardener/pkg/apis/resources/v1alpha1" |
PS: We should enable the import alias linter as done in gardener/gardener. (cc @dimitar-kostadinov)
@@ -40,6 +40,8 @@ type RegistryCache struct { | |||
SecretReferenceName *string | |||
// Proxy contains settings for a proxy used in the registry cache. | |||
Proxy *Proxy | |||
// HighAvailability defines if the StatefulSet is scaled with the [High Availability](https://gardener.cloud/docs/gardener/high-availability/#system-components) feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// HighAvailability defines if the StatefulSet is scaled with the [High Availability](https://gardener.cloud/docs/gardener/high-availability/#system-components) feature. | |
// HighAvailability defines if the registry cache is scaled with the [High Availability](https://gardener.cloud/docs/gardener/high-availability/#system-components) feature. |
@@ -46,6 +46,9 @@ type RegistryCache struct { | |||
// Proxy contains settings for a proxy used in the registry cache. | |||
// +optional | |||
Proxy *Proxy `json:"proxy,omitempty"` | |||
// HighAvailability defines if the StatefulSet is scaled with the [High Availability](https://gardener.cloud/docs/gardener/high-availability/#system-components) feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// HighAvailability defines if the StatefulSet is scaled with the [High Availability](https://gardener.cloud/docs/gardener/high-availability/#system-components) feature. | |
// HighAvailability defines if the registry cache is scaled with the [High Availability](https://gardener.cloud/docs/gardener/high-availability/#system-components) feature. |
The registry cache runs with a single replica. This fact may lead to concerns for the high availability such as "What happens when the registry cache is down? Does containerd fail to pull the image?". As outlined in the [How does it work? section](#how-does-it-work), containerd is configured to fall back to the upstream registry if it fails to pull the image from the registry cache. Hence, when the registry cache is unavailable, the containerd's image pull operations are not affected because containerd falls back to image pull from the upstream registry. | ||
Per default the registry cache runs with a single replica. This fact may lead to concerns for the high availability such as "What happens when the registry cache is down? Does containerd fail to pull the image?". As outlined in the [How does it work? section](#how-does-it-work), containerd is configured to fall back to the upstream registry if it fails to pull the image from the registry cache. Hence, when the registry cache is unavailable, the containerd's image pull operations are not affected because containerd falls back to image pull from the upstream registry. | ||
|
||
In special cases where this is not enough (for example when using the registry cache with a proxy) it is possible to set `providerConfig.caches[].highAvailability` to `true`, this will add the label `high-availability-config.resources.gardener.cloud/type=server` and scale the statefulset to 2 replicas. The `topologySpreadConstraints` is added according to the cluster configuration. See also [High Availability of Deployed Components](https://gardener.cloud/docs/gardener/high-availability/#system-components). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make sense to clarify that the replicas will use different volumes.
statefulSet := &appsv1.StatefulSet{ | ||
ObjectMeta: metav1.ObjectMeta{ | ||
Name: name, | ||
Namespace: metav1.NamespaceSystem, | ||
Labels: registryutils.GetLabels(name, upstreamLabel), | ||
Labels: utils.MergeStringMaps(registryutils.GetLabels(name, upstreamLabel), additionalStatefulSetLabels), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It could also be implemented as:
diff --git a/pkg/component/registrycaches/registry_caches.go b/pkg/component/registrycaches/registry_caches.go
index 36166f40..297a649b 100644
--- a/pkg/component/registrycaches/registry_caches.go
+++ b/pkg/component/registrycaches/registry_caches.go
@@ -18,7 +18,7 @@ import (
gardencorev1beta1 "github.com/gardener/gardener/pkg/apis/core/v1beta1"
v1beta1constants "github.com/gardener/gardener/pkg/apis/core/v1beta1/constants"
v1beta1helper "github.com/gardener/gardener/pkg/apis/core/v1beta1/helper"
- "github.com/gardener/gardener/pkg/apis/resources/v1alpha1"
+ resourcesv1alpha1 "github.com/gardener/gardener/pkg/apis/resources/v1alpha1"
"github.com/gardener/gardener/pkg/client/kubernetes"
"github.com/gardener/gardener/pkg/component"
"github.com/gardener/gardener/pkg/utils"
@@ -297,16 +297,11 @@ func (r *registryCaches) computeResourcesDataForRegistryCache(ctx context.Contex
}
utilruntime.Must(kubernetesutils.MakeUnique(tlsSecret))
- additionalStatefulSetLabels := map[string]string{}
- if cache.HighAvailability != nil && *cache.HighAvailability {
- additionalStatefulSetLabels[v1alpha1.HighAvailabilityConfigType] = v1alpha1.HighAvailabilityConfigTypeServer
- }
-
statefulSet := &appsv1.StatefulSet{
ObjectMeta: metav1.ObjectMeta{
Name: name,
Namespace: metav1.NamespaceSystem,
- Labels: utils.MergeStringMaps(registryutils.GetLabels(name, upstreamLabel), additionalStatefulSetLabels),
+ Labels: registryutils.GetLabels(name, upstreamLabel),
},
Spec: appsv1.StatefulSetSpec{
ServiceName: name,
@@ -481,6 +476,10 @@ source /entrypoint.sh /etc/distribution/config.yml
}
}
+ if cache.HighAvailability != nil && *cache.HighAvailability {
+ metav1.SetMetaDataLabel(&statefulSet.ObjectMeta, resourcesv1alpha1.HighAvailabilityConfigType, resourcesv1alpha1.HighAvailabilityConfigTypeServer)
+ }
+
var vpa *vpaautoscalingv1.VerticalPodAutoscaler
if r.values.VPAEnabled {
vpa = &vpaautoscalingv1.VerticalPodAutoscaler{
e2e tests job also failed due to network issues. /test pull-gardener-extension-registry-cache-e2e-kind |
@dergeberl: The following test failed, say
Full PR test history. Your PR dashboard. Command help for this repository. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
The e2e test failure is occurrence of #283. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could also add e2e test to cover this scenario. @dimitar-kostadinov raised the point that helper funcs that perform the checks have to be adapted to consider now the 2 replicas of the StatefulSet.
How to categorize this PR?
/area high-availability
/kind enhancement
What this PR does / why we need it:
I reopend this PR due to some special
A
in the branch name of the other PR (#247)This PR adds the optional
highAvailability
setting for the proxy cache. As mentioned in #246 in some cases the registry cache becomes more critical (like the proxy use case). In such cases it makes sense to scale the registry cache to more replicas. This PR uses the High Availability of Deployed Components feature to also set a workingtopologySpreadConstraints
.Which issue(s) this PR fixes:
N/A
Special notes for your reviewer:
Release note: