Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable HPA for Envoy Proxy introduce unexpected behavior #2807

Closed
ardikabs opened this issue Mar 6, 2024 · 7 comments · Fixed by #2816
Closed

Enable HPA for Envoy Proxy introduce unexpected behavior #2807

ardikabs opened this issue Mar 6, 2024 · 7 comments · Fixed by #2816
Assignees
Labels
area/infra-mgr Issues related to the provisioner used for provisioning the managed Envoy Proxy fleet. kind/bug Something isn't working road-to-ga
Milestone

Comments

@ardikabs
Copy link
Contributor

ardikabs commented Mar 6, 2024

Description:
Relates to #2257

Once enabled envoyHpa on EnvoyProxy API some unexpected behavior was observed.

During reconciling, Envoy Gateway managed to create HorizontalPodAutoscaler resource successfully. However, when it comes to any operation related to Gateway API resource, Envoy Gateway unexpectedly reset the replicas of Envoy Proxy deployment to their original value specified in EnvoyProxy.spec.provider.kubernetes.envoyDeployment.replicas, if this field is undefined, then it will set to 1 replica. Though, after some time, when the HPA controller kicks in again, it will set back to its minReplicas, but I am a bit concerned in the event of high traffic, and any operation for Gateway API applied, it enforces the Envoy Proxy deployment to be scaled-down.

Repro steps:

Apply the below EnvoyProxy manifest, then assuming some of HTTPRoute resources exist, then do any operation such as modification into that resource. You will observe a scale-down operation into EnvoyProxy deployment, from 3 replicas (specified in HPA's minReplicas field) to 1 replica (default value for EnvoyProxy replica).

apiVersion: gateway.envoyproxy.io/v1alpha1
kind: EnvoyProxy
metadata:
  name: default
  namespace: envoy-gateway-system
spec:
  logging:
    level:
      default: warn
  provider:
    kubernetes:
      envoyDeployment:
        pod:
          labels:
            service_name: envoy-proxy
            service_group: envoy
        container:
          resources:
            limits:
              cpu: 2
              memory: 2Gi
            requests:
              cpu: 1
              memory: 1Gi
      envoyHpa:
        maxReplicas: 12
        minReplicas: 3
        metrics:
          - resource:
              name: cpu
              target:
                averageUtilization: 60
                type: Utilization
            type: Resource
    type: Kubernetes

Proposal:

Even though the code has already been set up to prevent reverting replicas to their original value when the HPA controller kicks in (to adjust the Envoy Proxy deployment replicas), the Envoy Gateway seems still not aware when it comes in createOrUpdateDeployment function during reconciliation.

func (i *Infra) createOrUpdateDeployment(ctx context.Context, r ResourceRender) error {
deployment, err := r.Deployment()
if err != nil {
return err
}
current := &appsv1.Deployment{}
key := types.NamespacedName{
Namespace: deployment.Namespace,
Name: deployment.Name,
}
hpa, err := r.HorizontalPodAutoscaler()
if err != nil {
return err
}
var opts cmp.Options
if hpa != nil {
opts = append(opts, cmpopts.IgnoreFields(appsv1.DeploymentSpec{}, "Replicas"))
}

The createOrUpdateDeployment function seems reset the replicas from deploymentConfig in L239,
Spec: appsv1.DeploymentSpec{
Replicas: deploymentConfig.Replicas,
Strategy: *deploymentConfig.Strategy,
.
It seems to me that we need to set the replicas to nil if envoyHpa is utilized, something like below:

if hpa != nil {
   deployment.Spec.Replicas = nil
   opts = append(opts, cmpopts.IgnoreFields(appsv1.DeploymentSpec{}, "Replicas")) 
} 

This way the Envoy Gateway would refrain from further replica adjustments when envoyHpa is utilized.

Environment:
latest

Logs:

Include the access logs and the Envoy logs.

@ardikabs ardikabs added the triage label Mar 6, 2024
@ardikabs
Copy link
Contributor Author

ardikabs commented Mar 6, 2024

if this is reasonable, I would like to work on this

@zirain zirain added area/infra-mgr Issues related to the provisioner used for provisioning the managed Envoy Proxy fleet. and removed triage labels Mar 7, 2024
@arkodg arkodg added kind/bug Something isn't working road-to-ga labels Mar 7, 2024
@arkodg arkodg added this to the v1.0.0 milestone Mar 7, 2024
@arkodg
Copy link
Contributor

arkodg commented Mar 7, 2024

looks like we've discussed this a while ago #703 (comment)
imo the fix here is to not set any default replica, which we are doing today, this code needs to be removed

Replicas: DefaultKubernetesDeploymentReplicas(),

@ardikabs
Copy link
Contributor Author

ardikabs commented Mar 7, 2024

imo the fix here is to not set any default replica, which we are doing today, this code needs to be removed

However @arkodg , it seems to me that unsetting the default replica only addresses the default value behavior, but when a user accidentally sets EnvoyProxy.spec.provider.kubernetes.envoyDeployment.replicas, it will still revert to its replica value, right? wdyt?

@arkodg
Copy link
Contributor

arkodg commented Mar 7, 2024

imo the fix here is to not set any default replica, which we are doing today, this code needs to be removed

However @arkodg , it seems to me that unsetting the default replica only addresses the default value behavior, but when a user accidentally sets EnvoyProxy.spec.provider.kubernetes.envoyDeployment.replicas, it will still revert to its replica value, right? wdyt?

yeah, that is the user being explicit, if the user sets both replica and hpa, they will encounter the battle of the controllers, we can flag it, but its legal in k8s today
we could add a note similar to this https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#migrating-deployments-and-statefulsets-to-horizontal-autoscaling
saying dont set replica if hpa is set

@ardikabs
Copy link
Contributor Author

ardikabs commented Mar 7, 2024

yeah, that is the user being explicit, if the user sets both replica and hpa, they will encounter the battle of the controllers, we can flag it, but its legal in k8s today
we could add a note similar to this https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#migrating-deployments-and-statefulsets-to-horizontal-autoscaling
saying dont set replica if hpa is set

make sense, let me adjust it then

@ardikabs
Copy link
Contributor Author

ardikabs commented Mar 21, 2024

hi folks @arkodg @zirain , it looks like this issue is still happening, as I observed, it looks like because we are leveraging the Update API instead of Patch API in the infra-client code, below:

if updateChecker() {
specific.SetUID(current.GetUID())
if err := cli.Client.Update(ctx, specific); err != nil {
return fmt.Errorf("for Update: %w", err)
}
}

Therefore, I tried in my local to change with Patch API, it worked as expected. Should I re-open this issue, or create a new issue?

@arkodg
Copy link
Contributor

arkodg commented Mar 21, 2024

hi folks @arkodg @zirain , it looks like this issue is still happening, as I observed, it looks like because we are leveraging the Update API instead of Patch API in the infra-client code, below:

if updateChecker() {
specific.SetUID(current.GetUID())
if err := cli.Client.Update(ctx, specific); err != nil {
return fmt.Errorf("for Update: %w", err)
}
}

Therefore, I tried in my local to change with Patch API, it worked as expected. Should I re-open this issue, or create a new issue?

yes please open a new issue for this ....

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/infra-mgr Issues related to the provisioner used for provisioning the managed Envoy Proxy fleet. kind/bug Something isn't working road-to-ga
Projects
No open projects
Development

Successfully merging a pull request may close this issue.

3 participants