[RFE] Cluster maximums OCP workload #4

rsevilla87 · 2023-07-10T14:56:43Z

Is your feature request related to a problem? Please describe.

Having a workload able to reproduce the documented cluster-maximums can be very useful to detect regressions of some components that are not that intensively used by the current workloads.

i.e.:

Benchmarking max number of CRDs: It has been proven that a high number of CRDs had a negative impact in the API performance. Both in API responsiveness and resource usage. We're not tracking this scenario at the moment
Max number of endpoints per service. In our current workloads, we're testing a high number of services, however we're not adding a high number of endpoints to them. This scenario is being currently tracked in upstream with kube-proxy implemented services, but we're not actually tracking it with OVNKubernetes

There more examples like the above. This new workload shouldn't be used as a rule of thumb to demonstrate the limits of a cluster, but as a new helper to detect and verify scenarios we're not currently tracking.

Describe the solution you'd like

The cluster-maximums workload should be self-contained, based on a multi-job benchmark. With this approach maintaining and updating will be easier.

I started coding this workload, a initial approach about how it would look like is in the following snippet:

# Would test 10k namespaces, 10k routes, 10k service, 20k pods and 30k network policies                                
  - name: max-namespaces                                                                                                                                      
    namespace: max-namespaces
    jobIterations: {{.NAMESPACES}}                                                                                     
    qps: {{.QPS}}                                     
    burst: {{.BURST}}                                                                                                  
    namespacedIterations: true                   
    waitWhenFinished: true                       
    preLoadImages: false                   # We don't need to preload since this job is reusing images previously used                                        
    jobPause: 2m                                                                                                                                              
    namespaceLabels:                   
      security.openshift.io/scc.podSecurityLabelSync: false                                                                                                   
      pod-security.kubernetes.io/enforce: privileged                                                                                                          
      pod-security.kubernetes.io/audit: privileged  
      pod-security.kubernetes.io/warn: privileged 
    objects:                                     
      - objectTemplate: deployment-server.yml              
        replicas: 1                                     
        inputVars: 
          podReplicas: 1                                                                                               
      - objectTemplate: deployment-client.yml
        replicas: 1                          
        inputVars:  
          podReplicas: 1                         
          ingressDomain: {{.INGRESS_DOMAIN}}            
      - objectTemplate: service.yml                        
        replicas: 1                                                                                                                     
      - objectTemplate: route.yml                                   
        replicas: 1                                                 
      - objectTemplate: np-deny-all.yml                             
        replicas: 1                                                 
      - objectTemplate: np-allow-from-clients.yml                   
        replicas: 1                                                 
      - objectTemplate: np-allow-from-ingress.yml                              
        replicas: 1                                                            
                                                                               
  - name: remove-max-namespaces                                                
    qps: 5                                                                     
    burst: 5                                                                   
    jobType: delete                                                            
    jobPause: 2m                                                               
    objects:                                                                   
      - kind: Namespace                                                                                                                                       
        labelSelector: {kube-burner-job: max-namespaces}                       

# 5k backends per service: Five times -> 5k server pods + 1 client pods + 1 route + 3 network policies                                                        
  - name: max-backends                                                         
    namespace: max-backends                                                    
    jobIterations: 5                                                           
    qps: {{.QPS}}                                                              
    burst: {{.BURST}}                                                          
    namespacedIterations: true                                                 
    waitWhenFinished: true                                                     
    preLoadImages: false             # We don't need to preload since this job is reusing images previously used                                              
    jobPause: 2m                                                               
    namespaceLabels:                                                           
      security.openshift.io/scc.podSecurityLabelSync: false                                                                                                   
      pod-security.kubernetes.io/enforce: privileged                           
      pod-security.kubernetes.io/audit: privileged                             
      pod-security.kubernetes.io/warn: privileged                              
    objects:                                                                   
      - objectTemplate: deployment-server.yml                                  
        replicas: 1                                                            
        inputVars:                                                             
          podReplicas: {{.BACKENDS}}                                           
      - objectTemplate: deployment-client.yml                                  
        replicas: 1                                                            
        inputVars:                                                             
          podReplicas: 1                                                       
          ingressDomain: {{.INGRESS_DOMAIN}}                                   
      - objectTemplate: service.yml                                            
        replicas: 1                                                            
      - objectTemplate: route.yml                                              
        replicas: 1                                                            
      - objectTemplate: np-deny-all.yml                                        
        replicas: 1                                                            
      - objectTemplate: np-allow-from-clients.yml                              
        replicas: 1                                                            
      - objectTemplate: np-allow-from-ingress.yml                              
        replicas: 1                                                            

  - name: remove-max-backends                                                  
    jobType: delete                                                            
    objects:                                                                   
      - kind: Namespace                                                        
        labelSelector: {kube-burner-job: max-backends}

github-actions · 2023-10-09T12:32:19Z

This issue has become stale and will be closed automatically within 7 days.

qiliRedHat · 2023-11-28T06:52:26Z

Bug https://issues.redhat.com/browse/MON-3394 is discovered in ROSA with large number of namespaces with big number secrets "there are a lot of secrets on the cluster: 24464". So I suggest we added at least 3 secrets per namespace to cover this. (3 secrets x 10k namespaces=30k secrets > 24464)
In the old max-namespaces workload, there are 10 secrets in each namespace: https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/kube-burner/workloads/max-namespaces/max-namespaces.yml#L95C12-L95C12

github-actions · 2024-04-22T12:03:16Z

This issue has become stale and will be closed automatically within 7 days.

rsevilla87 added the enhancement New feature or request label Jul 10, 2023

rsevilla87 self-assigned this Jul 10, 2023

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 17, 2023

rsevilla87 reopened this Nov 21, 2023

rsevilla87 mentioned this issue Nov 21, 2023

Cluster-maximums workload kube-burner/kube-burner#521

Closed

7 tasks

rsevilla87 transferred this issue from kube-burner/kube-burner Jan 23, 2024

rsevilla87 mentioned this issue Jan 23, 2024

Cluster-maximums workload #5

Closed

7 tasks

github-actions bot added the stale label Apr 22, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFE] Cluster maximums OCP workload #4

[RFE] Cluster maximums OCP workload #4

rsevilla87 commented Jul 10, 2023

github-actions bot commented Oct 9, 2023

qiliRedHat commented Nov 28, 2023

github-actions bot commented Apr 22, 2024

[RFE] Cluster maximums OCP workload #4

[RFE] Cluster maximums OCP workload #4

Comments

rsevilla87 commented Jul 10, 2023

github-actions bot commented Oct 9, 2023

qiliRedHat commented Nov 28, 2023

github-actions bot commented Apr 22, 2024