-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Velero : Throttling request errors #3191
Comments
Hi @srajput1991 - Apologies for the delay in getting back to you on this. These messages are coming from My understanding is that this shouldn't impact the functionality of Velero. We should investigate though to see if there are better default settings for See the following settings for the velero server:
|
After doing some digging in the code, the underlying struct used is this Golang rate limiter and its Here's the key sentence:
So what I believe is happening is that the As @zubron said, the size of the bucket can be increased with the I don't have a good handle on what values to set at the moment. I'm going to do some experimenting, but I think moving up to 50 burst/40 QPS would be a good start. |
Looking around, Prometheus uses 100/100 and ingress-nginx used to use 1,000,000 for both, though it appears they've since removed any default values. We don't want Velero to spam the API server, so I think a value of 1,000,000 is too much, but moving up into the order of 100 or so is a reasonable start. |
@srajput1991 the following values made the throttling messages go away in a test cluster. Highlighted inline. ---
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
creationTimestamp: "2021-01-11T23:19:08Z"
generation: 3
labels:
component: velero
name: velero
namespace: velero
resourceVersion: "3678"
selfLink: /apis/apps/v1/namespaces/velero/deployments/velero
uid: 2ccad69e-f561-4497-98d8-9b4604073580
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
deploy: velero
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8085"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
component: velero
deploy: velero
spec:
containers:
- args:
- server
- --client-qps=75.0 <----- HERE
- --client-burst=100 <----- HERE
- --features=
command:
- /velero
env:
- name: VELERO_SCRATCH_DIR
value: /scratch
- name: VELERO_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: LD_LIBRARY_PATH
value: /plugins
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /credentials/cloud
- name: AWS_SHARED_CREDENTIALS_FILE
value: /credentials/cloud
- name: AZURE_CREDENTIALS_FILE
value: /credentials/cloud
- name: ALIBABA_CLOUD_CREDENTIALS_FILE
value: /credentials/cloud
image: velero/velero:main
imagePullPolicy: IfNotPresent
name: velero
ports:
- containerPort: 8085
name: metrics
protocol: TCP
resources:
limits:
cpu: "1"
memory: 256Mi
requests:
cpu: 500m
memory: 128Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /plugins
name: plugins
- mountPath: /scratch
name: scratch
- mountPath: /credentials
name: cloud-credentials
dnsPolicy: ClusterFirst
initContainers:
- image: velero/velero-plugin-for-gcp:v1.1.0
imagePullPolicy: IfNotPresent
name: velero-plugin-for-gcp
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /target
name: plugins
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
serviceAccount: velero
serviceAccountName: velero
terminationGracePeriodSeconds: 30
volumes:
- emptyDir: {}
name: plugins
- emptyDir: {}
name: scratch
- name: cloud-credentials
secret:
defaultMode: 420
secretName: cloud-credentials
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2021-01-11T23:19:26Z"
lastUpdateTime: "2021-01-11T23:19:26Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2021-01-11T23:19:08Z"
lastUpdateTime: "2021-01-11T23:21:40Z"
message: ReplicaSet "velero-6d9c7fc787" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 3
readyReplicas: 1
replicas: 1
updatedReplicas: 1 This is something that you can change in your own deployment, though I think we'll get the defaults updated within Velero's code, too. |
Running into this issue too. I've created vmware-tanzu/helm-charts#222 to allow each user to customize the client's qps and burst rate from chart values conveniently. |
Increase the k8s client QPS/burst to avoid throttling request errors Fixes vmware-tanzu#7127 Fixes vmware-tanzu#3191 Signed-off-by: Wenkai Yin(尹文开) <[email protected]>
[carlisia] Update on 1/13: this became a task to update our server defaults.
Hi,
I am using velero to take backup of my k8s resources. I am seeing a lot errors in datadog from velero as below:
The text was updated successfully, but these errors were encountered: