Configure dogpile.cache to deal with memcached pods failures

Whenever one of the mecached pods disappears, because of a rolling restart during a minor update or as result of a failure, APIs can take a long time to detect that the pod went away and keep trying to reconnect. From a quick round of tests we saw downtimes up to ~150s. By enabling the retry_client and limiting the number of retries the behavior seems much more acceptable. Similarly, when TLS is not in use, we may want to set a lower value for memcache_dead_retry so to eventually reconnect to a new pod (having the same dns name but different ip) much faster. Jira: https://issues.redhat.com/browse/OSPRH-11935
openstack-k8s-operators · Nov 29, 2024 · 9616563 · 9616563
1 parent 21d8169
commit 9616563
Showing 1 changed file with 4 additions and 0 deletions.
diff --git a/templates/nova.conf b/templates/nova.conf
@@ -172,8 +172,12 @@ enabled = True
 # on contoler we prefer to use memcache when its deployed
 {{if .MemcachedTLS}}
 backend = dogpile.cache.pymemcache
+enable_retry_client = true
+retry_attempts = 2
+retry_delay = 0
 {{else}}
 backend = dogpile.cache.memcached
+memcache_dead_retry = 10
 {{end}}
 memcache_servers={{ .MemcachedServers }}
 tls_enabled={{ .MemcachedTLS }}