Configure dogpile.cache to deal with memcached pods failures

Whenever one of the mecached pods disappears, because of a rolling restart during a minor update or as result of a failure, APIs can take a long time to detect that the pod went away and keep trying to reconnect. From a quick round of tests we saw downtimes up to ~150s. By enabling the retry_client and limiting the number of retries the behavior seems much more acceptable. Similarly, when TLS is not in use, we may want to set a lower value for memcache_dead_retry so to eventually reconnect to a new pod (having the same dns name but different ip) much faster.
openstack-k8s-operators · Nov 27, 2024 · 25b5f98 · 25b5f98
1 parent 4b0bf2f
commit 25b5f98
Showing 1 changed file with 4 additions and 0 deletions.
diff --git a/templates/keystoneapi/config/keystone.conf b/templates/keystoneapi/config/keystone.conf
@@ -4,8 +4,12 @@ use_stderr=true
 [cache]
 {{if .memcachedTLS}}
 backend = dogpile.cache.pymemcache
+enable_retry_client = true
+retry_attempts = 2
+retry_delay = 0
 {{else}}
 backend = dogpile.cache.memcached
+memcache_dead_retry = 10
 {{end}}
 enabled=true
 memcache_servers={{ .memcachedServers }}