Skip to content

Commit

Permalink
Configure dogpile.cache to deal with memcached pods failures
Browse files Browse the repository at this point in the history
Whenever one of the mecached pods disappears, because of a rolling
restart during a minor update or as result of a failure, APIs can
take a long time to detect that the pod went away and keep trying
to reconnect.

From a quick round of tests we saw downtimes up to ~150s.

By enabling the retry_client and limiting the number of retries
the behavior seems much more acceptable.

Similarly, when TLS is not in use, we may want to set a lower
value for memcache_dead_retry so to eventually reconnect to a new
pod (having the same dns name but different ip) much faster.

Jira: https://issues.redhat.com/browse/OSPRH-11935
  • Loading branch information
lmiccini committed Nov 29, 2024
1 parent 21d8169 commit 9616563
Showing 1 changed file with 4 additions and 0 deletions.
4 changes: 4 additions & 0 deletions templates/nova.conf
Original file line number Diff line number Diff line change
Expand Up @@ -172,8 +172,12 @@ enabled = True
# on contoler we prefer to use memcache when its deployed
{{if .MemcachedTLS}}
backend = dogpile.cache.pymemcache
enable_retry_client = true
retry_attempts = 2
retry_delay = 0
{{else}}
backend = dogpile.cache.memcached
memcache_dead_retry = 10
{{end}}
memcache_servers={{ .MemcachedServers }}
tls_enabled={{ .MemcachedTLS }}
Expand Down

0 comments on commit 9616563

Please sign in to comment.