Skip to content

Commit

Permalink
Configure keystonemiddleware/oslo to deal with memcached pods failures
Browse files Browse the repository at this point in the history
Whenever one of the mecached pods disappears, because of a rolling
restart during a minor update or as result of a failure, APIs can
take a long time to detect that the pod went away and keep trying
to reconnect.

From a quick round of tests we saw downtimes up to ~150s.

By tuning memcache_pool_dead_retry and memcache_pool_conn_get_timeout
the behavior seems much more acceptable.

Since neutron also uses memcached directly we also need to tweak
the [cache] section enabling the retry mechanism in the client and
apply similar defaults.

Jira: https://issues.redhat.com/browse/OSPRH-11935
  • Loading branch information
lmiccini committed Dec 12, 2024
1 parent 383b1d3 commit 8b5705f
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions templates/neutronapi/config/01-neutron.conf
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,8 @@ ovn_sb_ca_cert = /etc/pki/tls/certs/ovndbca.crt
www_authenticate_uri = {{ .KeystonePublicURL }}
auth_url = {{ .KeystoneInternalURL }}
memcached_servers={{ .MemcachedServersWithInet }}
memcache_pool_dead_retry = 10
memcache_pool_conn_get_timeout = 2
auth_type = password
project_domain_name = Default
user_domain_name = Default
Expand Down Expand Up @@ -91,9 +93,13 @@ lock_path = /var/lib/neutron/tmp
{{if .MemcachedTLS}}
backend = dogpile.cache.pymemcache
memcache_servers = {{ .MemcachedServers }}
enable_retry_client = true
retry_attempts = 2
retry_delay = 0
{{else}}
backend = dogpile.cache.memcached
memcache_servers = {{ .MemcachedServersWithInet }}
memcache_dead_retry = 10
{{end}}
enabled=true
tls_enabled={{ .MemcachedTLS }}
Expand Down

0 comments on commit 8b5705f

Please sign in to comment.