You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
On cluster restart, one of the Tyk Gateway pods failed to successfully reconnect to Redis. It attempted to reconnect twice and then appeared to stop retrying (NB: IP addresses replaced with dummy entries):
time="02:44:41" level=error msg="Redis disconnected or error received, attempting to reconnect: read tcp 1.2.3.4:44424->5.6.7.8:6379: read: connection reset by peer"
time="02:44:41" level=error msg="Redis disconnected or error received, attempting to reconnect: read tcp 1.2.3.4:44424->5.6.7.8:6379: read: connection reset by peer"
time="02:44:41" level=error msg="Connection to Redis failed, reconnect in 10s" err="read tcp 1.2.3.4:44424->5.6.7.8:6379: read: connection reset by peer"
time="02:44:51" level=warning msg=Reconnecting
time="02:44:54" level=error msg="Connection to Redis failed, reconnect in 10s" err="dial tcp 9.10.11.12:6379: connect: no route to host"
time="02:45:04" level=warning msg=Reconnecting
After this point, requests to that Tyk Gateway instance failed to authenticate (presumably because the Gateway was disconnected from Redis and so didn't recognise the key):
"time="14:35:21" level=warning msg="Key not found in storage engine" err="key not found" inbound-key="********"
"time="14:35:21" level=info msg="Attempted access with non-existent key."
However the /helloGateway health check endpoint continued to return HTTP 200 responses. Since we use this endpoint for Readiness and Liveness probes in Kubernetes, the instance continued to take traffic but returned HTTP 403 to all requests:
< HTTP/1.1 403 Forbidden
< Content-Type: application/json
< Date: xyz
< Content-Length: 57
<
{
"error": "Access to this API has been disallowed"
}
What is the expected behavior?
Health Check Endpoint Should Fail
If the Tyk Gateway is configured to use Redis storage but cannot connect to the Redis cluster, it should not return a HTTP 200 response on the health check endpoint.
We can then use higher-level orchestration determine something is wrong and take action (take instance out of service, restart it and/or trigger an alert).
Gateway should continue to reconnect or terminate
If the Tyk Gateway is configured to use Redis storage, it should continue trying to reconnect more than twice (potentially with exponential backoff). If it is going to stop attempting to reconnect, it should terminate itself.
If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem
Configure a Tyk Gateway instance to use a Redis cluster
Do you want to request a feature or report a bug?
Bug:
We have two Tyk Gateway instances running in Kubernetes configured to use a Redis cluster for storage, similar to the example:
What is the current behavior?
On cluster restart, one of the Tyk Gateway pods failed to successfully reconnect to Redis. It attempted to reconnect twice and then appeared to stop retrying (NB: IP addresses replaced with dummy entries):
After this point, requests to that Tyk Gateway instance failed to authenticate (presumably because the Gateway was disconnected from Redis and so didn't recognise the key):
However the
/hello
Gateway health check endpoint continued to return HTTP 200 responses. Since we use this endpoint for Readiness and Liveness probes in Kubernetes, the instance continued to take traffic but returned HTTP 403 to all requests:What is the expected behavior?
Health Check Endpoint Should Fail
If the Tyk Gateway is configured to use Redis storage but cannot connect to the Redis cluster, it should not return a HTTP 200 response on the health check endpoint.
We can then use higher-level orchestration determine something is wrong and take action (take instance out of service, restart it and/or trigger an alert).
Gateway should continue to reconnect or terminate
If the Tyk Gateway is configured to use Redis storage, it should continue trying to reconnect more than twice (potentially with exponential backoff). If it is going to stop attempting to reconnect, it should terminate itself.
If the current behavior is a bug, please provide the steps to reproduce and if possible a minimal demo of the problem
Which versions of Tyk affected by this issue? Did this work in previous versions of Tyk?
Observed in Tyk v2.7.4. Not yet tested on latest version (there appear to be some redis-related improvements in v2.7.7).
The text was updated successfully, but these errors were encountered: