-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[1.x] Adds support for reconnecting to Redis if disconnected by server #281
base: main
Are you sure you want to change the base?
Conversation
82d87b3
to
54e2a0b
Compare
@nathanaelytj I still have some tests to write here, but wondering if you could give it an early test and see if it works as you expect. Details of the behaviour are in the PR description. |
Thank you @joedixon I have checked the fix. It's working as intended for issue when redis is disconnected and then reconnected. I think this is okay to ignore, since orchestration like docker swarm / kubernetes has it's own healthcheck and after container crashed, it will restarted automatically. |
Thanks @nathanaelytj! I'm not super familiar with your configuration - do you have any idea why the code I'm introducing didn't handle that situation? |
@joedixon I'm using docker swarm setup. When the container all gone from the service / replica 0, the bound service DNS along with it's accosiated virtual-IP is flushed, and that's when name or service not known. After a moment, a new container pop and it re-registered within Docker Swarm Internal DNS, sometimes the propagation is not fast enough so when reverb internal healthcheck run, it meet with name or service not known or something similar. For non docker swarm / kubernetes setup, this error probably can be replicated when we are connecting to redis server through DNS, and then after connection established, we delete the DNS record. It will probably return error something similar to that. |
Do you think there is a different exception/error I can catch to trigger the timeout as opposed to a hard fail? |
After analyzing the stacktrace and experiment with the code I found that this line reverb/src/Servers/Reverb/Http/Server.php Lines 36 to 42 in 9469732
must caught ErrorException for timeout to trigger. Here's my experiment: /**
* Start the Http server
*/
public function start(): void
{
try {
$this->loop->run();
} catch (\ErrorException $e) {
Log::info("Error caught during restart");
}
} The final result is: It successfully trigger the timeout sequence and reconnection. |
Btw bellow is full stack trace I use during analysis: ErrorException
Redis::get(): php_network_getaddresses: getaddrinfo for base_redis failed: Name or service not known
at vendor/laravel/framework/src/Illuminate/Redis/Connections/Connection.php:116
112▕ public function command($method, array $parameters = [])
113▕ {
114▕ $start = microtime(true);
115▕
➜ 116▕ $result = $this->client->{$method}(...$parameters);
117▕
118▕ $time = round((microtime(true) - $start) * 1000, 2);
119▕
120▕ $this->events?->dispatch(new CommandExecuted(
1 vendor/laravel/framework/src/Illuminate/Redis/Connections/Connection.php:116
Redis::get()
2 vendor/laravel/framework/src/Illuminate/Redis/Connections/PhpRedisConnection.php:531
Illuminate\Redis\Connections\Connection::command()
3 vendor/laravel/framework/src/Illuminate/Redis/Connections/PhpRedisConnection.php:54
Illuminate\Redis\Connections\PhpRedisConnection::command()
4 vendor/laravel/framework/src/Illuminate/Cache/RedisStore.php:71
Illuminate\Redis\Connections\PhpRedisConnection::get()
5 vendor/laravel/framework/src/Illuminate/Cache/Repository.php:117
Illuminate\Cache\RedisStore::get()
6 vendor/laravel/framework/src/Illuminate/Cache/CacheManager.php:453
Illuminate\Cache\Repository::get()
7 vendor/laravel/framework/src/Illuminate/Support/Facades/Facade.php:361
Illuminate\Cache\CacheManager::__call()
8 vendor/laravel/reverb/src/Servers/Reverb/Console/Commands/StartServer.php:106
Illuminate\Support\Facades\Facade::__callStatic()
9 vendor/react/event-loop/src/Timer/Timers.php:102
Laravel\Reverb\Servers\Reverb\Console\Commands\StartServer::Laravel\Reverb\Servers\Reverb\Console\Commands\{closure}()
10 vendor/react/event-loop/src/StreamSelectLoop.php:185
React\EventLoop\Timer\Timers::tick()
11 vendor/laravel/reverb/src/Servers/Reverb/Http/Server.php:41
React\EventLoop\StreamSelectLoop::run()
12 vendor/laravel/reverb/src/Servers/Reverb/Console/Commands/StartServer.php:74
Laravel\Reverb\Servers\Reverb\Http\Server::start()
13 vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:36
Laravel\Reverb\Servers\Reverb\Console\Commands\StartServer::handle()
14 vendor/laravel/framework/src/Illuminate/Container/Util.php:43
Illuminate\Container\BoundMethod::Illuminate\Container\{closure}()
15 vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:95
Illuminate\Container\Util::unwrapIfClosure()
16 vendor/laravel/framework/src/Illuminate/Container/BoundMethod.php:35
Illuminate\Container\BoundMethod::callBoundMethod()
17 vendor/laravel/framework/src/Illuminate/Container/Container.php:694
Illuminate\Container\BoundMethod::call()
18 vendor/laravel/framework/src/Illuminate/Console/Command.php:213
Illuminate\Container\Container::call()
19 vendor/symfony/console/Command/Command.php:279
Illuminate\Console\Command::execute()
20 vendor/laravel/framework/src/Illuminate/Console/Command.php:182
Symfony\Component\Console\Command\Command::run()
21 vendor/symfony/console/Application.php:1094
Illuminate\Console\Command::run()
22 vendor/symfony/console/Application.php:342
Symfony\Component\Console\Application::doRunCommand()
23 vendor/symfony/console/Application.php:193
Symfony\Component\Console\Application::doRun()
24 vendor/laravel/framework/src/Illuminate/Foundation/Console/Kernel.php:198
Symfony\Component\Console\Application::run()
25 vendor/laravel/framework/src/Illuminate/Foundation/Application.php:1205
Illuminate\Foundation\Console\Kernel::handle()
26 artisan:13
Illuminate\Foundation\Application::handleCommand() |
So the issue is that exception is being thrown too high up the stack to be caught by the reconnection code? |
I think the connection class at laravel/framework throwing Exception earlier than reconnection code try catch block. Using reactphp the code is like operating in a closed loop. My experimental code caught the exception, but it's doing nothing waiting for the next tick. At the next tick the whole run sequence triggered. During this tick it threw another exception but caught by reconnection code try catch block because now DNS resolved but reverb is not subscibed. Then the reconnection triggered restoring all function to healthy state. What do you think? |
This PR adds support for attempting to reconnect to Redis if the connection is dropped.
It does this by listening for a disconnection and starting a configurable timeout. During the timeout, Reverb will attempt to reconnect every second. Any events that are triggered along the way are queued up and will be processed if the connection is re-established. If at the end of that timeout, it has not bee possible to reconnect, the server is stopped and it's up to the process monitor to attempt to restart Reverb which will kick off the process again.
Allowing Reverb to attempt the reconnection should prevent backoff from the process monitor when attempting to restart the server fails.