Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bot does not rejoin channels reliably after a netsplit #1335

Open
kwaaak opened this issue May 27, 2018 · 11 comments
Open

Bot does not rejoin channels reliably after a netsplit #1335

kwaaak opened this issue May 27, 2018 · 11 comments

Comments

@kwaaak
Copy link
Contributor

kwaaak commented May 27, 2018

It works most of the time but often enough it doesn't. Maybe this is the responsibility of the network?
In any case, someone more in tune with the IRC protocol might have an elegant way of making sure the bot stays in the channel(s).

@dgw
Copy link
Member

dgw commented May 27, 2018

Even "someone more in tune with the IRC protocol" can't do anything about this without raw logs of what's happening between Sopel and the IRC server.

@kwaaak
Copy link
Contributor Author

kwaaak commented Jun 13, 2018

The bot recognises that something is wrong and initiates a reconnect

>>1528894738.2752254	PING irc.efnet.nl
<<1528894738.2765565	:irc.efnet.nl PONG irc.efnet.nl :irc.efnet.nl
>>1528894858.3953335	PING irc.efnet.nl
>>1528894918.4552155	PING irc.efnet.nl
>>1528895021.7282913	CAP LS 302
>>1528895021.7284808	NICK botnickname
[server connection stuff]
<<1528895025.5272648	:irc.efnet.nl 001 botnickname :Welcome to the EFNet Internet Relay Chat Network botnickname
>>1528895025.527987	MODE  botnickname +B
>>1528895025.528177	JOIN #channelname
[server connection stuff, MOTD]
<<1528895025.5832474	:irc.efnet.nl 437 botnickname #channelname :Nick/channel is temporarily unavailable
[...]
>>1528896821.9092314	PRIVMSG #channelname :example message
<<1528896821.9106731	:irc.efnet.nl 404 botnickname #channelname :Cannot send to channel

The expected message after the JOIN would be:

<<1528925459.723694	:[email protected] JOIN :#channelname

Due to the state of the network, joining a channel is not possible at the time of the connection.
Should the bot retry periodically to join the channels?

@dgw
Copy link
Member

dgw commented Jun 14, 2018

I think EFNet is one of very few networks that lock channels during a netsplit. Most IRCds (AFAIK) just let users on the lost segment join channels anyway and resolve ops collisions with timestamps and/or services.

Adding this sort of logic to core doesn't seem especially worthwhile. For the majority of users, it would just waste CPU time. Once restarting (#1333) is done, a plugin could probably do it though. Or, run Sopel behind a bouncer (ZNC?) and let the bouncer handle channel joining and retries for free.

@dgw dgw added this to the 7.0.0 milestone Nov 13, 2018
@dgw dgw added the Tweak label Nov 13, 2018
@dgw
Copy link
Member

dgw commented Nov 13, 2018

Handling numeric 437 (ERR_UNAVAILRESOURCE) shouldn't be too difficult, as it does include the nick/channel that was unavailable (so there's no need for Sopel to do a lot of complicated state tracking).

I don't think there are any situations where Sopel would receive a 437 for something that isn't a channel, but this feature would definitely need someone to commit to testing it on a network that handles netsplits this way for some time before release. I'm not in a position to do so, realistically.

@dgw dgw modified the milestones: 7.0.0, 7.1.0 Nov 16, 2019
@dgw
Copy link
Member

dgw commented Nov 16, 2019

Punting relatively minor enhancement with an existing workaround.

@Exirel
Copy link
Contributor

Exirel commented Oct 2, 2020

I suggest to punt even further, to Sopel 8.x.

@dgw
Copy link
Member

dgw commented Feb 25, 2021

I suggest to punt even further, to Sopel 8.x.

Belatedly, I agree.

@dgw dgw modified the milestones: 7.1.0, 8.0.0 Feb 25, 2021
@dgw dgw modified the milestones: 8.0.0, 8.1.0 Jul 14, 2022
@dgw
Copy link
Member

dgw commented Jul 14, 2022

Let's consider this part of the asyncio rewrite's shakedown, to be revisited when work starts on 8.1.

@Exirel
Copy link
Contributor

Exirel commented Oct 21, 2024

From reading this conversation, it feels like Sopel could:

  • just log if it gets a 437
  • try to rejoin a channel if it gets a 404 and the channel is in its known channels list

@dgw
Copy link
Member

dgw commented Oct 21, 2024

Logging a 437 seems like about the only sane option, since different IRCds use it for conflicting things (RFC 2812 = ERR_UNAVAILRESOURCE, ircu and possibly others = ERR_BANNICKCHANGE).

I wonder how Sopel would get a 404 ERR_CANNOTSENDTOCHAN if it couldn't rejoin the channel in the first place? With rare exceptions (like reminders from sopel-remind), Sopel only sends to a channel in response to a triggering event from that channel.

If eventual consistency is OK, maybe there's also some mechanism that Sopel could use to make sure it's actually joined to all the channels it expects, using a self-WHO or something like that. (I'm not a fan of this idea because it sounds fairly complicated and potentially fragile, but it is a possibility.)

@dgw
Copy link
Member

dgw commented Oct 22, 2024

Another idea, maybe better: If Sopel receives 437 and is not in one or more of the core.channels, begin trying to rejoin the missing channel(s) periodically up to some timeout value (an hour?).

I'm loosely thinking of ZNC's behavior, where if it can't join a channel it will retry several times before logging a message and disabling future join attempts. (I don't think Sopel should quit trying to join the channel forever, e.g. after a restart it should try joining every channel in the settings again.)

If I shamelessly take a look at what another bot (Limnoria) does, I its Admin plugin, do437 triggers a join retry if the target is a channel. Sopel could just ignore 437s about nicks, for now. It's simple, but I think it could work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants