Report to Station when the node cannot connect to the network #63

bajtos · 2022-08-30T15:16:55Z

Detect the situation when the L2 node cannot connect to any L1 node and report the problem back to Station. Let's keep the first version simple:

When the app starts or whenever the number of L1 connections drops to zero, we start a timer for 3 seconds
If there is no L1 connection after this timeout, then we log an event about the problem. Proposed message:
```
fmt.Print("ERROR: Saturn Node is not able to connect to the network\n")
```
In the current backoff-based retry implementation, we give up connecting after several unsuccessful attempts. When this happens, L2 Node should report the problem to the Station.
```
 fmt.Printf("ERROR: Saturn Node was not able to connect to the network after %v attempts, giving up.\n", l.maxReconnectAttempts)
```

In both cases, it's important to print the message only once. We don't want the message to be printed for each L1 client we have, as that would print each message three times.

Related: #62

juliangruber · 2022-08-30T15:19:07Z

What about instead of adding a timeout we log this for every attempt?

fmt.Print("ERROR: Saturn Node is not able to connect to the network, retrying...\n")

This gives faster but more noisy feedback, with the benefit of a simpler implementation

bajtos · 2022-08-30T15:27:49Z

What about instead of adding a timeout we log this for every attempt?

I have already tried that and was getting three error messages every now and then. Like nothing happens for a second or two, then three messages appear at once, then there is another pause, and then another three messages, and so on.

I am fine to look for a simpler solution as long as we can report the problem at the L2-Node level, not at the level of every L1 client.

In other words, we can rework the part about 3sec timeout to use the current backoff retry mechanism, but then we need to report only the first error and not the duplicates following soon after the first one.

That would work in the case where we cannot reach the network at all.

However, it would not work if we can reach only some of the L1 nodes. In that case, we don't want to report connection errors mixed with messages like Saturn Node is online and connected to 1 peer(s).

juliangruber · 2022-08-30T15:32:46Z

Ah gotcha, I thought the timeout was there to reduce log messages on the L2-node level. Whatever is easiest then 👍

juliangruber · 2022-09-14T11:13:08Z

Closed by #68, which replaced #67

bajtos added the Station Work related to Filecoin Station label Aug 30, 2022

bajtos assigned aarshkshah1992 Aug 30, 2022

bajtos mentioned this issue Aug 30, 2022

Saturn L2 requirements filecoin-station/desktop#84

Closed

6 tasks

aarshkshah1992 mentioned this issue Aug 31, 2022

Log l1 activity #67

Closed

juliangruber closed this as completed Sep 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Report to Station when the node cannot connect to the network #63

Report to Station when the node cannot connect to the network #63

bajtos commented Aug 30, 2022 •

edited

Loading

juliangruber commented Aug 30, 2022

bajtos commented Aug 30, 2022

juliangruber commented Aug 30, 2022

juliangruber commented Sep 14, 2022

Report to Station when the node cannot connect to the network #63

Report to Station when the node cannot connect to the network #63

Comments

bajtos commented Aug 30, 2022 • edited Loading

juliangruber commented Aug 30, 2022

bajtos commented Aug 30, 2022

juliangruber commented Aug 30, 2022

juliangruber commented Sep 14, 2022

bajtos commented Aug 30, 2022 •

edited

Loading