-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
routing: shutdown chanrouter correctly. #8497
Conversation
Important Review skippedAuto reviews are limited to specific labels. Labels to auto review (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe recent changes introduce robust error handling and state management in various components of the Lightning Network Daemon (LND). Key enhancements include ensuring that certain methods are only executed when their corresponding components have been initialized, preventing nil pointer dereferences and managing lifecycle states with atomic boolean flags. These improvements enhance the stability and reliability of the system during startup and shutdown processes. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Server
participant Component
User->>Server: Start
Server->>Component: Initialize
Component-->>Server: Initialized
Server->>User: Success
User->>Server: Stop
Server->>Component: Cleanup
Component-->>Server: Cleaned
Server->>User: Success
Assessment against linked issues
Possibly related issues
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configuration File (
|
But this is just part of the fix why the other node is not able to sync the graph to the chain, we definitely need to retry the blockfetch and not fail immediately if we cannot get the block from the first peer. This issue is already tracked in this issue: |
801e50f
to
3223097
Compare
3223097
to
85a52aa
Compare
Swapped the order when we add the cleanup |
f21e62c
to
8c831e5
Compare
@yyforyongyu while adding the interruptibility to the startup of the server, I figured out that we need to make sure that each stop call is atomic (only happens once) otherwise we first call it in the But I think when the tests pass the switch of the cleanup order should have no side effects and can prevent some cases where subsystems depend on each other and therefore cannot shutdown correctly in case on of them does not close the |
@ziggie1984, remember to re-request review from reviewers when ready |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments, will check the itest logs to understand more about the new behavior.
0ac6836
to
5508ca3
Compare
Let's see whether all the itests pass after the change to error out when a start/stop is called twice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, just a few nits and needs a rebase - think there's a new subserver added, we may need to change that here too.
928ef1a
to
73c6a3b
Compare
just gonna make a note of the one's ive run into:
Found these by basically commenting out all the Start calls & thus only calling Stop calls |
Good observation - I think it means if we want to safely move |
Thank you for this important analysis, did not think about this, will try to analysis all the cases and provide a proper solution. |
a488da4
to
5209f80
Compare
went through the list of stop/start methods and tried mostly added nil pointer checks in the Also went through your list of examples above and addressed them, only exception was the panic you refered to in point 2, which was caused by the chainnotifier not running. However we already start the chainnotifier before the SubSwapper which is then able to subscribe to the channel events. I don't think I regarded every case in the code base where the stop method is called before the start method, but I focused on the subsystems changed by this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates 🙏
invoices/invoiceregistry.go
Outdated
if i.expiryWatcher == nil { | ||
return fmt.Errorf("InvoiceRegistry expiryWatcher not " + | ||
"initialized") | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think we still want the rest of the function to run though. iiuc, the whole reason we want to call Stop before Start so that quit channels can be closed & hence synch processes in Start methods can be stopped
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree changed it to:
if i.expiryWatcher != nil {
i.expiryWatcher.Stop()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we could also change the constructor of the invoice registry to catch the case where we have a nil
pointer for the expiryWatcher. Went with the above for now, but happy to change it.
5209f80
to
8c3abec
Compare
@coderabbitai review |
Actions performedReview triggered.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range, codebase verification and nitpick comments (2)
sweep/fee_bumper.go (1)
703-703
: Ensure correct initialization of the new fields in theNewTxPublisher
function.The
NewTxPublisher
function should correctly initialize the new atomic boolean fieldsstarted
andstopped
. Currently, these fields are not being initialized.
- File:
sweep/fee_bumper.go
- Function:
NewTxPublisher
Please add the initialization for the
started
andstopped
fields.Analysis chain
Ensure correct initialization of the new fields in the
NewTxPublisher
function.The
NewTxPublisher
function should correctly initialize the new atomic boolean fieldsstarted
andstopped
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the `NewTxPublisher` function initializes the new fields correctly. # Test: Search for the `NewTxPublisher` function implementation. Expect: Correct initialization of the `started` and `stopped` fields. rg --type go -A 10 $'func NewTxPublisher(cfg TxPublisherConfig) *TxPublisher {'Length of output: 258
Script:
#!/bin/bash # Description: Verify the `NewTxPublisher` function initializes the new fields correctly. # Test: Search for the `NewTxPublisher` function implementation. Expect: Correct initialization of the `started` and `stopped` fields. rg --type go -A 20 'func NewTxPublisher\(cfg TxPublisherConfig\) \*TxPublisher {'Length of output: 267
Script:
#!/bin/bash # Description: Verify the `NewTxPublisher` function initializes the new fields correctly. # Test: Search for the `NewTxPublisher` function implementation. Expect: Correct initialization of the `started` and `stopped` fields. rg --type go -A 20 'func NewTxPublisher(cfg TxPublisherConfig) *TxPublisher {'Length of output: 258
Script:
#!/bin/bash # Description: Verify the `NewTxPublisher` function initializes the new fields correctly. # Test: Search for the `NewTxPublisher` function implementation. Expect: Correct initialization of the `started` and `stopped` fields. rg --type go -A 20 'func NewTxPublisher'Length of output: 1276
invoices/invoiceregistry.go (1)
273-273
: Ensure correct initialization of the new fields in theNewRegistry
function.The
NewRegistry
function does not initialize the new atomic boolean fieldsstarted
andstopped
. Please add the initialization for these fields.
invoices/invoiceregistry.go: NewRegistry function
Analysis chain
Ensure correct initialization of the new fields in the
NewRegistry
function.The
NewRegistry
function should correctly initialize the new atomic boolean fieldsstarted
andstopped
.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the `NewRegistry` function initializes the new fields correctly. # Test: Search for the `NewRegistry` function implementation. Expect: Correct initialization of the `started` and `stopped` fields. rg --type go -A 10 $'func NewRegistry(idb InvoiceDB, expiryWatcher *InvoiceExpiryWatcher, cfg *RegistryConfig) *InvoiceRegistry {'Length of output: 411
Script:
#!/bin/bash # Description: Verify the `NewRegistry` function initializes the new fields correctly. # Test: Search for the `NewRegistry` function implementation. Expect: Correct initialization of the `started` and `stopped` fields. rg --type go -A 20 'func NewRegistry'Length of output: 1439
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (11)
- chainntnfs/bitcoindnotify/bitcoind.go (1 hunks)
- chainntnfs/neutrinonotify/neutrino.go (1 hunks)
- chanfitness/chaneventstore.go (4 hunks)
- discovery/gossiper.go (1 hunks)
- docs/release-notes/release-notes-0.18.3.md (1 hunks)
- graph/builder.go (3 hunks)
- htlcswitch/interceptable_switch.go (4 hunks)
- invoices/invoiceregistry.go (2 hunks)
- lnd.go (1 hunks)
- server.go (6 hunks)
- sweep/fee_bumper.go (3 hunks)
Additional context used
LanguageTool
docs/release-notes/release-notes-0.18.3.md
[grammar] ~40-~40: The word “shutdown” is a noun. The verb is spelled with a space.
Context: ...pull/8497) a case where LND would not shutdown properly when interrupted via e.g. SIGT...(NOUN_VERB_CONFUSION)
[uncategorized] ~40-~40: A comma may be missing after the conjunctive/linking adverb ‘Moreover’.
Context: ...erly when interrupted via e.g. SIGTERM. Moreover LND now shutsdown correctly in case o...(SENT_START_CONJUNCTIVE_LINKING_ADVERB_COMMA)
Markdownlint
docs/release-notes/release-notes-0.18.3.md
39-39: Expected: dash; Actual: asterisk
Unordered list style(MD004, ul-style)
Additional comments not posted (57)
chanfitness/chaneventstore.go (4)
15-17
: LGTM: Import ofsync/atomic
.The import of
sync/atomic
is necessary for the atomic boolean flags used in the file.
53-55
: LGTM: Addition ofstarted
andstopped
fields.The addition of
started
andstopped
fields of typeatomic.Bool
enhances the lifecycle management of theChannelEventStore
.
150-154
: LGTM: Prevent multiple invocations ofStart
.The check using
c.started.Swap(true)
ensures that theStart
method cannot be invoked more than once, preventing potential issues.
216-238
: LGTM: Prevent multiple invocations ofStop
and return an error.The check using
c.stopped.Swap(true)
ensures that theStop
method cannot be invoked more than once, and returning an error adds robustness to the shutdown process.htlcswitch/interceptable_switch.go (4)
7-7
: LGTM: Import ofsync/atomic
.The import of
sync/atomic
is necessary for the atomic boolean flags used in the file.
37-39
: LGTM: Addition ofstarted
andstopped
fields.The addition of
started
andstopped
fields of typeatomic.Bool
enhances the lifecycle management of theInterceptableSwitch
.
208-213
: LGTM: Prevent multiple invocations ofStart
.The check using
s.started.Swap(true)
ensures that theStart
method cannot be invoked more than once, preventing potential issues.
236-251
: LGTM: Prevent multiple invocations ofStop
and return an error.The check using
s.stopped.Swap(true)
ensures that theStop
method cannot be invoked more than once, and returning an error adds robustness to the shutdown process.lnd.go (4)
677-684
: Good use of asynchronous server startup.Starting the server within a goroutine allows for better control flow and graceful shutdown capabilities.
685-691
: Good use of deferred function for graceful shutdown.The deferred function ensures that the server is stopped gracefully, logging any errors that occur during the shutdown process.
693-702
: Improved control flow withselect
statement.The
select
statement allows for better error handling and graceful shutdown capabilities by listening for either an error fromerrChan
or a shutdown signal frominterceptor.ShutdownChannel()
.
700-702
: Proper use of break statement inselect
block.The break statement ensures that the function proceeds normally if the server starts successfully.
chainntnfs/bitcoindnotify/bitcoind.go (1)
154-159
: Robustness improvement with nil check fortxNotifier
.The conditional statement ensures that
txNotifier
is not nil before attempting to callTearDown
, preventing runtime errors associated with dereferencing a nil pointer.chainntnfs/neutrinonotify/neutrino.go (1)
156-160
: Robustness Improvement: Conditional Check fortxNotifier
.The added conditional check ensures that
n.txNotifier
is not nil before calling itsTearDown
method. This prevents potential nil pointer dereference errors and enhances the robustness of theStop
method.sweep/fee_bumper.go (4)
264-265
: LGTM! The addition of atomic boolean fields improves state management.The
started
andstopped
fields ensure that theTxPublisher
can only be started or stopped once, enhancing robustness.
Line range hint
672-686
:
LGTM! TheStart
method now ensures single invocation.The use of
t.started.Swap(true)
prevents multiple starts, and the logging statement aligns better with the logical flow.
691-703
: LGTM! TheStop
method now ensures single invocation and returns an error.The use of
t.stopped.Swap(true)
prevents multiple stops, and the method signature update aligns with Go best practices.
703-703
: Ensure proper handling of thequit
channel in themonitor
method.The
monitor
method should correctly handle thequit
channel, which is closed in theStop
method, to ensure a graceful shutdown.Verification successful
The
monitor
method correctly handles thequit
channel.The
monitor
method includes aselect
statement that properly handles thequit
channel, ensuring a graceful shutdown when thequit
signal is received.
- The
quit
channel is checked within theselect
statement, and the method returns when a message is received on thequit
channel.Code Snippet:
for { select { case epoch, ok := <-blockEvent.Epochs: if !ok { log.Error("Block epoch channel closed, exit monitor") return } log.Debugf("TxPublisher received new block: %v", epoch.Height) t.currentHeight.Store(epoch.Height) t.processRecords() case <-t.quit: log.Debug("Fee bumper stopped, exit monitor") return } }
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the `monitor` method handles the `quit` channel correctly. # Test: Search for the `quit` channel usage in the `monitor` method. Expect: Proper handling of the `quit` channel. rg --type go -A 10 $'func (t *TxPublisher) monitor(blockEvent *chainntnfs.BlockEpochEvent) {'Length of output: 300
Script:
#!/bin/bash # Description: Verify the `monitor` method handles the `quit` channel correctly. # Test: Search for the `monitor` method and include more lines to capture the full method implementation. rg --type go -A 50 'func (t *TxPublisher) monitor(blockEvent *chainntnfs.BlockEpochEvent) {'Length of output: 300
Script:
#!/bin/bash # Description: Verify the `monitor` method handles the `quit` channel correctly. # Test: Search for the `monitor` method and include more lines to capture the full method implementation. rg --type go -A 50 'func (t \*TxPublisher\) monitor(blockEvent \*chainntnfs.BlockEpochEvent) {'Length of output: 309
Script:
#!/bin/bash # Description: Verify the `monitor` method handles the `quit` channel correctly. # Test: Search for the `monitor` method and include more lines to capture the full method implementation. rg --type go -A 50 'func (t *TxPublisher) monitor(blockEvent *chainntnfs.BlockEpochEvent) {'Length of output: 300
Script:
#!/bin/bash # Description: Verify the `monitor` method handles the `quit` channel correctly. # Test: Use `ast-grep` to search for the `monitor` method and include more lines to capture the full method implementation. ast-grep --lang go --pattern $'func (t *TxPublisher) monitor(blockEvent *chainntnfs.BlockEpochEvent) { $$$ }'Length of output: 1650
invoices/invoiceregistry.go (4)
104-105
: LGTM! The addition of atomic boolean fields improves state management.The
started
andstopped
fields ensure that theInvoiceRegistry
can only be started or stopped once, enhancing robustness.
219-250
: LGTM! TheStart
method now ensures single invocation.The use of
i.started.Swap(true)
prevents multiple starts, and the logging statement provides clear information about the state transitions.
255-273
: LGTM! TheStop
method now ensures single invocation and includes logging.The use of
i.stopped.Swap(true)
prevents multiple stops, and the logging statements enhance visibility into the state transitions.
273-273
: Ensure proper handling of thequit
channel in theinvoiceEventLoop
method.The
invoiceEventLoop
method should correctly handle thequit
channel, which is closed in theStop
method, to ensure a graceful shutdown.Verification successful
Let's correct the regex pattern and rerun the script to verify the handling of the
quit
channel in theinvoiceEventLoop
method.
Let's simplify the regex pattern and rerun the script to verify the handling of the
quit
channel in theinvoiceEventLoop
method.
Let's simplify the regex pattern even further and rerun the script to verify the handling of the
quit
channel in theinvoiceEventLoop
method.
Let's use
ast-grep
to search for theinvoiceEventLoop
method and verify the handling of thequit
channel.
Ensure proper handling of the
quit
channel in theinvoiceEventLoop
method.The
invoiceEventLoop
method correctly handles thequit
channel by returning from the loop when thequit
channel is closed, ensuring a graceful shutdown.
invoices/invoiceregistry.go:348-349
: Theselect
statement listens for thequit
channel and returns from the method when it receives a signal.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the `invoiceEventLoop` method handles the `quit` channel correctly. # Test: Search for the `quit` channel usage in the `invoiceEventLoop` method. Expect: Proper handling of the `quit` channel. rg --type go -A 10 $'func (i *InvoiceRegistry) invoiceEventLoop() {'Length of output: 225
Script:
#!/bin/bash # Description: Verify the `invoiceEventLoop` method handles the `quit` channel correctly. # Test: Search for the `quit` channel usage in the `invoiceEventLoop` method. Expect: Proper handling of the `quit` channel. rg --type go -A 20 'func (i \*InvoiceRegistry\) invoiceEventLoop\(\) {'Length of output: 237
Script:
#!/bin/bash # Description: Verify the `invoiceEventLoop` method handles the `quit` channel correctly. # Test: Search for the `quit` channel usage in the `invoiceEventLoop` method. Expect: Proper handling of the `quit` channel. rg --type go -A 20 'func \(i \*InvoiceRegistry\) invoiceEventLoop\(\) {'Length of output: 240
Script:
#!/bin/bash # Description: Verify the `invoiceEventLoop` method handles the `quit` channel correctly. # Test: Search for the `quit` channel usage in the `invoiceEventLoop` method. Expect: Proper handling of the `quit` channel. rg --type go -A 20 'func .*invoiceEventLoop.*\(\) {'Length of output: 180
Script:
#!/bin/bash # Description: Verify the `invoiceEventLoop` method handles the `quit` channel correctly. # Test: Search for the `invoiceEventLoop` method and its handling of the `quit` channel. ast-grep --lang go --pattern $'func (i *InvoiceRegistry) invoiceEventLoop() {\n $$$\n}'Length of output: 3609
graph/builder.go (2)
303-303
: Approved: Log statement addition inStart
function.The debug log statement
log.Debug("Builder started")
enhances visibility into the startup process of the builder.
329-329
: Approved: Log statement modification inStop
function.The debug log statement
log.Debug("Builder shutdown complete")
is now placed correctly to log immediately after the shutdown sequence is concluded.discovery/gossiper.go (1)
756-761
: Robustness Improvement: Added nil check ford.blockEpochs
The addition of the nil check before calling
d.blockEpochs.Cancel()
prevents potential panics ifd.blockEpochs
is not initialized. This enhances the robustness of theStop
method and ensures a safer shutdown process.server.go (32)
Line range hint
1883-1893
:
Initialize cleanup with the first subsystem.The
cleanup
variable is initialized and the first subsystem (customMessageServer
) is added to the cleanup list. This ensures that if any subsequent subsystem fails to start, the already started subsystems will be stopped in reverse order.
1900-1900
: Add host announcer to cleanup list.The
hostAnn
subsystem is conditionally added to the cleanup list, ensuring it is stopped if the startup process fails.
1908-1908
: Add liveness monitor to cleanup list.The
livenessMonitor
subsystem is conditionally added to the cleanup list, ensuring it is stopped if the startup process fails.
1920-1920
: Add signature pool to cleanup list.The
sigPool
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1926-1926
: Add write pool to cleanup list.The
writePool
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1932-1932
: Add read pool to cleanup list.The
readPool
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1938-1938
: Add chain notifier to cleanup list.The
cc.ChainNotifier
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1944-1944
: Add best block tracker to cleanup list.The
cc.BestBlockTracker
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1950-1950
: Add channel notifier to cleanup list.The
channelNotifier
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1956-1958
: Add peer notifier to cleanup list.The
peerNotifier
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1964-1964
: Add HTLC notifier to cleanup list.The
htlcNotifier
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1971-1971
: Add tower client manager to cleanup list.The
towerClientMgr
subsystem is conditionally added to the cleanup list, ensuring it is stopped if the startup process fails.
1978-1978
: Add transaction publisher to cleanup list.The
txPublisher
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1984-1984
: Add UTXO sweeper to cleanup list.The
sweeper
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1990-1990
: Add UTXO nursery to cleanup list.The
utxoNursery
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
1996-1996
: Add breach arbitrator to cleanup list.The
breachArbitrator
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2002-2002
: Add funding manager to cleanup list.The
fundingMgr
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2011-2011
: Add HTLC switch to cleanup list.The
htlcSwitch
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2017-2017
: Add interceptable switch to cleanup list.The
interceptableSwitch
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2023-2023
: Add chain arbitrator to cleanup list.The
chainArb
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2029-2030
: Add graph builder to cleanup list.The
graphBuilder
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2035-2036
: Add channel router to cleanup list.The
chanRouter
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2042-2043
: Add authenticated gossiper to cleanup list.The
authGossiper
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2048-2048
: Add invoices registry to cleanup list.The
invoices
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2054-2054
: Add sphinx to cleanup list.The
sphinx
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2060-2060
: Add channel status manager to cleanup list.The
chanStatusMgr
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2066-2066
: Add channel event store to cleanup list.The
chanEventStore
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2113-2113
: Add channel sub swapper to cleanup list.The
chanSubSwapper
subsystem is added to the cleanup list, ensuring it is stopped if the startup process fails.
2120-2120
: Add Tor controller to cleanup list.The
torController
subsystem is conditionally added to the cleanup list, ensuring it is stopped if the startup process fails.
2137-2137
: Start connection manager last.The
connMgr
is started last to prevent connections before initialization is complete. This ensures that all necessary subsystems are up and running before accepting connections.
2324-2326
: Add error handling for txPublisher.Stop.The
txPublisher.Stop
method now includes error handling to log any issues encountered during the stop process.
2346-2349
: Add channel event store to stop process.The
chanEventStore.Stop
method is now included in the stop process, ensuring it is properly stopped and any errors are logged.
8c3abec
to
5639468
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we are missing a few nil check,
diff --git a/htlcswitch/link.go b/htlcswitch/link.go
index f39a12b2b..7b5b60295 100644
--- a/htlcswitch/link.go
+++ b/htlcswitch/link.go
@@ -533,6 +533,7 @@ func (l *channelLink) Start() error {
}()
}
+ // Needs to check this.
l.updateFeeTimer = time.NewTimer(l.randomFeeUpdateTimeout())
l.wg.Add(1)
diff --git a/lnwallet/chainfee/estimator.go b/lnwallet/chainfee/estimator.go
index d9a402964..0f291b724 100644
--- a/lnwallet/chainfee/estimator.go
+++ b/lnwallet/chainfee/estimator.go
@@ -860,6 +860,7 @@ func (w *WebAPIEstimator) Start() error {
log.Infof("Web API fee estimator using update timeout of %v",
feeUpdateTimeout)
+ // Needs to check this.
w.updateFeeTicker = time.NewTicker(feeUpdateTimeout)
w.wg.Add(1)
diff --git a/tor/controller.go b/tor/controller.go
index 47ea6e129..9c5eb13d6 100644
--- a/tor/controller.go
+++ b/tor/controller.go
@@ -164,6 +164,7 @@ func (c *Controller) Start() error {
return fmt.Errorf("unable to connect to Tor server: %w", err)
}
+ // Need check this.
c.conn = conn
return c.authenticate()
Make sure that each subsystem only starts and stop once. This makes sure we don't close e.g. quit channels twice.
This commit does two things. It starts up the server in a way that it can be interrupted and shutdown gracefully. Moreover it makes sure that subsystems clean themselves up when they fail to start. This makes sure that depending subsytems can shutdown gracefully as well and the shutdown process is not stuck.
5639468
to
db1b09f
Compare
With this PR we might call the stop method even when the start method of a subsystem did not successfully finish therefore we need to make sure we guard the stop methods for potential panics if some variables are not initialized in the contructors of the subsystems.
db1b09f
to
0adcb5c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
chanfitness/chaneventstore.go
Outdated
err = fmt.Errorf("ChannelEventStore FlapCountTicker not " + | ||
"initialized") | ||
} else { | ||
c.cfg.FlapCountTicker.Stop() | ||
} | ||
|
||
log.Debugf("ChannelEventStore shutdown complete") | ||
|
||
return nil | ||
return err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
non blocking: i'd say this is an error worth logging but not returning. The Stop function itself did not error here, it was just that Start never ran/completed. cause this makes it seem like "error stopping chanEventStore" even though there wasnt really an error stopping it. But defs not a big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment for a few other spots in this commit
Fixes #8489EDIT: Fixes #8721
So in the above linked issue, the channel graph could not be synced correctly so the ChanRouter:
...
so the 34 query failed and therefore the startup of the chanrouter failed as well.
We fail here and never call the
Stop
function of the channel router.https://github.com/lightningnetwork/lnd/blob/master/routing/router.go#L628
When cleaning up all the other subsystems we get stuck however:
because we don't close the quit channel of the channel router and therefore the
Authenticated Gossiper
cannot stop as well so the cleanup process is stuck holding up the shutdown of all subsystems, causing some sideeffects because other subsystems are still running.Goroutine Dump:
So we need to think how to prevent those situations, because I think we don't close the
quit channel
for almost all subsystems when the start fails.