Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Store channels per-peer #1507

Conversation

ViktorTigerstrom
Copy link
Contributor

@ViktorTigerstrom ViktorTigerstrom commented Jun 1, 2022

Closes #105 and enables #422 and #1278.

Moves the storage of our channels from the ChannelManager::ChannelHolder to the ChannelHolder::PeerState to enable that we store the channels per-peer instead of a single map.

This allows us to have channels with the same temporary_channel_id for different peers, as specified by BOLT2:
https://github.com/lightning/bolts/blob/4b62d26af93a07166d6a9de2cb5eff80849d9c19/02-peer-protocol.md?plain=1#L172

@codecov-commenter
Copy link

codecov-commenter commented Jun 1, 2022

Codecov Report

Base: 90.74% // Head: 90.72% // Decreases project coverage by -0.02% ⚠️

Coverage data is based on head (edef5e8) compared to base (d8a20ed).
Patch coverage: 86.25% of modified lines in pull request are covered.

❗ Current head edef5e8 differs from pull request most recent head 23fdcca. Consider uploading reports for the commit 23fdcca to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1507      +/-   ##
==========================================
- Coverage   90.74%   90.72%   -0.03%     
==========================================
  Files          96       96              
  Lines       50133    50523     +390     
  Branches    50133    50523     +390     
==========================================
+ Hits        45493    45835     +342     
- Misses       4640     4688      +48     
Impacted Files Coverage Δ
lightning/src/ln/channelmanager.rs 87.18% <ø> (+0.52%) ⬆️
lightning/src/ln/functional_test_utils.rs 91.46% <45.00%> (-2.31%) ⬇️
lightning-persister/src/lib.rs 93.51% <92.85%> (+0.06%) ⬆️
lightning/src/ln/functional_tests.rs 96.98% <95.94%> (-0.08%) ⬇️
lightning/src/ln/chanmon_update_fail_tests.rs 97.59% <100.00%> (ø)
lightning/src/ln/monitor_tests.rs 99.56% <100.00%> (ø)
lightning/src/ln/onion_route_tests.rs 97.64% <100.00%> (+<0.01%) ⬆️
lightning/src/ln/payment_tests.rs 98.73% <100.00%> (-0.01%) ⬇️
lightning/src/ln/reload_tests.rs 95.58% <100.00%> (+0.01%) ⬆️
lightning/src/ln/reorg_tests.rs 100.00% <100.00%> (ø)
... and 10 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@ViktorTigerstrom
Copy link
Contributor Author

Snap, will address the failing check_commits check tomorrow. Sorry about that!

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some comments with some examples of a few high-level patterns here - you've added a lot of unreachable branches, some of which I think look quite reachable, at a minimum there should probably be a comment in a few of the cases explaining why you're confident those are unreachable. Similarly, there are quite a few places you take the per_peer_state write lock where I think you only need the read lock. I'd really like to see some kind of writeup in the documentation for the various internal fields explaining the consistency guarantees between the channel_state fields and the per_peer_state stuff, especially given we'd really like to remove stuff from channel_state over time and take individual locks for a shorter period to get better parallelism.

/// added to `ChannelMonitor`s in the future, this map should be removed, as only the
/// `ChannelManager::per_peer_state` is required to access the channel if the
/// `counterparty_node_id` is passed with `MonitorEvent`s.
pub(super) id_to_peer: HashMap<[u8; 32], PublicKey>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe lets move this out of the ChannelHolder, not sure what reason we'd have to put it in there and easy to just track it separately? Once you do that, I think we can drop the $id_to_peer argument in most of the macros and just take the lock for a very short period by accessing $self.

@@ -511,31 +511,32 @@ macro_rules! get_htlc_update_msgs {

#[cfg(test)]
macro_rules! get_channel_ref {
($node: expr, $lock: ident, $channel_id: expr) => {
($node: expr, $peer_state_lock: ident, $channel_id: expr) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep the old style here where we do the lock inside of the macro to DRY up the callsites, which are now a bit more verbose.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I opted for this version as I simply had too many problems in getting your suggested version to work. But I'll give it another go and try to fix that :)

hash_map::Entry::Vacant(_) => return Err(APIError::ChannelUnavailable{err: "No such channel".to_owned()})
break Ok(());
},
hash_map::Entry::Vacant(_) => return Err(APIError::ChannelUnavailable{err: format!("No such channel for the passed counterparty_node_id {}", counterparty_node_id) })
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe update the error messages in a separate commit? This single commit is...kinda huge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Yeah, huge is a bit of an understatement hah 😄

/// the handling of the events.
///
/// TODO:
/// The `counterparty_node_id` can't be passed with `MonitorEvent`s currently, as adding it to
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requiring it in MonitorEvents breaks backwards compat, adding it does not :) Maybe phrase this as a "we should add this to the monitor events and then eventually rely on it".

/// Because adding or removing an entry is rare, we usually take an outer read lock and then
/// operate on the inner value freely. Sadly, this prevents parallel operation when opening a
/// new channel.
/// Currently, we usually take an outer read lock and then operate on the inner value freely.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh? Multiple threads can take the read lock, so that still allows for parallel operation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Snap thanks! That ment to say write lock, but given that I'll update that, I'll change this as well :)

},
hash_map::Entry::Vacant(entry) => { entry.insert(channel); }
}
} else { unreachable!() }
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this unreachable? I don't think it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given your feedback on "taking the write lock, then dropping it, then taking it again", I'd agree :). I'll make sure to be more careful about this and other unreachable clauses.

if is_permanent {
remove_channel!(self, channel_state, chan_entry);
break result;
let per_peer_state = self.per_peer_state.write().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this taking a write lock? In general it seems you take a write lock in quite a few places where its not necessary.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the reasoning behind taking the write lock here as well as in other places where it's not really needed is because of what i said in the PR message above. But I'll make sure to change that given that it shouldn't be necessary then :)

Copy link
Contributor Author

@ViktorTigerstrom ViktorTigerstrom Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading through this again I realize that I should probably specify a scenario I'm worried about without digging into the message/request thread execution further:
If we grab the read lock only, in the places where we only need operate on the inner lock, to I'm a bit worried for a scenario where for example large routing nodes would have threads that grab the outer read lock rapidly, so that there's almost always at least one read lock acquired. This would then cause issues for the threads that want to access the outer lock with write access, as they would need to wait an extensive amount of time to access the write lock, locking the thread meanwhile.
Though, like I stated in my PR message, I'm haven't digged into that deep enough to understand if this is a cause for concern, or if that's nothing we need to worry about :)!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yea, writer starvation could be a concern. We have a FairRwLock that we use in PeerManager for this, which would let you take read locks without worrying too much about it :).

},
None => { return Err(APIError::ChannelUnavailable { err: "No such channel".to_owned() }) },
let (res, chan) = {
let per_peer_state = self.per_peer_state.write().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets avoid taking the write lock, then dropping it, then taking it again, it makes it really hard to reason about the consistency across the two parts of a single function, and I think it likely makes the unreachable down in the else branch of the second per_peer_state lock actually reachable.

@ViktorTigerstrom
Copy link
Contributor Author

Thanks a lot for the feedback @TheBlueMatt! Will address it as soon as I can. I will therefore push the failing check commits fix together with the feedback addressing as well :)

@ViktorTigerstrom
Copy link
Contributor Author

Would you like me to update the name of the ChannelHolder as well as channel_state given that the channels aren't stored there any longer?

/// SCIDs (and outbound SCID aliases) to the real channel id. Outbound SCID aliases are added
/// here once the channel is available for normal use, with SCIDs being added once the funding
/// transaction is confirmed at the channel's required confirmation depth.
pub(super) short_to_id: HashMap<u64, [u8; 32]>,
pub(super) short_to_id: HashMap<u64, (PublicKey, [u8; 32])>,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe update the variable name and consequence now you're storing the counterparty_node_id here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sure :)

@ariard
Copy link

ariard commented Jun 6, 2022

Would you like me to update the name of the ChannelHolder as well as channel_state given that the channels aren't stored there any longer?

Could be something like CrossChanStateHolder or CrossStateHolder.

@TheBlueMatt
Copy link
Collaborator

Would you like me to update the name of the ChannelHolder as well as channel_state given that the channels aren't stored there any longer?

I guess you could, no strong opinion. Ideally we'll just remove it sooner or later anyway, so dunno if its worth touching every line with channel_state, but up to you :).

See-also #1403 (comment) - dunno if you want to take that part of this PR out to help move #1403 forward.

@ViktorTigerstrom
Copy link
Contributor Author

ViktorTigerstrom commented Jun 6, 2022

Could be something like CrossChanStateHolder or CrossStateHolder.

I guess you could, no strong opinion. Ideally we'll just remove it sooner or later anyway, so dunno if its worth touching every line with channel_state, but up to you :).

Thanks a lot for the review @ariard!

Ok, given that there's no immediate need to update the name, I'll leave it for now as this PR is getting quite huge, and it might therefore be better to leave the name change for a follow-up later because of that.

Sorry for the delay on this one, I'm gonna try to get time to address the feedback later today!

@TheBlueMatt
Copy link
Collaborator

Looks like this needs rebase after #1527. I do wonder if we can split this PR up a bit more - either more commits or ideally even more PRs - to do various smaller, more reviewable new-parameters and such changes before the big honking "move things around" change.

@ViktorTigerstrom
Copy link
Contributor Author

ViktorTigerstrom commented Jun 9, 2022

Looks like this needs rebase after #1527. I do wonder if we can split this PR up a bit more - either more commits or ideally even more PRs - to do various smaller, more reviewable new-parameters and such changes before the big honking "move things around" change.

Sure I'll try to split it up as much as possible! I'll push the new maps in a separate PR and such. Unfortunately though, I think it's quite unavoidable to not make the big "move things around" commit huge, as long as we want every separate commit to build. I'll try to focus it as much as possible on only moving things around, and not changing logic, error messages and such in the same commit though :).

@TheBlueMatt
Copy link
Collaborator

Yea, there's no question that final commit is going to be big no matter what we do. My only point here is if there's anything we can split at all, we absolutely should, and if possible even into a new PR.

@ViktorTigerstrom
Copy link
Contributor Author

ViktorTigerstrom commented Jun 14, 2022

Apologies for taking so long time to address the feedback! This PR has now been split up and is now based on #1542. So this can be marked as blocked by dependent PR.

I'd really like to see some kind of writeup in the documentation for the various internal fields explaining the consistency guarantees between the channel_state fields and the per_peer_state stuff, especially given we'd really like to remove stuff from channel_state over time and take individual locks for a shorter period to get better parallelism.

@TheBlueMatt, could you please specify a bit what consistency guarantees you would like to have documented? Is it mainly the consistency guarantees between the different maps you're interested in :)?

I still need to update some of the commit messages, but will get to that when i squash the commits :).

Happy to address any feedback :)

@ViktorTigerstrom ViktorTigerstrom marked this pull request as ready for review June 14, 2022 20:21
@TheBlueMatt
Copy link
Collaborator

@TheBlueMatt, could you please specify a bit what consistency guarantees you would like to have documented? Is it mainly the consistency guarantees between the different maps you're interested in :)?

Yea, basically that - in what order do locks need to be taken, what lock ensures what is consistent with what, etc.

@TheBlueMatt
Copy link
Collaborator

Needs rebase 🚀

@ViktorTigerstrom ViktorTigerstrom force-pushed the 2022-05-store-channels-per-peer branch from ae665ce to cdc0f13 Compare July 13, 2022 08:01
@ViktorTigerstrom
Copy link
Contributor Author

Rebased on main now that the blocking PR has been merged :)!

@TheBlueMatt
Copy link
Collaborator

Can you go ahead and squash fixups down and clean up the git history? Then I'll do another round of review.

@ViktorTigerstrom ViktorTigerstrom force-pushed the 2022-05-store-channels-per-peer branch from 1e60496 to d536b69 Compare July 13, 2022 19:14
@ViktorTigerstrom
Copy link
Contributor Author

Can you go ahead and squash fixups down and clean up the git history? Then I'll do another round of review.

Sure squashed :)! Though I've realized I still haven't written the docs for the consistency guarantees. A question before I write the documentation though. Do we want this PR to not change the lock order for the channel_state and the peer_state locks, and leave that for a follow-up PR?

@TheBlueMatt
Copy link
Collaborator

TheBlueMatt commented Jul 13, 2022

On second thought, I'm gonna try to get #1420 passing CI and we'll want to test this with that patch as well (doesn't need to wait on #1420 to land, just want to at least run it with both to make sure there's no errors caught with the additions there).

This passes.

@ViktorTigerstrom ViktorTigerstrom force-pushed the 2022-05-store-channels-per-peer branch from c06abcd to 23fdcca Compare January 9, 2023 22:50
@ViktorTigerstrom
Copy link
Contributor Author

Feel free to always squash in fixups since this is super close :)

Thanks, squashed!

Copy link
Collaborator

@TheBlueMatt TheBlueMatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooookkayyy this looks good to me. There's some notes here that I'd like to see addressed in a followup, but they're not bugs, and in the interest of delaying this, I'm happy to see it land.

@@ -1651,24 +1686,18 @@ where

let mut failed_htlcs: Vec<(HTLCSource, PaymentHash)>;
let result: Result<(), _> = loop {
let mut channel_state_lock = self.channel_state.lock().unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, when we remove a channel we need to check and see if the peer the channel was connected to has since disconnected and remove the per_peer entry for it. ie if we force-close a channel for a peer we aren't connected to we have to clean the per_peer entry. I'm not 100% sure how best to do this without always taking the per_peer_state write lock, which would kill paralleism, so maybe we can do it in the timer. In any case, we should ensure it doesn't leave us vulnerable to a DoS like in #1889.

break;
}

let peer_state_mutex = per_peer_state.as_ref().unwrap().get(&counterparty_node_id).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't do redundant lookups back-to-back :)

@@ -3684,73 +3771,95 @@ where
}

fn claim_funds_from_hop<ComplFunc: FnOnce(Option<u64>) -> Option<MonitorUpdateCompletionAction>>(&self,
mut channel_state_lock: MutexGuard<ChannelHolder<<K::Target as SignerProvider>::Signer>>,
per_peer_state_lock: RwLockReadGuard<HashMap<PublicKey, Mutex<PeerState<<K::Target as SignerProvider>::Signer>>>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to claim_funds_internal, I think we should just drop the lock argument here.

};

let (found_channel, mut peer_state_opt) = if counterparty_node_id_opt.is_some() && per_peer_state_lock.get(&counterparty_node_id_opt.unwrap()).is_some() {
let peer_mutex = per_peer_state_lock.get(&counterparty_node_id_opt.unwrap()).unwrap();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, you're doing two lookups in a row.


if found_channel {
let peer_state = &mut *peer_state_opt.as_mut().unwrap();
if let hash_map::Entry::Occupied(mut chan) = peer_state.channel_by_id.entry(chan_id) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, we look up twice.

let per_peer_state = self.per_peer_state.read().unwrap();
let peer_state_mutex_opt = per_peer_state.get(counterparty_node_id);
if let None = peer_state_mutex_opt {
return Err(MsgHandleErrInternal::send_err_msg_no_close(format!("Can't find a peer matching the passed counterparty node_id {}", counterparty_node_id), msg.temporary_channel_id))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the message handling functions, we should debug_assert!(false) here - the peer handler should never pass us messages for peers we don't think are connected.

}});
if ev_index.is_some() {
let mut updated_msg_events = msg_events.to_vec();
let ev = updated_msg_events.remove(ev_index.unwrap());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're cloning the entire event vec just to remove one and return it? Let's instead pass in a mut reference to msg_events and remove that way.

let mut peer_state_lock = peer_state_mutex.lock().unwrap();
let peer_state = &mut *peer_state_lock;
if peer_state.pending_msg_events.len() > 0 {
let mut peer_pending_events = Vec::new();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you can drop this entirely, no? append "leaves the other vec empty"

Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First pass minus tests, nothing major. I think we're going to land this and I'll do more post-merge review this week, so we can get this out of the way :)

channel_state.pending_msg_events.push(events::MessageSendEvent::SendOpenChannel {
node_id: their_network_key,
msg: res,
});
Ok(temporary_channel_id)
}

fn list_channels_with_filter<Fn: FnMut(&(&[u8; 32], &Channel<<K::Target as SignerProvider>::Signer>)) -> bool>(&self, f: Fn) -> Vec<ChannelDetails> {
fn list_channels_with_filter<Fn: FnMut(&(&[u8; 32], &Channel<<K::Target as SignerProvider>::Signer>)) -> bool + Copy>(&self, f: Fn) -> Vec<ChannelDetails> {
let mut res = Vec::new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use with_capacity for conciseness

/// Because adding or removing an entry is rare, we usually take an outer read lock and then
/// operate on the inner value freely. Sadly, this prevents parallel operation when opening a
/// new channel.
/// operate on the inner value freely. This opens up for parallel per-peer operation for
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// operate on the inner value freely. This opens up for parallel per-peer operation for
/// operate on the inner value freely. This opens up parallel per-peer operation for

Comment on lines +742 to +744
/// The bulk of our storage will eventually be here (message queues and the like). Currently
/// the `per_peer_state` stores our channels on a per-peer basis, as well as the peer's latest
/// features.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't message queues move in this PR?

Suggested change
/// The bulk of our storage will eventually be here (message queues and the like). Currently
/// the `per_peer_state` stores our channels on a per-peer basis, as well as the peer's latest
/// features.
/// The bulk of our storage. Stores our channels on a per-peer basis, as well as the peer's latest
/// features.

let per_peer_state = self.per_peer_state.read().unwrap();

let peer_state_mutex_opt = per_peer_state.get(&their_network_key);
if let None = peer_state_mutex_opt {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use is_none() and elsewhere

}
);
let per_peer_state = self.per_peer_state.read().unwrap();
if let Some(peer_state_mutex) = per_peer_state.get(&counterparty_node_id) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should error if the peer is not found in this method, fine to save for followup though

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm I'm not 100% sure what the correct result is here. We successfully managed to force_close the channel, but we're unable send this message to the peer. The message would just be dropped anyways if we're no longer connected to the peer, so it maybe should be an OK(()) res?

I'll return Err here in the follow-ups though, and then we can take decide there what the correct approach is :)!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second glance, I think you're right and we shouldn't bother Erring :)

@@ -2187,7 +2225,7 @@ where
/// public, and thus should be called whenever the result is going to be passed out in a
/// [`MessageSendEvent::BroadcastChannelUpdate`] event.
///
/// May be called with channel_state already locked!
/// May be called with peer_state already locked!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment confuses me because since a Channel is being passed in, peer_state must be locked, but not a big deal

Comment on lines +2714 to +2718
let peer_state_mutex_opt = per_peer_state.get(counterparty_node_id);
if let None = peer_state_mutex_opt {
return Err(APIError::APIMisuseError{ err: format!("Can't find a peer matching the passed counterparty node_id {}", counterparty_node_id) });
}
let mut peer_state_lock = peer_state_mutex_opt.unwrap().lock().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like this would be more idiomatic/concise:

Suggested change
let peer_state_mutex_opt = per_peer_state.get(counterparty_node_id);
if let None = peer_state_mutex_opt {
return Err(APIError::APIMisuseError{ err: format!("Can't find a peer matching the passed counterparty node_id {}", counterparty_node_id) });
}
let mut peer_state_lock = peer_state_mutex_opt.unwrap().lock().unwrap();
let peer_state_mutex = per_peer_state.get(counterparty_node_id)
.ok_or_else(|| APIError::APIMisuseError{ err: format!("Can't find a peer matching the passed counterparty node_id {}", counterparty_node_id) })?;
let mut peer_state_lock = peer_state_mutex.lock().unwrap();

and elsewhere

@ViktorTigerstrom
Copy link
Contributor Author

ViktorTigerstrom commented Jan 10, 2023

Thanks a lot @TheBlueMatt and @valentinewallace for reviewing this and the preceding PRs! Super happy to see this finally getting in ready state as it's been quite a while since this was initially opened 🚀!

I won't be able to address the feedback today as it's already night here. I'll start working on it tomorrow though :).
If you want to merge this to avoid further merge conflicts with other PRs, I'm super ok with addressing it all in follow-ups instead, but either way is ok for me. Most of it should be quick and straight forward with the exception of #1507 (comment) :).

@TheBlueMatt
Copy link
Collaborator

#1944 should fix the windows issues. Once that gets a review or two will land this.

@ViktorTigerstrom
Copy link
Contributor Author

ViktorTigerstrom commented Jan 10, 2023

Thanks a lot for digging into the issue @TheBlueMatt!
Would you like me to rebase this on top of #1944 just to see that the regex matching fixes the issue, or are we ok with assuming it's solved?

@TheBlueMatt
Copy link
Collaborator

I think don't bother touching this PR, I tested an earlier variant of #1944 on my own branch based on this and it seemed to work so I'm happy with it, and it should be easy enough to fix if I'm wrong.

@ViktorTigerstrom
Copy link
Contributor Author

I think don't bother touching this PR, I tested an earlier variant of #1944 on my own branch based on this and it seemed to work so I'm happy with it, and it should be easy enough to fix if I'm wrong.

Ok great!

@TheBlueMatt TheBlueMatt merged commit 5221e4a into lightningdevkit:main Jan 10, 2023
@TheBlueMatt TheBlueMatt mentioned this pull request Jan 10, 2023
Copy link
Contributor

@valentinewallace valentinewallace left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, will take another look after the follow-up

#[cfg(debug_assertions)]
{
if let None = per_peer_state.get(&$counterparty_node_id) {
// This shouldn't occour in tests unless an unkown counterparty_node_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: s/occour/occur, s/unkown/unknown

@@ -3481,11 +3557,12 @@ where
fn fail_htlc_backwards_internal(&self, source: &HTLCSource, payment_hash: &PaymentHash, onion_error: &HTLCFailReason, destination: HTLCDestination) {
#[cfg(debug_assertions)]
{
// Ensure that the `channel_state` lock is not held when calling this function.
// Ensure that no peer state channel storage lock is not held when calling this
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/no peer state/the peer state

if let Some(channel) = by_id.get_mut(&previous_channel_id) {
channel.claim_htlc_while_disconnected_dropping_mon_update(claimable_htlc.prev_hop.htlc_id, payment_preimage, &args.logger);
if let Some(peer_node_id) = id_to_peer.get(&previous_channel_id){
let peer_state_mutex = per_peer_state.get(peer_node_id).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know this unwrap is safe?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The id_to_peer map isn't written/read from disk, its built as we deserialize channels, so at this point it should be guaranteed to be consistent with our channel set.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah agree that this is a bit confusing :). Even the channel existence check below is actually redundant, as we're guaranteed to have the channel here if it's exists in the id_to_peer map. I can of course add an extra if clause here though if you think it makes the code less confusing :).

Another general problem here going forward though: I'm a little worried here what we're gonna do here once we remove the id_to_peer map. Ideally we should be using the short_to_chan_info map here instead. Gonna look more into if we're guaranteed to have all channels we can have serialized claims for in the short_to_chan_info map here, especially channels that's not fully closed but about to (ie. is_usable is false).

@ViktorTigerstrom
Copy link
Contributor Author

Aiming to have the follow-ups ready by tomorrow!

@ViktorTigerstrom ViktorTigerstrom mentioned this pull request Jan 18, 2023
@ViktorTigerstrom
Copy link
Contributor Author

Sorry for the delay. I've now opened the followup PR #1966 were the feedback has been addressed :)!

k0k0ne pushed a commit to bitlightlabs/rust-lightning that referenced this pull request Sep 30, 2024
0.0.114 - Mar 3, 2023 - "Faster Async BOLT12 Retries"

API Updates
===========

 * `InvoicePayer` has been removed and its features moved directly into
   `ChannelManager`. As such it now requires a simplified `Router` and supports
   `send_payment_with_retry` (and friends). `ChannelManager::retry_payment` was
   removed in favor of the automated retries. Invoice payment utilities in
   `lightning-invoice` now call the new code (lightningdevkit#1812, lightningdevkit#1916, lightningdevkit#1929, lightningdevkit#2007, etc).
 * `Sign`/`BaseSign` has been renamed `ChannelSigner`, with `EcdsaChannelSigner`
   split out in anticipation of future schnorr/taproot support (lightningdevkit#1967).
 * The catch-all `KeysInterface` was split into `EntropySource`, `NodeSigner`,
   and `SignerProvider`. `KeysManager` implements all three (lightningdevkit#1910, lightningdevkit#1930).
 * `KeysInterface::get_node_secret` is now `KeysManager::get_node_secret_key`
   and is no longer required for external signers (lightningdevkit#1951, lightningdevkit#2070).
 * A `lightning-transaction-sync` crate has been added which implements keeping
   LDK in sync with the chain via an esplora server (lightningdevkit#1870). Note that it can
   only be used on nodes that *never* ran a previous version of LDK.
 * `Score` is updated in `BackgroundProcessor` instead of via `Router` (lightningdevkit#1996).
 * `ChainAccess::get_utxo` (now `UtxoAccess`) can now be resolved async (lightningdevkit#1980).
 * BOLT12 `Offer`, `InvoiceRequest`, `Invoice` and `Refund` structs as well as
   associated builders have been added. Such invoices cannot yet be paid due to
   missing support for blinded path payments (lightningdevkit#1927, lightningdevkit#1908, lightningdevkit#1926).
 * A `lightning-custom-message` crate has been added to make combining multiple
   custom messages into one enum/handler easier (lightningdevkit#1832).
 * `Event::PaymentPathFailure` is now generated for failure to send an HTLC
   over the first hop on our local channel (lightningdevkit#2014, lightningdevkit#2043).
 * `lightning-net-tokio` no longer requires an `Arc` on `PeerManager` (lightningdevkit#1968).
 * `ChannelManager::list_recent_payments` was added (lightningdevkit#1873).
 * `lightning-background-processor` `std` is now optional in async mode (lightningdevkit#1962).
 * `create_phantom_invoice` can now be used in `no-std` (lightningdevkit#1985).
 * The required final CLTV delta on inbound payments is now configurable (lightningdevkit#1878)
 * bitcoind RPC error code and message are now surfaced in `block-sync` (lightningdevkit#2057).
 * Get `historical_estimated_channel_liquidity_probabilities` was added (lightningdevkit#1961).
 * `ChannelManager::fail_htlc_backwards_with_reason` was added (lightningdevkit#1948).
 * Macros which implement serialization using TLVs or straight writing of struct
   fields are now public (lightningdevkit#1823, lightningdevkit#1976, lightningdevkit#1977).

Backwards Compatibility
=======================

 * Any inbound payments with a custom final CLTV delta will be rejected by LDK
   if you downgrade prior to receipt (lightningdevkit#1878).
 * `Event::PaymentPathFailed::network_update` will always be `None` if an
   0.0.114-generated event is read by a prior version of LDK (lightningdevkit#2043).
 * `Event::PaymentPathFailed::all_paths_removed` will always be false if an
   0.0.114-generated event is read by a prior version of LDK. Users who rely on
   it to determine payment retries should migrate to `Event::PaymentFailed`, in
   a separate release prior to upgrading to LDK 0.0.114 if downgrading is
   supported (lightningdevkit#2043).

Performance Improvements
========================

 * Channel data is now stored per-peer and channel updates across multiple
   peers can be operated on simultaneously (lightningdevkit#1507).
 * Routefinding is roughly 1.5x faster (lightningdevkit#1799).
 * Deserializing a `NetworkGraph` is roughly 6x faster (lightningdevkit#2016).
 * Memory usage for a `NetworkGraph` has been reduced substantially (lightningdevkit#2040).
 * `KeysInterface::get_secure_random_bytes` is roughly 200x faster (lightningdevkit#1974).

Bug Fixes
=========

 * Fixed a bug where a delay in processing a `PaymentSent` event longer than the
   time taken to persist a `ChannelMonitor` update, when occurring immediately
   prior to a crash, may result in the `PaymentSent` event being lost (lightningdevkit#2048).
 * Fixed spurious rejections of rapid gossip sync data when the graph has been
   updated by other means between gossip syncs (lightningdevkit#2046).
 * Fixed a panic in `KeysManager` when the high bit of `starting_time_nanos`
   is set (lightningdevkit#1935).
 * Resolved an issue where the `ChannelManager::get_persistable_update_future`
   future would fail to wake until a second notification occurs (lightningdevkit#2064).
 * Resolved a memory leak when using `ChannelManager::send_probe` (lightningdevkit#2037).
 * Fixed a deadlock on some platforms at least when using async `ChannelMonitor`
   updating (lightningdevkit#2006).
 * Removed debug-only assertions which were reachable in threaded code (lightningdevkit#1964).
 * In some cases when payment sending fails on our local channel retries no
   longer take the same path and thus never succeed (lightningdevkit#2014).
 * Retries for spontaneous payments have been fixed (lightningdevkit#2002).
 * Return an `Err` if `lightning-persister` fails to read the directory listing
   rather than panicing (lightningdevkit#1943).
 * `peer_disconnected` will now never be called without `peer_connected` (lightningdevkit#2035)

Security
========

0.0.114 fixes several denial-of-service vulnerabilities which are reachable from
untrusted input from channel counterparties or in deployments accepting inbound
connections or channels. It also fixes a denial-of-service vulnerability in rare
cases in the route finding logic.
 * The number of pending un-funded channels as well as peers without funded
   channels is now limited to avoid denial of service (lightningdevkit#1988).
 * A second `channel_ready` message received immediately after the first could
   lead to a spurious panic (lightningdevkit#2071). This issue was introduced with 0conf
   support in LDK 0.0.107.
 * A division-by-zero issue was fixed in the `ProbabilisticScorer` if the amount
   being sent (including previous-hop fees) is equal to a channel's capacity
   while walking the graph (lightningdevkit#2072). The division-by-zero was introduced with
   historical data tracking in LDK 0.0.112.

In total, this release features 130 files changed, 21457 insertions, 10113
deletions in 343 commits from 18 authors, in alphabetical order:
 * Alec Chen
 * Allan Douglas R. de Oliveira
 * Andrei
 * Arik Sosman
 * Daniel Granhão
 * Duncan Dean
 * Elias Rohrer
 * Jeffrey Czyz
 * John Cantrell
 * Kurtsley
 * Matt Corallo
 * Max Fang
 * Omer Yacine
 * Valentine Wallace
 * Viktor Tigerström
 * Wilmer Paulino
 * benthecarman
 * jurvis
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Need to relax temporary_channel_id check
5 participants