Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Ability to Run as Full Simulator #165

Merged
merged 7 commits into from
Mar 13, 2024

Conversation

carlaKC
Copy link
Contributor

@carlaKC carlaKC commented Feb 1, 2024

This PR is a WIP that adds the ability to run SimLN without any real lightning nodes, instead providing "simulated nodes" that copy the the routing policies and liquidity constraints of real nodes. The last commit hard-codes a graph to run this "full simulation" on as an example, whereas in reality we'd load from a json description of the desired topology/policies.

High Level Overview

Lightning implementations in SimLN are abstracted using the LightningNode trait. The steps that we follow to support a simulated implementation of this trait (SimNode) are as follows:

  • Run the simulator with a –from_graph option, which allows starting the simulator with a graph file (format tbd) that provides channel policies and topology (not included in this PR).
  • Use a single, top-level SimGraph struct to manage:
    • A hashmap of simulated channels available
    • Propagation of payments through the simulated channels
  • Implement LightningNode on a SimNode struct which:
    • Handles implementation of send/tracking functionality
    • Points to the coordinating SimGraph struct to manage payment dispatch.

Graph Abstraction

Management of payments through the simulated network is separated managed with the following layers:

  • SimGraph: high level coordinator that keeps a hashmap of each simulated channel (see below) in the network, along with their forwarding policies. Responsible for coordinating addition/removal of HTLCs as they propagate through the network.
  • SimNode: implementation of the LightningNode trait which rely on the SimGraph to propagate payments on their behalf.

Screenshot 2024-01-30 at 4 43 35 PM

These structs are primarily used to provide send/track payment APIs, receiving updates on the state of their payments from SimGraph.

Channel State Machine

Implementation of channel state is broken down into three layers:

  • Channel Policy: representation of the directional policy of a participant (fees etc)
  • Channel State: tracks the “live” state of the outgoing state of the channel
  • Simulated Channel: tracks state for each direction of the channel, and high level information

Screenshot 2024-01-30 at 4 19 13 PM

Each channel state is responsible for handling tracking of liquidity and HTLC limits in one direction, and the simulated channel is responsible for enforcing sanity checks across the channel (eg, that we don’t exceed our capacity).

An example state machine update is provided in the table below:

Step Alice Channel State Bob Channel State
Channel Opened by Alice Local_balance: 100_000 / In_Flight: nil Local_balance: 0 / In_Flight: Nil
Alice sends 100 sats to Bob Local_balance: 999_000 / In_Flight: 100 sats Local_balance: 0 / In_Flight: Nil
Alice payment succeeds Local_balance: 999_000 / In_Flight: nil Local_balance: 100 / In_Flight: Nil
Bob sends 50 sats to Alice Local_balance: 999_000 / In_Flight: nil Local_balance: 50 / In_Flight: 50
Bob payment fails Local_balance: 999_000 / In_Flight: nil Local_balance: 100 / In_Flight: nil

tl;dr

Screenshot 2024-01-30 at 4 01 34 PM

@@ -15,7 +15,9 @@ expanduser = "1.2.2"
serde = { version="1.0.183", features=["derive"] }
serde_json = "1.0.104"
bitcoin = { version = "0.30.1", features=["serde"] }
lightning = { version = "0.0.116" }
# LDK is on a different version to bitcoin to us, which leads to some issues, import so we can use its types.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there reasons why bitcoin_ldk types are being used as opposed to sticking with types from bitcoin?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long and painful story that's now resolved, but for prosperity:

We were relying of a different version of the bitcoin dependency to LDK:

  • sim-ln -> bitcoin-30
  • sim-ln -> lightning-117 -> bitocin-29

We couldn't downgrade bitcoin-30, because we're using a feature it it. But to use lightning-117 (specifically in pathfinding) we needed to provide the public key struct type in bitcoin-29. So this alias allowed me to import both types. This was messy and gross and thankfully solved by another LDK release that updated the bitcoin version.

sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
Copy link
Collaborator

@enigbe enigbe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a handful of small comments on code style and a question about using types from LDK's bitcoin library.

This would benefit from documentation comments across several structs and methods but expect they'd be updated before this is ready

@okjodom
Copy link
Collaborator

okjodom commented Feb 9, 2024

oh, my! 🎆 🥇

@carlaKC carlaKC force-pushed the sim-node branch 2 times, most recently from d58c28d to 8785cf6 Compare February 9, 2024 21:54
@carlaKC
Copy link
Contributor Author

carlaKC commented Feb 9, 2024

Pushed some additional docs + cleanup, no semantic changes!

@carlaKC carlaKC requested a review from sr-gi February 12, 2024 14:20
@carlaKC carlaKC force-pushed the sim-node branch 2 times, most recently from 4dc5455 to 3fb2d0d Compare February 16, 2024 16:17
@carlaKC
Copy link
Contributor Author

carlaKC commented Feb 16, 2024

Pushed a off by one bug that I caught writing tests!

Copy link
Member

@sr-gi sr-gi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the design makes sense overall. I may need to give this a second pass down the line (the diff is huge 😅 ) but my current understanding is as follows:

SimNode is the main class that mimics our current node implementations, and hence it implements LightningNode. It has a shared reference to a network (Arc<Mutex<SimGraph>>) that represents the simulated network. The network includes all channels, which have a breakdown in classes all the way down to the channel policies. The construction of a simulation works as usual: we create a list of clients that are instances that implement LightningNode (in this case SimNodes), when a payment is made, the simulated node accesses the graph via its shared pointer and triggers all the state transitions, which are managed by the SimGraph.

I left a bunch of suggestions, most (if not all of them) can be found here: sr-gi@67f810e. Feel free to cherry-pick as desired.

Some general comments:

  • I don't think 604019d should be needed. We can have a different constructor for the simulator when calling it with a simulated graph (Simulator::with_sim_channels or something) that receives all the usual but instead of a map of clients, receives a collection of simulated channels. That constructor could have the functionality of ln_node_from_graph to build the "clients", plus construct (sr-gi@67f810e#diff-f891364aff98268ac6d94109bd61594c23af50e37e547fec9b0ea44a816459a0R390-R421). Doing this gets rids of a bunch of imports, plus having to define the triggers in the CLI and so on.

  • I think d2ea936 should be squashed, I don't think it's meaningful enough to be a commit on its own (but up to you in the end)

sim-lib/Cargo.toml Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
@carlaKC carlaKC marked this pull request as ready for review February 27, 2024 13:52
@carlaKC
Copy link
Contributor Author

carlaKC commented Feb 27, 2024

Thanks for the mega-review @sr-gi 🫶

Not sure if this is helpful, but I've pushed a single fixup for each commit with the suggested changes (each fixup directly follows its original commit) with the hope of making round two a little easier! If it's not useful I'll just go ahead and squash.

Added a few responses inline where the big-picture stuff needs some discussion.

@carlaKC carlaKC changed the title [draft]: Add Ability to Run as Full Simulator Add Ability to Run as Full Simulator Feb 27, 2024
@carlaKC carlaKC requested a review from sr-gi February 27, 2024 13:58
@carlaKC
Copy link
Contributor Author

carlaKC commented Feb 27, 2024

Also, I've started on some unit test coverage for this but going to put it in as a separate PR because this one is already far too big :')

sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
Copy link
Member

@sr-gi sr-gi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! 🎉

I can confirm most of the comments have been addressed or refuted, just left a few comments on the open discussion plus things that may have been passed unnoticed.

I'm happy with the fixups to be squashed at this point.

sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
Comment on lines 489 to 513
let preimage = PaymentPreimage(rand::random());
let preimage_bytes = Sha256::hash(&preimage.0[..]).to_byte_array();
let payment_hash = PaymentHash(preimage_bytes);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I opened a PR upstream to add this, feels like a nice utility to have as part of the API: lightningdevkit/rust-lightning#2916

Comment on lines 626 to 629
for channel in graph_channels.iter() {
channels.insert(channel.short_channel_id, channel.clone());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterating taking ownership (into_iter) and moving the insert after the internal for loop will allow you to drop the channel clone

sim-lib/src/sim_node.rs Show resolved Hide resolved
sim-lib/src/sim_node.rs Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
sim-lib/src/sim_node.rs Outdated Show resolved Hide resolved
Comment on lines 714 to 796
match self.nodes.get(node) {
Some(channels) => Ok((node_info(*node), channels.clone())),
None => Err(LightningError::GetNodeInfoError(
"Node not found".to_string(),
)),
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was not addressed, but feel free to disregard it (just keeping track of what was covered)

sim-lib/src/sim_node.rs Show resolved Hide resolved
Comment on lines 202 to 210
let (shutdown_trigger, shutdown_listener) = triggered::trigger();
let sim = Simulation::new(
clients,
validated_activities,
cli.total_time,
cli.expected_pmt_amt,
cli.capacity_multiplier,
write_results,
(shutdown_trigger, shutdown_listener),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh very nice, I think I'm going to leave this for a follow up because I'll drop the commit where we actually use the simulator before merge (to be followed up with surfacing the config options to use it).

@carlaKC
Copy link
Contributor Author

carlaKC commented Mar 11, 2024

Squashed + addressed last comments. I've removed the commit which actually uses the simulator, because we need to add a configuration option for an end user to provide that format (--simulate-network, or something like that). Only major change is that I added a ShortChannelID and went with a struct so that we can impl Display on it.

Will follow up with a PR adding tests + that option.

@carlaKC carlaKC requested review from sr-gi and enigbe March 11, 2024 20:46
@sr-gi
Copy link
Member

sr-gi commented Mar 12, 2024

ACK f199c2f

Last comment, feel free to add it as a follow-up if you find this useful:

You can add a couple of utility functions to ShortChannelID to convert back and forth from/to u64, that way you can simply call x.into() in the code when converting, without having to even use the wrappers/inner values:

/// Utility function to easily convert from u64 to `ShortChannelID`
impl From<u64> for ShortChannelID {
    fn from(value: u64) -> Self {
        ShortChannelID(value)
    }
}

/// Utility function to easily convert `ShortChannelID` into u64
impl From<ShortChannelID> for u64 {
    fn from(scid: ShortChannelID) -> Self {
        scid.0
    }
}

Here are a couple of examples:

- short_channel_id: channel.short_channel_id.0,
+ short_channel_id: channel.short_channel_id.into(),
- let scid = ShortChannelID(hop.short_channel_id);
+ let scid = hop.short_channel_id.into();
- return Err(ForwardingError::ChannelNotFound(ShortChannelID(
-                    hop.short_channel_id,
-                )))
+ return Err(ForwardingError::ChannelNotFound(
+                   hop.short_channel_id.into(),
+              ))

This commit adds a ChannelState struct which is used to track the
policy and state of a channel in the *outgoing* direction. This will
be used to check forwards against the node's advertised policy and
track the movement of outgoing HTLCs through the channel.

Note that we choose to implement this state *unidirectionally*, so a
single channel will be represented by two ChannelState structs (one in
each direction).
Add a single representation of a simulated lightning channel which
uses our state representation to add and remove htlcs. For simplicity,
each side of the channel is represented as a separate state, with the
only interaction between the two through the changes to local balance
that happen when we settle htlcs.
Add an implementation of the LightningNode trait that represents
the underlying lightning node. This implementation is intentionally
kept simple, depending on some SimNetwork trait to handle the mechanics
of actually simulating the flow of payments through a simulated graph.
We want to be able to distinguish between expected and critical payment
errors in our simulated payments. An expected error occurs due to a
lightning-related failure (such as running out of liquidity), and a
critical one happens because something has gone wrong with our
simulator (for example, an assertion failure about balances).

This commit adds an some utilities to ForwardingError to make this
distinction and display errors properly.
Add an implementation of our SimNetwork trait that will do the heavy
lifting of propagating htlcs through a simulated network.
@carlaKC
Copy link
Contributor Author

carlaKC commented Mar 13, 2024

You can add a couple of utility functions to ShortChannelID to convert back and forth from/to u64, that way you can simply call x.into() in the code when converting, without having to even use the wrappers/inner values:

Added in last push!

@carlaKC carlaKC merged commit b66a84a into bitcoin-dev-project:main Mar 13, 2024
2 checks passed
@carlaKC
Copy link
Contributor Author

carlaKC commented Mar 13, 2024

Thanks for the great review on this @sr-gi 🫶🫶🫶

@sr-gi
Copy link
Member

sr-gi commented Mar 13, 2024

Anytime :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants