Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIP-0078? | Extended Local Chain Sync Protocol #375

Closed
wants to merge 5 commits into from
Closed
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
111 changes: 111 additions & 0 deletions cip-ledger-state.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
---
CIP: ?
erikd marked this conversation as resolved.
Show resolved Hide resolved
Title: Extended Local Chain Sync Protocol
Authors: Erik de Castro Lopo <[email protected]>
Discussions-To: Erik de Castro Lopo <[email protected]>
Comments-Summary: Extend Local Chain Sync Protocol
Comments-URI:
Status: Draft
Type: ?
Created: 2022-11-15
License: CC-BY-4.0
---
## Abstract

Modify the `cardano-node` (and underlying code) to provide an extended version of the existing
local chain sync protocol.

## Motivation

Applications that provide insight into the Cardano block chain (like db-sync, exporters, Kupo, and
smart contracts reacting to events) often need access to the current state of the Cardano ledger
(stake distribution, reward and wallet balances etc). This information is known to the node, partly
on the block chains and partly in what is referred to as ledger state (described more fully below).
Extracting block chain data is relatively easy, but ledger state data is not. Currently these
applications have to recreate and maintain ledger state themselves based on the block information
they stream from the node over the local chain sync protocol. Recreation and maintenance of the
ledger state is not only complex and a source of bugs but more importantly requires significant
resources, specifically RAM. Ledger state in memory currently consumes 10 Gig of RAM and that is
growing. In situations where the node and any such application run on the same machine, the machine
ends up with twice the resource usage. The following proposal hopes to reduce resource usage and
complexity for chain following applications like db-sync.

# Current Situation
erikd marked this conversation as resolved.
Show resolved Hide resolved

Currently there is a local chain sync protocol which is really just the peer-to-peer protocol
using a local domain socket rather than the TCP/IP socket normally used for P2P transport.

The data transported over this local chain sync protocol is limited to block chain data. However, a
Cardano node also maintains ledger state which includes:

* The current UTxO state.
* Current amount of ADA delegated to each stake pool.
* Which stake address is currently delegated to each pool.
* Rewards account balances for each stake address.
* Current protocol parameters.

The first of these ledger state components is by far the largest component and is probably not
needed outside the node (and definitely not needed by db-sync). However the others are needed and
stored by `cardano-db-sync` which gets these data sets by maintaining its own copy of ledger state
and periodically extracting the parts required.

This means that when `node` and `db-sync` are run on the same machine (which is the recommended
configuration) that machine has two basically identical copies of ledger state. Ledger state is a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's little use-case for the UTxO state or the protocol parameters because these are straightforward to obtain from the chain-sync protocol itself.

However, anything that regards rewards is indeed a pain in the *** to work with; because rewards are implicit in the protocol. I'd love to see a quick justification here (instead of "basically" 😅 ...) stating just that. There's no way for a client application to provide reliable information on rewards without maintaining the entire ledger state; for one needs to know what everyone owns in order to calculate rewards snapshots on each era.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am pretty sure that the protocol parameters are not available on chain, only the voting for the changes. Yes, the ledger rules can be applied to whatever voting is seen on chain but that is way more fragile than getting the parameters from the source.

Agree, that nobody in their right mind would want the UTxO state.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the protocol parameters are available on chain, NewEpochState -> EpochState -> PParams.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be really nice to have a function to pull the protocol parameters out of ledger state so that I did not have to go digging around inside it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're in luck, I have one right here: nesEs . esPp

Copy link
Author

@erikd erikd Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would call that "digging into the internals".

I am asking for an officially maintained function that is part of an official API.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The lack of a quality api is indeed an issue, but a much bigger issues imo, is the need to replicate resources outside the node, especially for ledger state data. An api cannot help there, this requires improvements on the protocol level. Api and protocol are two different topics.

*HUGE* data structure and the mainnet version currently consumes over 10 Gigabytes of memory.
Furthermore, maintaining ledger state duplicates code functionality that is in `ouroboros-consensus`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An alternative solution that won't solve the duplicate use of RAM but would solve the duplicate use of code would be to pack consensus + ledger into a "Slim node" that works in read-only mode with an existing ChainDB, that would be packaged and built along side the node.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then db-sync or any client could use that "ghost node" to replay blocks from any point in time, generate events, query ledger state and what not. I am even surprised such a tool does not already exist: Is this not somewhat akin to what db-analyser do?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't this idea be extended even more to solve the duplicate RAM problem? What if db-sync maintained its own ChainDB, connected to peers itself, ran ouroboros etc? That way it doesn't need a separate trusted node process, there is no replication, no need for protocol extension, deployment is greatly simplified etc. At the same time this adds no complexity to the node that @dcoutts understandably wants to avoid, as this complexity will live in the db-sync repository.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this is an idea that popped into my head recently: Make the full node's logic available as a library instead of only as an executable. Then db-sync could just embed the node's logic, and provided there's some way to plug "decorators" here and there, could just tap on node processing stuff, or poke at the current ledger state when it needs, etc...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative -- and straightforward -- solution would be to also simply "dump" information required to the calculation of implicit stuff (such as rewards) into some file as the node processes them. And either, document the format of that file or provide a thin api-layer to query its content.

Having just that would solve many problems down the line; client applications will almost always be connected to a local node and that node will have to do this work anyway. So instead of duplicating that work; simply record the information when its done and gives way to access that information later on. It could be a mere .csv or .json file recording the stake distribution on each epoch transition.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, ok, so you would not need to run a node on a machine running db-sync at all. Not sure how this compares to mu proposal in terms of amount of work or complexity.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably (much) more work so I'd not consider this as an alternative for this particular CIP

Copy link
Author

@erikd erikd Dec 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funny, in the last company I worked for, everything was written as a library first (makes testing much easier) and then thin wrapper was added to make it into an executable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then db-sync or any client could use that "ghost node" to replay blocks from any point in time, generate events, query ledger state and what not. I am even surprised such a tool does not already exist: Is this not somewhat akin to what db-analyser do?

Is this similar to what Santiago from TxPipe proposed a while back called Dolos, a Data Node?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I wasn't there a while back ;) But I guess this idea is pretty much obvious and not novel so I am pretty sure other people have thought of iti.

and the maintenance of ledger state has been the cause of about 90% of the bugs in `db-sync` over
the last two years. The maintenance of ledger state also makes updating, running and maintaining
`db-sync` by operators more difficult than it should be. Finally, if `db-sync` did not have to
maintain ledger state, the size of the `db-sync` code base would probably decrease by about 50% and
the bits removed are some of the most complicated parts.


## Specification

The proposed solution is an enhanced local chain sync protocol that would only be served over a
local domain socket. The enhanced chain chain sync protocol would include the existing block chain
data as well as events containing the ledger data that db-sync needs. This enhanced local chain
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has always been said to be "impossible" because the ledger does not record past events and wouldn't be able to replay any of those events that are too far in the past.

Wouldn't it make sense to start by recording the events somewhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is effectively part of this proposal.

sync protocol is useful for many applications other than just db-sync.

Smart contract developers would like an application that turns block chain and ledger state changes
into an event stream. With this enhanced local chain sync protocol, generating an easily consumable
event stream simply requires a conversion of the binary enhanced local chain sync protocol into
JSON.

This enhanced local chain sync protocol is basically the data that would be provided by the
proposed Scientia program (which as far as I am aware has been dropped).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this "Scientia program" about?

Copy link
Author

@erikd erikd Dec 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was a proposal put forward by John Woods during his temporary tenure at IOG.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@KtorZ The output document isn't public, but it was a user research report of what chain indexers exist, which and why people use, pros, cons etc.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If it isn't public, I don't see why it is mentioned here 👍


The ledger state data that would provided over the extended chain sync protocol is limited to:

* Per epoch stake distribution (tickle fed in a deterministic manner).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specify that the user may wish to request one of the three (labeled mark, set, go) snap shots? Is the live, up-to-date distribution ever needed (though note that this one changes block by block)?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think yes. It is very useful to request only 1 of mark, set, go. @papacarp at pooltool needs mark. At dripdropz, I need just the go snapshot.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db-sync extracts slices of ~2000 per block from the set snashot starting from the epoch boundary. The size may be bigger based on the expected number of blocks in the epoch.

https://github.com/input-output-hk/cardano-db-sync/blob/master/cardano-db-sync/src/Cardano/DbSync/Era/Shelley/Generic/StakeDist.hs#L44

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of simpilicity, I really think this should be push only from the node to the consumer..

The live up-to-date distribution is not currently needed by db-sync.

As for which one (and making sure I have this correct, mark is the stake distributions that will be used in current_epoch + 2, set in current_epoch + 1 and set in the current_epoch). My understanding is that set is not guaranteed stable in the first part of an epoch, but I would suggest this is the one we want, but only as soon as it is stable.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need both mark and set in mithril for example. Different use cases will require different sets so providing all of them would just make things simpler because AFAIK they are maintained together in the Ledger?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for which one (and making sure I have this correct, mark is the stake distributions that will be used in current_epoch + 2, set in current_epoch + 1 and set in the current_epoch). My understanding is that set is not guaranteed stable in the first part of an epoch, but I would suggest this is the one we want, but only as soon as it is stable.

Slight correction here:
mark = current_epoch + 1 (this is next epochs stake distribution)
set = current_epoch (this is the stake distribution being used CURRENTLY to make blocks)
go = current_epoch - 1 (this is used for reward calculations)

mark snapshot would not be guaranteed until after the stability window I presume but it would be good to pull whenever the consumer wants as each application will have different accuracy requirements.

Copy link
Author

@erikd erikd Nov 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, @papacarp, about half the time I look at the mark|set|go terminology I get it wrong :/.

* Per epoch rewards (tickle fed in a deterministic manner).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specify an event on the epoch boundary that lists the stake credentials which appeared in the reward calculation, but were de-registered when the reward were given out?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure on this one. @erikd would know if db-sync needs it in that format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we specify an event on the epoch boundary that lists the stake credentials which appeared in the reward calculation, but were de-registered when the reward were given out?

I believe this exists already. It's called RestrainedRewards

Copy link
Contributor

@JaredCorduan JaredCorduan Nov 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there currently is an event like this and we need to keep it.

erikd marked this conversation as resolved.
Show resolved Hide resolved
* Per epoch protocol parameters (tiny so provide in a single event)
* Per epoch reaped pool list (single event).
* Per epoch MIR distribution (single event).
* Per epoch pool deposit refunds (single event).

The small bits of data above are sent as single per epoch events. Large bits of data like the epoch
stake distribution map (which can have millions of entries) are tickle fed as they are calculated
by the ledger. Its deterministic in that give the same ledger state LS, with the same block B, the
events generated will always be identical. This is so that if the consumer is stopped and restarted
and needs to rollback a block or two, the replay will be identical.


## Rationale

THe recommended configuration for `db-sync` is to run it on the same machine as the `node`.
Currently this means that there are two copies of the *HUGE* ledger state data structure (each being
at least 10G in size) on the machine. In addition, `db-sync` and other applications only need about
1% of that data. The rest is


## Test Cases



## Implementations
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A possible implementation first step would be to write a prototype funnelling existing events (or only one of them for simplicity's sake because I assume there would be need to implement serialisation) through the consensus code down to the network stack, in order to properly understand the impact of this change.



## Copyright

This CIP is licensed under [CC-BY-4.0](https://creativecommons.org/licenses/by/4.0/legalcode)