-
Notifications
You must be signed in to change notification settings - Fork 79
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
soroban-rpc: Add in-memory events storage #355
Conversation
|
||
CREATE INDEX ledger_entries_key ON ledger_entries(key); | ||
|
||
CREATE TABLE ledger_close_meta ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see why both the ledger entries and the events needs to live in the same database; they have quite a distinct an unrelated "characteristics" and not ever accessed together.
I'm concerned that placing both in the same database would result in a database-level locking between the two that isn't needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
they are not accessed together but they are updated together using the same stream of ledgers coming from captive core. it's better if the two tables are in the same database because we can ensure both are updated within the same transaction. otherwise, if one table is behind the other it will complicate the ingestion code to ensure both are synced to the same ledger.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like the fact that we can commit the entire ledger changes in a single commit. In this particular case, I don't believe that it really give us much advantage. The events are only being "added" and not really requiring any synchronization with the rest of the data. You have already noticed that the current ledger data tend to become very large. That's because we have lots of write/delete. In order to avoid that, we need to perform periodic full vacuuming.
Unfortunately, performing the full vacuuming on a large database would take a long time - which is why it's advisable to break out the tables that have a different insert/delete profile into a separate database files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're right that the ledgers (from which we derive the events) don't need to be synchronized with the ledger entries for any of our sql queries. My point was that it would require more code to manage the ingestion.
If we cannot assume that the ledger entries and ledgers tables are not synchronized to the same ledger sequence then, we need to catchup which ever table is behind and then ingest into both tables ledger by ledger. I was thinking that at this stage of soroban-rpc we could avoid that complexity. would you be ok if I created an issue for this concern and we can decide to implement it at a later point?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like the right thing to do. It would allow us to release this as a working prototype while we improve on the implementation.
|
||
reader.Rewind() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems wrong that we need to rewind and re-process. But maybe I'm not understanding the codeflow here correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made several comments. I'd be happy to review them together if any questions comes up !
continue | ||
} | ||
for eventIndex, opEvent := range op.Events { | ||
events = append(events, event{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it worthwhile to offer an external config to be used here as event topic filters? as this is crunching events for the whole network, maybe rpc hosting use cases will only be interested in events for their contract or known topics, etc?
events *events.MemoryStore | ||
retentionWindow uint32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding events processing to ledgerentry_storage.go
doesn't feel right, both because of the naming and because it complicates the logic.
If possible, I would separate the event ingestion from ledgerentry_storage.go
. We can have a common daemon consuming CloseMeta
entries, which are passed to ledgerentry_storage/ledgerentry_storage.go
and, say events_storage/events_storage.go
.
Something similar (but probably more lightweight) than the Horizon processors would do.
TrimLedgers(retentionWindow uint32) error | ||
InsertLedger(ledger xdr.LedgerCloseMeta) error |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would move this into a separate interface since this is for Ledger entries
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a similar request : can we avoid calling these "Ledgers" ?
i.e.
ApplyLedgerEventsRetentionWindow(retentionWindow uint32) error
InsertLedgerEvents(ledger xdr.LedgerCloseMeta) error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tsachiherman we are actually are storing the entire xdr.LedgerCloseMeta into the db. We don't store events in the db, we only keep events in memory. The reason for that is because I remember some discussion of eventually exposing an HTTP endpoint to serve ledgers in soroban-rpc and that this endpoint could be used by other services / clients instead of captive core.
I think that mixing up the Ledger Entry and Events logic into the same files makes the code complicated. I would change the file hierarchy to decouple them and do the processing of each separately.
|
} | ||
|
||
// GetLedger fetches a single ledger from the db. | ||
func (s *sqlDB) GetLedger(sequence uint32) (xdr.LedgerCloseMeta, bool, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
before sql gets more verbose, is it worth considering a type-safe compile time binding like go-jet.
2413c3d
to
37f3dbd
Compare
Is this closed in favor of #361, or still relevant? |
@paulbellamy @tsachiherman I will close this PR to minimize confusion and spin up new PRs for the remaining work |
Add in-memory events db with a retention window specified in number of ledgers. For example, the event store can be configured to have a retention window of 17280 ledgers, which corresponds to approximately 24 hours assuming an average ledger close time of 5 seconds.
The event store is implemented using a circular buffer.
Close stellar/go#4718