-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add LedgerEntryChangeCache #2004
Add LedgerEntryChangeCache #2004
Conversation
case xdr.LedgerEntryTypeData: | ||
rowsAffected, err = c.HistoryQ.RemoveAccountData(entry.LedgerKey().MustData()) | ||
case xdr.LedgerEntryTypeOffer: | ||
rowsAffected, err = c.HistoryQ.RemoveOffer(entry.LedgerKey().MustOffer().OfferId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, we could build BatchRemoveBuilder
that would remove rows in batches (DELETE ... WHERE ... IN (...)
). This could be helpful when removing many offers after a trade.
services/horizon/internal/expingest/ledger_entry_change_cache.go
Outdated
Show resolved
Hide resolved
services/horizon/internal/expingest/ledger_entry_change_cache.go
Outdated
Show resolved
Hide resolved
approach looks good |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Going in a good direction, I had a few comments/questions mostly for my understanding.
services/horizon/internal/expingest/ledger_entry_change_cache.go
Outdated
Show resolved
Hide resolved
services/horizon/internal/expingest/ledger_entry_change_cache.go
Outdated
Show resolved
Hide resolved
services/horizon/internal/expingest/ledger_entry_change_cache.go
Outdated
Show resolved
Hide resolved
services/horizon/internal/expingest/ledger_entry_change_cache.go
Outdated
Show resolved
Hide resolved
services/horizon/internal/expingest/ledger_entry_change_cache.go
Outdated
Show resolved
Hide resolved
services/horizon/internal/expingest/ledger_entry_change_cache.go
Outdated
Show resolved
Hide resolved
// If existing type is removed it means that this entry does exist | ||
// in a DB so we update entry change. | ||
c.cache[ledgerKeyString] = xdr.LedgerEntryChange{ | ||
Type: xdr.LedgerEntryChangeTypeLedgerEntryUpdated, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be LedgerEntryChangeTypeLedgerEntryCreated
since it is being added for the first time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If so, also change comment above to in the DB so we add entry change
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The case here is:
- Entry exists in the previous ledger.
- It was removed in the current ledger (thus
removed
in the cache). - It is now created so given it exists in the previous ledger we need to
update
it instead ofcreate
it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I think the confusing part about this code is that the previous ledger has a state, and so does the cache (representing current ledger), and they can of course be different. Is there a way to make it clearer which state we're dealing with?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added some updates to the code that should make it more clear.
// queries sent to a DB to update the current state of the ledger. | ||
// It has integrity checks built in so ex. removing an account that was | ||
// previously removed returns an error. In such case verify.StateError is | ||
// returned. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could add a small section explaining the possible state transitions here. Something like:
Existing Type Add Update Remove
EntryCreated X entry exists, can update entry exists, can remove
EntryUpdated X entry exists, can update entry exists, can remove
EntryRemoved no entry, can add X X
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rationale is to try and help with understanding when we're talking about previous vs current ledger (in the form of the cache). Also because the logic is currently distributed through several functions, and it depends on the XDR state transitions of stellar core, so the truth table is hard to verify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded the comment to explain exactly how the algorithm looks like.
ledgerKeyString, | ||
)) | ||
case xdr.LedgerEntryChangeTypeLedgerEntryRemoved: | ||
// If existing type is removed it means that this entry does exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If existing type is removed it means that this entry does exist | |
// If existing type is removed it means that this entry does not exist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current comment is correct. We want to add create
change to the cache and existing change in the cache is removed
. This means that the entry exists in the DB/previous ledger (probably received removed
before this change).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. I'm sure you're right but this is very confusing. :/ Is there a way to make it clearer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check the latest code and let me know if it's still confusing.
Co-Authored-By: Eric Saunders <[email protected]>
I checked performance of this solution by replaying ingestion of 26988927-26989439 ledgers range in pubnet (when DRA spam happened). Before we explore the data it's important to understand that the test was performed on my dev machine so both Horizon and it's DB were on the same server. This means that connection round trip time was minimal. The actual savings this PR introduces are in the round trip times because it makes much smaller number of DB queries. Full data can be found in this spreadsheet. Observations:
I believe results will be much better in staging where the DB is on a different server. |
This looks very promising, and reinforces the rationale for doing this. I think we should merge. After that, It would be great to get an unstable version running on the staging server to see real-world numbers with the remote DB server. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! Two optional suggestions.
services/horizon/internal/expingest/processors/database_processor.go
Outdated
Show resolved
Hide resolved
This commit changes `DatabaseProcessor` to insert/update (upsert) accounts and trust lines in batches. In #2004 we added `LedgerEntryChangeCache` that aims to decrease number of DB updates: all the changes connected to a single ledger entry are squashed into just one DB query. Even though it gives a nice performance boost when large number of ops change a small number of ledger entries, it turns out this is not enough. When many ledger entries are changed in a new ledger, DB connection round trip time takes significant percentage of time in overall ledger processing time. The SQL query was borrowed [1] from stellar-core. Known limitations: * This code shouldn't be the final code that is released. The aim of this PR is to deploy it to stg environment to check it's performance. * This is adding upsert queries for accounts and trust lines only. These types are the most common changes in the recent ledgers. The final solution should add batch upsert and batch delete for all ledger entries. * There's a potential to improve the code: autogenerating upsert query for any ledger entry type, better tests. Will be done in a separate PR. * It removes a code that check a number of rows affected by a query. Unfortunately, the performance of the current solution is too bad to keep it. [1] https://github.com/stellar/stellar-core/blob/21469f90da1eacc6845017a520e179afb3772e65/src/ledger/LedgerTxnAccountSQL.cpp#L306-L336
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
Adds
exp/ingest/io.LedgerEntryChangeCache
that squashes all the ledger entry changes. This can be later used to decrease number of DB queries when applying them. See #2003.Close #2003.
Why
Some ledgers that add a lot of changes connected to a small set of entries are causing a performance issues because every ledger entry change is applied to a DB.
LedgerEntryChangeCache
solves this problem because it makes holds a final version of a ledger entry after all the changes.Before this fix, extreme cases when two accounts send a payment between each other 1000 times in a ledger required 3000 DB updates (2000 account changes due to payment and 500 fee meta per account). After the fix, it requires just 2 DB updates.
Algorithm used in
LedgerEntryChangeCache
is explained in the comment: