-
Notifications
You must be signed in to change notification settings - Fork 970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid persisting redundant tx sets #3501
Conversation
note: I realized some functionality is currently missing from the PR, so please don't review it yet |
0a06b7c
to
a8fc982
Compare
This is up-to-date now. Note that protocol-next doesn't build at the moment because we need to update XDR upsteam. I'll do it after a round of review though, just in case XDR needs to change. |
src/main/PersistentState.cpp
Outdated
@@ -20,18 +21,52 @@ using namespace std; | |||
std::string PersistentState::mapping[kLastEntry] = { | |||
"lastclosedledger", "historyarchivestate", "lastscpdata", | |||
"databaseschema", "networkpassphrase", "ledgerupgrades", | |||
"rebuildledger", "lastscpdataxdr"}; | |||
"rebuildledger", "lastscpdataxdr", "txset"}; | |||
|
|||
std::string PersistentState::kSQLCreateStatement = | |||
"CREATE TABLE IF NOT EXISTS storestate (" | |||
"statename CHARACTER(32) PRIMARY KEY," |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, it looks like sqlite (which is what we use for tests) doesn't actually enforce this limit, but it's now too small because we reference tx sets by hash. I got a failure on Postgres though, so am currently fixing this.
UPD: this is fixed now (added extra tests for Postgres as well)
src/herder/TxSetFrame.h
Outdated
@@ -57,6 +58,9 @@ class TxSetFrame : public NonMovableOrCopyable | |||
makeFromWire(Hash const& networkID, | |||
GeneralizedTransactionSet const& xdrTxSet); | |||
|
|||
static TxSetFrameConstPtr | |||
getTxSetFramePtr(StoredTransactionSet const& storedSet, Application& app); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is also a factory method, isn't it? I would rather call it makeFromStoredTxSet
for consistency with other factory methods. Also, could you please add a short doc comment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, done.
src/main/PersistentState.cpp
Outdated
{ | ||
auto txSetPtr = TxSetFrame::getTxSetFramePtr(txSet, mApp); | ||
txSets.emplace( | ||
std::pair(txSetPtr->getContentsHash(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there is no need to use the pair constructor when you're using emplace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, just a habit of writing things down explicitly. Updated (in the other place too)
src/main/ApplicationImpl.cpp
Outdated
@@ -308,6 +308,9 @@ ApplicationImpl::initialize(bool createNewDB, bool forceRebuild) | |||
upgradeToCurrentSchemaAndMaybeRebuildLedger(true, forceRebuild); | |||
} | |||
|
|||
// Subtle: restore persistent state after a potential DB schema upgrade | |||
mPersistentState->restoreState(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to double-check: this should be safe for restarts without upgrades, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, when a node upgrades its version to include this change, it'll do a schema upgrade, which will replace all of the old-style rows with new ones. If there is no upgrade, it means the schema is already up-to-date, which is what restoreState
expects (as it queries PersistentStateV1)
src/herder/HerderImpl.cpp
Outdated
} | ||
txSetsToPersist.emplace(std::pair( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: no need to use pair constructor with emplace
continue; | ||
} | ||
auto msgs = herder.getSCP().getLatestMessagesSend(i); | ||
if (expectedSCPState.has_value()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: if (expectedSCPState)
is sufficient, though I suppose you could use the explicit check for expressivity, so up to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kept it as-is if you don't mind: has_value is used pretty commonly in the codebase, and I do like the explicitness of it
c29ea2d
to
862b15c
Compare
src/database/Database.cpp
Outdated
// Sqlite does not enforce size limits | ||
if (!isSqlite()) | ||
{ | ||
PersistentState::upgradeSizeLimit(mApp.getDatabase()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't you make the schema upgrade on sqlite instead of skipping? That way in the future (when we squash sql upgrades) we can use the same code to create tables (and the schema is the same everywhere)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALTER COLUMN doesn't exist in sqlite. Instead, you have to create a new table and copy its contents from the old table. It doesn't really make sense to do this, given that this upgrade is a no-op on sqlite.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh I see, so a sad sqlite limitation.
So maybe update the comment to something a little more specific: "sqlite does not enforce size constraints and does not support updating columns".
Also, you should just move this to upgradeSCPDataV1Format
: there is no reason to split the schema upgrade into 2 parts.
src/main/PersistentState.cpp
Outdated
xdr::xdr_from_opaque(buffer, scpState); | ||
for (auto const& e : scpState.v1().scpEnvelopes) | ||
{ | ||
for (auto const& hash : getTxSetHashes(e)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably not the right place for this code: it's supposed to match logic that lives right now in HerderImpl. That way we let PersistentState
be a "dumb" key/value store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved it to Herder.
src/main/PersistentState.cpp
Outdated
} | ||
|
||
void | ||
PersistentState::startTxSetGCTimer() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like having this GC timer setup independently of what Herder is doing (at a minimum this should be managed by Herder).
Do we really need to GC using a timer?
It can be done on startup and every X times we store SCP state or maybe it should be based on the number of TxSets that we have in the database (we can estimate that from herder, doest not need to be exact)?
Maybe we should just reuse the code in HerderPersistence
for all this (there is quite a bit of overlap, but I understand it may not be worth it)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a fan of the suggested approaches, because they both make it harder to reason about what's happening:
- we don't know how many times SCP persistence is called per ledger, so we can't really reason about how often tx sets are cleaned up (same thing with the number of tx sets).
- It makes testing harder as now we have to into account cases where purging triggers at different times depending on consensus.
- I went with the timer, as it seems to be the simplest approach that avoids anything complicated where we have to map Herder's state to the database and try to estimate a good time to do the cleanup.
I suggest we keep the timer approach. I do like the idea of moving it to Herder, this way we don't need extra start/shutdown logic in PersistentState, and it'll be easy to refactor the purging function based on your other suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually you're right, I was thinking that we could get arbitrary number of values from other validators, so we should try to GC more often in some situations, but this is only tracking what the current validator is doing.
src/herder/HerderImpl.cpp
Outdated
} | ||
|
||
void | ||
HerderImpl::restoreSCPState() | ||
{ | ||
ZoneScoped; | ||
|
||
// First, load all known tx sets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code and GC may be able to share code (and we want to GC before loading all those txsets)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought, but ultimately decided to keep them separate: the functions are sufficiently different, so refactoring didn't actually simplify much.
c01b46d
to
be65ec2
Compare
3e7d437
to
f13ddfe
Compare
r+ f13ddfe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah
Resolves #3499