-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
exp/ingest: Add processor to ingest ledger headers #1949
Conversation
0bf92b2
to
1bbd3ff
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
exp/ingest/io/ledger_reader_test.go
Outdated
t.Fatalf("unexpected error %v", err) | ||
} | ||
backend.AssertExpectations(t) | ||
if got := reader.CloseTime(); got != expectedCloseTime { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tamirms is there a specific reason for not using Assert.Equal
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no, I should have used Assert.Equal
I think I want to go back on the decision to use the same table ( To address this issue we could add an My concern with implementing one of those two solutions is that there might be some negative consequences if the I think the safest solution might be to configure the experimental ingestion system to write to another table When running horizon with both the legacy and experimental ingestion systems enabled, we can have a routine which compares recently inserted rows in I think with this approach we reduce the risk of bugs while also verifying that the ingestion of ledgers in the experimental system is completely consistent with the legacy system. |
I think there should be a code that turns off old ingestion system when exp ingestion is enabled (when it has processors for all of existing historical data types). To ensure that new system works correctly we can reingest entire history and compare DB dumps of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think it's possible to improve the code by moving code to different files.
// LegacyIngestionVersion reflects the latest version of the non-experimental ingestion | ||
// algorithm. As rows are ingested into the horizon database, this version is | ||
// used to tag them. In the future, any breaking changes introduced by a | ||
// developer should be accompanied by an increase in this value. | ||
// | ||
// Scripts, that have yet to be ported to this codebase can then be leveraged | ||
// to re-ingest old data with the new algorithm, providing a seamless | ||
// transition when the ingested data's structure changes. | ||
const LegacyIngestionVersion = 16 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this today, I think we should keep a single version so update expingest.CurrentVersion
to 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think using the same version for both ingestion systems makes sense. If we need to bump the experimental ingestion version wouldn't that trigger re-ingeston on the legacy system?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reingestion of historical data in the legacy system is manual (horizon db reingest ...
).
Possible solution here: #1969 |
Forgot about one last thing: The base branch should be 0.25.0 or 0.26.0 whichever is planned to be released next year. |
I don't like this approach because we cannot gradually release the remaining ingestion components. We will need to wait until all the historical data processors are ready and then do a large release which transitions between the old system and the new ingestion system. |
02095e4
to
fea685b
Compare
if we change the processor to write to |
bb8cdac
to
0f6c3ba
Compare
I think we could do something similar to what @bartekn did on the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
services/horizon/internal/db2/schema/migrations/26_exp_history_ledgers.sql
Show resolved
Hide resolved
services/horizon/internal/expingest/processors/database_processor.go
Outdated
Show resolved
Hide resolved
services/horizon/internal/expingest/processors/database_processor.go
Outdated
Show resolved
Hide resolved
Also, use same importer version as the legacy ingestion system
0f6c3ba
to
29805bf
Compare
29805bf
to
8826967
Compare
…sor.go Co-Authored-By: Bartek Nowotarski <[email protected]>
@bartekn I have addressed the code review feedback, please take another look |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just two issues connected to error reporting. The rest of the code LGTM!
return verify.NewStateError(errors.Errorf( | ||
"No rows affected when ingesting new ledger: %v", | ||
r.GetSequence(), | ||
)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly to the previous StateError issue we probably shouldn't mark state as invalid because of this.
Also, before the new table contents is used in actions we probably should return errors here because it will block meta ingestion. Instead, we probably need to log it directly like log.WithField("service", "expingest").Error(...)
.
"ledger %v in exp_history_ledgers does not match ledger in history_ledgers", | ||
seq, | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For two return errors above:
Before the new table contents is used in actions we probably should return errors here because it will block meta ingestion. Instead, we probably need to log it directly like log.WithField("service", "expingest").Error(...)
.
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
This PR extends the experimental ingestion system to cover ledger headers. The database processor in the ledger pipeline will populate the
history_ledgers
table. The processor intends to insert the samehistory_ledgers
rows as theservices/horizon/internal/ingest
package.Why
The
/ledgers
and/ledgers/{ledger_id}
endpoints serve data from thehistory_ledgers
table. Currently, the table is populated by the legacy ingestion system. The experimental ingestion system should be populating the table if we want to migrate away from the legacy ingestion system.Note that unlike other experimental ingestion processors, the ledgers processor writes to the same table as the legacy ingestion system. The ledger processor should be interchangeable with the legacy ingestion system in that they both produce identical rows in the
history_ledgers
table. The advantage of this approach is that we don't need to backfill old ledgers when we switch over to the experimental ingestion system.Backfilling old ledgers is a relatively expensive operation because we need to process every single ledger in Stellar's history. With the other experimental ingestion processors, this was not a concern because it was possible to backfill all state using the history archive snapshot for a single ledger checkpoint.
Known limitations
When starting ingestion it is possible that the most recent
history_ledgers
table row is not consistent withhistoryQ.GetLastLedgerExpIngest()
. Handling this problem requires some more thought and will be tackled in a separate PR.The interplay between
ledgerReaderWrapper
andreaderWrapperLedger
is really confusing. I have tried to mimic the style and conventions of the existing codebase but I think we should get rid of these wrapper pipeline objects