Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recovery scenario #1024

Merged
merged 16 commits into from
Jun 16, 2022
Merged

recovery scenario #1024

merged 16 commits into from
Jun 16, 2022

Conversation

shiqizng
Copy link
Contributor

Summary

This PR implements a recovery mechanism for ledger. If the local disk fails and we lose the data directory, the indexer will reinitialize the ledger.

Test Plan

run daemon manually.

@shiqizng shiqizng changed the base branch from shiqi/migration to shiqi/fastcatchup June 10, 2022 17:06
processor/blockprocessor/block_processor.go Outdated Show resolved Hide resolved
processor/blockprocessor/block_processor.go Outdated Show resolved Hide resolved
processor/blockprocessor/block_processor.go Show resolved Hide resolved
@@ -133,6 +136,19 @@ var daemonCmd = &cobra.Command{
fmt.Fprint(os.Stderr, "missing indexer data directory")
os.Exit(1)
}

// sync local ledger
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not required now, but I think we'll need to figure out a way to put this in an Init function thats part of the block processor.

There is also an edge case we could check to fail fast: if the catchpoint is ahead of nextDBRound we should avoid fast catchup, and probably inform the user that they might want to finish catchup with an earlier version of Indexer.

@shiqizng shiqizng changed the base branch from shiqi/fastcatchup to localledger/integration June 13, 2022 19:54
@shiqizng shiqizng marked this pull request as ready for review June 14, 2022 16:56
@shiqizng shiqizng changed the title [WIP] recovery scenario recovery scenario Jun 14, 2022
@codecov
Copy link

codecov bot commented Jun 14, 2022

Codecov Report

❗ No coverage uploaded for pull request base (localledger/integration@5a56ad2). Click here to learn what that means.
The diff coverage is n/a.

@@                    Coverage Diff                     @@
##             localledger/integration    #1024   +/-   ##
==========================================================
  Coverage                           ?   57.98%           
==========================================================
  Files                              ?       48           
  Lines                              ?     8906           
  Branches                           ?        0           
==========================================================
  Hits                               ?     5164           
  Misses                             ?     3255           
  Partials                           ?      487           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5a56ad2...d057e1c. Read the comment docs.

initState, err := util.CreateInitState(genesis, genesisBlock)
if err != nil {
return nil, fmt.Errorf("MakeProcessor() err: %w", err)
}
l, err := ledger.OpenLedger(logging.NewLogger(), filepath.Join(path.Dir(datadir), "ledger"), false, initState, algodConfig.GetDefaultLocal())
if dbRound != 0 && !ledgerExists(datadir, prefix) {
msg := fmt.Sprintf("%s\n%s\n%s\n%s\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the added friction of causing an error here worth it? Why not just create the files in sequential migration mode and print a warning to the user saying that there are faster alternatives?

"ledger.block.sqlite-wal",
"ledger.tracker.sqlite",
"ledger.tracker.sqlite-shm",
"ledger.tracker.sqlite-wal",
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shm and wal files are also needed or openLedger in block processor runs into disk I/O error

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you still seeing an I/O error? I wasn't able to reproduce this

@shiqizng shiqizng self-assigned this Jun 16, 2022
nextDBRound, err := db.GetNextRoundToAccount()
maybeFail(err, "Error getting DB round")
if nextDBRound > 0 {
if catchpoint != "" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another edge case where the catchpoint is > nextDBRound

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a check for this in processor,

if uint64(l.Latest()) > dbRound {
return nil, fmt.Errorf("MakeProcessor() err: the ledger cache is ahead of the required round and must be re-initialized")
}
. maybe also add one in the migration method?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the opposite check, we need to avoid catching up if the catchpoint if ahead of the desired round (i.e. if you're starting a new node and provide a catchpoint, we shouldn't initialize anything)

Copy link
Contributor

@winder winder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can merge this in and have a followup handling the catchpoint > next-round case

@shiqizng shiqizng merged commit 5a03dc8 into localledger/integration Jun 16, 2022
@winder winder deleted the shiqi/recovery branch June 17, 2022 13:26
Eric-Warehime added a commit that referenced this pull request Jul 21, 2022
* Local Ledger (#1011)

* integrate block processor

* Local Ledger Deployment (#1013)

* add simple local ledger migration

* add deleted opts

* fast catchup (#1023)

* add fast catchup

* Localledger merge (#1036)

* return empty lists from fetchApplications and fetchAppLocalStates (#1010)

* Update model to converge with algod (#1005)

* New Feature: Adds Data Directory Support (#1012)

- Updates the go-algorand submodule hash to point to rel/beta
- Moves the cpu profiling file, pid file and indexer configuration file
  to be options of only the daemon sub-command
- Changes os.Exit() to be a panic with a special handler.  This is so
  that defer's are handled instead of being ignored.
- Detects auto-loading configuration files in the data directory and
  issues errors if equivalent command line arguments are supplied.
- Updates the README with instructions on how to use the auto-loading
  configuration files and the data directory.

* Update mockery version

Co-authored-by: erer1243 <[email protected]>
Co-authored-by: AlgoStephenAkiki <[email protected]>

* recovery scenario (#1024)

* handle ledger recovery scenario

* refactor create genesis block (#1026)

* refactor create genesis block

* Adds Local Ledger Readme (#1035)

* Adds Local Ledger Readme

Resolves #4109

Starts Readme docs

* Update docs/LocalLedger.md

Co-authored-by: Will Winder <[email protected]>

* Update docs/LocalLedger.md

Co-authored-by: Will Winder <[email protected]>

* Update docs/LocalLedger.md

Co-authored-by: Will Winder <[email protected]>

* Removed troubleshooting section

Co-authored-by: Will Winder <[email protected]>

* update ledger file path and migration (#1042)

* LocalLedger Refactoring + Catchpoint Service (#1049)

Part 1

    cleanup genesis file access.
    put node catchup into a function that can be swapped out with the catchup service.
    pass the indexer logger into the block processor.
    move open ledger into a util function, and move the initial state util function into a new ledger util file.
    add initial catchupservice implementation.
    move ledger init from daemon.go to constructor.
    Merge multiple read genesis functions.

Part 2

    Merge local_ledger migration package into blockprocessor.
    Rename Migration to Initialize
    Use logger in catchup service catchup

Part 3

    Update submodule and use NewWrappedLogger.
    Make util.CreateInitState private

* build: merge develop into localledger/integration (#1062)

* Ledger init status (#1058)

* Generate an error if the catchpoint is not valid for initialization. (#1075)

* Use main logger in handler and fetcher. (#1077)

* Switch from fullNode catchup to catchpoint catchup service. (#1076)

* Refactor daemon, add more tests (#1039)

Refactors daemon cmd into separate, testable pieces.

* Merge develop into localledger/integration (#1083)

* Misc Local Ledger cleanup (#1086)

* Update processor/blockprocessor/initialize.go

Co-authored-by: Zeph Grunschlag <[email protected]>

* commit

* fix function call args

* RFC-0001: Rfc 0001 impl (#1069)

Adds an Exporter interface and a noop exporter implementation with factory methods for construction

* Fix test errors

* Add/fix tests

* Add postgresql_exporter tests

* Update config loading

* Change BlockExportData to pointers

* Move and rename ExportData

* Add Empty func to BlockData

* Add comment

Co-authored-by: shiqizng <[email protected]>
Co-authored-by: [email protected] <[email protected]>
Co-authored-by: erer1243 <[email protected]>
Co-authored-by: AlgoStephenAkiki <[email protected]>
Co-authored-by: Will Winder <[email protected]>
Co-authored-by: Zeph Grunschlag <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants