Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/services/horizon/ingest: captive core on-disk ingestion, optimize catchup times #4454

Closed
sreuland opened this issue Jul 11, 2022 · 8 comments · Fixed by #4471
Closed

/services/horizon/ingest: captive core on-disk ingestion, optimize catchup times #4454

sreuland opened this issue Jul 11, 2022 · 8 comments · Fixed by #4471
Assignees

Comments

@sreuland
Copy link
Contributor

sreuland commented Jul 11, 2022

What problem does your feature solve?

Long ingestion catchup times from captive core on-disk mode for pubnet observed during horizon start up.

These times were observed on the following environment:
stellar-core 19.0.1
horizon 2.17.1 with an already full ingested history db of pubnet
CPU - 3 cores
32GB ram
500GB disk, 3K IOPS

on-disk pubnet(1h15min for catchup, 1h25min total for latest ledger sync)

20:30:46 stellar-core new-db
20:30:46 stellar-core catchup 41746401/0
21:43:39 stellar-core run
21:45:52 horizon reports "Range prepared duration=4506.688054431 from=41746403"
... ledgers emitted on txmeta, horizon is logging ingested ledgers to history
21:55:10 stellar-core run reports "LM_CATCHING_UP_STATE -> LM_SYNCED_STATE"

on-disk logs

in-memory pubnet(2min total for catchup and latest ledger sync)

20:18:27 stellar-core run --in-memory --start-at-ledger 41746276 --start-at-hash xyz
20:20:44 horizon reports "Range prepared" duration=137.273946913 from=41746277"
... ledgers emitted on txmeta, horizon is logging ingested ledgers to history
20:20:54 stellar-core run reports "LM_CATCHING_UP_STATE -> LM_SYNCED_STATE"

in-mem logs

These on-disk times were observed on environment with faster SSD disk for comparison:
stellar-core 19.1.0
horizon 2.18.1 with an empty history db
CPU - 8 cores
16GB ram
500GB disk, 2G/s IOPS(avg rate read/write)

on-disk pubnet with faster disk(35min for catchup)

14:09:59 stellar-core new-db
14:09:59 stellar-core catchup 41746749/0
14:45:12 stellar-core run
14:46:44 horizon reports "Range prepared" duration=2206.337728458 from=41746751"
... ledgers emitted on txmeta, horizon is repeatedly logging 
Processing ledger entry changes pid=4073 processed_entries=XXX progress="XX.XX%"
...
let it run for another 45min, stayed in this pattern of log output, didn't see LM_SYNCED_STATE emitted, stopped process at this point."

ssd on-disk logs

summary observations:

  • The stellar-core run --in-memory invocation takes a couple minutes at most for catch up and emitting latest ledgers on tx meta output.
  • The on disk mode to perform catchup was observed taking longer time, 35mins to over an hour.
  • The catchup times for on-disk vary and are dependent on the IOPS of the disk that stellar-core's stellar.db is located, fastest observed catchup time so far was 35min on SSD disk with 2Gb/s IOPS.

What would you like to see?

identify solutions to address the on-disk ingestion catch up time.

  • operator has a new optional horizon command line flag which can be set to force horizon to tell captive core to use an existing on-disk core db rather than init'ing empty at startup, could result in faster startup times as sync is only needed from youngest ledger in the db. Needs to be validated if a populated db from prior core process can be re-used from a new core process or does it ignore/wipe it or fail.
  • other ideas

What alternatives are there?

use older in-memory configuration for captive core, but that is running into limitations due to the 30+Gb of RAM required, on-disk mode of captive core is preferred.

@sreuland
Copy link
Contributor Author

this feature came from prior discussion on slack which recognized the issue

@sreuland sreuland changed the title /services/horizon/ingest: captive core on-disk ingestion, make core db initialization optional /services/horizon/ingest: captive core on-disk ingestion, optimize catchup times Jul 13, 2022
@sreuland
Copy link
Contributor Author

@jacekn , I captured ingestion run times of in-memory and on-disk captive core using the k8s horizon-dev horizon-pubnet-ingest-0 pod as the compute environment, measurements are in the description above, results show similar contrast in times between disk and memory modes as you reported.

@jacekn
Copy link
Contributor

jacekn commented Jul 14, 2022

@sreuland something to note is that with in-memory config core syncs almost instantly on restarts. This was one of the things we spent time optimising for when captive core was first introduced.

@sreuland sreuland self-assigned this Jul 14, 2022
@sreuland
Copy link
Contributor Author

@jacekn , thanks for that context.

@bartekn , I wanted to check on ideas, I tested skipping of new-db locally by running stellar-core catchup <ledger_no>\0 with an existing db that already has some ledger state and an LCL(last closed ledger), core will run those invocations, but only if ledger_no is greater than it's internal LCL, it does not allow catchup going backwards in the db. That becomes an issue for ingesting horizon restart scenario, because in that mode, stellar-core run is active process and it's updating LCL to most recent from network pretty quickly, but horizon ingestion history is going to be a couple ledgers behind that since it has to receive each ledger from core's tx meta stream first, and run it through ingestion pipeline, which would mean on a restart, horizon tries to resume ingestion at <last_history_ledger> + 1, which likely will be a sequence that is behind core's LCL in it's db, and then horizon ingest state machine is just stuck, as it won't be able to get this captive core instance to rewind the LCL back to ledger sequence <last_history_ledger> + 1 and start emitting tx meta from there. This is why the stellar-core new-db is important, as it resets core's LCL to genesis(sequence=2), that way, the next catchup can fast-foward the core to any sequence.

Seems like options are steering towards looking into why stellar-core catchup <ledger_no>\0 on-disk is taking time, it has been observed in tests, that timing is proportional to disk IOPS, but even on 2Gb/s SSD disks am observing 30+mins on catchup times. Maybe there's other sqlite3 features that can be leveraged, like localhost mode, o/s pipe rather than over tcp/ip etc.

@bartekn
Copy link
Contributor

bartekn commented Jul 19, 2022

@sreuland great research! So the interesting bit is what you said about Stellar-Core only being able to close ledgers moving forward. This complicates the problem because, currently, in most cases Horizon ingestion is behind Stellar-Core. First because it keeps all new ledgers in a buffer (of 20 ledgers), and second when Horizon is up to date and new ledger is closed it can only ingest this ledger once Stellar-Core closes it.

I think there's another option that should work in many cases however it can still require occasional new-db call. If we remove the internal buffer of ledgers in Captive-Core (so Core will block until Horizon picks the ledger) it should help in most cases. The only exception is when Horizon crashes during ledger ingestion but this shouldn't normally happen because graceful shutdown allows ingesting the currently ingested ledger.

The only problem I can see with the solution above is that it can slow catchup in some cases because Stellar-Core will have to wait until Horizon ingests the ledgers. However it shouldn't be a problem in normal operations.

@bartekn
Copy link
Contributor

bartekn commented Jul 20, 2022

OK, I think the solution in my previous comment works and if an instance is fully synced we don't even need to remove the buffer! Here's what I did:

  1. I started removed the code responsible for removing the storage dir on shutdown below:
                    // We also remove the storage path if there was an error terminating the
                    // process (files can be corrupted).
                    // We remove all files when reingesting to save disk space.
    -               return os.RemoveAll(storagePath)
    +               // return os.RemoveAll(storagePath)
            }
    and started ingesting instance with CAPTIVE_CORE_USE_DB=true. Waited for it to catchup then I stopped it. Thanks to my code change, on-disk DB wasn't removed on shutdown.
  2. Then in stellarCoreRunner.runFrom I removed entire if r.useDB ... else ... condition and left:
    	r.cmd = r.createCmd(
    		"run",
    		"--metadata-output-stream", r.getPipeName(),
    	)
    This will make Stellar-Core just start closing ledger where is left before shutdown which is exactly the range Horizon needs.

It works but there are several problems:

  • If Horizon was closed during catchup mode (so when catching up to latest ledger after some time of being offline) it's possible that buffered meta pipe would consume some ledgers which would make ingestion of older ledgers impossible. To solve this I think we can remove buffered meta pipe usage outside reingestion. Usually in such cases Horizon needs to ingest just a small number of ledgers.
  • To simplify things I changed the code between Horizon executions. If we decide to go with this solution we probably need to implement code that checks if there's a core DB in storage dir (and if possible check the latest ledger in this DB) and run a new flow only if the requested ledgers is exactly N+1 the ledger in core DB.

@sreuland
Copy link
Contributor Author

sreuland commented Jul 20, 2022

@bartekn @paulbellamy , per design chat in team mtg, and @bartekn 's second bullet on design idea, i checked core's offline-info response and it looks like it provides the LCL on the db:

stellar-core --conf stellar-core.conf offline-info

returns json, with:

"info": {
  "ledger" : {
         "age" : 1658336492,
         "baseFee" : 100,
         "baseReserve" : 100000000,
         "closeTime" : 0,
         "hash" : "39c2a3cd4141b2853e70d84601faa44744660334b48f3228e0309342e3f4eb48",
         "maxTxSetSize" : 100,
         "num" : 1,
         "version" : 0
   }
}

I don't recall exact meaning of LCL, I think it refers to the internal state of what core has last processed and not latest network status or vice versa. if so, perhaps it can be used in horizon stellar-core-runner in conjunction with @bartekn's suggestion to dynamically figure out whether to run new-db or not.

@bartekn
Copy link
Contributor

bartekn commented Jul 21, 2022

PR ready: #4471.

@sreuland sreuland assigned bartekn and unassigned sreuland Jul 22, 2022
bartekn added a commit that referenced this issue Jul 26, 2022
This commit changes the behaviour of `stellarCoreRunner` when using an on-disk
DB in online mode to check if existing storage dir contains the DB in a state
that allows Captive Core to start without rebuilding Stellar-Core state. In
short, it checks (by using `stellar-core offline-info` command) if the LCL of
Stellar-Core matches the requested ledger in `startFrom`.

This was done because while applying state from buckets was relatively fast in
memory mode of Captive Core it can be extremely slow when using disk. This
change allows reusing existing state in most cases.

Close #4454.
sreuland pushed a commit to sreuland/go that referenced this issue Aug 7, 2022
This commit changes the behaviour of `stellarCoreRunner` when using an on-disk
DB in online mode to check if existing storage dir contains the DB in a state
that allows Captive Core to start without rebuilding Stellar-Core state. In
short, it checks (by using `stellar-core offline-info` command) if the LCL of
Stellar-Core matches the requested ledger in `startFrom`.

This was done because while applying state from buckets was relatively fast in
memory mode of Captive Core it can be extremely slow when using disk. This
change allows reusing existing state in most cases.

Close stellar#4454.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants