-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
/services/horizon/ingest: captive core on-disk ingestion, optimize catchup times #4454
Comments
this feature came from prior discussion on slack which recognized the issue |
@jacekn , I captured ingestion run times of in-memory and on-disk captive core using the k8s horizon-dev |
@sreuland something to note is that with in-memory config core syncs almost instantly on restarts. This was one of the things we spent time optimising for when captive core was first introduced. |
@jacekn , thanks for that context. @bartekn , I wanted to check on ideas, I tested skipping of Seems like options are steering towards looking into why |
@sreuland great research! So the interesting bit is what you said about Stellar-Core only being able to close ledgers moving forward. This complicates the problem because, currently, in most cases Horizon ingestion is behind Stellar-Core. First because it keeps all new ledgers in a buffer (of 20 ledgers), and second when Horizon is up to date and new ledger is closed it can only ingest this ledger once Stellar-Core closes it. I think there's another option that should work in many cases however it can still require occasional The only problem I can see with the solution above is that it can slow catchup in some cases because Stellar-Core will have to wait until Horizon ingests the ledgers. However it shouldn't be a problem in normal operations. |
OK, I think the solution in my previous comment works and if an instance is fully synced we don't even need to remove the buffer! Here's what I did:
It works but there are several problems:
|
@bartekn @paulbellamy , per design chat in team mtg, and @bartekn 's second bullet on design idea, i checked core's
returns json, with:
I don't recall exact meaning of LCL, I think it refers to the internal state of what core has last processed and not latest network status or vice versa. if so, perhaps it can be used in horizon |
PR ready: #4471. |
This commit changes the behaviour of `stellarCoreRunner` when using an on-disk DB in online mode to check if existing storage dir contains the DB in a state that allows Captive Core to start without rebuilding Stellar-Core state. In short, it checks (by using `stellar-core offline-info` command) if the LCL of Stellar-Core matches the requested ledger in `startFrom`. This was done because while applying state from buckets was relatively fast in memory mode of Captive Core it can be extremely slow when using disk. This change allows reusing existing state in most cases. Close #4454.
This commit changes the behaviour of `stellarCoreRunner` when using an on-disk DB in online mode to check if existing storage dir contains the DB in a state that allows Captive Core to start without rebuilding Stellar-Core state. In short, it checks (by using `stellar-core offline-info` command) if the LCL of Stellar-Core matches the requested ledger in `startFrom`. This was done because while applying state from buckets was relatively fast in memory mode of Captive Core it can be extremely slow when using disk. This change allows reusing existing state in most cases. Close stellar#4454.
What problem does your feature solve?
Long ingestion catchup times from captive core on-disk mode for pubnet observed during horizon start up.
These times were observed on the following environment:
stellar-core 19.0.1
horizon 2.17.1 with an already full ingested history db of pubnet
CPU - 3 cores
32GB ram
500GB disk, 3K IOPS
on-disk pubnet(1h15min for catchup, 1h25min total for latest ledger sync)
on-disk logs
in-memory pubnet(2min total for catchup and latest ledger sync)
in-mem logs
These on-disk times were observed on environment with faster SSD disk for comparison:
stellar-core 19.1.0
horizon 2.18.1 with an empty history db
CPU - 8 cores
16GB ram
500GB disk, 2G/s IOPS(avg rate read/write)
on-disk pubnet with faster disk(35min for catchup)
ssd on-disk logs
summary observations:
stellar-core run --in-memory
invocation takes a couple minutes at most for catch up and emitting latest ledgers on tx meta output.on disk
mode to perform catchup was observed taking longer time, 35mins to over an hour.What would you like to see?
identify solutions to address the on-disk ingestion catch up time.
What alternatives are there?
use older in-memory configuration for captive core, but that is running into limitations due to the 30+Gb of RAM required, on-disk mode of captive core is preferred.
The text was updated successfully, but these errors were encountered: