Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partner Re-Ingestion issue #4319

Closed
jcx120 opened this issue Apr 4, 2022 · 3 comments
Closed

Partner Re-Ingestion issue #4319

jcx120 opened this issue Apr 4, 2022 · 3 comments
Assignees
Labels

Comments

@jcx120
Copy link

jcx120 commented Apr 4, 2022

Partner is experiencing issues with Reingestion and Ingestion for their Horizon instance (on Azure PostGres instance)

Re-ingestion:

  1. Unable to re-ingest full history without failure of this type: (Instance "03")

/usr/bin/stellar-horizon db reingest range 2 40325759 --parallel-workers 10 --retries 10

Using these spec's:

We were using Standard_D48s_v3 (48 vCPUs, 192GB RAM) with 42 workers and 10 retries , when encountered this issue:

And consistently getting this error:

time="2022-04-04T21:43:00.487Z" level=error msg="error in reingest worker" error="error when processing [11496322, 11596289] range: error preparing range: Error fast-forwarding to 11496322: error reading frame length: unmarshalling XDR frame header: xdr:DecodeUint: EOF while decoding 4 bytes - read: '[]'" pid=1 service=ingest

job failed, recommended restart range: [10696578, 40325759]: error when processing [10696578, 10796545] range: error preparing range: Error fast-forwarding to 10696578: error reading frame length: unmarshalling XDR frame header: xdr:DecodeUint: EOF while decoding 4 bytes - read: '[]'

  1. Ingestion (synch) always behind and unable to catch up (Instance "01", "02") and lagging by >25 ledgers consistently

Specs:

stellar-ingest and stellar-reingest containers use 256GB Premium SSD with up to 1100 IOPS and 128 MBps throughput. The disks are mounted as volumes inside containers and used as CAPTIVE_CORE_STORAGE_PATH.
postgres container uses 16TB Premium SSD with up to 18000 IOPS and 750 MBps throughput. The disk is mounted as volume inside the container and used by PostgreSQL instance to store horizon’s database.

@jcx120 jcx120 added the support label Apr 4, 2022
@jcx120 jcx120 changed the title Partner Ingestion issue Partner Re-Ingestion issue Apr 5, 2022
@2opremio
Copy link
Contributor

2opremio commented Apr 6, 2022

Related: #4255

The parsing error is probably a red herring (and likely caused by stellar core crashing, my guess is it's OOM-killed)

@2opremio 2opremio self-assigned this Apr 6, 2022
@2opremio
Copy link
Contributor

2opremio commented Apr 6, 2022

Note that there is an issue in Core version > 18.2.0 and < 18.5.0 which breaks reingestion.

See stellar/stellar-core#3360

Please upgrade to Core 18.5.0 or downgrade to Core 18.2.0

@AlexeyShchukinSecurrency

After the upgrade to Core 18.5.0 and horizon 2.15.1, we're not experiencing this issue anymore. Thank you!

@jcx120 jcx120 closed this as completed Apr 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants