Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

update verify-range for captive core use-disk #4444

Merged
merged 10 commits into from
Jul 13, 2022

Conversation

sreuland
Copy link
Contributor

@sreuland sreuland commented Jun 30, 2022

running the verify range against current pubnet for min range of ledgers, 20,100, resulted in the verify-range image failing on AWS Batch jobs:

error reading frame length: unmarshalling XDR frame header: xdr:DecodeUint: EOF while decoding 4 bytes

the same error is showing up also on CI 'verify-range' step on most current PR's for any branch, you'll see the verify-range step failing with the frame header error when it tries to run horizon verify-range

from master or any branch on go repo, it can be replicated locally from top folder:

$ docker build -f services/horizon/docker/verify-range/Dockerfile -t stellar/horizon-verify-range services/horizon/docker/verify-range/
$ docker run -e FROM=10000063 -e TO=10000127 stellar/horizon-verify-range

per prior issues, this message means captive core has crashed, per #4255,

the root cause of that is due to cc running out of memory to store current ledger state, so the effort here is to convert the verify-range to launch ingest with cc using disk for current state instead via --captive-core-use-db=true

During testing of this PR on AWS Batch, it was also observed that the jobs ran out of disk space, this was due to verify-range hosting horizon's postgres locally on VM's root volume combined with cc storing archives and it's disk on same volume, there is a limit of about 30GB that docker allocates ephemeral space on '/'. Had to break these other f/s paths out to different volumes with external volume mounts and make corresponding changes to new AWS Job Def verify-range-c5-9xlarge-job:9

@@ -12,6 +12,8 @@ RUN ["chmod", "+x", "dependencies"]
RUN /dependencies

ADD stellar-core.cfg /
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can drop this now, right?

Copy link
Contributor Author

@sreuland sreuland Jun 30, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atm, just testing out changes to use 'on disk' cc mode on CI, to see if verify-range gets resolved, but it doesn't seem to be affecting it, it's wedged on these frame header errors.

@sreuland sreuland changed the title Trying cc use disk db on verify-range captive core crashing on CI verify range step Jun 30, 2022
@sreuland sreuland force-pushed the trying-ccdisk-verify-range branch from e6c39a6 to 9deb6e6 Compare July 2, 2022 21:48
@sreuland sreuland changed the base branch from horizon-release-2.18.1 to master July 2, 2022 21:49
@sreuland sreuland changed the title captive core crashing on CI verify range step update verify-range for captive core use-disk Jul 5, 2022
@sreuland sreuland requested a review from a team July 5, 2022 16:28
@sreuland sreuland merged commit 118efe4 into stellar:master Jul 13, 2022
sreuland added a commit to sreuland/go that referenced this pull request Aug 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants