-
Notifications
You must be signed in to change notification settings - Fork 502
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
update verify-range for captive core use-disk #4444
update verify-range for captive core use-disk #4444
Conversation
@@ -12,6 +12,8 @@ RUN ["chmod", "+x", "dependencies"] | |||
RUN /dependencies | |||
|
|||
ADD stellar-core.cfg / |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can drop this now, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atm, just testing out changes to use 'on disk' cc mode on CI, to see if verify-range gets resolved, but it doesn't seem to be affecting it, it's wedged on these frame header errors.
…or to verify range
… parameter for docker build
e6c39a6
to
9deb6e6
Compare
running the verify range against current pubnet for min range of ledgers, 20,100, resulted in the verify-range image failing on AWS Batch jobs:
the same error is showing up also on CI 'verify-range' step on most current PR's for any branch, you'll see the verify-range step failing with the frame header error when it tries to run
horizon verify-range
from master or any branch on go repo, it can be replicated locally from top folder:
per prior issues, this message means captive core has crashed, per #4255,
the root cause of that is due to cc running out of memory to store current ledger state, so the effort here is to convert the verify-range to launch ingest with cc using disk for current state instead via
--captive-core-use-db=true
During testing of this PR on AWS Batch, it was also observed that the jobs ran out of disk space, this was due to verify-range hosting horizon's postgres locally on VM's root volume combined with cc storing archives and it's disk on same volume, there is a limit of about 30GB that docker allocates ephemeral space on '/'. Had to break these other f/s paths out to different volumes with external volume mounts and make corresponding changes to new AWS Job Def
verify-range-c5-9xlarge-job:9