-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
services/horizon: Allow captive core to start from any ledger. #3160
services/horizon: Allow captive core to start from any ledger. #3160
Conversation
25ad1bf
to
eee405a
Compare
@@ -290,14 +290,18 @@ func (c *CaptiveStellarCore) runFromParams(from uint32) (runFrom uint32, ledgerH | |||
// ledger and then fast-forward to the desire ledger | |||
// | |||
// | |||
runFrom = roundDownToFirstReplayAfterCheckpointStart(from) - 1 | |||
runFrom = from - 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If from
is 1, wouldn't this always fail because runFrom
will be 0 and this is invalid ledger sequence? While this will never happen in Horizon I think it can be called by other users. I think we need a table in the comment with return values for the following arguments: 1, 2, 3 (corner cases because core would start from ledger 2 instead of 1), 62, 63, 64 (corner cases around first checkpoint), 126, 127, 128 (corner cases for general case). We also need tests for each of these.
But before that I think there's another problem with the approach here. Let's say that we need to restart Horizon (ex. version upgrade) and from
is 2 ledgers after checkpoint, for simplicity here: 127+2=129. Then we won't see the ledger 128 (from-1) in the archives until the next checkpoint is closed so in 5 minutes. This is a long time. What we can do is start from previous checkpoint (it should be in the archive already) and fast forward from there. Surprisingly, I think this should simplify this function.
Finally, we still have the trust issue: it's possible that bad actor has changed archives and we'll learn about it only when core errors. And as Nicolas mentioned internally it can be after, possibly, hours of catchup.
To sum it up I think we should:
- Fix the wait-for-checkpoint issue I explained above because this will be a problem anyway.
- Add more tests to check corner cases as above (1, 2, 3, 62, 63, 64, 126, 127, 128).
- In another PR, let's allow CaptiveCore to get ledger hashes we know about. This can be done by passing a store that implement a known interface with a method to return hash based on sequence number (@tamirms's idea).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implemented 1 and 2 in 8adeebf. Also added a small tool to run tests against running standalone network to ensure params are correct.
6bf5b55
to
8b3a263
Compare
eee405a
to
560a216
Compare
Thanks! This seems to fix #3157 ! (which I tested using #3144 ). However, I find confusing that the stats obtained from
The root stats are still at 0 (including the CoreSequence and IngestSequence): I would expect the |
@2opremio I think I run into this while working on this PR but haven't debugged it much yet. I suspect that changes in #3106 broke something. If you |
It's strange, because after ledger 64 is reached (according to the logs) the |
True.
|
@2opremio I noticed there is a new env variable: I'm going to approve this PR but please 👍 too because I worked on this partially. And maybe let's more discussion about the issue with a container to a new issue. |
OK, just for the record. This seems to be the problem:
I think it should be |
PR Checklist
PR Structure
otherwise).
services/friendbot
, orall
ordoc
if the changes are broad or impact manypackages.
Thoroughness
.md
files, etc... affected by this change). Take a look in the
docs
folder for a given service,like this one.
Release planning
needed with deprecations, added features, breaking changes, and DB schema changes.
semver, or if it's mainly a patch change. The PR is targeted at the next
release branch if it's not a patch change.
What
Allow captive core to start from any ledger.
Why
Previously we were limiting the ledgers where online captive core could start since we were always trying to start (captive core) from the previous check-point ledger.
This was probably problematic since this wouldn't work for ledgers smaller than 63.
Known limitations
[TODO or N/A]