-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix captive core integration tests #3144
Fix captive core integration tests #3144
Conversation
After some changes suggested by @bartekn (3d5c083 + setting
|
5ab15ff
to
1e7f62f
Compare
I have finally managed to get the captive core tests to start after fast-forwarding to the first checkpoint. However, transaction submissions time out because it takes captive core ~50 seconds to obtain the new ledger :( Core seems to be happy and Captive core seems to be connected (note the
However, from the Horizon logs, it seems like Horizon cannot obtain new ledgers quickly enough and captive core loses consensus:
After a while, it seems captive core recovers consensus and Horizon obtains the ledger, but it takes 50 seconds
Full logs: horizon-stdout---supervisor-XKwdGb.log.gz |
@@ -309,7 +310,8 @@ func (i *Test) waitForCore() { | |||
|
|||
// For some reason, we need to do some waiting blackmagic for Core to publish the checkpoint. | |||
// Otherwise, it indefinitely stays in `"status" : [ "Publishing 1 queued checkpoints [63-63]: Waiting: prepare-snapshot" ]` | |||
time.Sleep(5 * time.Second) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I merged a PR with a method that can help here. The problem with the code about (close ledgers in a loop) is that manualclose is not blocking (TIL!). So running it 62 times doesn't mean 62 new ledgers will be closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it will help in the case because, even if the ledgers keep advancing, the publishing status
doesn't disappear unless you wait long enough before closing a ledger.
} | ||
} | ||
|
||
// For some reason, we need to do some waiting blackmagic for Core to publish the checkpoint. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not blackmagic, TIL when I worked on the other PR that Core will publish checkpoint after closing the next ledger after the checkpoint ledger. Maybe it's just a behaviour in in manual mode 🤷 .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The waiting times seem like blackmagic ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, the problem here isn't only advancing the ledger sequences but also getting Core to publish the checkpoint. I have observed that, unless you wait enough before manually closing a ledger, Core will remain in the checkpoint publishing state even if the ledger sequences keep advancing.
New log dump, including |
After a few conversations to @rokopt and @graydon it turns out that Core's
@graydon was nice enough to collect the issues I encountered at stellar/stellar-core#2778 . The Core team will try to resolve those issues so that To move forward with the captive-core integration-tests we have two options:
Regardless, I think I am going to park this work for a while and clear my mind working on higher-priority issues. |
556c695
to
b9be268
Compare
b9be268
to
9723255
Compare
Closes #3153
Followup to #3137