-
Notifications
You must be signed in to change notification settings - Fork 501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the detection of captive-core failures #3158
Labels
Comments
I was about to create another issues for this. Here's a small repro script I wrote (based on package main
import (
"fmt"
"github.com/stellar/go/ingest/ledgerbackend"
"github.com/stellar/go/network"
)
func main() {
check(6218)
}
func check(ledger uint32) bool {
c, err := ledgerbackend.NewCaptive(
"stellar-core",
"stellar-core-testnet.cfg",
network.TestNetworkPassphrase,
[]string{"http://history.stellar.org/prd/core-testnet/core_testnet_001/"},
)
if err != nil {
panic(err)
}
defer c.Close()
err = c.PrepareRange(ledgerbackend.UnboundedRange(ledger))
if err != nil {
fmt.Println(err)
return false
}
ok, meta, err := c.GetLedger(ledger)
if err != nil {
fmt.Println(err)
return false
}
if !ok {
fmt.Println("no ledger")
return false
}
if meta.LedgerSequence() != ledger {
fmt.Println("wrong ledger", meta.LedgerSequence())
return false
}
fmt.Println(ledger, "ok")
return true
} After starting it find |
7 tasks
bartekn
added a commit
that referenced
this issue
Nov 9, 2020
…ore backend (#3187) This commit introduces `bufferedLedgerMetaReader` which decouples buffering and unmarshaling from `stellarCoreRunner` and `CaptiveStellarCore`. `bufferedLedgerMetaReader` fixes multiple issues: * It fixes #3132 by increasing internal buffers' sizes to hold more ledgers. It makes catchup code much faster. * It fixes #3158 - `bufferedLedgerMetaReader` allowed rewriting shutdown code to a much simpler version. Now `bufferedLedgerMetaReader` and `CaptiveStellarCore` listen to a single shutdown signal: `stellarCoreRunner.getProcessExitChan()`. When Stellar-Core process terminates `bufferedLedgerMetaReader.Start` go routine will stop and `CaptiveStellarCore` will return a user friendly error in `PrepareRange` and `GetLedger` methods. When `CaptiveStellarCore.Close()` is called, it kills the Stellar-Core processing triggering shutdown code explained above. * Decouple buffering and unmarshaling into a single struct. This makes `stellarCoreRunner` and `CaptiveStellarCore` simpler. * It fixes a possible OOM issue when network closes a series of large ledgers. In such case `bufferedLedgerMetaReader` will wait for a buffer to be consumed first before reading more ledgers into memory preventing an increased memory usage.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
In the context of adding captive core integration tests (#3153 ) I found that a failure in captive core's run-from execution failed but went uunnoticed
Most relevant log lines:
Thus, Horizon seems satisfied with Core's preparation of the range. However, after that, we get:
My guess is that captive-core returned a 0 exit code. However, it seems like captive core's output was empty, and we could have detected that.
Furthermore, I think that the execution of captive core (with and without HTTP) could probably more defensive.
Full log context (using `run-from`)
The text was updated successfully, but these errors were encountered: