Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

soroban-rpc: Check DB on health endpoint #449

Merged
merged 1 commit into from
Feb 23, 2023

Conversation

2opremio
Copy link
Contributor

@2opremio 2opremio commented Feb 23, 2023

What

Make sure that the DB is ready in the getHealth() endpoint

Why

soroban-rpc is not really ready until we reach the first checkpoint (before which the DB won't be ready, and ingestion won't be able to start)

Known limitations

The getHealth endpoint now becomes a bit heavier

Copy link
Contributor

@sreuland sreuland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice, this will be super helpful for clients to leverage the health endpoint.

@2opremio 2opremio merged commit 6dc54bd into stellar:main Feb 23, 2023
@2opremio 2opremio deleted the check-db-on-health branch February 23, 2023 18:32
return HealthCheckResult{Status: "healthy"}
func NewHealthCheck(reader db.LedgerEntryReader) jrpc2.Handler {
return handler.New(func(ctx context.Context) (HealthCheckResult, error) {
if _, err := reader.GetLatestLedgerSequence(ctx); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry for the late comment but I was thinking that it might be better to declare soroban-rpc healthy if the latest ledger close time is relatively recent (e.g. within 20 seconds) because we have seen a bug where soroban-rpc is able to process the history archive snapshot and commit that to the db but then it fails to the prepare range operation on captive core. in that scenario we would have a latest ledger sequence in the db but it could be stale

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does that threshold of 20 seconds depend on checkpoint acceleration also, suggestion sounds like a worthwhile internal enhancement if the system can determine the best threshold internally, i.e. don't need to provide externalized config parameter?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, making the threshold configurable makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants