Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add recent uptimes to validator metadata in the client api #800

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions analyzer/consensus/consensus.go
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,8 @@ import (

const (
consensusAnalyzerName = "consensus"

validatorUptimesViewRefreshQuery = `REFRESH MATERIALIZED VIEW CONCURRENTLY views.validator_uptimes`
)

type EventType = apiTypes.ConsensusEventType // alias for brevity
Expand Down Expand Up @@ -316,6 +318,10 @@ func (m *processor) queueDbUpdates(batch *storage.QueryBatch, data allData) erro
}
}

if m.mode == analyzer.SlowSyncMode {
batch.Queue(validatorUptimesViewRefreshQuery)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to run this after every processed block. Since the "partial windows" are 1200 blocks in size, refreshing every ~12 blocks (1% of the window size) or ~60 blocks (5% of the window size) should be reasonable. Any change is unlikely to be noticeable before that.

It might be clearer to run this periodically, such as every minute or so, with the interval being configurable.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's refresh this in a new ItemBasedAnalyzer, see accounts_list.go for an example. Alternatively, feel free to refactor that analyzer to refresh both views; I think we'll soon have more views that need refreshing, and it'd be good to have them all in one spot so we can tweak how often they get recalculated.

}

return nil
}

Expand Down
27 changes: 26 additions & 1 deletion api/spec/v1.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2085,8 +2085,33 @@ components:
description: An array containing details of the last 100 consensus blocks, indicating whether each block was signed by the validator. Only available when querying a single validator.
items:
allOf: [$ref: '#/components/schemas/ValidatorSignedBlock']
uptime:
allOf: [$ref: '#/components/schemas/ValidatorUptime']
description: The validator's uptime statistics for a period of time up to now.
description: |
An validator registered at the consensus layer.
A validator registered at the consensus layer.

ValidatorUptime:
type: object
properties:
window_length:
type: integer
format: uint64
description: The length of the window this object describes, in blocks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that this is currently always 14400 (~last 24 hours).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
description: The length of the window this object describes, in blocks.
description: The length of the historical window for which this object provides uptime information, in blocks. Currently defaults to 14400 blocks, or approximately 24 hours.

partial_length:
type: integer
format: uint64
description: The length of the partial windows within window_length, in blocks.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that this will currently always be 1200 (~2 hours).

Comment on lines +2101 to +2104
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
partial_length:
type: integer
format: uint64
description: The length of the partial windows within window_length, in blocks.
segment_length:
type: integer
format: uint64
description: The length of the window segment, in blocks. We subdivide the window into segments of equal length and aggregate the uptime of each segment into `segment_uptimes`. Currently defaults to 1200 blocks, which is approximately 2 hours.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not very enthusiastic about segment, I considered using chunk or subwindow too. Feel free to change

overall_uptime:
type: integer
format: uint64
description: The number of blocks signed by the validator out of the last window_length blocks.
partial_uptimes:
type: array
description: An array showing the signed block counts for each partial slot within window_length.
Comment on lines +2109 to +2111
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
partial_uptimes:
type: array
description: An array showing the signed block counts for each partial slot within window_length.
segment_uptimes:
type: array
description: An array showing the signed block counts for each sub-segment within window_length. The segments are in reverse-chronological order; ie the first element represents the most recent segment of blocks.

items:
type: integer
format: uint64

ValidatorSignedBlock:
type: object
Expand Down
40 changes: 40 additions & 0 deletions storage/client/client.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,12 @@ const (
maxTotalCount = 1000
)

var (
// These two should be kept the same as in the views.validator_uptimes materialized view.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two can be made constants instead of variables.

uptimeWindowBlocks = uint64(14400)
uptimeSlotBlocks = uint64(1200)
Comment on lines +49 to +50
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm wdyt of hardcoding these into the postgres query instead of here? It'd be nice to colocate them together so that if it changes, we only need to make changes in one place.

)

// StorageClient is a wrapper around a storage.TargetStorage
// with knowledge of network semantics.
type StorageClient struct {
Expand Down Expand Up @@ -1191,6 +1197,11 @@ func (c *StorageClient) ProposalVotes(ctx context.Context, proposalID uint64, p
return &vs, nil
}

type validatorBlockCounts struct {
OverallCount uint64
SlotCounts []uint64
}

// Validators returns a list of validators, or optionally the single validator matching `address`.
func (c *StorageClient) Validators(ctx context.Context, p apiTypes.GetConsensusValidatorsParams, address *staking.Address) (*ValidatorList, error) {
var epoch Epoch
Expand All @@ -1213,6 +1224,26 @@ func (c *StorageClient) Validators(ctx context.Context, p apiTypes.GetConsensusV
return nil, wrapError(err)
}

// Prepare validator uptime metadata.
uptimes, err := c.withTotalCount(ctx, queries.ValidatorUptimes, address)
if err != nil {
return nil, wrapError(err)
}
defer uptimes.rows.Close()
uptimesBySigner := map[string]validatorBlockCounts{}
for uptimes.rows.Next() {
var signerEntityID string
var signedSlots []uint64
var signedOverall uint64
if err = uptimes.rows.Scan(&signerEntityID, &signedSlots, &signedOverall); err != nil {
return nil, wrapError(err)
}
uptimesBySigner[signerEntityID] = validatorBlockCounts{
OverallCount: signedOverall,
SlotCounts: signedSlots,
}
}

res, err := c.withTotalCount(
ctx,
queries.ValidatorsData,
Expand Down Expand Up @@ -1282,6 +1313,15 @@ func (c *StorageClient) Validators(ctx context.Context, p apiTypes.GetConsensusV
}
}

if uptime, ok := uptimesBySigner[v.EntityID]; ok {
v.Uptime = &apiTypes.ValidatorUptime{
OverallUptime: &uptime.OverallCount,
PartialUptimes: &uptime.SlotCounts,
WindowLength: &uptimeWindowBlocks,
PartialLength: &uptimeSlotBlocks,
}
}

if next > 0 {
v.CurrentCommissionBound.EpochEnd = next
}
Expand Down
52 changes: 52 additions & 0 deletions storage/client/queries/queries.go
Original file line number Diff line number Diff line change
Expand Up @@ -433,6 +433,58 @@ const (
LIMIT $4::bigint
OFFSET $5::bigint`

ValidatorUptimes = `
SELECT * FROM views.validator_uptimes
WHERE ($1::text IS NULL OR signer_entity_id = $1::text)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the input param $1 is the staking address, can add a join with chain.entities.address to filter by address here. Alternatively, it might be cleaner to just include the entity address in the materialized view to avoid the join.

ORDER BY signer_entity_id`
/*ValidatorUptimes = `
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: remove

-- With a limit of 14400 blocks, this is the last ~24 hrs of signatures.
WITH last_window_blocks AS (
SELECT height, signer_entity_ids
FROM chain.blocks
ORDER BY height DESC
LIMIT $2
),
-- Generate a series of 12 slots representing ~2 hours within the window.
all_slots AS (
SELECT generate_series(0, 11) AS slot_id
),
-- Slots of blocks of ~2 hours within the main window, with expanded signers.
slot_blocks AS (
SELECT
height,
UNNEST(signer_entity_ids) AS signer_entity_id,
(ROW_NUMBER() OVER (ORDER BY height DESC) - 1) / $3 AS slot_id
FROM last_window_blocks
),
-- Count signed blocks in each slot.
slot_counts AS (
SELECT
signer_entity_id,
slot_id,
COUNT(height) AS signed_blocks_count
FROM
slot_blocks
WHERE
($1::text IS NULL OR signer_entity_id = $1::text)
GROUP BY
signer_entity_id, slot_id
)
-- Group windows per signer and calculate overall percentage.
SELECT
signers.signer_entity_id,
ARRAY_AGG(COALESCE(slot_counts.signed_blocks_count, 0) ORDER BY slot_counts.slot_id) AS slot_signed,
COALESCE(SUM(signed_blocks_count), 0) AS overall_signed
FROM
-- Ensure we have all windows for each signer, even if they didn't sign in a particular window.
(SELECT DISTINCT signer_entity_id FROM slot_counts) AS signers
CROSS JOIN all_slots
LEFT JOIN slot_counts ON signers.signer_entity_id = slot_counts.signer_entity_id AND all_slots.slot_id = slot_counts.slot_id
GROUP BY
signers.signer_entity_id
ORDER BY
signers.signer_entity_id`*/

RuntimeBlocks = `
SELECT round, block_hash, timestamp, num_transactions, size, gas_used
FROM chain.runtime_blocks
Expand Down
50 changes: 50 additions & 0 deletions storage/migrations/07_validator_uptimes.up.sql
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
BEGIN;

CREATE MATERIALIZED VIEW views.validator_uptimes AS
-- With a limit of 14400 blocks, this is the last ~24 hrs of signatures.
WITH last_window_blocks AS (
SELECT height, signer_entity_ids
FROM chain.blocks
ORDER BY height DESC
LIMIT 14400
),
-- Generate a series of 12 slots representing ~2 hours within the window.
all_slots AS (
SELECT generate_series(0, 11) AS slot_id
),
-- Slots of blocks of ~2 hours within the main window, with expanded signers.
slot_blocks AS (
SELECT
height,
UNNEST(signer_entity_ids) AS signer_entity_id,
(ROW_NUMBER() OVER (ORDER BY height DESC) - 1) / 1200 AS slot_id
Comment on lines +19 to +20
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you double check that the unnest does not cause unexpected results with row_number here? I'm wondering if each (signer x height) will have its own row which would then cause row_number to vastly surpass 14400 and thus overflow the slot_id.

FROM last_window_blocks
),
-- Count signed blocks in each slot.
slot_counts AS (
SELECT
signer_entity_id,
slot_id,
COUNT(height) AS signed_blocks_count
FROM
slot_blocks
-- Compute this for all validators; the client can select from the view if needed.
GROUP BY
signer_entity_id, slot_id
)
-- Group windows per signer and calculate overall percentage.
SELECT
signers.signer_entity_id,
ARRAY_AGG(COALESCE(slot_counts.signed_blocks_count, 0) ORDER BY slot_counts.slot_id) AS slot_signed,
COALESCE(SUM(signed_blocks_count), 0) AS overall_signed
FROM
-- Ensure we have all windows for each signer, even if they didn't sign in a particular window.
(SELECT DISTINCT signer_entity_id FROM slot_counts) AS signers
CROSS JOIN all_slots
LEFT JOIN slot_counts ON signers.signer_entity_id = slot_counts.signer_entity_id AND all_slots.slot_id = slot_counts.slot_id
Comment on lines +37 to +44
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Neat sql-fu! Seems like the e2e tests might not be suited to testing this properly; mind validating it locally by running a local nexus instance and printing the table / querying the api? Thanks!

GROUP BY
signers.signer_entity_id;

CREATE UNIQUE INDEX ix_views_validator_uptimes_signer_entity_id ON views.validator_uptimes (signer_entity_id); -- A unique index is required for CONCURRENTLY refreshing the view.

END;
Loading