Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(zksync_cli): Health checkpoint improvements #3193

Merged
merged 70 commits into from
Nov 26, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
8dd58dc
feat: add dummy node version heathcheck
manuelmauro Oct 29, 2024
98167a1
feat: add version information to healthcheck
manuelmauro Oct 30, 2024
d08ecae
refactor: simplify static health check
manuelmauro Oct 30, 2024
a058388
feat: add last migration to system_dal
manuelmauro Oct 30, 2024
a5f571b
feat: add database healthcheck
manuelmauro Oct 31, 2024
a5db49a
feat: add more information to database heathcheck
manuelmauro Oct 31, 2024
b57027c
style: format code
manuelmauro Oct 31, 2024
5d23bd6
chore: prepare sqlx queries
manuelmauro Oct 31, 2024
959542b
fix: remove outdated query file
manuelmauro Oct 31, 2024
28ecc96
feat: improve bytes encoding in healthcheck
manuelmauro Oct 31, 2024
3d7e82c
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Oct 31, 2024
de3a5e1
feat: add dummy health check tasks for state keeper and eth sender
manuelmauro Oct 31, 2024
582754b
fix: do not unwrap
manuelmauro Oct 31, 2024
1b8e50e
feat: retrieve failed L1 transactions and next operator nonce
manuelmauro Oct 31, 2024
68210aa
feat: add information on last saved/mined batches to healthcheck
manuelmauro Nov 4, 2024
2fc3da9
feat: get last miniblock number from DB
manuelmauro Nov 4, 2024
098e54f
feat: add protocol version information to healthcheck
manuelmauro Nov 4, 2024
4cbb890
feat: add last processed L1 batch to health check
manuelmauro Nov 4, 2024
f60e3a9
refactor: use SELECT MAX instead of ORDER BY
manuelmauro Nov 4, 2024
927cddb
refactor: rename LastBatchIndex to BatchNumbers
manuelmauro Nov 4, 2024
80db2ce
fix: revert code committed by mistake
manuelmauro Nov 6, 2024
2c9ca8d
feat: add config parameters for healthcheck polling intervals
manuelmauro Nov 6, 2024
98f8d72
fix: fix house keeper config from env test
manuelmauro Nov 6, 2024
988b957
fix: fix house keeper config parameters naming in unit test
manuelmauro Nov 6, 2024
293484d
fix: use u64 for failed_l1_txns
manuelmauro Nov 7, 2024
3ad3fd2
fix: return u64 in get_number_of_failed_transactions
manuelmauro Nov 7, 2024
b31233b
feat: use connection_tagged for better code instumentation
manuelmauro Nov 7, 2024
5396e5a
feat: add reactive health check to state keeper
manuelmauro Nov 7, 2024
f51785c
fix: update state keeper health at the right moment
manuelmauro Nov 7, 2024
bb5acfc
refactor: use ORDER BY to query last database migration
manuelmauro Nov 7, 2024
fd35543
feat: add reactive health check to eth sender
manuelmauro Nov 8, 2024
416ea50
style: clippy
manuelmauro Nov 8, 2024
d0ac511
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 8, 2024
8d4c176
style: move field before reserved ones
manuelmauro Nov 13, 2024
eb2fb68
refactor: rename PostgresMetricsLayer to PostgresLayer
manuelmauro Nov 13, 2024
80986e4
refactor: rename postgres_metrics_layer to postgres_layer
manuelmauro Nov 13, 2024
9bffd99
refactor: rename module postgres_layer to postgres
manuelmauro Nov 13, 2024
5e235b1
refactor: move database health check task to postgres layer
manuelmauro Nov 13, 2024
884b864
feat: implement Serialize and Deserialize directly on AggregatedActio…
manuelmauro Nov 13, 2024
c701399
refactor: make DatabaseHealthTask fields private
manuelmauro Nov 13, 2024
53cca2f
refactor: remove redundant health status updates
manuelmauro Nov 13, 2024
fe339c2
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 13, 2024
9fe98d2
feat: make health mod private
manuelmauro Nov 13, 2024
36ad7c5
refactor: remove StateKeeperTask constructor
manuelmauro Nov 13, 2024
1585d5d
refactor: use getter for health updater
manuelmauro Nov 13, 2024
f9e2ebf
refactor: clippy
manuelmauro Nov 14, 2024
26aa824
feat: add git information to RustcMetadata
manuelmauro Nov 14, 2024
f06a415
refactor: rename rustc module to binary
manuelmauro Nov 14, 2024
bdc50f4
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 14, 2024
39f3864
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 15, 2024
195aee4
refactor: remove redundant health status update
manuelmauro Nov 15, 2024
4e14447
refactor: remove redundant health check update
manuelmauro Nov 15, 2024
2acadca
fix: remove unused dependencies
manuelmauro Nov 15, 2024
d5d78fc
revert: revert formatting changes
manuelmauro Nov 15, 2024
f1d618c
Merge branch 'main' into manuel-add-more-components-to-healthcheck
Deniallugo Nov 15, 2024
ec9ae20
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 15, 2024
416c187
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 18, 2024
9964ee7
feat: use Duration for migrations' execution_time
manuelmauro Nov 18, 2024
19275b6
refactor: use same interval for Postgres metrics exporter and healthc…
manuelmauro Nov 18, 2024
8911cc2
refactor: do not split use and mod declarations
manuelmauro Nov 18, 2024
6bc90e2
refactor: use Option type instead of "unknown"
manuelmauro Nov 18, 2024
d2c983c
feat: update state keeper health from cursor right away
manuelmauro Nov 18, 2024
25d8b26
refactor: merge tx status into tx details
manuelmauro Nov 18, 2024
d7d62ec
refactor: clippy
manuelmauro Nov 18, 2024
a994cc6
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 18, 2024
93b3153
Merge branch 'main' into manuel-add-more-components-to-healthcheck
manuelmauro Nov 19, 2024
16d1358
feat: integrate binary metadata into AppHealth
manuelmauro Nov 19, 2024
b6bdda4
feat: split git metrics from rust metrics
manuelmauro Nov 21, 2024
785fe5b
style: format code
manuelmauro Nov 21, 2024
e30ebac
refactor: nit BinMetadata creation
manuelmauro Nov 21, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 2 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ members = [
"core/lib/da_client",
"core/lib/eth_client",
"core/lib/eth_signer",
"core/lib/git_version_macro",
"core/lib/l1_contract_interface",
"core/lib/mempool",
"core/lib/merkle_tree",
Expand Down Expand Up @@ -272,6 +273,7 @@ zksync_eth_client = { version = "0.1.0", path = "core/lib/eth_client" }
zksync_da_client = { version = "0.1.0", path = "core/lib/da_client" }
zksync_eth_signer = { version = "0.1.0", path = "core/lib/eth_signer" }
zksync_health_check = { version = "0.1.0", path = "core/lib/health_check" }
zksync_git_version_macro = { version = "0.1.0", path = "core/lib/git_version_macro" }
zksync_l1_contract_interface = { version = "0.1.0", path = "core/lib/l1_contract_interface" }
zksync_mempool = { version = "0.1.0", path = "core/lib/mempool" }
zksync_merkle_tree = { version = "0.1.0", path = "core/lib/merkle_tree" }
Expand Down
3 changes: 3 additions & 0 deletions core/lib/config/src/configs/house_keeper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -4,4 +4,7 @@ use serde::Deserialize;
#[derive(Debug, Deserialize, Clone, PartialEq)]
pub struct HouseKeeperConfig {
pub l1_batch_metrics_reporting_interval_ms: u64,
pub database_health_polling_interval_ms: u64,
pub eth_sender_health_polling_interval_ms: u64,
pub state_keeper_health_polling_interval_ms: u64,
}
3 changes: 3 additions & 0 deletions core/lib/config/src/testonly.rs
Original file line number Diff line number Diff line change
Expand Up @@ -637,6 +637,9 @@ impl Distribution<configs::house_keeper::HouseKeeperConfig> for EncodeDist {
fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> configs::house_keeper::HouseKeeperConfig {
configs::house_keeper::HouseKeeperConfig {
l1_batch_metrics_reporting_interval_ms: self.sample(rng),
database_health_polling_interval_ms: self.sample(rng),
eth_sender_health_polling_interval_ms: self.sample(rng),
state_keeper_health_polling_interval_ms: self.sample(rng),
}
}
}
Expand Down

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

37 changes: 37 additions & 0 deletions core/lib/dal/src/system_dal.rs
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
use std::{collections::HashMap, time::Duration};

use chrono::DateTime;
use serde::{Deserialize, Serialize};
use zksync_db_connection::{connection::Connection, error::DalResult, instrument::InstrumentExt};

use crate::Core;
Expand All @@ -12,6 +14,16 @@ pub(crate) struct TableSize {
pub total_size: u64,
}

#[derive(Debug, Serialize, Deserialize)]
pub struct DatabaseMigration {
pub version: i64,
pub description: String,
pub installed_on: DateTime<chrono::Utc>,
pub success: bool,
pub checksum: String,
pub execution_time: i64,
manuelmauro marked this conversation as resolved.
Show resolved Hide resolved
}

#[derive(Debug)]
pub struct SystemDal<'a, 'c> {
pub(crate) storage: &'a mut Connection<'c, Core>,
Expand Down Expand Up @@ -86,4 +98,29 @@ impl SystemDal<'_, '_> {
});
Ok(table_sizes.collect())
}

pub async fn get_last_migration(&mut self) -> DalResult<DatabaseMigration> {
let row = sqlx::query!(
r#"
SELECT *
FROM _sqlx_migrations
WHERE _sqlx_migrations.version = (
SELECT MAX(_sqlx_migrations.version)
FROM _sqlx_migrations
);
manuelmauro marked this conversation as resolved.
Show resolved Hide resolved
"#
)
.instrument("get_last_migration")
.fetch_one(self.storage)
.await?;

Ok(DatabaseMigration {
version: row.version,
description: row.description,
installed_on: row.installed_on,
success: row.success,
checksum: hex::encode(row.checksum),
execution_time: row.execution_time,
})
}
}
6 changes: 6 additions & 0 deletions core/lib/env_config/src/house_keeper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ mod tests {
fn expected_config() -> HouseKeeperConfig {
HouseKeeperConfig {
l1_batch_metrics_reporting_interval_ms: 10_000,
database_health_polling_interval_ms: 10_000,
eth_sender_health_polling_interval_ms: 10_000,
state_keeper_health_polling_interval_ms: 10_000,
}
}

Expand All @@ -26,6 +29,9 @@ mod tests {
let mut lock = MUTEX.lock();
let config = r#"
HOUSE_KEEPER_L1_BATCH_METRICS_REPORTING_INTERVAL_MS="10000"
HOUSE_KEEPER_DATABASE_HEALTH_POLLING_INTERVAL_MS="10000"
HOUSE_KEEPER_ETH_SENDER_HEALTH_POLLING_INTERVAL_MS="10000"
HOUSE_KEEPER_STATE_KEEPER_HEALTH_POLLING_INTERVAL_MS="10000"
"#;
lock.set_env(config);

Expand Down
16 changes: 16 additions & 0 deletions core/lib/git_version_macro/Cargo.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
[package]
name = "zksync_git_version_macro"
edition = "2021"
description = "Procedural macro to generate metainformation about build in compile time"
version.workspace = true
homepage.workspace = true
license.workspace = true
authors.workspace = true
repository.workspace = true
keywords.workspace = true

[lib]
proc-macro = true

[dependencies]
chrono.workspace = true
81 changes: 81 additions & 0 deletions core/lib/git_version_macro/src/lib.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
extern crate proc_macro;
use std::{process::Command, str::FromStr};

use proc_macro::TokenStream;

/// Outputs the current date and time as a string literal.
/// Can be used to include the build timestamp in the binary.
#[proc_macro]
pub fn build_timestamp(_item: TokenStream) -> TokenStream {
let now = chrono::Local::now().format("%Y-%m-%d %H:%M:%S").to_string();
encode_as_str(&now)
}

/// Outputs the current git branch as a string literal.
#[proc_macro]
pub fn build_git_branch(_item: TokenStream) -> TokenStream {
let out = run_cmd("git", &["rev-parse", "--abbrev-ref", "HEAD"]);
encode_as_str(&out)
}

/// Outputs the current git commit hash as a string literal.
#[proc_macro]
pub fn build_git_revision(_item: TokenStream) -> TokenStream {
let out = run_cmd("git", &["rev-parse", "--short", "HEAD"]);
manuelmauro marked this conversation as resolved.
Show resolved Hide resolved
encode_as_str(&out)
}

/// Creates a slice of `&[(&str, &str)]` tuples that correspond to
/// the submodule name -> revision.
/// Results in an empty list if there are no submodules or if
/// the command fails.
#[proc_macro]
pub fn build_git_submodules(_item: TokenStream) -> TokenStream {
let Some(out) = run_cmd_opt("git", &["submodule", "status"]) else {
return TokenStream::from_str("&[]").unwrap();
};
let submodules = out
.lines()
.filter_map(|line| {
let parts: Vec<&str> = line.split_whitespace().collect();
// Index 0 is commit hash, index 1 is the path to the folder, and there
// may be some metainformation after that.
if parts.len() >= 2 {
let folder_name = parts[1].split('/').last().unwrap_or(parts[1]);
Some((folder_name, parts[0]))
} else {
None
}
})
.collect::<Vec<_>>();
let submodules = submodules
.iter()
.map(|(name, rev)| format!("(\"{}\", \"{}\")", name, rev))
.collect::<Vec<_>>()
.join(", ");
TokenStream::from_str(format!("&[{}]", submodules).as_str())
.unwrap_or_else(|_| panic!("Unable to encode submodules: {}", submodules))
}

/// Tries to run the command, only returns `Some` if the command
/// succeeded and the output was valid utf8.
fn run_cmd(cmd: &str, args: &[&str]) -> String {
run_cmd_opt(cmd, args).unwrap_or("unknown".to_string())
}

fn run_cmd_opt(cmd: &str, args: &[&str]) -> Option<String> {
let output = Command::new(cmd).args(args).output().ok()?;
if output.status.success() {
String::from_utf8(output.stdout)
.ok()
.map(|s| s.trim().to_string())
} else {
None
}
}

/// Encodes string as a literal.
fn encode_as_str(s: &str) -> TokenStream {
TokenStream::from_str(format!("\"{}\"", s).as_str())
.unwrap_or_else(|_| panic!("Unable to encode string: {}", s))
}
20 changes: 20 additions & 0 deletions core/lib/protobuf_config/src/house_keeper.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,21 @@ impl ProtoRepr for proto::HouseKeeper {
&self.l1_batch_metrics_reporting_interval_ms,
)
.context("l1_batch_metrics_reporting_interval_ms")?,

database_health_polling_interval_ms: *required(
&self.database_health_polling_interval_ms,
)
.context("database_health_polling_interval_ms")?,

eth_sender_health_polling_interval_ms: *required(
&self.eth_sender_health_polling_interval_ms,
)
.context("eth_sender_health_polling_interval_ms")?,

state_keeper_health_polling_interval_ms: *required(
&self.state_keeper_health_polling_interval_ms,
)
.context("state_keeper_health_polling_interval_ms")?,
})
}

Expand All @@ -20,6 +35,11 @@ impl ProtoRepr for proto::HouseKeeper {
l1_batch_metrics_reporting_interval_ms: Some(
this.l1_batch_metrics_reporting_interval_ms,
),
database_health_polling_interval_ms: Some(this.database_health_polling_interval_ms),
eth_sender_health_polling_interval_ms: Some(this.eth_sender_health_polling_interval_ms),
state_keeper_health_polling_interval_ms: Some(
this.state_keeper_health_polling_interval_ms,
),
}
}
}
3 changes: 3 additions & 0 deletions core/lib/protobuf_config/src/proto/config/house_keeper.proto
Original file line number Diff line number Diff line change
Expand Up @@ -17,4 +17,7 @@ message HouseKeeper {
reserved 15; reserved "prover_job_archiver_archive_after_secs";
reserved 16; reserved "fri_gpu_prover_archiver_archiving_interval_ms";
reserved 17; reserved "fri_gpu_prover_archiver_archive_after_secs";
optional uint64 database_health_polling_interval_ms = 18; // required; ms
manuelmauro marked this conversation as resolved.
Show resolved Hide resolved
optional uint64 eth_sender_health_polling_interval_ms = 19; // required; ms
optional uint64 state_keeper_health_polling_interval_ms = 20; // required; ms
}
3 changes: 3 additions & 0 deletions core/node/house_keeper/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,11 @@ keywords.workspace = true
categories.workspace = true

[dependencies]
serde.workspace = true
vise.workspace = true
zksync_dal.workspace = true
zksync_git_version_macro.workspace = true
zksync_health_check.workspace = true
zksync_shared_metrics.workspace = true
zksync_types.workspace = true
zksync_config.workspace = true
Expand Down
42 changes: 42 additions & 0 deletions core/node/house_keeper/src/database.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
use async_trait::async_trait;
use serde::{Deserialize, Serialize};
use zksync_dal::{system_dal::DatabaseMigration, ConnectionPool, Core, CoreDal};
use zksync_health_check::{Health, HealthStatus, HealthUpdater};

use crate::periodic_job::PeriodicJob;

#[derive(Debug, Serialize, Deserialize)]
pub struct DatabaseInfo {
last_migration: DatabaseMigration,
}

impl From<DatabaseInfo> for Health {
fn from(details: DatabaseInfo) -> Self {
Self::from(HealthStatus::Ready).with_details(details)
}
}

#[derive(Debug)]
pub struct DatabaseHealthTask {
pub polling_interval_ms: u64,
pub connection_pool: ConnectionPool<Core>,
pub database_health_updater: HealthUpdater,
manuelmauro marked this conversation as resolved.
Show resolved Hide resolved
}

#[async_trait]
impl PeriodicJob for DatabaseHealthTask {
const SERVICE_NAME: &'static str = "DatabaseHealth";

async fn run_routine_task(&mut self) -> anyhow::Result<()> {
let mut conn = self.connection_pool.connection().await?;
manuelmauro marked this conversation as resolved.
Show resolved Hide resolved
let last_migration = conn.system_dal().get_last_migration().await?;

self.database_health_updater
.update(DatabaseInfo { last_migration }.into());
Ok(())
}

fn polling_interval_ms(&self) -> u64 {
self.polling_interval_ms
}
}
Loading
Loading