Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support opening an in-use rocksdb as secondary #10209

Merged
merged 12 commits into from
Jun 3, 2020

Conversation

ryoqun
Copy link
Member

@ryoqun ryoqun commented May 24, 2020

Problem

it seems that the rocksdb's notorious limitation is now lifted: facebook/rocksdb#4899 ㊗️

Maybe our bitter experience caused blindness for the rocksdb; it's certainly improving. ;)

Summary of Changes

Fallback to secondary access when primary failed from ledget-tool's read-only subcommands. solana-validator and other subcommands behavior hasn't been changed.

This realizes to run create-snapshot while a validator is running and other various debugging subcommands.

PS: I know my half-backed ledger-tool pr #9490 is laying around... Sorry for our longer pr queue. ;) I thought this worth to interrupt. :)

@ryoqun ryoqun requested a review from mvines May 24, 2020 09:18
@@ -873,7 +881,7 @@ fn main() {
let starting_slot = value_t_or_exit!(arg_matches, "starting_slot", Slot);
let allow_dead_slots = arg_matches.is_present("allow_dead_slots");
output_ledger(
open_blockstore(&ledger_path),
open_blockstore(&ledger_path, AccessType::AllowSecondary),
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: carefully audit for the appropriate AccessType for all of subcommands.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I've tested all the subcommands and they worked as intended.

@mvines FYI, I found graph to be very useful to create a graph right from a still running validator's ledger. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

totally, this is gonna be yuuuuge!

@ryoqun
Copy link
Member Author

ryoqun commented May 24, 2020

$ ./target/release/solana-ledger-tool bounds --ledger config/ledger/                                                                                                                                   [2020-05-24T09:24:33.743850403Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 65000
[2020-05-24T09:24:33.743862041Z INFO  solana_ledger::blockstore] Opening database at "/home/ryoqun/work/solana/solana/config/ledger/rocksdb"
[2020-05-24T09:24:34.165603805Z INFO  solana_ledger::blockstore] "/home/ryoqun/work/solana/solana/config/ledger/rocksdb" open took 421ms
Ledger has data for slots 0 to 307
$ ./target/release/solana-ledger-tool bounds --ledger config/ledger/
[2020-05-24T09:25:02.343769343Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 65000
[2020-05-24T09:25:02.343783292Z INFO  solana_ledger::blockstore] Opening database at "/home/ryoqun/work/solana/solana/config/ledger/rocksdb"
[2020-05-24T09:25:02.826810286Z WARN  solana_ledger::blockstore_db] Error when opening as primary: IO error: While lock file: /home/ryoqun/work/solana/solana/config/ledger/rocksdb/LOCK: Resource temporarily unavailable
[2020-05-24T09:25:02.826845191Z WARN  solana_ledger::blockstore_db] Trying as secondary at : "/home/ryoqun/work/solana/solana/config/ledger/rocksdb/solana-secondary"
[2020-05-24T09:25:02.826866554Z WARN  solana_ledger::blockstore_db] This active secondary db use may temporarily cause the performance of another db use (like by validator) to degrade
[2020-05-24T09:25:02.837985233Z INFO  solana_ledger::blockstore] "/home/ryoqun/work/solana/solana/config/ledger/rocksdb" open took 494ms
Ledger has data for slots 0 to 313

$ ./target/release/solana-ledger-tool verify --ledger config/ledger/ --halt-at-slot 303
genesis hash: FuSiXHRk5ueQ9oWBzi1t1i6GZCaGWQMM3hRncprZ7DkM
[2020-05-24T09:25:33.386127560Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 65000
[2020-05-24T09:25:33.386137069Z INFO  solana_ledger::blockstore] Opening database at "/home/ryoqun/work/solana/solana/config/ledger/rocksdb"
[2020-05-24T09:25:33.387410733Z WARN  solana_ledger::blockstore_db] Error when opening as primary: IO error: While lock file: /home/ryoqun/work/solana/solana/config/ledger/rocksdb/LOCK: Resource temporarily unavailable
[2020-05-24T09:25:33.387419105Z WARN  solana_ledger::blockstore_db] Trying as secondary at : "/home/ryoqun/work/solana/solana/config/ledger/rocksdb/solana-secondary"
[2020-05-24T09:25:33.387425640Z WARN  solana_ledger::blockstore_db] This active secondary db use may temporarily cause the performance of another db use (like by validator) to degrade
[2020-05-24T09:25:33.423248225Z INFO  solana_ledger::blockstore] "/home/ryoqun/work/solana/solana/config/ledger/rocksdb" open took 37ms
Default accounts path is switched aligning with Blockstore's non-primary access: "/home/ryoqun/work/solana/solana/config/ledger/accounts.ledger-tool"
[2020-05-24T09:25:33.423352152Z INFO  solana_ledger::bank_forks_utils] Initializing snapshot path: "/home/ryoqun/work/solana/solana/config/ledger/snapshot"
[2020-05-24T09:25:33.424167639Z INFO  solana_ledger::bank_forks_utils] Loading snapshot package: "/home/ryoqun/work/solana/solana/config/ledger/snapshot-301-4Wg71pDqRpiBzqkLALqPDJRpLQSyrFQqQCYxdRMk4t9w.tar.bz2"
[2020-05-24T09:25:33.435946713Z INFO  solana_ledger::snapshot_utils] snapshot untar took 11ms
[2020-05-24T09:25:33.435967384Z INFO  solana_ledger::snapshot_utils] snapshot version: 1.1.0
[2020-05-24T09:25:33.435984389Z INFO  solana_ledger::snapshot_utils] Loading bank from "/home/ryoqun/work/solana/solana/config/ledger/snapshot/.tmpwI0M5H/snapshots/301/301"
[2020-05-24T09:25:33.436465012Z INFO  solana_ledger::snapshot_utils] Rebuilding accounts...
[2020-05-24T09:25:33.438081941Z INFO  solana_ledger::snapshot_utils] Rebuilding status cache...
[2020-05-24T09:25:33.438795063Z INFO  solana_ledger::snapshot_utils] Loaded bank for slot: 301
[2020-05-24T09:25:33.438831362Z INFO  solana_runtime::accounts_db] total_stores: 6, newest_slot: 301, oldest_slot: 0, max_slot: 0 (num=5), min_slot: 301 (num=1)
[2020-05-24T09:25:33.438844764Z INFO  solana_metrics::metrics] metrics disabled: SOLANA_METRICS_CONFIG: environment variable not found
[2020-05-24T09:25:33.438947576Z INFO  solana_metrics::metrics] datapoint: accounts_db-stores total_count=6i
[2020-05-24T09:25:33.439057557Z INFO  solana_metrics::metrics] datapoint: accounts_db-perf-stats delta_hash_num=0i delta_hash_scan_us=0i delta_hash_merge_us=0i delta_hash_accumulate_us=0i
[2020-05-24T09:25:33.441309245Z INFO  solana_runtime::bank] bank frozen: 301 hash: LKXnwzu6gEg9v3YbV3WxemCQZEa2umqjZgtVosDqimF accounts_delta: 5wGmVMdRPVKZFfhDP3TKGt5VEGTDv1e8F4BT9LhGGrx2 signature_count: 1 last_blockhash: HoSQozeXUGL1MPs4yxKuBd9aMiKhx7c1dB6yUaNEusZX capitalization: 999999999999252500
[2020-05-24T09:25:33.441332341Z INFO  solana_runtime::bank] accounts hash slot: 301 stats: BankHashStats { num_updated_accounts: 8, num_removed_accounts: 0, num_lamports_stored: 1000025361145, total_data_len: 161372, num_executable_accounts: 0 }
[2020-05-24T09:25:33.441340438Z INFO  solana_ledger::snapshot_utils] bank rebuild from snapshot took 5ms
[2020-05-24T09:25:33.442264216Z INFO  solana_ledger::blockstore_processor] processing ledger from slot 301...
[2020-05-24T09:25:33.442727341Z INFO  solana_ledger::blockstore_processor] ledger holds data through slot 385
[2020-05-24T09:25:33.443071843Z INFO  solana_metrics::metrics] datapoint: bank-new_from_parent-heights slot_height=302i block_height=301i
[2020-05-24T09:25:33.445336463Z INFO  solana_metrics::metrics] datapoint: collect_rent_eagerly accounts=0i
[2020-05-24T09:25:33.446527731Z INFO  solana_runtime::bank] bank frozen: 302 hash: BCsYDV9zWbPLT4URQ2rPEbNz62xJXjW2RqSyVMcCe25c accounts_delta: DAeQf1KESANwU4Pqh8BX4aDqaF3ypC9r4x38TyN882xq signature_count: 1 last_blockhash: S9DCk2biCzfSTN1EintdhcatB9dXTBT7JM1tipRaKTh capitalization: 999999999999250000
[2020-05-24T09:25:33.446547925Z INFO  solana_runtime::bank] accounts hash slot: 302 stats: BankHashStats { num_updated_accounts: 8, num_removed_accounts: 0, num_lamports_stored: 1000025356145, total_data_len: 161372, num_executable_accounts: 0 }
[2020-05-24T09:25:33.446601988Z INFO  solana_metrics::metrics] datapoint: bank-new_from_parent-heights slot_height=303i block_height=302i
[2020-05-24T09:25:33.447783013Z INFO  solana_metrics::metrics] datapoint: collect_rent_eagerly accounts=0i
[2020-05-24T09:25:33.449290451Z INFO  solana_runtime::bank] bank frozen: 303 hash: 7kV9D6BnNtonmm5pXNUwdmMiNyaFNmdCKAqEyyGsACdp accounts_delta: DLS5xCaFo1rWeLkmwQYsQbrJtov27VkeqMyoQY26zK1h signature_count: 1 last_blockhash: BWsqbPEfmZUopzzPhhh88ZgRwiu6isPAa9t4PJ6jn4TE capitalization: 999999999999247500
[2020-05-24T09:25:33.449309966Z INFO  solana_runtime::bank] accounts hash slot: 303 stats: BankHashStats { num_updated_accounts: 8, num_removed_accounts: 0, num_lamports_stored: 1000025351145, total_data_len: 161372, num_executable_accounts: 0 }
[2020-05-24T09:25:33.449364519Z INFO  solana_metrics::metrics] datapoint: bank-new_from_parent-heights slot_height=304i block_height=303i
[2020-05-24T09:25:33.450015220Z INFO  solana_ledger::blockstore_processor] ledger processed in 7ms. 2 MB allocated. root=303, 1 fork at 303, with 1 frozen bank
Ok
$ ./target/release/solana-ledger-tool verify --ledger config/ledger/ --halt-at-slot 303
genesis hash: FuSiXHRk5ueQ9oWBzi1t1i6GZCaGWQMM3hRncprZ7DkM
[2020-05-24T09:25:46.726147071Z INFO  solana_ledger::blockstore] Maximum open file descriptors: 65000
[2020-05-24T09:25:46.726156682Z INFO  solana_ledger::blockstore] Opening database at "/home/ryoqun/work/solana/solana/config/ledger/rocksdb"
[2020-05-24T09:25:46.971791300Z INFO  solana_ledger::blockstore] "/home/ryoqun/work/solana/solana/config/ledger/rocksdb" open took 245ms
[2020-05-24T09:25:46.971818110Z INFO  solana_ledger::bank_forks_utils] Initializing snapshot path: "/home/ryoqun/work/solana/solana/config/ledger/snapshot"
[2020-05-24T09:25:46.972948911Z INFO  solana_ledger::bank_forks_utils] Loading snapshot package: "/home/ryoqun/work/solana/solana/config/ledger/snapshot-301-4Wg71pDqRpiBzqkLALqPDJRpLQSyrFQqQCYxdRMk4t9w.tar.bz2"
[2020-05-24T09:25:46.985352990Z INFO  solana_ledger::snapshot_utils] snapshot untar took 12ms
[2020-05-24T09:25:46.985377482Z INFO  solana_ledger::snapshot_utils] snapshot version: 1.1.0
[2020-05-24T09:25:46.985419222Z INFO  solana_ledger::snapshot_utils] Loading bank from "/home/ryoqun/work/solana/solana/config/ledger/snapshot/.tmplmqaYi/snapshots/301/301"
[2020-05-24T09:25:46.986100468Z INFO  solana_ledger::snapshot_utils] Rebuilding accounts...
[2020-05-24T09:25:46.988328357Z INFO  solana_ledger::snapshot_utils] Rebuilding status cache...
[2020-05-24T09:25:46.989179617Z INFO  solana_ledger::snapshot_utils] Loaded bank for slot: 301
[2020-05-24T09:25:46.989224409Z INFO  solana_runtime::accounts_db] total_stores: 6, newest_slot: 301, oldest_slot: 0, max_slot: 0 (num=5), min_slot: 301 (num=1)
[2020-05-24T09:25:46.989240849Z INFO  solana_metrics::metrics] metrics disabled: SOLANA_METRICS_CONFIG: environment variable not found
[2020-05-24T09:25:46.989433776Z INFO  solana_metrics::metrics] datapoint: accounts_db-stores total_count=6i
[2020-05-24T09:25:46.989474575Z INFO  solana_metrics::metrics] datapoint: accounts_db-perf-stats delta_hash_num=0i delta_hash_scan_us=0i delta_hash_merge_us=0i delta_hash_accumulate_us=0i
[2020-05-24T09:25:46.990981535Z INFO  solana_runtime::bank] bank frozen: 301 hash: LKXnwzu6gEg9v3YbV3WxemCQZEa2umqjZgtVosDqimF accounts_delta: 5wGmVMdRPVKZFfhDP3TKGt5VEGTDv1e8F4BT9LhGGrx2 signature_count: 1 last_blockhash: HoSQozeXUGL1MPs4yxKuBd9aMiKhx7c1dB6yUaNEusZX capitalization: 999999999999252500
[2020-05-24T09:25:46.991000390Z INFO  solana_runtime::bank] accounts hash slot: 301 stats: BankHashStats { num_updated_accounts: 8, num_removed_accounts: 0, num_lamports_stored: 1000025361145, total_data_len: 161372, num_executable_accounts: 0 }
[2020-05-24T09:25:46.991008305Z INFO  solana_ledger::snapshot_utils] bank rebuild from snapshot took 5ms
[2020-05-24T09:25:46.992385735Z INFO  solana_ledger::blockstore_processor] processing ledger from slot 301...
[2020-05-24T09:25:46.992747372Z INFO  solana_ledger::blockstore_processor] ledger holds data through slot 394
[2020-05-24T09:25:46.993146334Z INFO  solana_metrics::metrics] datapoint: bank-new_from_parent-heights slot_height=302i block_height=301i
[2020-05-24T09:25:46.994945217Z INFO  solana_metrics::metrics] datapoint: collect_rent_eagerly accounts=0i
[2020-05-24T09:25:46.996229490Z INFO  solana_runtime::bank] bank frozen: 302 hash: BCsYDV9zWbPLT4URQ2rPEbNz62xJXjW2RqSyVMcCe25c accounts_delta: DAeQf1KESANwU4Pqh8BX4aDqaF3ypC9r4x38TyN882xq signature_count: 1 last_blockhash: S9DCk2biCzfSTN1EintdhcatB9dXTBT7JM1tipRaKTh capitalization: 999999999999250000
[2020-05-24T09:25:46.996248345Z INFO  solana_runtime::bank] accounts hash slot: 302 stats: BankHashStats { num_updated_accounts: 8, num_removed_accounts: 0, num_lamports_stored: 1000025356145, total_data_len: 161372, num_executable_accounts: 0 }
[2020-05-24T09:25:46.996302896Z INFO  solana_metrics::metrics] datapoint: bank-new_from_parent-heights slot_height=303i block_height=302i
[2020-05-24T09:25:46.997372698Z INFO  solana_metrics::metrics] datapoint: collect_rent_eagerly accounts=0i
[2020-05-24T09:25:46.998545309Z INFO  solana_runtime::bank] bank frozen: 303 hash: 7kV9D6BnNtonmm5pXNUwdmMiNyaFNmdCKAqEyyGsACdp accounts_delta: DLS5xCaFo1rWeLkmwQYsQbrJtov27VkeqMyoQY26zK1h signature_count: 1 last_blockhash: BWsqbPEfmZUopzzPhhh88ZgRwiu6isPAa9t4PJ6jn4TE capitalization: 999999999999247500
[2020-05-24T09:25:46.998563620Z INFO  solana_runtime::bank] accounts hash slot: 303 stats: BankHashStats { num_updated_accounts: 8, num_removed_accounts: 0, num_lamports_stored: 1000025351145, total_data_len: 161372, num_executable_accounts: 0 }
[2020-05-24T09:25:46.998616640Z INFO  solana_metrics::metrics] datapoint: bank-new_from_parent-heights slot_height=304i block_height=303i
[2020-05-24T09:25:46.999281222Z INFO  solana_ledger::blockstore_processor] ledger processed in 6ms. 2 MB allocated. root=303, 1 fork at 303, with 1 frozen bank
Ok

@ryoqun
Copy link
Member Author

ryoqun commented May 24, 2020

CI is broken... ;)

ledger-tool/src/main.rs Outdated Show resolved Hide resolved
@ryoqun
Copy link
Member Author

ryoqun commented May 24, 2020

TODO: maybe don't fallback by default; instead explicit --allow-secondary-access or similar?

@mvines
Copy link
Member

mvines commented May 24, 2020

This looks very promising, thanks @ryoqun

@ryoqun ryoqun force-pushed the rocksdb-secondary-open branch from 3be691a to e88a3c9 Compare May 25, 2020 01:19
@codecov
Copy link

codecov bot commented May 25, 2020

Codecov Report

Merging #10209 into master will decrease coverage by 0.0%.
The diff coverage is 86.2%.

@@            Coverage Diff            @@
##           master   #10209     +/-   ##
=========================================
- Coverage    81.3%    81.3%   -0.1%     
=========================================
  Files         288      288             
  Lines       67000    67135    +135     
=========================================
+ Hits        54510    54603     +93     
- Misses      12490    12532     +42     

ledger/src/blockstore.rs Outdated Show resolved Hide resolved
@ryoqun
Copy link
Member Author

ryoqun commented May 30, 2020

TODO: maybe don't fallback by default; instead explicit --allow-secondary-access or similar?

I've tested enough. Be confident to start using the secondary fallback by default. --allow-secondary-access was too tedious from the ops perspective.

@ryoqun ryoqun changed the title [wip] Support opening an in-use rocksdb as secondary Support opening an in-use rocksdb as secondary May 30, 2020
@ryoqun ryoqun requested a review from mvines May 30, 2020 14:37
@ryoqun ryoqun marked this pull request as ready for review May 30, 2020 14:37
@ryoqun ryoqun force-pushed the rocksdb-secondary-open branch from 1042882 to 890e823 Compare May 30, 2020 14:53
@ryoqun
Copy link
Member Author

ryoqun commented May 30, 2020

@mvines This is ready for review!! :)

Copy link
Member

@mvines mvines left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like solana-validator opens with AccessType::OnlyPrimary (Blockstore::open_with_signal)?

ledger-tool/src/main.rs Outdated Show resolved Hide resolved
Self::do_open_with_signal(ledger_path, AccessType::OnlyPrimary)
}

pub fn open_with_signal_with_access_type(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
pub fn open_with_signal_with_access_type(
pub fn open_with_signal_and_access_type(

@ryoqun
Copy link
Member Author

ryoqun commented Jun 1, 2020

It looks like solana-validator opens with AccessType::OnlyPrimary (Blockstore::open_with_signal)?

@mvines Yeah, I deliberately do so because...:

  1. I didn't want to change the behavior in that regard for solana-validator because I don't see any use case running validator as secondary database access.
  2. In fact, rocksdb's secondary doesn't support writes at all. It's almost like a read-only, with the exception of tailing functionality via explicit API call from the secondary process (this might be fun for the ledger-tool version of live-slots). So I think it's not possible to run validator to begin with...
  3. secondary might operate under the degraded performance; so fail hard and early for the tried attempt to start solana-validatro in the rare case of other running solana-ledger-tool's subcommand.

ledger/src/blockstore_db.rs Outdated Show resolved Hide resolved
@ryoqun ryoqun requested a review from mvines June 2, 2020 04:50
mvines
mvines previously approved these changes Jun 2, 2020
@mergify mergify bot dismissed mvines’s stale review June 3, 2020 02:38

Pull request has been modified.

@ryoqun ryoqun added the automerge Merge this Pull Request automatically once CI passes label Jun 3, 2020
@solana-grimes solana-grimes merged commit caa7f7a into solana-labs:master Jun 3, 2020
mergify bot pushed a commit that referenced this pull request Jun 3, 2020
automerge

(cherry picked from commit caa7f7a)

# Conflicts:
#	ledger-tool/Cargo.toml
#	ledger-tool/src/main.rs
mergify bot pushed a commit that referenced this pull request Jun 3, 2020
solana-grimes pushed a commit that referenced this pull request Jun 3, 2020
solana-grimes pushed a commit that referenced this pull request Jun 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
automerge Merge this Pull Request automatically once CI passes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants