Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aug40 #27344

Closed
wants to merge 59 commits into from
Closed

Aug40 #27344

wants to merge 59 commits into from

Conversation

jeffwashington
Copy link
Contributor

Problem

Summary of Changes

Fixes #

mergify bot and others added 30 commits August 5, 2022 08:09
…lana-labs#26930) (solana-labs#26942)

Remove runtime dependency from solana-transaction-status (solana-labs#26930)

* Move RewardType out of runtime

* Move collect_token_balances to solana-ledger

* Remove solana-runtime dependency

(cherry picked from commit 2dca239)

Co-authored-by: Tyera Eulberg <[email protected]>
Io stats v2 (solana-labs#26898)

* Use sysfs instead of procfs for disk stats

* Filter map to filter dmcrypt and mdraid volumes

* Unit test cover different kernel formats

(cherry picked from commit 5bc81a6)

Co-authored-by: Brennan Watt <[email protected]>
solana-labs#26952)

* spl: Bump token to 3.5.0 and ata to 1.1.0 (solana-labs#26921)

(cherry picked from commit b725b86)

* Bump spl-token dependency in solana-ledger

Co-authored-by: Jon Cinque <[email protected]>
…olana-labs#26522) (solana-labs#26958)

Fix sol_get_processed_sibling_instruction on 32-bit hosts (solana-labs#26522)

(cherry picked from commit a9a3c62)

Co-authored-by: Richard Patel <[email protected]>
…abs#26965)

Unpin tokio for non-rpc crates (solana-labs#26957)

(cherry picked from commit 66919e5)

Co-authored-by: Tyera Eulberg <[email protected]>
…bs#26968)

bpf-loader: make syscalls pub (solana-labs#26918)

(cherry picked from commit 35f04db)

Co-authored-by: Richard Patel <[email protected]>
solana-labs#26528) (solana-labs#26970)

transaction-status, storage-proto: add compute_units_consumed (solana-labs#26528)

* transaction-status, storage-proto: add compute_units_consumed

* fix bpf test

Co-authored-by: Justin Starry <[email protected]>
(cherry picked from commit 270315a)

Co-authored-by: Richard Patel <[email protected]>
…26991)

Fix windows release builds (solana-labs#26986)

* Don't try to build protobuf-src on windows

* Set protoc envar

(cherry picked from commit 46b3ece)

Co-authored-by: Tyera Eulberg <[email protected]>
…solana-labs#26065) (solana-labs#27007)

* Add API docs for secp256k1_instruction and secp256k1_recover (solana-labs#26065)

* Add API docs for secp256k1_instruction and secp256k1_recover

* typo

* Remove unused variable from secp256k1 program test

* Bump solana_bpf_rust_secp256k1_recover ix count

Co-authored-by: Tyera Eulberg <[email protected]>
(cherry picked from commit ebe25fd)

# Conflicts:
#	programs/bpf/Cargo.lock
#	programs/bpf/rust/secp256k1_recover/Cargo.toml
#	programs/bpf/tests/programs.rs

* Fix conflicts

Co-authored-by: Brian Anderson <[email protected]>
Co-authored-by: Tyera Eulberg <[email protected]>
solana-labs#27021)

Add `Signers` impls for `Arc<dyn Signer>` (solana-labs#27000)

* Add `Signers` impls for `Arc<dyn Signer>`

* Reformat

(cherry picked from commit 632752d)

Co-authored-by: Justin Malčić <[email protected]>
…uild" (backport solana-labs#27011) (solana-labs#27024)

Revert "Remove resolver=2 from Cargo.toml and add it to the Windows build" (solana-labs#27011)

Revert "Remove resolver=2 from Cargo.toml and add it to the Windows build (solana-labs#26706)"

This reverts commit 2f6f5b1.

(cherry picked from commit ecda3be)

Co-authored-by: Ryo Onodera <[email protected]>
…bs#27006) (solana-labs#27026)

adds number of coding shreds to broadcast metrics (solana-labs#27006)

(cherry picked from commit e2a2d27)

Co-authored-by: behzad nouri <[email protected]>
…bs#27012) (solana-labs#27038)

tracer-packet-stats reporting should not reset id (solana-labs#27012)

(cherry picked from commit b6d38aa)

Co-authored-by: apfitzge <[email protected]>
…7042)

Fix VoteInstruction order (solana-labs#27035)

(cherry picked from commit f8d610d)

Co-authored-by: Tyera Eulberg <[email protected]>
…struction type name (backport solana-labs#27034) (solana-labs#27041)

Correct StakeInstruction::DeactivateDelinquent instruction type

(cherry picked from commit b0c61e8)

Co-authored-by: Michael Vines <[email protected]>
…#27040) (solana-labs#27047)

Implement nonblocking version of BlockhashQuery (solana-labs#27040)

(cherry picked from commit b9a5af0)

Co-authored-by: hana <[email protected]>
…7046) (solana-labs#27053)

Fix quic client on TestValidator, alternative (solana-labs#27046)

Add new method to enable custom offset

(cherry picked from commit 45c0da8)

Co-authored-by: Tyera Eulberg <[email protected]>
…solana-labs#27048)

ancestor hashes socket ping/pong support (solana-labs#26866)

(cherry picked from commit 370de81)

Co-authored-by: Jeff Biseda <[email protected]>
… waiting for supermajority (backport solana-labs#27055) (solana-labs#27060)

`solana-validator monitor` how displays slot and gossip stake % while waiting for supermajority

(cherry picked from commit 4e79d78)

Co-authored-by: Michael Vines <[email protected]>
…olana-labs#27077)

Fix local cluster tests for QUIC usage (solana-labs#27071)

(cherry picked from commit ce23003)

Co-authored-by: Pankaj Garg <[email protected]>
…solana-labs#25807) (solana-labs#27082)

removes buffering when generating coding shreds in broadcast (solana-labs#25807)

Given the 32:32 erasure recovery schema, current implementation requires
exactly 32 data shreds to generate coding shreds for the batch (except
for the final erasure batch in each slot).
As a result, when serializing ledger entries to data shreds, if the
number of data shreds is not a multiple of 32, the coding shreds for the
last batch cannot be generated until there are more data shreds to
complete the batch to 32 data shreds. This adds latency in generating
and broadcasting coding shreds.

In addition, with Merkle variants for shreds, data shreds cannot be
signed and broadcasted until coding shreds are also generated. As a
result *both* code and data shreds will be delayed before broadcast if
we still require exactly 32 data shreds for each batch.

This commit instead always generates and broadcast coding shreds as soon
as there any number of data shreds available. When serializing entries
to shreds:
* if the number of resulting data shreds is less than 32, then more
  coding shreds will be generated so that the resulting erasure batch
  has the same recovery probabilities as a 32:32 batch.
* if the number of data shreds is more than 32, then the data shreds are
  split uniformly into erasure batches with _at least_ 32 data shreds in
  each batch. Each erasure batch will have the same number of code and
  data shreds.

For example:
* If there are 19 data shreds, 27 coding shreds are generated. The
  resulting 19(data):27(code) erasure batch has the same recovery
  probabilities as a 32:32 batch.
* If there are 107 data shreds, they are split into 3 batches of 36:36,
  36:36 and 35:35 data:code shreds each.

A consequence of this change is that code and data shreds indices will
no longer align as there will be more coding shreds than data shreds
(not only in the last batch in each slot but also in the intermediate
ones);

(cherry picked from commit ac91cda)

Co-authored-by: behzad nouri <[email protected]>
…labs#27085)

Bump rust-rocksdb to 0.19.0 tag (solana-labs#26949)

(cherry picked from commit 14d9922)

Co-authored-by: steviez <[email protected]>
…labs#26650) (solana-labs#27143)

test-validator: improve multi-value arg help output (solana-labs#26650)

(cherry picked from commit b28657f)

Co-authored-by: Trent Nelson <[email protected]>
…st (backport solana-labs#27008) (solana-labs#27156)

Increase timeout to reduce the flakyness of rpc signature receving test (solana-labs#27008)

* Increase timeout to reduce the flakyness of rpc signature receving test

* Minor fmt fix

(cherry picked from commit 52d8a20)

Co-authored-by: Xiang Zhu <[email protected]>
…labs#27150)

Bump sbf-tools version to v1.29

Co-authored-by: Dmitri Makarov <[email protected]>
…labs#27103)

Fix local-cluster for QUIC more (solana-labs#27096)

(cherry picked from commit 1aa8e06)

Co-authored-by: Tyera Eulberg <[email protected]>
…na-labs#27117)

adjusts max coding shreds per slot (solana-labs#27083)

As a consequence of removing buffering when generating coding shreds:
solana-labs#25807
more coding shreds are generated than data shreds, and so
MAX_CODE_SHREDS_PER_SLOT needs to be adjusted accordingly.

The respective value is tied to ERASURE_BATCH_SIZE.

(cherry picked from commit b3b57a0)

Co-authored-by: behzad nouri <[email protected]>
…#27144) (solana-labs#27178)

adds Shred{Code,Data}::SIZE_OF_HEADERS trait constants (solana-labs#27144)

(cherry picked from commit 0e30609)

Co-authored-by: behzad nouri <[email protected]>
…ana-labs#27159)

Add stats for readonly cache evicts (solana-labs#26938)

* add stats for readonly cache evicts

* bump up account cache to 400M

* aggregate num_evicts in the loop

(cherry picked from commit 1a90cff)

Co-authored-by: HaoranYi <[email protected]>
…olana-labs#26359) (solana-labs#27183)

reverts wide fanout in broadcast when the root node is down (solana-labs#26359)

A change included in
solana-labs#20480
was that when the root node in turbine broadcast tree is down, the
leader will broadcast the shred to all nodes in the first layer.
The intention was to mitigate the impact of dead nodes on shreds
propagation, because if the root node is down, then the entire cluster
will miss out the shred.
On the other hand, if x% of stake is down, this will cause 200*x% + 1
packets/shreds ratio at the broadcast stage which might contribute to
line-rate saturation and packet drop.
To avoid this bandwidth saturation issue, this commit reverts that logic
and always broadcasts shreds from the leader only to the root node.
As before we rely on erasure codes to recover shreds lost due to staked
nodes being offline.

(cherry picked from commit 3b87aa9)

Co-authored-by: behzad nouri <[email protected]>
willhickey and others added 28 commits August 16, 2022 23:12
…solana-labs#26927)

* Enable QUIC client by default. Add arg to disable QUIC client.
* Deprecate --disable-quic-servers arg
* Add #[ignore] annotation to failing tests
…7052) (solana-labs#27198)

Fix windows build after crossbeam-epoch patch (solana-labs#27052)

(cherry picked from commit 773a4dd)

Co-authored-by: Ryo Onodera <[email protected]>
…olana-labs#27197)

Patch crossbeam-epoch to avoid overhead (solana-labs#26555)

(cherry picked from commit ad3e10f)

Co-authored-by: Ryo Onodera <[email protected]>
…kport solana-labs#27208) (solana-labs#27220)

derives Error trait for ClusterInfoError and core::result::Error (solana-labs#27208)

(cherry picked from commit fea66c8)

Co-authored-by: behzad nouri <[email protected]>
…solana-labs#27204) (solana-labs#27227)

chore: only buildkite pipelines use sccache in docker-run.sh (solana-labs#27204)

chore: only buildkite ci use sccache
(cherry picked from commit d2d4d4a)

Co-authored-by: Yihau Chen <[email protected]>
…abs#27221) (solana-labs#27225)

sdk: Fix args after "--" in build-bpf and test-bpf (solana-labs#27221)

(cherry picked from commit 68a5e05)

Co-authored-by: Jon Cinque <[email protected]>
…olana-labs#27203)

snapshots: serialize version file first (solana-labs#27192)

serialize version file first

(cherry picked from commit c1111fa)

Co-authored-by: apfitzge <[email protected]>
…olana-labs#27236)

Flaky Unit Test test_rpc_subscriptions (solana-labs#27214)

Increase unit test timeout from 5 seconds to 10 seconds

(cherry picked from commit 5c9d612)

Co-authored-by: Brennan Watt <[email protected]>
…olana-labs#27195) (solana-labs#27232)

Fix a corner-case panic in get_entries_in_data_block() (solana-labs#27195)

#### Problem
get_entries_in_data_block() panics when there's inconsistency between
slot_meta and data_shred.

However, as we don't lock on reads, reading across multiple column families is
not atomic (especially for older slots) and thus does not guarantee consistency
as the background cleanup service could purge the slot in the middle.  Such
panic was reported in solana-labs#26980 when the validator serves a high load of RPC calls.

#### Summary of Changes
This PR makes get_entries_in_data_block() panic only when the inconsistency
between slot-meta and data-shred happens on a slot older than lowest_cleanup_slot.

(cherry picked from commit 6d12bb6)

Co-authored-by: Yueh-Hsuan Chiang <[email protected]>
…olana-labs#27244)

adds hash domain to ping-pong protocol (solana-labs#27193)

In order to maintain backward compatibility, for now the responding node
will hash the token both with and without domain so that the other node
will accept the response regardless of its upgrade status.
Once the cluster has upgraded to the new code, we will remove the legacy
domain = false case.

(cherry picked from commit 6928b2a)

Co-authored-by: behzad nouri <[email protected]>
…ckport solana-labs#27152) (solana-labs#27249)

slots_connected: check if the range is connected (>= ending_slot) (solana-labs#27152)

(cherry picked from commit 40b9f2f)

Co-authored-by: apfitzge <[email protected]>
…7153) (solana-labs#27250)

create-snapshot check if snapshot slot exists (solana-labs#27153)

(cherry picked from commit 6da3eb0)

Co-authored-by: apfitzge <[email protected]>
solana-labs#27274)

* recovers merkle shreds from erasure codes (solana-labs#27136)

The commit
* Identifies Merkle shreds when recovering from erasure codes and
  dispatches specialized code to reconstruct shreds.
* Coding shred headers are added to recovered erasure shards.
* Merkle tree is reconstructed for the erasure batch and added to
  recovered shreds.
* The common signature (for the root of Merkle tree) is attached to all
  recovered shreds.

(cherry picked from commit c0b6335)

# Conflicts:
#	ledger/Cargo.toml

* removes mergify merge conflicts

Co-authored-by: behzad nouri <[email protected]>
…e usable (backport solana-labs#27237) (solana-labs#27280)

Standardize thread names

Tenets:
1. Limit thread names to 15 characters
2. Prefix all Solana-controlled threads with "sol"
3. Use Camel case. It's more character dense than Snake or Kebab case

(cherry picked from commit 3f4731b)

Co-authored-by: Michael Vines <[email protected]>
…na-labs#27266) (solana-labs#27281)

patches metrics for invalid cached vote/stake accounts (solana-labs#27266)

patches invalid cached vote/stake accounts metrics

Invalid cached vote accounts is overcounting actual mismatches, and
invalid cached stake accounts is undercounting.

(cherry picked from commit 544a957)

Co-authored-by: behzad nouri <[email protected]>
…lana-labs#27264) (solana-labs#27291)

Update `solana deploy` subcommand to warn non-upgradable (solana-labs#27264)

Update subcommand text to warn deploy deprecated

Update the about text for `solana deploy` to warn this is only for non-upgradeable deploys. Fixes solana-labs#27228

(cherry picked from commit 65070df)

Co-authored-by: Chris Coudron <[email protected]>
…kport solana-labs#27286) (solana-labs#27292)

checks that cached vote accounts are consistent with accounts-db (solana-labs#27286)

The commit adds sanity checks that when loading a bank from snapshots:
* cached vote accounts are consistent with accounts-db.
* all valid vote-accounts referenced in stake delegations are already
  cached.

(cherry picked from commit 7fda028)

Co-authored-by: behzad nouri <[email protected]>
…a-labs#27118) (solana-labs#27270)

banking stage: actually aggregate tracer packet stats (solana-labs#27118)

* aggregated_tracer_packet_stats_option was alwasys None

* Actually accumulate tracer packet stats

(cherry picked from commit eb06bb6)

Co-authored-by: apfitzge <[email protected]>
…ce::cleanup_ledger (backport solana-labs#26651) (solana-labs#27304)

Delete files older than the lowest_cleanup_slot in LedgerCleanupService::cleanup_ledger (solana-labs#26651)

#### Problem
LedgerCleanupService requires compactions to propagate & digest range-delete tombstones
to eventually reclaim disk space.

#### Summary of Changes
This PR makes LedgerCleanupService::cleanup_ledger delete any file whose slot-range is
older than the lowest_cleanup_slot.  This allows us to reclaim disk space more often with
fewer IOps.  Experimental results on mainnet validators show that the PR can effectively
reduce 33% to 40% ledger disk size.

(cherry picked from commit 99ef218)

Co-authored-by: Yueh-Hsuan Chiang <[email protected]>
…na-labs#27324)

Add documentation for JSON parsing (solana-labs#27268)

* Add documentation about json parsing

* Link jsonParsed to info section

* Include version information

(cherry picked from commit 322fbc1)

Co-authored-by: Tyera Eulberg <[email protected]>
…#26555)" (solana-labs#27327)

Revert "Patch crossbeam-epoch to avoid overhead (backport solana-labs#26555) (solana-labs#27197)"

This reverts commit e48d8a9.
…a-labs#27052)" (solana-labs#27328)

Revert "Fix windows build after crossbeam-epoch patch (backport solana-labs#27052) (solana-labs#27198)"

This reverts commit ed38458.
…7341)

Update config parsing doc (solana-labs#27340)

(cherry picked from commit deb13ab)

Co-authored-by: Tyera Eulberg <[email protected]>
…ana-labs#27212)

serialize incremental_snapshot_hash (solana-labs#26839)

* serialize incremental_snapshot_hash

* pr feedback

(cherry picked from commit 225cddc)

Co-authored-by: Jeff Washington (jwash) <[email protected]>
@mergify
Copy link
Contributor

mergify bot commented Aug 23, 2022

⚠️ The sha of the head commit of this PR conflicts with #27345. Mergify cannot evaluate rules on this PR. ⚠️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants