Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

polygon/sync: message listener to preserve peer events ordering #10032

Merged
merged 1 commit into from
Apr 23, 2024

Conversation

taratorio
Copy link
Member

@taratorio taratorio commented Apr 23, 2024

Observed the following issue in a long running Astrid sync on bor-mainnet:

[DBUG] [04-17|14:25:43.504] [p2p.peerEventObserver] received new peer event id=Disconnect peerId=51935aa1eeabdb73b70d36c7d5953a3bfdf5c84e88241c44a7d16d508b281d397bdd8504c934bfb45af146b86eb5899ccea85e590774f9823d056a424080b763
[DBUG] [04-17|14:25:43.504] [p2p.peerEventObserver] received new peer event id=Connect peerId=51935aa1eeabdb73b70d36c7d5953a3bfdf5c84e88241c44a7d16d508b281d397bdd8504c934bfb45af146b86eb5899ccea85e590774f9823d056a424080b763

Note the timestamps are the same on the millisecond level, however the disconnect was processed before the connect which is wrong (connect should always be first).

This then got the PeerTracker in a bad state - it kept on returning peer 51935aa1eeabdb73b70d36c7d5953a3bfdf5c84e88241c44a7d16d508b281d397bdd8504c934bfb45af146b86eb5899ccea85e590774f9823d056a424080b763 as a valid peer to download from, which caused repeated peer not found errors when sending messages to it.

Fix is to have the message listener wait for all observers to finish processing peer event 1 before proceeding to notifying them about peer event 2.

@battlmonstr battlmonstr merged commit 190cbfa into devel Apr 23, 2024
6 checks passed
@battlmonstr battlmonstr deleted the astrid-peer-events-preserve-order branch April 23, 2024 15:49
hexoscott added a commit to 0xPolygonHermez/cdk-erigon that referenced this pull request Oct 9, 2024
* mdbx: `Batch()` (erigontech#9999)

This task is mostly implemented to be used in
`erigon/erigon-lib/downloader/mdbx_piece_completion.go` and maybe in
`nodesDB` (where we need many parallel RwTx)

I was agains adding this "trick"/"api" last years, because thought that
we can implement our App to be more 1-big-rwtx-friendly. And we did it
in Erigon - StagedSync. TxPool also did, but with a bit less happy face
- by "map+mutex with periodic flush to db". But `anacrolix/torrent` is
external library and unlikely will survive such big mind-model-change.
Maybe it's time to add `db.Batch()`.

#### Batch Rw transactions

Each `DB.Update()` waits for disk to commit the writes. This overhead
can be minimized by combining multiple updates with the `DB.Batch()`
function:

```go
err := db.Batch(func(tx *bolt.Tx) error {
	...
	return nil
})
```

Concurrent Batch calls are opportunistically combined into larger
transactions. Batch is only useful when there are multiple goroutines
calling it.

The trade-off is that `Batch` can call the given
function multiple times, if parts of the transaction fail. The
function must be idempotent and side effects must take effect only
after a successful return from `DB.Batch()`.

For example: don't display messages from inside the function, instead
set variables in the enclosing scope:

```go
var id uint64
err := db.Batch(func(tx *bolt.Tx) error {
	// Find last key in bucket, decode as bigendian uint64, increment
	// by one, encode back to []byte, and add new key.
	...
	id = newValue
	return nil
})
if err != nil {
	return ...
}
fmt.Println("Allocated ID %d", id)
```


---- 

Implementation mostly taken from
https://github.com/etcd-io/bbolt/?tab=readme-ov-file#batch-read-write-transactions

Maybe in future can push-down it to
https://github.com/erigontech/mdbx-go

* downloader: rename TorrentFiles to AtomicTorrentFS (erigontech#10005)

* Caplin: indexing to use right buf size  (erigontech#9998)

- PutUvarint can produce 10 bytes
- re-using buffer - faster and less gc

* First round of fixes in making gossip publishing good for the validator: See comment (erigontech#9972)

* Fixed and simplified unaggregated bits check.
* There are 2 bits on, one for the attester and one for the
End-of-bitlist, needed to account for end of bitlist bit
 * Wrong publishing topic for sync_committee_ messages
* Added more Ignore by receiving specific errors to avoid forwarding
useless data.
 * Replaced `validateAttestation` with full message processing
 * Fixed forwarding of sync committee aggregates
 * Fixed subnet announcements

---------

Co-authored-by: kewei <[email protected]>

* Downloader: atomic-fs to be less smart. if app called - Create() - don't check .lock. Otherwise can't create .torrent for existing .seg files. (erigontech#10004)

* Implement the optional output field on ots_traceTransaction (erigontech#10014)

This is for E2.

It implements the backward compatible output field for traces on
ots_traceTransaction: otterscan/execution-apis#1

It'll be consumed by Otterscan in an upcoming release of this feature:
otterscan/otterscan#1530

* polygon/sync: Clean shutdown (erigontech#10017)

* re-gen mock files (erigontech#10007)

there was error:
```
prog.go:12:2: missing go.sum entry for module providing package github.com/golang/mock/mockgen/model; to add:
	go mod download github.com/golang/mock
```

* rename aggv3 to agg (erigontech#10011)

* chain-config: capital IsOsaka (erigontech#9989)

To Follow suit with rest of the naming

* move more services out from ForkchoiceStore (erigontech#9981)

- voluntary_exit
- bls_to_execution_change
- proposer_slashing
- expirable lru

---------

Co-authored-by: Giulio <[email protected]>

* WP - dvovk/diagnostics downloader print (erigontech#10000)

Added command which prints to console diagnostics data. In this initial
version it is possible to print stages list and snapshot download
progress. Erigon should be running with --metrics flag
There are two available commands:
- "downloader"
- "stages" "current"
There are two possible options for output: text and json
Run command - ./build/bin/diag [command] [text | json]

---------

Co-authored-by: Mark Holt <[email protected]>

* move `temporal` package to erigon-lib (erigontech#10015)

Co-authored-by: awskii <[email protected]>

* downloader: more durable db mode  (erigontech#10010)

* Added body close on retry for downloader round trip (erigontech#10008)

Add missing body close method when webseed roundtrip is retried

* Set block baseFeePerGas value in graphql response (erigontech#9974)

Set baseFeePerGas value in graphql resolver for block

* vm: Rename stateTransition gas to gasRemaining (erigontech#10025)

The `StateTransition` property `gas` actually tracks the remaining gas
in the current context. This PR is to improve code readability.
Geth also uses similar naming.

* chore: fix function names in comment (erigontech#9987)

Signed-off-by: fuyangpengqi <[email protected]>

* sonar: add test coverage (erigontech#9988)

- attempt to integrate sonar with test coverage by following 
-
https://sonarcloud.io/project/configuration/GitHubActions?id=ledgerwatch_erigon
-
https://docs.sonarsource.com/sonarcloud/advanced-setup/ci-based-analysis/github-actions-for-sonarcloud/
- adds sonar properties file to specify code coverage output
- also properties file can be used to filter out generated code from
sonar scan
    - protobuf
    - graphql
    - ignore pedersen hash bindings code
- ... there will be more ignores coming in later PRs (e.g. some c/c++
code we dont need to scan, some js code, some contract gen code, etc.)

* sonar: disable c/c++ scanning (erigontech#10033)

Fixes error in Sonar GitHub action:
<img width="1645" alt="Screenshot 2024-04-23 at 17 46 01"
src="https://github.com/ledgerwatch/erigon/assets/94537774/3833db1c-6a8a-4db2-8bb7-5de58b57e638">

* Caplin: Added `SyncAggregate` computation to block production (erigontech#10009)

This PR allows the computation for the computation of the
`SyncAggregate` field in block production:
https://sepolia.beaconcha.in/slot/4832922 proof of the code working is
that now Caplin validators can include sync aggregates in their blocks.

Things modified:
* We do not aggregate pre-aggregated `SyncContributionAndProof`s,
instead we just listen to the network and pick the most profitable ones
for each sub sync committee (4 sync subcommittee on mainnet).
profitability == most bits set in `AggregationBits` field.
* Separate aggregates set for contribution to be included in a block
from the ones constructed from `SyncCommitteeMessage`s, combining the
two causes some contributions to be marked as invalid and not
aggregable.
* Remove SyncContributionMock in favor of gomock

* polygon/sync: message listener to preserve peer events ordering (erigontech#10032)

Observed the following issue in a long running Astrid sync on
bor-mainnet:
```
[DBUG] [04-17|14:25:43.504] [p2p.peerEventObserver] received new peer event id=Disconnect peerId=51935aa1eeabdb73b70d36c7d5953a3bfdf5c84e88241c44a7d16d508b281d397bdd8504c934bfb45af146b86eb5899ccea85e590774f9823d056a424080b763
[DBUG] [04-17|14:25:43.504] [p2p.peerEventObserver] received new peer event id=Connect peerId=51935aa1eeabdb73b70d36c7d5953a3bfdf5c84e88241c44a7d16d508b281d397bdd8504c934bfb45af146b86eb5899ccea85e590774f9823d056a424080b763
```

Note the timestamps are the same on the millisecond level, however the
disconnect was processed before the connect which is wrong (connect
should always be first).

This then got the `PeerTracker` in a bad state - it kept on returning
peer
`51935aa1eeabdb73b70d36c7d5953a3bfdf5c84e88241c44a7d16d508b281d397bdd8504c934bfb45af146b86eb5899ccea85e590774f9823d056a424080b763`
as a valid peer to download from, which caused repeated `peer not found`
errors when sending messages to it.

Fix is to have the message listener wait for all observers to finish
processing peer event 1 before proceeding to notifying them about peer
event 2.

* check attestation signature (erigontech#10018)

* sonar: fix warnings (erigontech#10034)

Fixes Sonar warnings:
<img width="550" alt="Screenshot 2024-04-23 at 19 37 53"
src="https://github.com/ledgerwatch/erigon/assets/94537774/b85c9607-3800-408d-8a1b-c5bf80da38b2">

* sonar: fix js warnings and exclude mocks (erigontech#10042)

- Excludes go mock generated files from analysis
- Excludes broken js files (valid as they are used for tracers and test
data) to fix below warnings
<img width="1658" alt="Screenshot 2024-04-24 at 11 12 04"
src="https://github.com/ledgerwatch/erigon/assets/94537774/7925d07f-37f3-43c9-b34a-9a5361e48a8a">

* tests: Support iterations in Heimdall simulator (erigontech#10040)

Accept a slice of block numbers that represents the final block number
that will be available to the client of the simulator.Any data after the
iteration stage end is not accessible to the client.

The iteration moves to the next stage under certain conditions:
- requesting the latest span via `FetchSpan`
- requesting state sync events beyond current last iteration block's
timestamp

* Fix forward bor snaps (erigontech#10027)

This fixes this issue:

erigontech#9499

which is caused by restarting erigon during the bor-heimdall stage.

Previously after the initial call to bor-heimdall (header 0), forward
downloading was disabled, but backward
downloading recursively collects headers - holding results in memory
until it can roll them forward. This should
only be called for a limited number of headers, otherwise it leads to a
large amount of memory >45GB for bor
main net if the process is stopped at block 1.

* Added downloader request count (erigontech#10036)

The downloader is not complete until all of its requested files have
been downloaded.

This changes adds a request count to the downloader stats to be checked
for completeness, otherwise the downloader
may appear complete before all required torrents have been added.

* StageSenders: `--sync.loop.block.limit` support (erigontech#9982)

We reverted support of this flag in `updateForkChoice` because
implementation was too complex and fragile:
erigontech#9900

But it's good-enough if StageSenders will preserve this flag - then next
stages (exec) will also follow (because they look at prev stage
progress).

It's good-enough - because users just want to save some partial progress
after restoring node from backup (long downtime). And enforce "all
stages progress together" invariant

* chore:fix typo (erigontech#9952)

* Optimize prune old chunks (erigontech#10019)

**Summary**
Fixes prune point for log (+index)
- Unnecessary to use ETL again for deleting `kv.Log` entries, can just
introduce `RwCursor` in the initial loop
- Put the last `pruneTo` block number in the `PruneState` - this will
begin pruning from that point. Earlier the `pruneFrom` point being
passed in was buggy as it used some other assumption for this value

* [ots] Fix block rewards calculation on post-merge blocks (erigontech#10038)

This is for E2.

The block rewards returned by Otterscan API is incorrect since the
merge. It replaces very old code with the same calculation used for
trace_block.

this code probably won't work with Aura consensus, but that's ok since
the current one doesn't work as well. It would actually require exposing
more code from block execution and I don't want to handle it for now,
let's fix only the post-merge calc for now.

Co-authored-by: sealer3 <[email protected]>

* sonar: use fixed version for sonarcloud-github-action (erigontech#10046)

* standardize mock file name (erigontech#10043)

* chore: remove repetitive words (erigontech#10044)

* mdbx, erigon backup: fix typo (erigontech#10031)

* Build Silkworm RpcDaemon settings from Erigon ones (erigontech#10002)

This PR introduces support for customising Silkworm RpcDaemon settings
in Erigon++.

Common RPC settings between Erigon and Silkworm are simply translated
from the existing Erigon command-line options. They include:
- `--http.addr`
- `--http.port`
- `--http.compression`
- `--http.corsdomain`
- `--http.api`
- `--ws`
- `--ws.compression`

Moreover, the following Silkworm-specific command-line options are
added:
- `--silkworm.verbosity`
- `--silkworm.contexts`
- `--silkworm.rpc.log`
- `--silkworm.rpc.log.maxsize`
- `--silkworm.rpc.log.maxfiles`
- `--silkworm.rpc.log.response`
- `--silkworm.rpc.workers`
- `--silkworm.rpc.compatibility`

Default values cover the common usages of Erigon++ experimental
features, yet such options can be useful for testing some corner cases
or collecting information.

Finally, this PR adds a new `LogDirPath` function to `logging` module
just to determine the log dir path used by Erigon and put there also
Silkworm RPC interface logs, when enabled.

* Optimized attestation processing (erigontech#10020)

* Decrease memory footprint on chain tip
* Fix a race
* Better times on `Attestation` processing. 1 sec -> 54 ms

* Revert "Fix new_heads Events Emission on Block Forks (erigontech#9738)" (erigontech#10055)

This reverts commit f4aefdc.

See PR erigontech#9738

* chore: fix comments (erigontech#9958)

Fix some comments

* Revert "Added downloader request count" (erigontech#10053)

Reverts erigontech#10036

* drop go 1.20 support (erigontech#10052)

drop go 1.20 support 

use ` github.com/erigontech/torrent v1.54.2-alpha` - to simplify future
support and features backport

* cmd/integration: print_table_sizes (erigontech#10061)

* Revert "StageSenders: `--sync.loop.block.limit` support" (erigontech#10060)

Reverts erigontech#9982

* downloader: remove deprecated manual fsync  (erigontech#10064)

After switching to more durable db mode
erigontech#10010 - we don't need manual
fsync anymore.

* cmd/integration: import erigon-lib/kv to execute init func (erigontech#10065)

* Caplin: fixed attestation broadcasting (erigontech#10041)

This PR fixes 2 things:
* Superset handling (should ignore)
* SSZ offset not set for custom ssz in attestation encoding after json
unmarshalling

* feat: add `fullTx` params to `NewPendingTransactions` (erigontech#9204)

feat: add `fullTx` params to `NewPendingTransactions`

Closes erigontech#9203

* backward compatibility of .lock (erigontech#10006)

In PR: 
- new .lock format introduced by
erigontech#9766 is not backward
compatible. In the past “empty .lock” did mean “all prohibited” and it
was changed to “all allowed”.
- commit

Not in PR: I have idea to make .lock also forward compatible - by making
it whitelist instead of blacklist: after adding new snap type it will
not be downloaded by accident. Will do it in next PR.

But I need one more confirmation - why do we need exceptions from .lock?
Why we breaking "download once" invariant for some type of files? Can we
avoid it?

* Make logs subscription channel size configurable (erigontech#9810)

This PR makes the channel that is used to send logs to subscriptions
configurable so logs are not dropped when the channel gets filled. See
issue 9699.
This is just an initial version since I wanted to gather some feedback
and was unsure if this is the correct approach to solve this.

* cmd/integration: print table sizes to filter deprecated tables (erigontech#10066)

* [ots] Fix incorrect return type and overflow on total block fees calc (erigontech#10070)

For E2: fix incorrect type + overflow in certain blocks

Corresponding otterscan issue:
otterscan/otterscan#1658

* RPC: `--http.dbg.single=true` and custom HTTP header `dbg: true` (erigontech#10039)

- Added method `tx.Context()` - because Tx already bounded to context by
`db.BeginRo(ctx)`
- Removed ctx parameter from `BlockWithSenders` method in interfaces
- Added `dbg.ToContext()` and `dbg.Enabled(ctx)` methods to set/get
debugging tag to `ctx`.

Added way to debug single http request: 
To print more detailed logs for 1 request - add `--http.dbg.single=true`
flag. Then can send HTTP header `"dbg: true"`:

```
curl -X POST -H "dbg: true" -H "Content-Type: application/json" --data '{"jsonrpc": "2.0", "method": "eth_blockNumber", "params": [], "id":1}' localhost:8545
```

---------

Co-authored-by: battlmonstr <[email protected]>

* all: use the built-in slices library (erigontech#9842)

In the current go 1.21 version used in the project, slices are no longer
an experimental feature and have entered the standard library

Co-authored-by: alex.sharov <[email protected]>

* chore(config): json marshal chainName (erigontech#9865)

As the other fields are json marshaled into lowerUpper case, we should
use the same style.

---------

Signed-off-by: jsvisa <[email protected]>

* Fix new_heads Events Emission on Block Forks (erigontech#10072)

TL;DR: on a reorg, the common ancestor block is not being published to
subscribers of newHeads

#### Expected behavior

if the reorg's common ancestor is 2, I expect 2 to be republished

1, 2, **2**, **3**, **4**

#### Actual behavior

2 is not republished, and 3's parentHash points to a 2 header that was
never received

1, 2, **3**, **4**

This PR is the same thing as
erigontech#9738 except with a test.

Note... the test passes, but **this does not actually work in
production** (for Ethereum mainnet with prysm as external CL).

Why? Because in production, `h.sync.PrevUnwindPoint()` is always nil:
https://github.com/ledgerwatch/erigon/blob/a5270bccf5e69a6beaaab9a0663bdad80e989505/turbo/stages/stageloop.go#L291
which means the initial "if block" is never entered, and thus we have
**no control** of increment/decrement `notifyFrom` during reorgs
https://github.com/ledgerwatch/erigon/blob/a5270bccf5e69a6beaaab9a0663bdad80e989505/eth/stagedsync/stage_finish.go#L137-L146

I don't know why `h.sync.PrevUnwindPoint()` is seemingly always nil, or
how the test can pass if it fails in prod. I'm hoping to pass the baton
to someone who might. Thank you @indanielo for original fix.

If we can figure this bug out, it closes erigontech#8848 and closes erigontech#9568 and
closes erigontech#10056

---------

Co-authored-by: Daniel Gimenez <[email protected]>

* chore: remove repetitive words with tools (erigontech#10076)

use https://github.com/Abirdcfly/dupword to check repetitive words

* grafana: configurable datasource (erigontech#10073)

* Revert "Fix new_heads Events Emission on Block Forks" (erigontech#10081)

Reverts erigontech#10072

* AggregateAndProof put aggregated data into attestationsPool (erigontech#10079)

* downloader: docs on MMAP for data-files r/w and experiments with bufio (erigontech#10074)

Pros:
- it allows to not pre-alloc files:
erigontech#8688
- it allows to not "sig-bus" when no space left on disk (return
user-friendly error). see:
erigontech#8500 - but DB will be MMAP
anyway and may get "sig-bus"

FYI:
- seems no perf difference (but i tested only on cloud drives)
- erigon will anyway open it as mmap 
 
Cons: 
- i did implemented `fsync` for mmap (
anacrolix/torrent#755 ) - probably will need
implement it for bufio: anacrolix/torrent#937
- no zero-copy: more `alloc` memory will be holded by APP (PageCache
starvation). I see 2x mem usage (at `--torrent.download.slots=500` 20gb
vs 40gb)
- i see "10K threads exchaused" error earlier (on
`--torrent.download.slots=500`).
- what else?

* polygon/p2p: Add blk/s and bytes/s to periodic log (erigontech#9976)

* wrong ttl value initialization in expirable lru cache (erigontech#10090)

fix issue erigontech#10089

* Fetch and skip sync events (erigontech#10051)

For period where there are not many sync events (mostly testnets) sync
event fecthing can be slow becuase sync events are fetched at the end of
every sprint.

Fetching the next and looking at its block number optimizes this because
fetches can be skipped until the next known block with sync events.

* EIP-2537 (BLS12-381): use gnark instead of kilic (erigontech#10082)

Cherry pick ethereum/go-ethereum#29441

---------

Co-authored-by: Marius van der Wijden <[email protected]>
Co-authored-by: Martin Holst Swende <[email protected]>

* abi: fix abigen issue with make devtools (erigontech#10091)

fixes erigontech#7593 

it introduced a regression: `"fmt"` and `"reflect"` imports were added
for all files generated by `abigen` assuming that they will be used in
all cases, however that assumption was wrong for some cases resulting in
invalid code being generated (in this case after running `make
devtools`):
<img width="982" alt="Screenshot 2024-04-27 at 10 50 37"
src="https://github.com/ledgerwatch/erigon/assets/94537774/9a1b93a5-2141-40d9-8c9e-01a1ff6c031c">

* Caplin: Inclusion of `VoluntaryExits`, `AttesterSlashing`s, `ProposerSlashing`s, `BlsExecutionToChange`s and `Attestation`s into block production (erigontech#10071)

This PR add operations inclusion.
## Normal operations

* BlsExecutionChange
* VoluntaryExit
* Slashings

Each of these operations blacklist the index they work on so we do not
have repeating indices for the same operations twice. we assume all
signatures are pre-validated and just see if it is a good time to
produce a block with them (by looking at their slot)
## Aggregated Attestations

There is a lot of trash attestations on the network so we separate our
algorithm in 3 steps:

### Eligibility

We iterate over the entire pool of accumulated attestations and filter
out all attestations who cannot be included at the current slot, and
compute their expected reward. (filter out if 0).

### Ranking

We rank the `Attestation`s by their expected reward (we just sort the
array of candidates) by expected reward in ascending order.

### Filtering by superset

We may have some supersets left-over, filter attestation which ends up
being supersets of other. this process is done from highest reward down
to lowest reward.

* mdbx: Return err early in iter.Next() (erigontech#10078)

`HasNext` will return true even with existing error and the application
will expect a next entry. The `Next` function can get into an internal
error (such as a `panic()`) while fetching the next cursor item and thus
fail to return the error.

---------

Co-authored-by: alex.sharov <[email protected]>

* make: mocks using mockgen (erigontech#10098)

- replaces usages of `moq` in `erigon-lib` with `mockgen` (gomock)
- adds a `make mocks` and `mocks-clean` command for `erigon`
- updates existing `make mocks` command and adds a `mocks-clean` common
for `erigon-lib`

* mockgen: use typed mocks for compile time check (erigontech#10103)

Use `mockgen -typed=true` to generate mocks with type-safe `Return`,
`Do`, `DoAndReturn` function -
https://github.com/uber-go/mock?tab=readme-ov-file#flags

* make: add gen commands (erigontech#10106)

adds:
- `make gen`
- `make solc`
- `make abigen`
- `make codecgen`
- `make gencodec`
- `make graphql`

tidies up `make devtools`

* added print DBs table sizes (erigontech#10111)

Added command to print databases tables basic info. There are two
options :
- print all info: ./build/bin/diag dbs all
- print only populated tables and dbs: ./build/bin/diag dbs pop   

Here is example output:
![Screenshot 2024-04-28 at 21 38
18](https://github.com/ledgerwatch/erigon/assets/29065143/f0a04931-8d87-4c45-b71a-71d75404f3fc)


@taratorio if you want I can add flag which will print specific DB.

* nodedb: UpdateNode method to create 1 rwtx instead of 2 (erigontech#10109)

* Caplin: tweaks to make staking more stable. (erigontech#10097)

Tweaks I did:
1) Decreased attestation expiry down to 30 minutes
2) Removed slot check in committeeSubAggregation
3) More reliable algorithm for the dependent root

Results:
* Better aggregates
* Less strain on the node
* No blocks/attestations missed

* mdbx: pre-open read pagesize from db (erigontech#10113)

Problem: if --pageSize parameter not set - we using `default pagesize`
instead of `real pagesize of db`. And it causing different `dirtySpace`
size (because it's accounted in "pages")

* RPC: Receipts LRU cache (erigontech#10112)

for erigontech#10099
for things like `eth_getTransactionReceipt`,
`ots_searchTransactionsAfter`, etc...

Also moved:
- moved `api.chainConfig()` inside `api.getReceipts()`
- switched `ots` to use blocks/receipts lru
- switched price oracle to use blocks/receipts

* use sonar for code coverage badge (erigontech#10107)

- use sonar badge for code coverage
- remove unnecessary "Coverage" GitHub action and unnecessary duplicate
test run on "devel" CI for it
- the existing coverage job + badge didn't seem to be accurate (wasn't
taking into account `erigon-lib` sub-module)

<img width="982" alt="Screenshot 2024-04-29 at 12 06 46"
src="https://github.com/ledgerwatch/erigon/assets/94537774/e47367ed-340d-42b5-ad00-2f59edce100c">

* dvovk/limit mem usage (erigontech#10069)

Implemented limit for saving peers in an Erigon node memory to be able
to turn on diagnostics data collection by default.

* chore: fix some function names (erigontech#10117)

Signed-off-by: luchenhan <[email protected]>

* Revert "backward compatibility of .lock" and Backward compatibility by Giulio (erigontech#10077)

Reverts erigontech#10006 and add a proper migration routine

* dvovk/enable_dignostic (erigontech#10083)

Enabled diagnostics by default to collect data. It will allow to connect
to node and get stored data. It includes three new flags:
- "diagnostics.disabled" - it's set to "false" by default. Set to "true"
if you want to disable diagnostics.
- "diagnostics.endpoint.addr" - address of HTTP endpoint to get
diagnostics data
- "diagnostics.endpoint.port" - port of HTTP endpoint to get diagnostics
data

[DO NOT MERGE] as it depend on: 
- erigontech#10069
- update support command
- update diagnostics UI

* Revert "mdbx: pre-open read pagesize from db" (erigontech#10125)

Reverts erigontech#10113

* Bor waypoint storage (erigontech#9793)

Implementation of db and snapshot storage for additional synced hiemdall
waypoint types

* Checkpoint
* Milestones

This is targeted at the Astrid downloader which uses waypoints to verify
headers during syncing and fork choice selection.

Post milestones for heimdall these types are currently downloaded by
erigon but not persisted locally. This change adds persistence for these
types.

In addition to the pure persistence changes this PR also contains a
refactor step which is part of the process of extracting polygon related
types from erigon core into a seperate package which may eventually be
extracted to a separate module and possibly repo.

The aim is rather than the core `turbo\snapshotsync\freezeblocks` having
to know about types it manages and how to exaract and index their
contents this can concern it self with a set of macro shard management
actions.

This process is partially completed by this PR, a final step will be to
remove BorSnapshots and to simplify the places in the code which has to
remeber to deal with them. This requires further testing so has been
left out of this PR to avoid delays in delivering the base types.

# Status

* Waypont types and storage are complete and integrated in to the
BorHeimdall stage, The code has been tested to check that types are
inserted into mdbx, extracted and merged correctly
* I have verified that when produced from block 0 the new snapshot
correctly follow the merging strategy of existing snapshots
* The functionality is enables by a **--bor.waypoints=true** this is
false by default.

# Testing

This has been tested as follows:

* Run a Mumbai instance to the tip and check current processing for
milestones and checkpoints

# Post merge steps

* Produce and release snapshots for mumbai and bor mainnet
* Check existing node upgrades
* Remove --bor.waypoints flags

* Replace snaptype.AllTypes with local definitions (erigontech#10132)

When adding bor waypont types I have removed snaptype.AllTypes because
it causes package cross-dependencies.

This fixes the places where all types have been used post the merge
changes.

* Caplin: process new attesting indicies before block comes in to avoid occasiona Reorg (erigontech#10085)

* qa-tests: small improvements (erigontech#10127)

This PR
- avoids installing Golang on every test run, 
- clean up the testbed datadir at the end of the test

* fix some flags parsing  (erigontech#10134)

* align deps of e35 and devel (erigontech#10136)

- upgrade docker 
- remove tendermint

* core/types: disable go:generate codecgen for receipts and logs (erigontech#10105)

running `go generate ./...` fails with:
```
codecgen error: error running 'go run codecgen-main-2.generated.go': exit status 1, console: panic: encoding alphabet includes duplicate symbols

goroutine 1 [running]:
encoding/base64.NewEncoding(...)
    /usr/local/go/src/encoding/base64/base64.go:82
github.com/ugorji/go/codec.init()
    /Users/milen/go/pkg/mod/github.com/ugorji/go/[email protected]/gen.go:168 +0xf1c
exit status 2
```

this is a problem when using go1.22 and it has been fixed here:
-
ugorji/go@8286c2d
- issue: ugorji/go#407

* fix concurrent rw on map in operation_pool (erigontech#10140)

relates to erigontech#10139

* Refactored types to force runtime registrations to be type dependent (erigontech#10147)

This resolves erigontech#10135

All enums are constrained by their owning type which forces package
includsion and hence type registration.

Added tests for each type to check the construction cycle.

* protection from starting e2 git branch on e3 db (erigontech#10150)

* Set existing torrent webseeds after download (erigontech#10149)

Fix a timing hole where torrents that get created before webseeds have
been downloaded don't get webseeds set.

* eth, txpool: enforce 30gwei for gas related configs for polygon (erigontech#10158)

Cherry-pick PR erigontech#10119 into the release

Co-authored-by: Marcello Ardizzone <[email protected]>

* make: fix gen issue with mockgen not found in PATH (erigontech#10162) (erigontech#10166)

Fixes
erigontech#10157 (comment)

Problem was:
```
grep -r -l --exclude-dir=erigon-lib "^// Code generated by MockGen. DO NOT EDIT.$" . | xargs rm -r
```

was deleting the `mockgen` binary after it was built 🙃

* abigen: fix duplicate struct definitions (erigontech#10157) (erigontech#10164)

fixes a 2nd regression introduced by -
erigontech#7593

- it generates duplicate struct types in the same package (check
screenshot below)
- also found a better way to fix the first regression with unused
imports (improvement over
erigontech#10091)

<img width="1438" alt="Screenshot 2024-04-30 at 17 30 42"
src="https://github.com/ledgerwatch/erigon/assets/94537774/154d484b-4b67-4104-8a6e-eac2423e1c0e">

* dvovk/pprof fix (erigontech#10155) (erigontech#10178)

Cherry pick PR erigontech#10155 into the release

Co-authored-by: Dmytro <[email protected]>

* Engine API: NewPayload fails with a "context canceled" error in Current/GetHeader (erigontech#9786) (erigontech#9894)

* improved logging
* check ctx in ServeHTTP: The context might be cancelled if the client's
connection was closed while waiting for ServeHTTP.
* If execution API returns ExecutionStatus_Busy, limit retry attempts to
10 seconds. This timeout must be lower than a typical client timeout (30
sec), in order to give the client feedback about the server status.
* If execution API returns ExecutionStatus_Busy, increase retry delay
from 10 ms to 100 ms to avoid stalling ourselves with multiple busy
loops. IMO this delay should be higher (e.g. 1 sec). Ideally we
shouldn't do polling at all, but doing a blocking ctx call requires
rearchitecting the ExecutionStatus_Busy logic.

see erigontech#9786

* torrent v1.54.2-alpha -> v1.54.2-alpha-7 (release/2.60) (erigontech#10183)

* Unnecessary Logs in sentry removed (erigontech#10190)

Cherry pick PR erigontech#10187 into the release

Co-authored-by: Giulio rebuffo <[email protected]>

* nil block during execution (erigontech#10193)

release cherry pick

* qa-tests: updating test workflow on release/2.60 (erigontech#10196)

This PR brings the changes of erigontech#10195 to the branch release/2.60 with the
necessary modifications

* qa-tests: fix workflows for release 2.60 (erigontech#10217)

Running a test every day doesn't make sense on an inactive branch. 
It also seems that the schedule trigger favours the main branch if the
test workflow has the same name on the main and other branches.
So this PR changes the test trigger to "push events".

* Release: fix logs spam (erigontech#10211)

for erigontech#10203

* Blocks snaps - see 0 indices after reopen (erigontech#10219)

Cherry pick PR erigontech#10214 into the release

Co-authored-by: Alex Sharov <[email protected]>

* torrent v1.54.2-alpha-7 -> v1.54.2-alpha-8 (release/2.60) (erigontech#10224)

This adds torrent fixes that remove bad peers due to non handling of
http errs.

* fixed start diag server (erigontech#10236)

fixed start diag server if metrics address is different from pprof
address

---------

Co-authored-by: taratorio <[email protected]>

* params: version 2.60.0-rc1 (erigontech#10230)

* downloader: --seedbox doesn't init snaptypes (erigontech#10245)

Cherry pick PR erigontech#10215 into the release

Co-authored-by: Alex Sharov <[email protected]>

* e2: bor-mainnet fix broken v1-054600-054700-borspans.seg  (erigontech#10243)

Pick erigontech/erigon-snapshot#160

* test

* e2: set dirty-space for chaindb to 512mb (erigontech#10269)

* Fix potential index out of bounds in decodeBlobVersionedHashes (erigontech#10294)

* remove nils from p2p logs (erigontech#10303)

fix for 
```
[p2p] Server                             protocol=68 peers=2 trusted=0 inbound=1 LOG15_ERROR= LOG15_ERROR= LOG15_ERROR= LOG15_ERROR= LOG15_ERROR= i/o timeout=53 EOF=65 closed by remote=215 too many peers=6 ecies: invalid message=5
```

* params: version 2.60.0 (erigontech#10330)

* Fix tests

* fix Consensus specification tests CI (erigontech#10391) (erigontech#10396)

Cherry-pick:
erigontech@bc5fa6f

Need this to get PR CI green for v2.60.1 patches, e.g. -
erigontech#10390

Co-authored-by: Andrew Ashikhmin <[email protected]>

* rpc/handler: do not append null to stream when json may be valid (erigontech#10390)

Cherry-pick:
erigontech@4d1c954
Relates to: erigontech#10376

* Remove files that should have been ignored

* Bump go version to 1.21

* Add erigon-lib back

* Fixed Bor Log appearing on Ethereum Mainnet (erigontech#10405) (erigontech#10420)

Cherry-pick:
erigontech@be889f6

Co-authored-by: Giulio rebuffo <[email protected]>

* Fix dynamic config

* Fix APIList

* Upstream merge fix

Fix txpool and metrics

* Fix nil pointer dereference during first stage cycle

* fix gas price not right problem (erigontech#10456)

Cherry pick PR erigontech#10451 into the release branch

Co-authored-by: mars <[email protected]>

* eth_estimateGas: default feeCap to base fee (erigontech#10499)

Copy PR erigontech#10495 into the release branch

* Add flag for bor waypoint types (erigontech#10501)

Cherry pick PR erigontech#10281 into the release branch

Co-authored-by: Mark Holt <[email protected]>
Co-authored-by: alex.sharov <[email protected]>

* try to fix 'method handler crashed' for debug_traceCall of erigontech#9090 (erigontech#10502)

Cherry pick PR erigontech#10401 into the release branch

Co-authored-by: mars <[email protected]>

* diagnostics: cherry pick speedtest disable (erigontech#10509)

Cherry pick PR erigontech#10449 into the release branch

* Enable DNS p2p discovery on holesky (erigontech#10507)

Cherry pick PR erigontech#10460 into the release branch

Co-authored-by: Willian Mitsuda <[email protected]>

* fix eth_call 'method handler crashed' error when tx has set maxFeePerBlobGas (erigontech#10506)

Cherry pick PR erigontech#10452 into the release branch

Co-authored-by: mars <[email protected]>

* e2: remove overlapped files only after merge (erigontech#10487)

Otherwise: if start after `kill -9` in the middle of merge - may remove
small files of 1 type of file, but leave small files of another type of
files (which merge was not finished) - and leave node in un-mergable
state: erigontech#10485

---------

Co-authored-by: awskii <[email protected]>

* add flag checking for pruning waypoints (erigontech#10508)

Cherry pick PR erigontech#10468 into the release branch

Co-authored-by: Mark Holt <[email protected]>

* p2p/sentry: sentry doesn't start with ErrNoHead (erigontech#10454) (erigontech#10523)

cherry-pick erigontech#10494 to
release/2.60

* add lock to purgeMilestoneIDsList (erigontech#10524)

Cherry pick PR erigontech#10493 into the release branch

Co-authored-by: Mark Holt <[email protected]>

* polygon/heimdall: fix checkpoint json marshalling (erigontech#10530)

Fixes a recent regression causing unwinds due to checkpoints having zero
root hash:
```
[WARN] [05-18|23:58:54.662] [bor] Root hash mismatch while whitelisting checkpoint expected=ac1c57270479250af3ce8eee90075cd8b2ba1bac55353105e063d9a4c87c743e got=0000000000000000000000000000000000000000000000000000000000000000
[WARN] [05-18|23:58:54.662] [bor] Rewinding chain due to checkpoint root hash mismatch number=57125727
```

Note this has already been fixed on Erigon 3 branch but as part of a
non-related PR -
https://github.com/ledgerwatch/erigon/pull/10124/files#diff-47d4532f399f2d6a45e6f19944a45c80bac573b4d1b5cb51485d0254229d1b16

* Fix capacity for immediate appends (erigontech#10539)

Cherry pick PR erigontech#10528 into the release branch

Co-authored-by: Shoham Chakraborty <[email protected]>

* core/vm: set tracer-observable value of a delegatecall to match parent value (erigontech#10370)

requested by erigontech#9549

port of ethereum/go-ethereum#26632

* Remove unused binary

* params: version 2.60.1 (erigontech#10555)

* remove: externalcl flag from default configs

* blobGasPrice should be marshalled as hex (erigontech#10571)

Cherry pick PR erigontech#10551 into the release branch

* Caplin: Fixed reforwarding of Bls Execution changes (erigontech#10577)

Cherry pick PR erigontech#10546 into the release branch

Co-authored-by: Giulio rebuffo <[email protected]>

* Caplin: Proper "Normalization" of length of ForkVersions to 8 hex characters (erigontech#10578)

Cherry pick PR erigontech#10512 into the release branch

Co-authored-by: Giulio rebuffo <[email protected]>

* Caplin: Update BlobSidecars Beacon API endpoint to the latest specs (erigontech#10580)

Cherry pick PR erigontech#10576 into the release branch

Co-authored-by: Giulio rebuffo <[email protected]>

* wip

* resolve conflicts/issues after merge

* bor blocks retire: infinity loop fix (erigontech#10596)

Problem: `+1` was added to maxBlockNum instead of minBlockNum
for: erigontech#10554

* add modes in acl db

* refactored code

* acl cli

* Uts

* Revert "Uts"

This reverts commit 807bc91.

Revert "acl cli"

This reverts commit c8eead9.

Revert "refactored code"

This reverts commit aa4b8a4.

Revert "add modes in acl db"

This reverts commit 09c512a.

Revert "wip"

This reverts commit ca6db9e.

* bor blocks retire: infinity loop fix (erigontech#10596)

Problem: `+1` was added to maxBlockNum instead of minBlockNum
for: erigontech#10554

* txpool: EIP-3860 should only apply to create transactions (erigontech#10609)

This fixes Issue erigontech#10607

* qa-tests: update 2.60.x test workflows from main (erigontech#10627)

* Fix potential p2p shutdown hangup (erigontech#10626)

This is a fix for: 

erigontech#10192

This fixes is a deadlock in v4_udp.go where 
* Thread A waits on mutex.Lock() in resetTimeout() called after reading
listUpdate channel.
* Thread B waits on listUpdate <- plist.PushBack(p) called after locking
mutex.Lock()
  
This fix decouples the list operations which need locking from the
channel operations which don't by storing the changes in local
variables. These updates are used for resetting a timeout - which is not
order dependent.

* downloader: Number of DNS requests seem excessive (erigontech#5145) (erigontech#10739)

cherry-pick erigontech#10693 to release

* rpc: Fix incorrect txfeecap (erigontech#10643)

Cherry pick PR erigontech#10636 to
Erigon 2

* downloader: don't block erigon startup if devs deploy new hashes (of same files) (erigontech#10761)

* skip hidden files when list files with given extension  (erigontech#10654)

for erigontech#10644

* qa-tests: backport to release/2.60 improvements made to e3 github action workflows (erigontech#10778)

This PR backports improvements that we added to the E3 tests: recording
runner name and db version used for testing on MongoDB database.

* Fix tests

* Fix docker file and make file

* Fix configs

* Add a dummy flag 'externalcl' for backward compatibility

* Enable CI in upstream-merge PRs

* Exclude non-buildable targets from CI

* e2: more snaps (all networks) (erigontech#10794)

* Update ubuntu version in CI

* e2: configurable hashers amount (erigontech#10785)

* Revert "e2: configurable hashers amount" (erigontech#10834)

* diagnostics: move E3 changes to E2 (erigontech#10806)

Merged all the work done from main branch to keep diagnostics up to
date.

* Downloader: fix staticpeers flag (erigontech#10798)

Cherry pick erigontech#10792

* Fix NewPayload Validation during header download (erigontech#10837)

Cherry pick PR erigontech#10093 into the release branch

Co-authored-by: Minhyuk Kim <[email protected]>

* e2: mainnet blob 9.3M (erigontech#10842)

* Fix gas fee calculation for debug calls (erigontech#10880)

Cherry pick PR erigontech#10825 into the release branch

Co-authored-by: Minhyuk Kim <[email protected]>

* Revert "eth_estimateGas: default feeCap to base fee (erigontech#10499)" (erigontech#10904)

This reverts PR erigontech#10499. See
erigontech#10495 (comment)
and PR erigontech#10901

* Change CI branches

* params: version 2.60.2 (erigontech#10905)

* Add bridge test to CI

* Remove matrix

* [bugfix] Fix gas estimation bug where EVM was not correctly used in the interpreter

* Changing Caplin Finality Checkpoint API response to match spec (erigontech#10944)

Cherry pick PR erigontech#10843 into the release branch

Co-authored-by: Angus Scott <[email protected]>

* Add zero check in tx.Sender func (erigontech#10737)

This is an additional check as erigontech#9990 could not be reliably reproduced.
The conjecture is that at some point there is a race condition somewhere
related to either storing snapshot file for an older block or updating
the DB for a more recent block.
Somewhere the code sets sender value directly to zero or overwrites a
pointer, leading to sender address being incorrectly assigned to ZERO.

* Add Normalcy hardfork (#676)

* Add Normalcy fork

* Add a few missed functions

* eth/tracers: fix prestate tracer bug with create with value (erigontech#10960)

fixes erigontech#9531

Changes:
- fixes a bug with the prestate tracer where we were incorrectly
subtracting the value of a transaction from the "to" address balance in
the "pre" state (should not be done for CREATE calls)
- fixes a bug with the prestate tracer where we were incorrectly adding
the value of a transaction to the "from" address balance in the "pre"
state (should not be done for CREATE calls)
- fixes a bug with the prestate tracer where we were incorrectly
decrementing the nonce value of a transaction's "from" address in the
"pre" state (should not be done for CREATE calls)
- adds a test generator that can generate the test files for us based on
real life transaction hash and node rpc url - check README
https://github.com/ledgerwatch/erigon/blob/fix-prestate-tracer-on-create-e2/eth/tracers/internal/tracetest/testgenerator/README.md
- adds test cases
- fixes some existing test cases that were setup with incorrect data

* eth/tracers: add optional includePrecompiles flag to callTracer - default true is preserved (erigontech#10986)

relates to erigontech#9784

- Adds support for an optional `"includePrecompiles"` tracer config
option for `callTracer` that users can use to control behaviour
(previous default of including precompile traces is preserved)
- Adds tests for default and for `"includePrecompiles": false` based on
https://etherscan.io/tx/0x536434786ace02697118c44abf2835f188bf79902807c61a523ca3a6200bc350

* Cherry-pick: Caplin's past finalization check (erigontech#11006)

* turbo/jsonrpc: add optional includePrecompiles flag to trace_* apis (erigontech#10979)

relates to erigontech#9784

- Adds support for an optional `"includePrecompiles"` tracer config
option for our OeTracer (OpenEthereum) that users can use to match
output of debug_* apis with callTracer (by default it includes
precompiles). Note default spec for OpenEthereum traces are to not
include precompiles - this is preserved by this PR
- Note geth has support for `"includePrecompiles"` so we are getting
more aligned as well -
https://github.com/ethereum/go-ethereum/blob/master/eth/tracers/native/call_flat.go#L124
- Adds tests for OeTracer

* eth/tracers: always pop precompiles stack in callTracer (erigontech#11004)

made a mistake in previous PR
erigontech#10986
should always pop the precompiles stack for correctness

* allow to gracefully exit from CL downloading stage (erigontech#10887) (erigontech#11020)

Duplicating erigontech#10887

Co-authored-by: awskii <[email protected]>

* Less troublesome way of identifying content-type (erigontech#10770) (erigontech#11018)

Co-authored-by: Giulio rebuffo <[email protected]>

* Diagnostics: loglevel (erigontech#11015)

Changed log level

* dl: additional pre-check for having info  (erigontech#11012)

cherry-pick of erigontech#10853

* Diagnostics: Optimize db write (erigontech#11016)

Fix for erigontech#10932

* qa-tests: add Tip-Tracking test for Gnosis (erigontech#11053)

This add a Tip-Tracking test on Erigon v2 for Gnosis chain/network

* params: version 2.60.3 (erigontech#11069)

* Fix issues for post-london hardforks

* Fix unit test

* Add dynamic gas fee tx to CI for post-london

* Fix kurtosis in CI

* Disable blob opcodes and point evaluation precompile (#1147)

* Disable blob tx (#1150)

* Avoid log padding when normalcy is enabled

* Bump kurtosis version

* Remove auto claim

* update with feat/zero branch zero.go file (#1109)

Co-authored-by: Jerry <[email protected]>

* Fix stage_batches bug

* Fix unwind ci

* Fix RPC doc check ci

* Fix unwind ci build

---------

Signed-off-by: fuyangpengqi <[email protected]>
Signed-off-by: jsvisa <[email protected]>
Signed-off-by: luchenhan <[email protected]>
Co-authored-by: Alex Sharov <[email protected]>
Co-authored-by: Giulio rebuffo <[email protected]>
Co-authored-by: kewei <[email protected]>
Co-authored-by: Willian Mitsuda <[email protected]>
Co-authored-by: Shoham Chakraborty <[email protected]>
Co-authored-by: Somnath <[email protected]>
Co-authored-by: Dmytro <[email protected]>
Co-authored-by: Mark Holt <[email protected]>
Co-authored-by: awskii <[email protected]>
Co-authored-by: Mark Holt <[email protected]>
Co-authored-by: Stuart Corring <[email protected]>
Co-authored-by: fuyangpengqi <[email protected]>
Co-authored-by: milen <[email protected]>
Co-authored-by: goofylfg <[email protected]>
Co-authored-by: sealer3 <[email protected]>
Co-authored-by: mcfx <[email protected]>
Co-authored-by: canepat <[email protected]>
Co-authored-by: Andrew Ashikhmin <[email protected]>
Co-authored-by: persmor <[email protected]>
Co-authored-by: galois <[email protected]>
Co-authored-by: adytzu2007 <[email protected]>
Co-authored-by: battlmonstr <[email protected]>
Co-authored-by: carehabit <[email protected]>
Co-authored-by: Delweng <[email protected]>
Co-authored-by: Jonathan Otto <[email protected]>
Co-authored-by: Daniel Gimenez <[email protected]>
Co-authored-by: Marius van der Wijden <[email protected]>
Co-authored-by: Martin Holst Swende <[email protected]>
Co-authored-by: luchenhan <[email protected]>
Co-authored-by: Michelangelo Riccobene <[email protected]>
Co-authored-by: Marcello Ardizzone <[email protected]>
Co-authored-by: Anshal Shukla <[email protected]>
Co-authored-by: mars <[email protected]>
Co-authored-by: awskii <[email protected]>
Co-authored-by: Rachit Sonthalia <[email protected]>
Co-authored-by: Goran Rojovic <[email protected]>
Co-authored-by: Minhyuk Kim <[email protected]>
Co-authored-by: Igor Mandrigin <[email protected]>
Co-authored-by: Angus Scott <[email protected]>
Co-authored-by: VBulikov <[email protected]>
Co-authored-by: Arpit Temani <[email protected]>
Co-authored-by: Scott Fairclough <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants