-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tune RPS of eth_getTransactionReceipt #11090
Comments
See: #9486 Try re-index files: After re-indexing set: —db.read.concurrency=5000 see also: —rpc.batch.limit and —rpc.batch.concurrency After this - will see. You also can compare it with eth_getBlockByNumber - to exclude TxnLookup index use. |
eth_getBlockByNumber runs about 31k RPS with ~85% CPU utilization and almost no change in IOPS our rpc daemon is configured to have
but in these test we are not using batches we are running are there any params to make it faster - use more threads, utilize the HW more(not using all CPUs, nor IOPS, nor disk bandwith)?
|
Please avoid too emotional involvement: Indexing threads amount: choosing threads amount based on GOMAXPROCS/GOMEMLIMIT. Maybe you forgot remove this to env variables? Or you can try increase them to increase amount of indexing threads.
|
Sorry for being emotional, but already digging into it for almost week with no progress. After re-indexing & eth_getBlockByNumber eth_getBlockReceipts
eth_getTransactionReceipt
For comparison Geth eth_getTransactionReceipt
|
nice.
thank you |
add 1: have tried same test with -max-workers 6 and different db.read.concurrency: --db.read.concurrency 1 Requests [total, rate, throughput] 662, 22.04, 21.91 --db.read.concurrency 2 RPS 38.58 (+75%) Requests [total, rate, throughput] 1158, 38.58, 38.37 --db.read.concurrency 5 RPS 43.99 (+14%) Requests [total, rate, throughput] 1321, 43.99, 43.85 --db.read.concurrency 20000 53.56 --db.read.concurrency 200000 58.56 A change between 1 and 2 is reasonable, but later the performance doesn't grow as expected - it looks like it is waiting for something being executed in 2 threads (very likely just some part of processing). And HW even in the highest configuration is pretty calm (CPU 10%, peak read IOPS at most on 3k + 20k write, we should have 36k available 72k in peak). I have tried profiling with release version 2.60.3-1f73ed55, but mem, block and mutex are empty, are the commands you have shared ok? |
I have identified the problem ...
What I have tried is not locking for TxnLookup:
The perf rocketed to sky, we had 800+ RPS and HW is finally getting utilized (CPU 75%, peak read IOPs 25k). Is the locking for read-only (rpcdaemon) necessary? Is it necessary even for historical segments? Or can it lock segments one-by-one? And I have noticed that "View" is using read/write lock, would it improve perf if we would use RLock (I guess it is read-only lock)? |
Ok, I dug a bit deeper into it, this is what I have tried tholcman@d677685#diff-72924d5ce1ba1962b50ba71cd48ca3c4ac4c54c1322e7db1bbac2108f760e517 Instead of using r.sn.View(), which locks all segment types from the call until the defer (return)
RPS: 651.52 while keeping all necessary locks (I hope), with some tuning I guess we can get even better. Can anyone, more experienced than me, review my approach or properly review the Locking in turbo/snapshotsync/freezeblocks/block_reader.go ? |
@tholcman yes. I can do it today.
only open question is: does it works better than "1 rwmutex for all types" or not |
you also can |
FYI: also created #11157 |
I would say it is not ready for PR. I would probably move the methods into OR change the View() to use RLock (all the segment & types) - it may have the significant impact (as original RW lock is exclusive) with minimal effort. Which path would you suggest? (I don't have that much time during weekdays to properly test both). & do I understand correctly that I should prepare PR into release/2.60? Will the change get also into E3 - I see the same code also in main so I guess it is affected as well? |
@tholcman "all the uses of View()" - did it in my PR. "OR change the View() to use RLock" - also done in my PR. |
Oh great! Thank you! When is the expected next minor release? I will use custom built image from |
@tholcman it's because you test on hot data. try test on cold data (various block nums across all history) and likely you will see that bigger read.concurrency helps - if disk can handle parallel reads. |
for: #11090 thank you [tholcman](https://github.com/tholcman) for finding
for #11090 thank you [tholcman](https://github.com/tholcman) for finding
Actually my test simulates a traffic which we see from our clients:
Also the list of transactions I used for the vegeta tests was from "random blocks from the last 1/3 of the chain" (as i found this the most problematic). Can I port your changes to node-real/bsc-erigon? I guess they are no longer in sync with this repo and this fix would be beneficial there as well (and on our BSCnodes). |
for erigontech#11090 thank you [tholcman](https://github.com/tholcman) for finding
yes. please. |
for erigontech#11090 thank you [tholcman](https://github.com/tholcman) for finding Co-authored-by: Alex Sharov <[email protected]>
) for erigontech#11090 thank you [tholcman](https://github.com/tholcman) for finding Co-authored-by: Alex Sharov <[email protected]>
* diag: thread-safety step1 - json marshal under mutex (erigontech#11134) * diag: thread-safety step2 - unlock mutex in defer (erigontech#11135) * diag: thread-safety step4 - remove dedicated shutdown listener goroutine (erigontech#11137) reason: - we already have 1 goroutine for saving data: ``` func (d *DiagnosticClient) runSaveProcess(rootCtx context.Context) { ticker := time.NewTicker(5 * time.Minute) go func() { for { select { case <-ticker.C: d.SaveData() case <-rootCtx.Done(): ticker.Stop() return } } }() } ``` no reason to save it from one more goroutine. just save it right here - in `case <-rootCtx.Done()` section. less concurrency - better. rootContext already subscribed to sigterm * diag: thread-safety step3 - `PeerStatistics.Clone()` and `PeerStats.mutex` (erigontech#11136) Co-authored-by: dvovk <[email protected]> * dl: manifest-verify green CI (erigontech#11142) - skip `erigon2-v2` buckets - until erigontech#10967 * bor: fix race in `LockedMilestoneIDs` access (erigontech#11139) for erigontech#11129 * Diagnostics: refactor bulk execution thread safety (erigontech#11143) * Call UnwindTo with tx instead of nil in sync_test.go (erigontech#11133) Co-authored-by: antonis19 <[email protected]> * Caplin: Tweaked CGO_Flags (erigontech#11144) * diag: thread-safety step5 - race in speedtest (erigontech#11138) - attempt to upgrade speedtest - to fix race: showwin/speedtest-go#109 (comment) It didn't help. Created: showwin/speedtest-go#223 I see: ``` ================== WARNING: DATA RACE Write at 0x00c2167c2088 by goroutine 70275: github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start.func1.1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:203 +0x84 sync.(*Once).doSlow() /usr/local/go/src/sync/once.go:74 +0xf0 sync.(*Once).Do() /usr/local/go/src/sync/once.go:65 +0x44 github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start.func1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:200 +0xb1 Previous read at 0x00c2167c2088 by goroutine 69927: github.com/showwin/speedtest-go/speedtest.(*DataChunk).Read() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:452 +0x64 io.(*nopCloser).Read() <autogenerated>:1 +0x6c io.(*LimitedReader).Read() /usr/local/go/src/io/io.go:479 +0xc5 io.copyBuffer() /usr/local/go/src/io/io.go:429 +0x29a io.Copy() /usr/local/go/src/io/io.go:388 +0x6f net.genericReadFrom() /usr/local/go/src/net/net.go:689 +0x12 net.(*TCPConn).readFrom() /usr/local/go/src/net/tcpsock_posix.go:54 +0xc9 net.(*TCPConn).ReadFrom() /usr/local/go/src/net/tcpsock.go:130 +0x64 io.copyBuffer() /usr/local/go/src/io/io.go:415 +0x22e io.Copy() /usr/local/go/src/io/io.go:388 +0x95 net/http.persistConnWriter.ReadFrom() /usr/local/go/src/net/http/transport.go:1824 +0x12 bufio.(*Writer).ReadFrom() /usr/local/go/src/bufio/bufio.go:794 +0x2b0 io.copyBuffer() /usr/local/go/src/io/io.go:415 +0x22e io.CopyBuffer() /usr/local/go/src/io/io.go:402 +0x8f net/http.(*transferWriter).doBodyCopy() /usr/local/go/src/net/http/transfer.go:416 +0x144 net/http.(*transferWriter).writeBody() /usr/local/go/src/net/http/transfer.go:371 +0x75c net/http.(*Request).write() /usr/local/go/src/net/http/request.go:755 +0x1413 net/http.(*persistConn).writeLoop() /usr/local/go/src/net/http/transport.go:2447 +0x379 net/http.(*Transport).dialConn.gowrap3() /usr/local/go/src/net/http/transport.go:1800 +0x33 Goroutine 70275 (running) created at: github.com/showwin/speedtest-go/speedtest.(*TestDirection).rateCapture.func1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:267 +0x3ef github.com/showwin/speedtest-go/speedtest.(*TestDirection).rateCapture.gowrap1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:279 +0x41 Goroutine 69927 (running) created at: net/http.(*Transport).dialConn() /usr/local/go/src/net/http/transport.go:1800 +0x27fe net/http.(*Transport).dialConnFor() /usr/local/go/src/net/http/transport.go:1485 +0x124 net/http.(*Transport).queueForDial.gowrap1() /usr/local/go/src/net/http/transport.go:1449 +0x44 ================== ================== WARNING: DATA RACE Write at 0x00c2167c2088 by goroutine 63832: github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start.func1.1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:203 +0x84 sync.(*Once).doSlow() /usr/local/go/src/sync/once.go:74 +0xf0 sync.(*Once).Do() /usr/local/go/src/sync/once.go:65 +0x44 github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start.func1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:200 +0xb1 Previous read at 0x00c2167c2088 by goroutine 57836: github.com/showwin/speedtest-go/speedtest.(*DataChunk).DownloadHandler() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:408 +0x3ee github.com/showwin/speedtest-go/speedtest.downloadRequest() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:183 +0x624 github.com/showwin/speedtest-go/speedtest.(*Server).downloadTestContext.func1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:119 +0x85 github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start.func2() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:218 +0xca Goroutine 63832 (running) created at: time.goFunc() /usr/local/go/src/time/sleep.go:177 +0x44 Goroutine 57836 (running) created at: github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:212 +0x70a github.com/showwin/speedtest-go/speedtest.(*Server).downloadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:122 +0x2e8 github.com/showwin/speedtest-go/speedtest.(*Server).DownloadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:109 +0x1ee github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).runSpeedTest() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:66 +0x1d2 github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).setupSpeedtestDiagnostics.func1() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:35 +0xf2 ================== ================== WARNING: DATA RACE Write at 0x00c2167c2068 by goroutine 63836: github.com/showwin/speedtest-go/speedtest.(*DataChunk).UploadHandler() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:440 +0x247 github.com/showwin/speedtest-go/speedtest.uploadRequest() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:188 +0x113 github.com/showwin/speedtest-go/speedtest.(*Server).uploadTestContext.func1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:150 +0x85 github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start.func2() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:218 +0xca Previous read at 0x00c2167c2068 by goroutine 63840: github.com/showwin/speedtest-go/speedtest.(*DataChunk).UploadHandler() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:438 +0x179 github.com/showwin/speedtest-go/speedtest.uploadRequest() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:188 +0x113 github.com/showwin/speedtest-go/speedtest.(*Server).uploadTestContext.func1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:150 +0x85 github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start.func2() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:218 +0xca Goroutine 63836 (running) created at: github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:212 +0x70a github.com/showwin/speedtest-go/speedtest.(*Server).uploadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:153 +0x2e8 github.com/showwin/speedtest-go/speedtest.(*Server).UploadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:140 +0x25c github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).runSpeedTest() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:71 +0x240 github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).setupSpeedtestDiagnostics.func1() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:35 +0xf2 Goroutine 63840 (running) created at: github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:212 +0x70a github.com/showwin/speedtest-go/speedtest.(*Server).uploadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:153 +0x2e8 github.com/showwin/speedtest-go/speedtest.(*Server).UploadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:140 +0x25c github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).runSpeedTest() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:71 +0x240 github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).setupSpeedtestDiagnostics.func1() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:35 +0xf2 ================== ================== WARNING: DATA RACE Write at 0x00c2167c2068 by goroutine 63840: github.com/showwin/speedtest-go/speedtest.(*DataChunk).UploadHandler() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:440 +0x247 github.com/showwin/speedtest-go/speedtest.uploadRequest() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:188 +0x113 github.com/showwin/speedtest-go/speedtest.(*Server).uploadTestContext.func1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:150 +0x85 github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start.func2() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:218 +0xca Previous write at 0x00c2167c2068 by goroutine 63835: github.com/showwin/speedtest-go/speedtest.(*DataChunk).UploadHandler() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:440 +0x247 github.com/showwin/speedtest-go/speedtest.uploadRequest() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:188 +0x113 github.com/showwin/speedtest-go/speedtest.(*Server).uploadTestContext.func1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:150 +0x85 github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start.func2() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:218 +0xca Goroutine 63840 (running) created at: github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:212 +0x70a github.com/showwin/speedtest-go/speedtest.(*Server).uploadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:153 +0x2e8 github.com/showwin/speedtest-go/speedtest.(*Server).UploadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:140 +0x25c github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).runSpeedTest() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:71 +0x240 github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).setupSpeedtestDiagnostics.func1() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:35 +0xf2 Goroutine 63835 (running) created at: github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:212 +0x70a github.com/showwin/speedtest-go/speedtest.(*Server).uploadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:153 +0x2e8 github.com/showwin/speedtest-go/speedtest.(*Server).UploadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:140 +0x25c github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).runSpeedTest() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:71 +0x240 github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).setupSpeedtestDiagnostics.func1() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:35 +0xf2 ================== ================== WARNING: DATA RACE Read at 0x00c2167c2100 by goroutine 63833: github.com/showwin/speedtest-go/speedtest.(*TestDirection).rateCapture.func1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:258 +0x1c6 github.com/showwin/speedtest-go/speedtest.(*TestDirection).rateCapture.gowrap1() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:279 +0x41 Previous write at 0x00c2167c2100 by goroutine 63888: sync/atomic.AddInt64() /usr/local/go/src/runtime/race_amd64.s:289 +0xb sync/atomic.AddInt64() <autogenerated>:1 +0x15 io.(*nopCloser).Read() <autogenerated>:1 +0x6c io.(*LimitedReader).Read() /usr/local/go/src/io/io.go:479 +0xc5 io.copyBuffer() /usr/local/go/src/io/io.go:429 +0x29a io.Copy() /usr/local/go/src/io/io.go:388 +0x6f net.genericReadFrom() /usr/local/go/src/net/net.go:689 +0x12 net.(*TCPConn).readFrom() /usr/local/go/src/net/tcpsock_posix.go:54 +0xc9 net.(*TCPConn).ReadFrom() /usr/local/go/src/net/tcpsock.go:130 +0x64 io.copyBuffer() /usr/local/go/src/io/io.go:415 +0x22e io.Copy() /usr/local/go/src/io/io.go:388 +0x95 net/http.persistConnWriter.ReadFrom() /usr/local/go/src/net/http/transport.go:1824 +0x12 bufio.(*Writer).ReadFrom() /usr/local/go/src/bufio/bufio.go:794 +0x2b0 io.copyBuffer() /usr/local/go/src/io/io.go:415 +0x22e io.CopyBuffer() /usr/local/go/src/io/io.go:402 +0x8f net/http.(*transferWriter).doBodyCopy() /usr/local/go/src/net/http/transfer.go:416 +0x144 net/http.(*transferWriter).writeBody() /usr/local/go/src/net/http/transfer.go:371 +0x75c net/http.(*Request).write() /usr/local/go/src/net/http/request.go:755 +0x1413 net/http.(*persistConn).writeLoop() /usr/local/go/src/net/http/transport.go:2447 +0x379 net/http.(*Transport).dialConn.gowrap3() /usr/local/go/src/net/http/transport.go:1800 +0x33 Goroutine 63833 (running) created at: github.com/showwin/speedtest-go/speedtest.(*TestDirection).rateCapture() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:253 +0x464 github.com/showwin/speedtest-go/speedtest.(*TestDirection).Start() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/data_manager.go:195 +0x4a4 github.com/showwin/speedtest-go/speedtest.(*Server).uploadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:153 +0x2e8 github.com/showwin/speedtest-go/speedtest.(*Server).UploadTestContext() /home/ubuntu/go/pkg/mod/github.com/showwin/[email protected]/speedtest/request.go:140 +0x25c github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).runSpeedTest() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:71 +0x240 github.com/ledgerwatch/erigon-lib/diagnostics.(*DiagnosticClient).setupSpeedtestDiagnostics.func1() /home/ubuntu/erigon/erigon-lib/diagnostics/speedtest.go:35 +0xf2 Goroutine 63888 (running) created at: net/http.(*Transport).dialConn() /usr/local/go/src/net/http/transport.go:1800 +0x27fe net/http.(*Transport).dialConnFor() /usr/local/go/src/net/http/transport.go:1485 +0x124 net/http.(*Transport).queueForDial.gowrap1() /usr/local/go/src/net/http/transport.go:1449 +0x44 ================== ``` - also i don't understand why do we need cached client object erigontech#10259 * rlpgen: step 1 (erigontech#11112) - no behavior changes - move encbuffer and encreader to `encbuffer.go` - copy-paste rlpgen package * bor: finality race - step 2 (erigontech#11151) ``` ================== WARNING: DATA RACE Write at 0x00c015bf2208 by goroutine 110: github.com/ledgerwatch/erigon/polygon/bor/finality/whitelist.(*milestone).RemoveMilestoneID() /home/ubuntu/erigon/polygon/bor/finality/whitelist/milestone.go:190 +0x164 github.com/ledgerwatch/erigon/polygon/bor/finality.handleNoAckMilestone() /home/ubuntu/erigon/polygon/bor/finality/whitelist.go:237 +0x12c github.com/ledgerwatch/erigon/polygon/bor/finality.retryHeimdallHandler() /home/ubuntu/erigon/polygon/bor/finality/whitelist.go:159 +0x5c4 github.com/ledgerwatch/erigon/polygon/bor/finality.RetryHeimdallHandler() /home/ubuntu/erigon/polygon/bor/finality/whitelist.go:117 +0x57 github.com/ledgerwatch/erigon/polygon/bor/finality.startNoAckMilestoneService() /home/ubuntu/erigon/polygon/bor/finality/whitelist.go:102 +0x2a github.com/ledgerwatch/erigon/polygon/bor/finality.Whitelist.gowrap3() /home/ubuntu/erigon/polygon/bor/finality/whitelist.go:65 +0x17 Previous read at 0x00c015bf2208 by goroutine 109: github.com/ledgerwatch/erigon/polygon/bor/finality/whitelist.(*milestone).ProcessFutureMilestone() /home/ubuntu/erigon/polygon/bor/finality/whitelist/milestone.go:277 +0x166 github.com/ledgerwatch/erigon/polygon/bor/finality.handleMilestone() /home/ubuntu/erigon/polygon/bor/finality/whitelist.go:208 +0x1e5 github.com/ledgerwatch/erigon/polygon/bor/finality.retryHeimdallHandler() /home/ubuntu/erigon/polygon/bor/finality/whitelist.go:159 +0x5c4 github.com/ledgerwatch/erigon/polygon/bor/finality.RetryHeimdallHandler() /home/ubuntu/erigon/polygon/bor/finality/whitelist.go:117 +0x5c github.com/ledgerwatch/erigon/polygon/bor/finality.startMilestoneWhitelistService() /home/ubuntu/erigon/polygon/bor/finality/whitelist.go:93 +0x2a github.com/ledgerwatch/erigon/polygon/bor/finality.Whitelist.gowrap2() /home/ubuntu/erigon/polygon/bor/finality/whitelist.go:64 +0x17 ``` * integration: wait for caplin snap open finish (erigontech#11124) * rpc bottleneck: block files mutex (e3) (erigontech#11156) for: erigontech#11090 thank you [tholcman](https://github.com/tholcman) for finding * dl: use native `filepath.IsLocal` and `filepath.Clean` funcs (erigontech#11141) they was vendored because of go1.19 compatibility which we dropped * chore: confusing log (erigontech#11043) * execution spec tests update for devnet-1 (erigontech#11127) updating to version [[email protected]](https://github.com/ethereum/execution-spec-tests/releases/tag/devnet-1%40v1.3.0) - update to 7702: some extra json fields displayed in fixtures (rather than just rlps) - some refactoring to fixture json ordering means that a huge number of test files were changed - [refactor requested](erigontech#10812 (comment)) by Alex done in this pr [here](https://github.com/ledgerwatch/erigon/pull/11127/files#diff-8c398ef0a79f97ba6d497a99247a815b3a51918de9e16954215b6073f907c92e). * Delete as not needed cmd/release/. (erigontech#11161) Discussed with @AskAlexSharov * Fix downloader completion set and races (erigontech#11182) fixes: erigontech#11060 Also fixes several races, which should include: erigontech#11123 erigontech#11102 * chore: fix some comments (erigontech#11170) Signed-off-by: stellrust <[email protected]> * tests: auto-close temporal db (erigontech#11160) * discovery table: revalidate race fix (erigontech#11159) for erigontech#11158 * Caplin: Added POST `Validators` (erigontech#11152) close erigontech#11150 * diagnostics: refactor network peers mutex (erigontech#11178) Refactor to pattern which was suggested by @AskAlexSharov - move business-logic inside private methods - move mutex locking inside public methods - call private methods from public. don't call public methods from private. * move interfaces from ledgerwatch to erigontech (erigontech#11194) * increase snapshot semaphore default limit (erigontech#11193) * p2p receipts (erigontech#11010) closes erigontech#10320 and closes erigontech#11014 --------- Co-authored-by: JkLondon <[email protected]> Co-authored-by: alex.sharov <[email protected]> * Caplin: Optimization and Parallelization of processes and reduction of Goroutines (erigontech#11058) Optimizations: 1) Single goroutine for tracking expiry of gossip subscriptions 2) tweaking of parameters on operations retention in cache 3) moving the dumping of BeaconState after forkchoice, so that we do not mess up block times with I/O * Caplin: Look for peers if not avaiable within subnet (erigontech#11057) Actually look for peers of needed subnet for better performance on holesky and mainnet --------- Co-authored-by: Kewei <[email protected]> * Downloader: Bump up defaults (erigontech#11197) * `erigon_getLatestLogs` add limit parameter (erigontech#11095) If user provides logCount param the API should returns logCount records, even if the block contains more logs that satisfy the filter. The logCount param is already supported, this PR avoids to return the log records in excess. Moreover I have created 14 integration tests for this API (run with success on erigon2). * refine ProcessBlock and ProcessBlindedBlock (erigontech#10923) - Apply generic interface `GenericBlock` for blinded block and normal block. - Refine function `ProcessBlock()` so as to remove duplicated code. * less logs on CI (erigontech#11122) * revert logging of binary data (erigontech#11199) * pool: do fsync by non-empty update (e3) (erigontech#11198) for erigontech#11163 * engineapi: Fix req list check (erigontech#11191) * make `latest seen blocks` visible earlier (erigontech#11204) * move secp256k1 to erigontech (erigontech#11209) * move erigon-snapshot to erigontech (erigontech#11213) * replaced speedtest lib with our fork (erigontech#11217) Replaced speed test lib with our for which uses erigon cloudflare webbed to test download speed. I decided to change speedtest servers to erigon servers as users can complain why there is traffic servers which is not related to erigon infrastructure. Our for of speedtest https://github.com/erigontech/speedtest * Caplin: Better old-state pruning (erigontech#11219) * Caplin: Remove TmpDB from `BlockCollector` (erigontech#11215) Co-authored-by: Alex Sharov <[email protected]> * upstrem * Attempt to use go21 (erigontech#11207) * post validator balances (erigontech#11218) impl: erigontech#11055 * eth/stagedsync: polygon sync stage to use new heimdall service (erigontech#11196) part 1 of erigontech#11186 * polygon/heimdall: remove old duplicate heimdall component (erigontech#11214) part 2 of erigontech#11186 * enable `madv_normal` for .kv files of > 0 lvl (erigontech#11223) enable `KV_MADV_NORMAL_NO_LAST_LVL=accounts,storage,code,commitment` by default * Move from ledgerwatch to erigontech (erigontech#11224) * add log * add log * erigon-lib: tidy up slices,generics,cmp commons (erigontech#11216) * Update module path of erigonwatch to erigontech (erigontech#11226) * Fix previous download completion processing (erigontech#11227) This fixes downloads stalling if a locally produced file appears not downloaded It also will re-complete hashes without re-downloading if the download db is removed It should also complete locally produced files when they are verified, if not it will self mend on restart --------- Co-authored-by: Giulio <[email protected]> Co-authored-by: alex.sharov <[email protected]> * Bring back dirs to BaseAPI (erigontech#11228) This PR brings back `datadir.Dirs` to `BaseAPI`, and removes a redundant `Aggregator` argument to `EngineServer.Start` . --------- Co-authored-by: antonis19 <[email protected]> * Logs checking (erigontech#11229) changed erigon api logs to e3 closes erigontech#11117 --------- Co-authored-by: JkLondon <[email protected]> Co-authored-by: alex.sharov <[email protected]> * ots: nil-ptr in rpc (erigontech#11232) * txpool changes for set_code_tx support (erigontech#11235) * Erigon 3.0: Smarter full and minimal nodes (erigontech#11220) * add log * add log * added var inside WriteMap for chain-like interface (erigontech#11241) closes erigontech#11202 --------- Co-authored-by: JkLondon <[email protected]> * tracer: add support bailout on evm.create() (erigontech#11237) Add bailout mgt on evm.create() method (it is already mgt on evm.cal()). When approved I will create PR on rel 2.60.x * added rpc info feature (erigontech#11242) closes erigontech#11157 but a bit of scared that we probably don't have some MetadataFromContext method like in ethereum/go-ethereum#24255 Co-authored-by: JkLondon <[email protected]> * [test] remove unused variables (erigontech#10938) remove unused variables * force fsync after notifications sent (erigontech#11244) * Fixed nil pointer exception (erigontech#11253) * fix panic due to nil validator set (erigontech#11260) fix erigontech#11027 root cause: specific slot state data not found leads to empty validator set * qa-tests: increase test time of sync-from-scratch for minimal node (erigontech#11256) the last few runs of the test did not complete on time * qa-tests: fix sync-from-scratch test result uploading (erigontech#11245) The test results are uploaded to the github actions test run page at the end of the test. As this test has 2 jobs, we need to give the uploaded test results different names to avoid clashes. * Caplin: Add support for beacon snapshots (also stops relying on Engine API) (erigontech#11250) I had to: * Add Caplin snapshot download * Fix Snapshot Downloader on Holesky * Fixed Holesky's chainspec * add log * fix chapel bt hash * rm prints (erigontech#11261) * fix chapel bt hash * Enable `madv_normal` for level0 .kv (erigontech#11265) * HexOrDecimal - to accept unquoted numbers - in json (e3) (erigontech#11262) accept in `genesis.json`:` "nonce": 0,` now see: ``` Fatal: invalid genesis file: json: cannot unmarshal number into Go struct field Genesis.alloc of type *math.HexOrDecimal64 ``` See also `https://github.com/ethereum/go-ethereum/pull/26758` * e3: make getLatest layers visible in pprof (erigontech#11266) * hack to prevent early download finish (erigontech#11267) * add log * add log * add log * add log * add log * add log * add log * add log * add log * Wire OverridePragueTime into txpool (erigontech#11234) also extract common logic from `(p *TxPool) isShanghai()`, `isCancun()`, `isPrague()`. * readme update (erigontech#11275) * diag: race in updateIndexingStatus (erigontech#11274) for erigontech#11268 * Bump version to 3.0.0-alpha1 (erigontech#11276) Co-authored-by: yperbasis <[email protected]> * fix body data * fix goreleaser after update (erigontech#11281) Fixes https://github.com/erigontech/erigon/actions/runs/10041015251/job/27748204997#step:9:72 (caused by PR erigontech#10726 apparently). See also https://github.com/goreleaser/goreleaser-cross-example/blob/master/Makefile * fix body data * fix body data * bor: loopbreaker - to stop check `LoopBlockLimit` and rely on stage_headers progress (erigontech#11286) * Special logs for near-chain-tip execution (erigontech#11288) * fix upstream * e3 use same goreleaser-cross version as in e2 (erigontech#11285) - switched to `v1.21.5` - added to UI checkbox "Publish Artifacts" - which is disabled by default. if not set: `make release-dry-run` * add bool variable to auto-release (erigontech#11290) * on chain-tip: if batch is full - stop execution stage - to allow commit and reduce db size (erigontech#11287) * add log * PIP-35: enforce 25gwei gas config for all polygon chains (erigontech#11294) Remove the checks for amoy added previously to prepare for mainnet release. Sets `txpool.pricelimit`, `miner.gasprice` and `gpo.ignoreprice` to 25gwei for all polygon based networks. * refactor: rename fields in hex patricia trie (erigontech#11296) Small refactor to improve the readability of some fields in the `hex_patricia_hashed.go` . Co-authored-by: antonis19 <[email protected]> * add log * add log * add log * new dirWalk for test purposes (erigontech#11277) closes erigontech#10086 but it uses old lib (https://github.com/karrick/godirwalk). This branch could be used for tests for someone who experiences troubles with RAM with RemoveContents func. (For example for this guy https://discord.com/channels/687972960811745322/1233600171821240380) Maybe we should fork this lib :) thing for future milestone --------- Co-authored-by: JkLondon <[email protected]> Co-authored-by: alex.sharov <[email protected]> * chore: fix some comments (erigontech#11273) Signed-off-by: yingshanghuangqiao <[email protected]> * remove useless logs --------- Signed-off-by: stellrust <[email protected]> Signed-off-by: yingshanghuangqiao <[email protected]> Co-authored-by: Alex Sharov <[email protected]> Co-authored-by: dvovk <[email protected]> Co-authored-by: antonis19 <[email protected]> Co-authored-by: antonis19 <[email protected]> Co-authored-by: Giulio rebuffo <[email protected]> Co-authored-by: chuwt <[email protected]> Co-authored-by: sudeep <[email protected]> Co-authored-by: lystopad <[email protected]> Co-authored-by: Mark Holt <[email protected]> Co-authored-by: stellrust <[email protected]> Co-authored-by: Andrew Ashikhmin <[email protected]> Co-authored-by: awskii <[email protected]> Co-authored-by: Ilya Mikheev <[email protected]> Co-authored-by: JkLondon <[email protected]> Co-authored-by: Kewei <[email protected]> Co-authored-by: lupin012 <[email protected]> Co-authored-by: Somnath <[email protected]> Co-authored-by: milen <[email protected]> Co-authored-by: LEE <[email protected]> Co-authored-by: Michelangelo Riccobene <[email protected]> Co-authored-by: VBulikov <[email protected]> Co-authored-by: yperbasis <[email protected]> Co-authored-by: Manav Darji <[email protected]> Co-authored-by: yingshanghuangqiao <[email protected]>
System information
Erigon version:
./erigon --version
erigon version 2.60.3-1f73ed55
same behavior was in older versions
erigon version 2.59.x, 2.60.x
OS & Version: Windows/Linux/OSX
linux, docker image thorax/erigon:v2.60.3
erigon, prysm, rpcdaemon running in one pod in kubernetes as the only workload on the node (+ some system stuff)
Erigon Command (with flags/config):
RPCDaemon flags:
we have also tried to omit the db.read.concurrency and set GOMAXPROCS to many different values ... no change
Consensus Layer: prysm
Chain/Network: ethereum mainnet
HW Specs:
GCP n4-standard-16 = 16vCPUs, 64GB Ram
disk: 4TB, (hyperdisk balanced), 36000IOPS, 500MB/s
Actual behaviour
We are running some performance tests, calling
eth_getTransactionReceipt
with many different txids, it looks like rpcdaemon is doing something synchronously or in one thread only:to emphasize:
= 24RPS
if we increase concurrency of the test the performance goes very slighly up but responses starts slow down significantly:
= 34RPS
with more workers (10, 20, ...) it stays ~same 35
if we try to keep certain rate the API starts to slow down and eventually starts failing:
The daemon is running 22 threads:
CPU utilization goes from 5% (idle) to ~10-14% during tests
Disk
peak write iops ~15k not changed during the test
peak read iops ~1.5k(idle) -> 3k(during tests)
^ far far away from the limit of 36k (actually have tried to run additional workload (fio) during tests and was able to achieve ~70k peak iops)
By far it is not using all available hardware...
Expected behaviour
with increasing clients concurrency I would expect increasing performance approximately linearly until the HW is saturated - to utilize parallelization.
pprof during the
-max-workers 10
testsThe text was updated successfully, but these errors were encountered: