-
Notifications
You must be signed in to change notification settings - Fork 20.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Snapshot acceleration seems to have no impact on eth_call performance in both ethspam | versus
and my application
#21772
Comments
Some (perhaps too many) updates: There is an error log printed out towards the bottom here that might be of interest (upon restarting the node for the first time after an unbroken snapshot building process), but here is my best effort at a somewhat comprehensive info-dump. I managed to re-generate the entire snapshot, this time without ever interrupting the generation by
Once the snapshot was generated, without stopping/restarting the process I did the following and got the listed output:
This is well below what I've seen the node push without the snapshot; the highest I've seen using
Throughput is far worse with no concurrency, though we can see latency is improved even further (expected):
Here is the output of
I suppose it is worth noting that For posterity, I tried restarting the node, which creating the following output during shutdown:
The ERROR log does jump out to me, but I'm not sure what it means. Starting
Output once it found a peer and caught up:
Resulting
So performance increased a bit (~50%) and I was able to reproduce this result. There was also a slight gain in the throughput of my application of about 15%. Both of these metrics represent a return to virtually the same throughput I was able to get before building the snapshot. Some additional notes:
Let me know if there is any more diagnostic visibility I might be able to provide here; getting these numbers up is a functional requirement for me right now :-) I'm going to do my best to keep looking into it, but I'm obviously not an expert on the internals of |
So, I think we need to dive into what this means. But essentially, if it queries about accounts / slots that have been modified in the last 127 blocks or so, geth is almost guaranteed to have the data in memory, regardless of whether So the performance increase, in that case, would be:
If the queries were instead focused on 'random' data, which has not been touched recently, the differences between these two approaches would be a lot larger, in absolute times. As it is, I think the differences are relatively minor in comparison with other stuff. |
(shorter) Update 2: I did an
My laptop has weaker specs than my desktop so a performance hit is to be expected. However, if the acceleration were actually "doing the thing," we should probably see some increased throughput (perhaps at least the same) regardless. By my estimate, this number is more reflective of the performance I would expect without acceleration. Here is are my laptop's spec, for reference: Just for the sake of it, I tried messing around with some of the cache allocations (though I'll be the first to admit I don't entirely understand exactly what each one means). Here's me giving almost the entire cache to the snapshot:
Wow, surprising result!! Performance was 100% identical. Given the almost non-existence of the "vanilla" caches here, I had expected some godawful performance. Things to consider:
I think that about covers it. If anyone would like, I'd be willing to scp/ftp/whatever the chain data directory to you for your own perusal. For now, I think I'm going to try experimenting with Turbo-Geth and see if it can crank out some improved performance 🤞 |
I was just digging into ethspam a bit. These seem to be the queries: https://github.com/shazow/ethspam/blob/master/queries.go . These would not be affected, as it's not state-data but block data:
These should be improved by
These depends on func (s *liveState) RandomAddress() string {
if len(s.transactions) == 0 {
return ""
}
idx := int(s.randSrc.Int63()) % len(s.transactions)
return s.transactions[idx].From.String()
} As far as I understand it, it selects an account from a list of transactions, and the list of transactions is re-filled from the So my theory of why you don't experience any orders-of-magnitude performance is:
It definitely would be interesting to see how these numbers compare with other clients, like turbo-geth, but I would also be very interested to see the stats broken down in query-types, so we could better pinpoint which query types are slowest, or have highest variance etc. Anyway, thanks for all the work so far! |
@holiman just seeing your comment now. Indeed my application is accessing recent data rather than random data, not sure about the specifics of I think given the identical performance of Edit: looks like you are way ahead of me, thanks for the insight. Once I have the chance I'll see if I can't set up some (ideally rather more share-able) scripts focusing on the specific cases you listed. |
I seem to have a knack for posting right before you do, so you don't see it until later :) |
😂 Every time. |
I forked
or
By using this mode, one can more easily benchmark individual querytypes. |
Here are some results running against a node which doesn't have State only
Blockdata only
GetLogs only
Block data without getLogs
So apparently |
And if I run it in
Which indicates that even though my state accesse are at |
So, I made some measurements on the machine (
I'm now generating the snapshot on it, will do another run in a few days. |
It might be also a good idea to add a reference speed test. I opted to use #!/bin/bash
spam=/home/user/go/src/github.com/shazow/ethspam/ethspam
v="--concurrency=5 --stop-after=10000 http://localhost:8545"
yes "{\"jsonrpc\":\"2.0\",\"id\":1,\"method\":\"web3_clientVersion\",\"params\":[]}" | versus $v > version.txt
echo "Reference speed:" && cat version.txt | grep Requests
$spam -m=eth_getCode:100 -m=eth_getLogs:0 -m=eth_getTransactionByHash:0 -m=eth_blockNumber:0 -m=eth_getTransactionCount:100 -m=eth_getBlockByNumber:0 -m=eth_getBalance:100 -m=eth_getTransactionReceipt:0 -m=eth_call:100 | versus $v > all.txt
echo "State (all) speed:" && cat all.txt | grep Requests
$spam -m=eth_getCode:100 -m=eth_getLogs:0 -m=eth_getTransactionByHash:0 -m=eth_blockNumber:0 -m=eth_getTransactionCount:0 -m=eth_getBlockByNumber:0 -m=eth_getBalance:0 -m=eth_getTransactionReceipt:0 -m=eth_call:0 | versus $v > code.txt
echo "State (code) speed:" && cat code.txt | grep Requests
$spam -m=eth_getCode:0 -m=eth_getLogs:0 -m=eth_getTransactionByHash:0 -m=eth_blockNumber:0 -m=eth_getTransactionCount:100 -m=eth_getBlockByNumber:0 -m=eth_getBalance:0 -m=eth_getTransactionReceipt:0 -m=eth_call:0 | versus $v > nonce.txt
echo "State(nonce) speed:" && cat nonce.txt | grep Requests
$spam -m=eth_getCode:0 -m=eth_getLogs:0 -m=eth_getTransactionByHash:0 -m=eth_blockNumber:0 -m=eth_getTransactionCount:0 -m=eth_getBlockByNumber:0 -m=eth_getBalance:100 -m=eth_getTransactionReceipt:0 -m=eth_call:0 | versus $v > balance.txt
echo "State (balance) speed:" && cat balance.txt | grep Requests
$spam -m=eth_getCode:0 -m=eth_getLogs:0 -m=eth_getTransactionByHash:0 -m=eth_blockNumber:0 -m=eth_getTransactionCount:0 -m=eth_getBlockByNumber:0 -m=eth_getBalance:0 -m=eth_getTransactionReceipt:0 -m=eth_call:100| versus $v > call.txt
echo "State (call) speed:" && cat call.txt | grep Requests So if I plug that into my earlier numbers:
I find that the requests I made were basically instantaneous, even on the non- Could you please try the bash script above? |
Finally got around to try your script out. Will update this with numbers once my laptop is synced up again. Appreciate your work here, thank you! |
These are my numbers on my laptop:
Which is, of course, wildly interesting. I'll have to write up a script to run the particular calls I'm interested in and see how that last number changes. The similarity between the second ( Edit: I ran it a few more times and the numbers between
|
Yeah, I don't know what to make of those numbers, tbh. Like, All in all, I'm starting to wonder if it isn't other factors having a large contribution to the speeds here. Like, the RPC endpoing may be a lot slower if it's actively processing a block, or reorganizing transactions. If it's importing a block, it uses several threads to derive transaction signatures. If you are doing this against a remote node, are you tunneling over ssh? Using wifi? |
This isn't really an issue report about a bug -- we don't have any reason to believe the snapshots are broken, based on the discussions here, it's a more nuanced question about what types of data is requested. The speeds varies a lot for different types, some are accelerated, some not. Some are cached, some not. |
System information
Geth version: 1.9.23-stable-8c2f2715
OS & Version: Ubuntu 20.04.1 LTS
Commit hash : (see above)
Expected behaviour
After Geth creates the snapshot, performance of
eth_call
should increase substantially per the blog post and others online. This should be reflected by increased performance usingethspam
andversus
.Actual behaviour
The snapshot seems to have no impact on performance.
ethspam | versus --stop-after=10000 --concurrency=30
is still showing roughly 200 calls/s, as is my application. This is certainly within the margin of error of performance prior to the snapshot. In theory this means that, statistically speaking, something is probably not right.Steps to reproduce the behaviour
Run
geth
as usual without--snapshot
and perform aethspam | versus --stop-after=10000 --concurrency=30
test, and note the requests-per-second performance.Run
geth
with the following flags:geth --http --cache 5315 --datadir /some/path/here --snapshot
eta
of about 3 hours from that point.Edit: can confirm that, through a possible screwup on my part, the snapshot wipe deleted about
106_695_498
accountsRestart Geth for good measure using
ctrl + c
and wait for it to gracefully exit; then run the same command line to start it back up. Give it time to build up internal caches and such.Run
ethspam | versus --stop-after=10000 --concurrency=30
again.Observe that the performance has not changed; as if the snapshot either doesn't exist, doesn't work, or isn't getting used.
For reference, my hardware:
Intel Xeon E3-1200 v6 (4 cores, 4.2 ghz) official link
2 Nvidia 1080 Ti GPU's
16 GB RAM
1 TB NVMe drive
It's possible that I have something misconfigured or that I'm misunderstanding something, but this seems like at least a usability problem at best. Help would be appreciated! Thank you.
Edit 2: It may also be worth noting that my application is executing only
view
(non-pure) functions which themselves are heavy on state readsThe text was updated successfully, but these errors were encountered: