-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Archive node running 1.8.6-unstable from source syncing 3-5x slower relative to 1.7.11-stable #7489
Comments
Thanks for your report. cc @andresilva I noticed something similar with the latest master 1.9.0-unstable versus 1.8.5 which does not include #7348 - will do some benchmarks today. |
It could be the initial compaction from the updated RocksDB settings which takes a bit longer. We should check if this also happens on a fresh db. If indeed it is the compaction then performance should go back to normal if you let it run long enough. |
Is there some way I can export the raw block RLPs from parity and import them back into a clean database? I'd like to get an empirical test going between the two versions, but it's a bit difficult to get it consistent due to the fact that the two clean DBs might not necessarily have the same peer set and block availability characteristics. |
|
Thanks @andresilva!. Already noticed something interesting while exporting, the block export in 1.8.6 is S L O W! The block DB is RocksDB as well I presume? Check this out (this is using the DB originally created with 1.7.11 as well):
I didn't even bother going past 10k, because doing the same with with 1.7.11:
Something's definitely fishy here! I'd understand if it slower if it were writing to rocks as it has to use a different strategy to avoid space amp, but reading as well? I'm wondering if this performance hit is actually a read bottleneck! Anyway, will be back with import logs as soon as those run! |
@iostat I'm not sure if the compaction is triggered when you open RocksDB (even if you're just reading). Could you let it export completely and then do the import into a fresh database and compare results? This was actually one of the benchmarks I used when tuning RocksDB (block import and snapshot restore). |
I should probably clarify that in the original post by compaction I really meant "cleaning up the 200+ GB wasted space due to amp." I guess garbage collection would be more appropriate, I'll update the OP accordingly. As far as full export/restore -- I'm only synced up to ~3800000, and I know that even with 1.7.11 it took several days to clear the state bloat attacks with my hardware. Is there a more limited test we can do to get a viable result? I'm trying first 1M blocks at the moment. |
Yes, by compaction I mean RocksDB's garbage collection mechanism. For my benchmarks I used the first 2M blocks, as far as time taken to import I didn't see a lot of variability between the different versions (assuming a fresh database). |
Well, usually the whole node crashes once disk space limit is hit, so it's possible the GC I observed is part of crash recovery or something :/. Running the clean import on 1M I'm actually seeing significantly increased performance in 1.8.6! https://gist.github.com/iostat/c655f3cc7127f5ed9811d7c724436786 I took a closer look at my original issue while running htop and iostat, and I noticed I always have one of my cores maxed out with 1.8.6, but nowhere near the same amount of IOPS saturation (3-400 w/ 1.8.6 as opposed to ~2500 with 1.7.11). I'm assuming that's RocksDB's doing, maybe it's slowly chugging its compaction/optimization along? Perhaps I should let it crawl for a day or two and see if performance recovers? |
Yes, this is what I observed on my benchmarks too, a marginal performance increase (I didn't see that big a difference though). If you could let it keep running it would be great, so we can check if eventually performance goes back to normal (assuming it's caused by RocksDB compaction) and how long it takes to do it (I wasn't able to test this migration on archive nodes). |
Closing. The obvious workaround would be killing the DB. It looks like resyncing an archive node with 1.8.6 would only take a couple of days. I'm already at block 4.2 million after 24 hours on HDD with my fresh (non-archive) no-warp node. |
Hi!
I was initially attempting to sync a full archive node with v1.7.11-stable, and was bitten by with the space amplification issues described in #6280. My node would regularly run out of disk space, and I would have to make a little bit of room so that I can restart Parity wherein RocksDB would run garbage collection on startup and free up 200 or so GB allowing the sync to continue. To this end I actually had created a 1GB dummy file that I could delete when the node inevitably crashed to give it just a bit of breathing room to survive a restart and GC, and then recreated the 1GB dummy file for when RocksDB ate up all the disk space again.
Having noticed that greener pastures may lie ahead in #7348 (and subsequently backported to 1.8 in #7438), I went ahead and compiled what was the latest
HEAD
(364bf48
) of thebeta
branch last night, usingcargo build --release --verbose
. Firing up the 1.8.6-unstable binary against my partially synced (around block 3700000) archive node's storage directory (because who wants to relive blocks 2200000-2800000?), it picked it up just fine and continued syncing from where it left off. I immediately noticed it was only hovering around the 1 blk/s mark instead of the usual 3-4, but figured the shiny new RocksDB is compacting stuff behind the scenes and left it to its devices for a few hours. Coming back to it, I noticed absolutely no space amplification (awesome work on that!), but unfortunately saw it's still creeping along at around 1 blk/s and tx/s always within in the high double digits. Switching back to 1.7.11 (again, same working directory), it picked up where the 1.8.6 binary left off but had a significantly peppier 3-4 blk/s and tx/s almost consistently in the triple digits.Some sample logs:
v1.8.6-unstable
v1.7.11-stable:
I recognize that this may simply be due to the fact that the DBs were created with v1.7.11 RocksDB tuning parameters, but I'm inclined to believe that since v1.8.6 doesn't hit space-amplification issues that that's not entirely what's at play here. Would love to see if anyone else is experiencing such behavior or if it's just me, and I'd be glad to help prod this further any way I can!
The text was updated successfully, but these errors were encountered: