Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Parity warp sync is no longer very warpy. #6372

Closed
MicahZoltu opened this issue Aug 24, 2017 · 80 comments
Closed

Parity warp sync is no longer very warpy. #6372

MicahZoltu opened this issue Aug 24, 2017 · 80 comments
Labels
F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. F7-footprint 🐾 An enhancement to provide a smaller (system load, memory, network or disk) footprint. M4-core ⛓ Core client code / Rust. P0-dropeverything 🌋 Everyone should address the issue now. Q7-involved 💪 Can be fixed by a team of developers and probably takes some time.
Milestone

Comments

@MicahZoltu
Copy link
Contributor

MicahZoltu commented Aug 24, 2017

I'm running:

  • Parity version: 1.7.0
  • Operating system: Windows
  • And installed: via installer

I just installed Parity onto a brand new computer with a fresh install of Windows 10 Pro. The computer has Hyper-V enabled and Docker, but otherwise is a stock Windows 10 Pro computer of reasonable power (Dell XPS 9560). It is connected via wireless ab/n 5GHz to a nearby router and is able to easily max out my internet connection (100Mb/s down, 12 Mb/s up).

It launched after install about 12 hours ago and at first warp sync was moving quickly. However, at about 91.46% warp restore (pretty shortly into the initial sync) the restore progress froze at 91.46% and best block started counting up from 0. About 12 hours later it is only up to best block 3,594,000.

On previous versions of parity launching Parity with no chain data and no extra parameters (other than the default ui) would result in a full restore in about an hour on a less powerful computer.

Over the course of this time, it has written 2 TB to disk (presumably mostly overwrites since I only have a 1TB disk) and it has read almost nothing off of disk (42MB). It has received ~7GB over the network and only sent about 300MB.

It seems there are several issues floating around about Parity's footprint (one of them even by me) and I apologize if this should have been a comment on one of those, but none of them expressed the same symptoms so I wasn't sure.

Some cumulative numbers across the lifetime of the process:
image
image

@MicahZoltu
Copy link
Contributor Author

Upon further inspection, it appears that the initial launch of Parity post-install is ui --warp --mode=passive (this differs from the start menu shortcut that is created which is just ui).

@MicahZoltu
Copy link
Contributor Author

As an update, I just restarted my computer (for unrelated reasons) and Parity crashed on startup the first time, then on re-launch it started the Warp Restore over again from 0% and my best block says 3,831,654 (where it was at time of restart).

Perhaps this should be a separate issue (let me know if so) but it would seem that restarting in the middle of a warp restore causes problems.

@5chdn
Copy link
Contributor

5chdn commented Aug 25, 2017

Please share some logs. It's hard to tell why the warp sync is stuck at a certain state / block.

@5chdn 5chdn added F7-footprint 🐾 An enhancement to provide a smaller (system load, memory, network or disk) footprint. M4-core ⛓ Core client code / Rust. Z0-unconfirmed 🤔 Issue might be valid, but it’s not yet known. labels Aug 25, 2017
@MicahZoltu
Copy link
Contributor Author

It has recovered at this point, is it possible to see a log history or are they pruned over time? If not, the repro steps are:

  1. Buy new computer.
  2. Re-install OS without manufacturer bloatware.
  3. Install Chrome, Docker, Parity.
  4. Let it sit warp syncing for a bit.
  5. --> Notice that it gets stuck.

If I end up resetting my chain and reproducing this myself I'll try to enable the log capture during sync.

@5chdn 5chdn removed the Z0-unconfirmed 🤔 Issue might be valid, but it’s not yet known. label Aug 26, 2017
@5chdn
Copy link
Contributor

5chdn commented Aug 26, 2017

Actually, I can reproduce this on different setups. Sometimes it feels random. https://gist.github.com/5chdn/683b905aa410de0232690fd9ddaf32fb

The current state grew to around 1.3 GB and we have 6.3 million unique addresses. https://etherscan.io/chart/address

Not sure what the future will bring for warp sync, but it's quite probable that end-users will switch to light mode by default on most machines.

@5chdn
Copy link
Contributor

5chdn commented Aug 26, 2017

#6371 same issue, logs look similar to mine.

@luckymf
Copy link

luckymf commented Aug 26, 2017

Yes, that's my issue #6371. Is there any way to restore parity account in other ethereum-based wallet?

@luckymf
Copy link

luckymf commented Aug 26, 2017

Just noticed that operating mode is passive. Should it be that way?
[27.08.2017, 02:15:13] Syncing snapshot 2/555 #2423899 20/25 peers 7 MiB chain 103 MiB db 0 bytes queue 10 KiB sync RPC: 1 conn, 8 req/s, 121 µs
[27.08.2017, 02:15:03] Syncing snapshot 2/555 #2423899 19/25 peers 6 MiB chain 103 MiB db 0 bytes queue 10 KiB sync RPC: 1 conn, 7 req/s, 119 µs
[27.08.2017, 02:14:53] Syncing snapshot 2/555 #2423899 19/25 peers 5 MiB chain 103 MiB db 0 bytes queue 10 KiB sync RPC: 1 conn, 6 req/s, 118 µs
[27.08.2017, 02:14:43] Syncing snapshot 2/555 #2423899 19/25 peers 6 MiB chain 103 MiB db 0 bytes queue 10 KiB sync RPC: 1 conn, 4 req/s, 383 µs
[27.08.2017, 02:14:33] Syncing snapshot 1/555 #2423899 16/25 peers 7 MiB chain 103 MiB db 0 bytes queue 10 KiB sync RPC: 1 conn, 2 req/s, 327 µs
[27.08.2017, 02:14:23] Syncing snapshot 1/555 #2423899 17/25 peers 5 MiB chain 103 MiB db 0 bytes queue 10 KiB sync RPC: 1 conn, 3 req/s, 186 µs
[27.08.2017, 02:14:13] Syncing snapshot 0/555 #2423899 15/25 peers 5 MiB chain 103 MiB db 0 bytes queue 10 KiB sync RPC: 1 conn, 3 req/s, 116 µs
[27.08.2017, 02:14:03] Syncing snapshot 0/555 #2423899 17/25 peers 4 MiB chain 103 MiB db 0 bytes queue 10 KiB sync RPC: 1 conn, 3 req/s, 183 µs
[27.08.2017, 02:13:53] Syncing snapshot 0/555 #2423899 15/25 peers 7 MiB chain 103 MiB db 0 bytes queue 10 KiB sync RPC: 1 conn, 3 req/s, 283 µs
[27.08.2017, 02:13:43] Syncing snapshot 0/555 #2423899 9/25 peers 4 MiB chain 103 MiB db 0 bytes queue 10 KiB sync RPC: 1 conn, 3 req/s, 6323 µs
[27.08.2017, 02:13:34] Public node URL: enode://df8e086c142326702294d9d7fc152155f16044ed1fcb1924d6b3066223747292dc240af4bb60dac08be32149921ef4f8641519ee77ea321ee3fa66e21cfd1ff5@127.0.0.1:30303
[27.08.2017, 02:13:29] Updated conversion rate to Ξ1 = US$303.92 (391707040 wei/gas)
[27.08.2017, 02:13:28] Configured for Foundation using Ethash engine
[27.08.2017, 02:13:28] Operating mode: passive
[27.08.2017, 02:13:28] State DB configuration: fast
[27.08.2017, 02:13:28] Path to dapps C:\Users\LUCKY\AppData\Roaming\Parity\Ethereum\dapps
[27.08.2017, 02:13:28] DB path C:\Users\LUCKY\AppData\Local\Parity\Ethereum\chains\ethereum\db\906a34e69aec8c0d
[27.08.2017, 02:13:28] Keys path C:\Users\LUCKY\AppData\Roaming\Parity\Ethereum\keys\Foundation
[27.08.2017, 02:13:28] Starting Parity/v1.7.0-beta-5f2

@5chdn 5chdn added P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. labels Aug 28, 2017
@5chdn 5chdn added P0-dropeverything 🌋 Everyone should address the issue now. and removed P2-asap 🌊 No need to stop dead in your tracks, however issue should be addressed as soon as possible. labels Sep 4, 2017
@5chdn
Copy link
Contributor

5chdn commented Sep 12, 2017

@5chdn
Copy link
Contributor

5chdn commented Sep 20, 2017

@arkpar I was able to reproduce this on my office laptop today and logged the full trace of sync, network, and snapshot: https://5chdn.co/dump/g7xny97o/8i4h/warp-sync-fail.log.tar.bz2

It's around 65 minutes of logs, warp sync was stuck at 32% and fetching blocks in background.

@arkpar
Copy link
Collaborator

arkpar commented Sep 20, 2017

@5chdn could repeat the test with the sync-fix branch?

@5chdn
Copy link
Contributor

5chdn commented Sep 20, 2017

@arkpar compiled and running. So far it looks good. However, I had to reset my configuration this morning to unregister some tokens. I wasn't able to fix the warp sync until I removed everything in chains/ethereum/*. A db kill command wasn't sufficient. Maybe some old nodes.json? Or some messed up user_defaults?

@5chdn
Copy link
Contributor

5chdn commented Sep 20, 2017

Here is another user with this issue: https://www.screencast.com/t/bBcU5oYXKm

For some reasons it tries to fetch 700+ snapshots (and eventually fails) while a normal warp sync should only fetch ~360.

@arkpar
Copy link
Collaborator

arkpar commented Sep 21, 2017

There are multiple problems here:

  1. When all peers with currently syncing snapshot disconnect sync pauses snapshot sync and falls back to full mode until a snapshot peer appears again. The log still reports snapshot sync as ongoing though. This is fixed in Sync progress and error handling fixes #6560

  2. Parity does not preference peers with snapshots, leading to cases when all slots are taken by peers without a snapshot in the middle of an initial sync. This is usually not a problem when syncing after a fresh install because it is likely that the node will be connected to bootnodes that provide snapshots. But this might happen when snapshot sync is starting with an already populated node table, as seen in the log above.

@5chdn
Copy link
Contributor

5chdn commented Sep 21, 2017

Re: 2) can't we force a default number of "snapshot peers"?

@5chdn 5chdn added the Q7-involved 💪 Can be fixed by a team of developers and probably takes some time. label Sep 27, 2017
@5chdn 5chdn added this to the 1.8 milestone Oct 5, 2017
@kot-begemot
Copy link

Did you delete my relay and "summarized" it in yours? O_o Seriously?

@5chdn
Copy link
Contributor

5chdn commented Jan 10, 2018

Did you delete my relay and "summarized" it in yours? O_o Seriously?

Could you rephrase that question?

@kot-begemot
Copy link

kot-begemot commented Jan 10, 2018

My apologies. I think I overworked. I checked in the wrong issue. Feel free to delete my latest posts post in this thread

@lloy0076
Copy link

lloy0076 commented Jan 21, 2018

How is it that this is closed?

C:\Users\david_000>parity --version
Parity
  version Parity/v1.8.6-beta-2d051e4-20180109/x86_64-windows-msvc/rustc1.22.1
Copyright 2015, 2016, 2017 Parity Technologies (UK) Ltd
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

By Wood/Paronyan/Kotewicz/Drwięga/Volf
   Habermeier/Czaban/Greeff/Gotchac/Redmann

Sometimes it will sync snapshots, sometimes it will not.

It appears to be completely random and Parity under my invocation(s) doesn't appear to get enough information to figure out why it is doing what it is doing. Given sufficient time, I think I could base a cryptographically secure 2-headed coin toss based on the question, "Will it try to sync via warp on this invocation after 5 minutes?".

HOW would I have Parity state things like:

  • Today I am syncing block by block because I cannot find a peer to warp sync with
  • Today I am warp syncing
  • I cannot warp sync today even though I know there are snapshots available because something happened yesterday and I can't, investigate the logs at...

The tool's complete lack of transparency as to why it can't sync is entirely frustrating.

@5chdn
Copy link
Contributor

5chdn commented Jan 22, 2018

How is it that this is closed?

Warp-sync is hardly salvageable. I'm sorry.

The key issue is #7436 that prevents most nodes from generating snapshots, but even we could fix this temporarily, it will be only an intermediary duct-tape solution until the state is too big again.

With 1.8.6 I can recommend resetting the DB and trying a full sync, and for the long-term (1.10+) will stabilize the light client experience for different use-cases.

@nodermann2
Copy link

Ubuntu 16.04, Parity 1.8.11

Step 1 - is important
$ parity db kill --chain=foundation
Step 2
$ parity --warp --no-ancient-blocks --no-serve-light --max-peers 250 --snapshot-peers 50 --min-peers 50 --mode active --tracing off --pruning fast --db-compaction ssd --cache-size 4096
Result: ~40 min 5173638 blocks

@ravensorb
Copy link

So is this a "not to fix"? If so, does that mean that parity is being retired -- this is a show stopper for anyone looking to setup a new node. I have been trying for almost 4 days and have resulted to now try the "no warp" sync which looks like it may take another 4 days. If so, that is going to be pretty much useless in terms of bringing new nodes online.

@codewiz
Copy link

codewiz commented Mar 26, 2018

With warp-sync no longer working and the light client not ready yet, Parity is completely unusable for me and, I guess, for anyone who can't afford to sync the full blockchain.

The suggested workaround of killing the db does the trick, but has to be redone every time I open Parity so it's also unfeasible.

If warp mode can't be fixed, can it at least any mention to it be removed from end-user documentation, so people don't waste their time trying it?

@5chdn
Copy link
Contributor

5chdn commented Mar 26, 2018

@ravensorb The issues have been identified and split into sub-tasks, you can find an overview here: https://wiki.parity.io/Known-Issues-Priorities

@codewiz Warp-sync is not broken per se, and it works very well for any other chain than Ethereum. The current state size for Ethereum is a serious problem, and this is not a Parity issue in first place.

@Tbaut
Copy link
Contributor

Tbaut commented Mar 26, 2018

@ravensorb we will definitely attempt to fix it, this is a duplicate. See the list of known issues, this is the 1.3
Anyone looking to setup a new node should be either interested in having the whole DB locally to query it heavily, the need for a full node is implicit. These user will have to wait for a full sync, this is clear. Anyone looking for a wallet and sign transaction now and then will be ok with a light node synced in a couple of seconds, minutes.

@codewiz Did you try the light client ? If you can't afford to sync the chain, then you can't afford a full node and --light is what you are looking for.

@melnikaite
Copy link

I tried light client 1.9.5-stable. Unfortunately it took 5 hours to get synced. Database size is just 2.5 Mb, but downloading speed is extremely slow.

@ravensorb
Copy link

@5chdn @Tbaut thank you for the update -- it is much appreciated and I am glad to see it is on the list of items to be fixed. While it is being worked on is there any options for getting a full DB on a new node up and working?

@ravensorb
Copy link

Also if it helps, I think the warp sync issue also occurs when parity maxes out the IO of the storage system (its not just low memory)

@5chdn
Copy link
Contributor

5chdn commented Mar 26, 2018

@melnikaite 5 hours is incredibly fast for verifying 5.3 million block headers :)

@ravensorb to get a full parity DB synchronized, just leave your client running overnight. ideally, on a machine with SSD.

@melnikaite
Copy link

@5chdn Is it possible to disable verifying?

@5chdn
Copy link
Contributor

5chdn commented Mar 26, 2018

@melnikaite yes #8075 - in 1.11:

After this change, the light client synchronizes in 10 to 15 seconds even at the first launch on the main network.

@ravensorb
Copy link

@5chdn I was afraid of that :) I am now going on 23 hours using the nightly docker image (pulled just before I started the sync) and it is stuck at 90.90%

I'll give it another few hours and than if it doesn't make any progress, I'll kill it, clear the db, and try again.

@5chdn
Copy link
Contributor

5chdn commented Mar 26, 2018

No need to kill the DB, just leave it running, otherwise it will start from scratch again

@ravensorb
Copy link

@5chdn I noticed :) It seems to take about 2 hours to get to block 4880040 and then it drops to syncing a block every 2 seconds. If my math is right, that means I am looking at almost 10 days to complete the last 446k blocks. Does that sound correct??

@5chdn
Copy link
Contributor

5chdn commented Mar 27, 2018

Looks about right

@ravensorb
Copy link

@5chdn all I can say is WOW. If your curious, in the past 48 hours its only progressed 1%. Are there any options to speed this up?

If it helps, here is the command I am using to launch the docker container

docker run -ti --name parity-nightly --restart unless-stopped --net=bridge -v /opt/parity/nightly:/opt/parity -p 8180:8180 -p 8545:8545 -p 8546:8546 -p 30303:30303 -p 30303:30303/udp parity/parity:nightly --ui-interface all --jsonrpc-interface all --jsonrpc-apis all --jsonrpc-hosts all --ui-hosts=all --ws-interface all --ui-no-validation --config
/opt/parity/config.toml --no-ancient-blocks --no-serve-light --max-peers 250 --snapshot-peers 50 --min-peers 50 --mode active --pruning fast --cache-size 4096

@5chdn
Copy link
Contributor

5chdn commented Mar 29, 2018

Increase the --cache-size as much as possible and get a decent SSD :)

@TomaszWaszczyk
Copy link

@5chdn What --cache-size do you prefer?

@5chdn
Copy link
Contributor

5chdn commented Apr 16, 2018

My favorite cache size is 12288 :)

@gotnull
Copy link

gotnull commented May 25, 2018

My favorite cache size is 31337

@pro-respirator
Copy link

this problem is clearly still occurring, but I've noticed that when it drops out of warp mode into normal syncing, I usually get the error:
"encountered error during state restoration. chunk size is too large"

@melnikaite
Copy link

I hope this log will be useful for investigating this issue https://pastebin.com/YN8sUs2t

@btaro001234567
Copy link

Hay!!! I suffered an accident. Out of hospital

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
F3-annoyance 💩 The client behaves within expectations, however this “expected behaviour” itself is at issue. F7-footprint 🐾 An enhancement to provide a smaller (system load, memory, network or disk) footprint. M4-core ⛓ Core client code / Rust. P0-dropeverything 🌋 Everyone should address the issue now. Q7-involved 💪 Can be fixed by a team of developers and probably takes some time.
Projects
None yet
Development

No branches or pull requests