Ethstore optimizations #6827

arkpar · 2017-10-19T16:48:52Z

tomusdrw

LGTM, minor nitpicks

tomusdrw · 2017-10-19T18:32:03Z

ethstore/src/dir/disk.rs

+
+	fn last_modification_date(&self) -> Result<u64, Error> {
+		use std::time::UNIX_EPOCH;
+		let duration = fs::metadata(&self.path)?.modified()?.duration_since(UNIX_EPOCH).unwrap();


tomusdrw · 2017-10-19T18:32:53Z

ethstore/src/ethstore.rs

-			return Ok(())
+		let mut last_timestamp = self.timestamp.lock();
+		let now = Instant::now();
+		if (now - last_timestamp.last_checked) > Duration::from_secs(REFRESH_TIME_SEC) {


I would rather store last_timestamp.valid_till to avoid subtraction (and possible overflow in case the time is adjusted by the system):

if last_timestamp.valid_till < now { }

Instant is monotonic. It is not affected by clock changes and is guaranteed to be never decreased.

Fair enough.

tomusdrw · 2017-10-19T18:34:02Z

ethstore/src/ethstore.rs

-			.cloned()
-			.ok_or(Error::InvalidAccount)
+		let cache = self.cache.read();
+		let mut r = cache.range((::std::collections::Bound::Included(*address), ::std::collections::Bound::Included(*address)));


use ::std::collections::Bound at the top of the function?

5chdn · 2017-10-19T18:40:22Z

Is this related to #6456 and #6787?

arkpar · 2017-10-19T19:50:02Z

Right, closes #6456, #6787

AdvancedStyle · 2017-10-19T23:44:30Z

This fixes the issue #6787 and is now finding/keeping peers.
However parity is using a massive amount of disk read IO...like 100% disk read IO all the time (using latest master branch + this commit)

AdvancedStyle · 2017-10-20T04:35:35Z

ethstore/src/ethstore.rs

+		let mut last_timestamp = self.timestamp.lock();
+		let now = Instant::now();
+		if (now - last_timestamp.last_checked) > Duration::from_secs(REFRESH_TIME_SEC) {
+			let dir_hash = Some(self.dir.unique_repr()?);


This appears to be creating a disk IO read problem. Because every 5 seconds you end up looping and hashing all files in the keystore...if this is a large number of files (in my case 150,000)...in the end causes total server crash.

Is there a better way to detect changes in the keystore dir? Possibly only detect a change when new key is added via personal_newAccount... calls. Or simple file count?

unique_repr is changed to return directory' last modification date in this PR. So no more hashing here

@AdvancedStyle are you saying your server crashes with the changes from this PR?

Yes, doing more testing and will provide more information shortly.

Basically once I compile/run latest master with this PR it fixes the peer connections, but then crashes the server (with noticeably huge amount of disk reading).

But i'll do a bit more testing and see if I can gather some actually error messages from the server crash

Ok, i've been using iostat / iotop / tracefile (https://gitlab.com/ole.tange/tangetools/tree/master/tracefile) to montor disk IO and here are the results:

With using latest master the iostat util is quite low (less than 5%) however the tracefiles shows that parity is constantly accessing the keystore files...but it is not causing high disk util and not causing server crash..only causing peers to drop/timeout and syncing to stop.

After PR #6827 the iostat util goes through the roof (50-100% disk util all the time):

iostat output sample:
avg-cpu: %user %nice %system %iowait %steal %idle 23.14 0.00 20.48 13.56 0.27 42.55
Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 0.00 5339.00 0.00 49176.00 0.00 18.42 0.75 0.14 0.14 0.00 0.09 49.60

avg-cpu: %user %nice %system %iowait %steal %idle
20.21 0.00 17.32 14.70 0.26 47.51

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 17.00 5192.00 3.00 46776.00 80.00 18.04 0.74 0.14 0.14 2.67 0.11 55.60

avg-cpu: %user %nice %system %iowait %steal %idle
25.07 0.00 15.04 10.55 0.53 48.81

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 28.00 4083.00 2.00 38940.00 120.00 19.12 0.60 0.15 0.15 2.00 0.10 42.40

avg-cpu: %user %nice %system %iowait %steal %idle
22.13 0.00 18.67 15.73 0.27 43.20

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 10.00 3744.00 4.00 32000.00 56.00 17.11 0.83 0.22 0.21 1.00 0.17 64.00

avg-cpu: %user %nice %system %iowait %steal %idle
24.42 0.00 15.32 32.21 0.26 27.79

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
vda 0.00 13.00 2987.00 4.00 28780.00 68.00 19.29 2.07 0.70 0.70 1.00 0.27 80.80

But the strange thing is the tracefile (strace script) is not really seeing a lot of file access on the process (parity is no longer accessing the keystore files individuals)... it only appears to be accessing the keystore dir...so why all the disk IO?:

Sample of tracefile output:

/home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484066.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484067.log /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484068.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484066.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484068.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484069.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484069.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484066.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484064.sst /home/ethereum/.ethereum/keystore/ethereum /home/ethereum/.ethereum/keystore/ethereum /home/ethereum/.ethereum/keystore/ethereum /home/ethereum/.ethereum/keystore/ethereum /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484070.log /sys/devices/system/cpu/online /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484071.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484071.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484072.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484072.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484071.sst /home/ethereum/.local/share/io.parity.ethereum/chains/ethereum/db/906a34e69aec8c0d/overlayrecent/db/484068.sst /home/ethereum/.ethereum/keystore/ethereum /home/ethereum/.ethereum/keystore/ethereum

VM then hangs / crashes after a few minutes with hung_task_timeout_secs

INFO: task kworker/[...] blocked for more than 120 seconds. Not tainted [...] kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

And a bunch of similar hung tasks.

Conclusion:
For some reason the modified time check on the keystore dir might be creating a lot of IO read?

I mean at first was testing with master (no IO read problem / no crash).

Then i compiled with this PR #6827 (git fetch origin pull/6827/head:sometest && git checkout sometest and compiled)

The IO read problem only occurred after compiling with PR #6827

The IO must be just the database IO. Does the machine have HDD or SSD? How exactly does it crash? Is there a backtrace printed?

SSD. Server running on a KVM virtual machine (Debian host, Ubuntu 16 guest).

The entire virtual machine just locks up; SSH non-responsive and virtmanager console non-responsive, then it dumps a series of error messages in the virtmanager console similar to:

INFO: task kworker/[...] blocked for more than 120 seconds.
Not tainted [...]
kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.

Note, i'm running parity with params:

screen -d -m parity --cache-size 4096 --allow-ips public --jsonrpc-apis web3,eth,net,parity,traces,rpc,personal --geth --keys-path /home/ethereum/.ethereum/keystore

I've decided to clone the VM and try to delete/resync the chain from 0 to see if that helps, and so far it seems to be doing ok (no IO read problems so far)...will post back shortly

I have resynced the chain from 0 using PR #6827 and now disk IO is normal again and all is working.

So everything is fine now (fingers crossed).

The issue of slow key generation (issue #6839 is also fixed)

arkpar added A0-pleasereview 🤓 Pull request needs code review. B0-patch M4-core ⛓ Core client code / Rust. labels Oct 19, 2017

arkpar requested review from svyatonik, tomusdrw and NikVolf October 19, 2017 16:48

tomusdrw approved these changes Oct 19, 2017

View reviewed changes

Ethstore optimizations

52e2adc

arkpar force-pushed the ethstore-optimize branch from 8d8a11a to 52e2adc Compare October 19, 2017 19:57

svyatonik approved these changes Oct 20, 2017

View reviewed changes

AdvancedStyle reviewed Oct 20, 2017

View reviewed changes

5chdn mentioned this pull request Oct 20, 2017

Align README with 1.8 and prepare CHANGELOG with 1.8.1 #6833

Merged

Office-Julia mentioned this pull request Oct 20, 2017

personal_newAccount very slow #6839

Closed

svyatonik added A8-looksgood 🦄 Pull request is reviewed well. and removed A0-pleasereview 🤓 Pull request needs code review. labels Oct 20, 2017

arkpar merged commit 097815c into master Oct 20, 2017

arkpar deleted the ethstore-optimize branch October 20, 2017 18:20

arkpar added a commit that referenced this pull request Oct 20, 2017

Ethstore optimizations (#6827)

2ec8716

arkpar added a commit that referenced this pull request Oct 20, 2017

Ethstore optimizations (#6827) (#6844)

d470773

AdvancedStyle mentioned this pull request Oct 21, 2017

Constant Disk IO #6847

Closed

lght pushed a commit to lght/parity that referenced this pull request Oct 25, 2017

Ethstore optimizations (openethereum#6827)

bfe793f

tomusdrw mentioned this pull request Nov 1, 2017

Search and backup account key file programmatically #6941

Closed

5chdn added this to the 1.8 milestone Dec 8, 2017

tomusdrw pushed a commit that referenced this pull request Dec 20, 2017

Ethstore optimizations (#6827) (#6844)

f53ee5f

5chdn pushed a commit that referenced this pull request Dec 27, 2017

Ethstore optimizations (#6827) (#6844) (#7347)

cfa1970

folsen mentioned this pull request May 19, 2018

execution time of eth.account call is increasing exponentially with numbers of managed account #6170

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ethstore optimizations #6827

Ethstore optimizations #6827

arkpar commented Oct 19, 2017 •

edited

Loading

tomusdrw left a comment

tomusdrw Oct 19, 2017

tomusdrw Oct 19, 2017

arkpar Oct 19, 2017

tomusdrw Oct 19, 2017

tomusdrw Oct 19, 2017

5chdn commented Oct 19, 2017

arkpar commented Oct 19, 2017

AdvancedStyle commented Oct 19, 2017

AdvancedStyle Oct 20, 2017

svyatonik Oct 20, 2017

5chdn Oct 20, 2017

AdvancedStyle Oct 20, 2017 •

edited

Loading

AdvancedStyle Oct 20, 2017 •

edited

Loading

AdvancedStyle Oct 20, 2017 •

edited

Loading

arkpar Oct 20, 2017

AdvancedStyle Oct 20, 2017 •

edited

Loading

AdvancedStyle Oct 20, 2017

AdvancedStyle Oct 20, 2017

Ethstore optimizations #6827

Ethstore optimizations #6827

Conversation

arkpar commented Oct 19, 2017 • edited Loading

tomusdrw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

5chdn commented Oct 19, 2017

arkpar commented Oct 19, 2017

AdvancedStyle commented Oct 19, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdvancedStyle Oct 20, 2017 • edited Loading

Choose a reason for hiding this comment

AdvancedStyle Oct 20, 2017 • edited Loading

Choose a reason for hiding this comment

AdvancedStyle Oct 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AdvancedStyle Oct 20, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

arkpar commented Oct 19, 2017 •

edited

Loading

AdvancedStyle Oct 20, 2017 •

edited

Loading

AdvancedStyle Oct 20, 2017 •

edited

Loading

AdvancedStyle Oct 20, 2017 •

edited

Loading

AdvancedStyle Oct 20, 2017 •

edited

Loading