Add parallel reads to GetFullProof #239

marcoabreu · 2021-05-15T04:44:59Z

This PR adds parallel reads to retrieve the 64 values required in GetFullProof.

Previously, 64 sequential reads were executed. In the best case (5400rpm), this would take 64 * 6ms = 400ms just for the seeking alone. But as we know, you can expect a waaay higher access latency over 64 queue_depth=1 calls. Especially in cases where a NAS or similar non-local storages were used and thus the access-latency got higher, you could see read times over a minute.

This will now ensure that in the rare case somebody would be eligible for a reward, that they are actually able to claim it instead of it silently timing out due to latency issues.

I'm aware that this approach might spawn quite a few threads and create a bunch of filehandles - but they are quite short lived. In future, this could be optimized with an explicit thread- and filehandle pool. For now, this should improve things a lot already.

Supported by: Chia-Network/chia-blockchain#5170

coding-horror · 2021-05-15T05:31:30Z

Wow cool -- this is great!

hoffmang9 · 2021-05-15T05:39:58Z

This is potentially a nice upgrade but it needs to be kept in perspective. It would only assist when you have more than one likely winning proof at the exact same signage point. Even on large farms - that is rare.

no2chem · 2021-05-15T06:55:31Z

@marcoabreu Great work. I actually implemented the exact same thing and was looking to submit it. Looks like you beat me to it. However, in my patch, I did the work to switch the reads to pread(), which I think is better than opening multiple handles. I also performed the load of the entire read of ReadLinePoint in a single shot (you can determine the size of the buffer needed at the beginning of the call).

@hoffmang9 --- I think this is actually quite critical on network media with high tail latencies or frequent multi-second latency spikes.

In my interesting NAS setup where latencies average 30-50ms with 2s spikes, my changes seem to make the tests complete in 1/3 of the time - testing on a k18 plot:

        PlotAndTestProofOfSpace("cpp-test-plot.dat", 100, 18, plot_id_1, 11, 95, 4000, 2);

   Success: 95/100 95% (verification took 96.7014s)

versus

   Success: 95/100 95% (verification took 293.075s)

(I added the timing). For reference, it takes 0.84s on my macbook NVMe ssd, and adding async actually makes things slightly slower (1.1s).

For me at least, it seemed to eliminate spikes resulting in harvester warning messages.
My guess is that if a random read got stuck in a latency spike, async at least allows traversing other parts of the tree in parallel so the whole search doesn't get stuck.

marcoabreu · 2021-05-15T10:07:20Z

Hi @no2chem , that sounds even better! I have to admit that I'm quite new to C++, hence I tried to just use some naiive methods :)

In my tests, the time to execute a single full proof (qualities + 64 full proof values) went from 60 seconds on average down to 3 seconds - with my storage backend.

I'll have a look to see if I can implement your suggestion, thanks a lot!

@hoffmang9 this is not only useful in the scenario when you have multiple proofs (I agree, it's very unlikely). It rather is already useful for a single proof as you reduce the waiting time to effectively 7xReadLatency as opposed to 64xReadLatency. That's already quite useful for environments where the average read latency is not that good, but this excels when you have inconsistent read latencies like I have in my setup.

no2chem · 2021-05-15T11:24:35Z

@marcoabreu I've made a branch here https://github.com/no2chem/chiapos/tree/fastRead - feel free to take a look.

fiveangle · 2021-05-15T21:33:10Z

This is potentially a nice upgrade but it needs to be kept in perspective. It would only assist when you have more than one likely winning proof at the exact same signage point. Even on large farms - that is rare.

Gene: Is it that simple ? In the case of high-latency/high-bandwidth storage solutions like NAS, archive-optimized (aka wide-stripe) RAID, ZFS, etc, or in my case, what turned out to be a flakey SAS expander that I was unaware of, there seems plenty of pitfalls for farms to fall in unbeknownst to them. With my intermittent hardware for example, every IO latency went up dramatically, but transporting plots via netcat/tar (high-queue depth) seemed entirely unaffected. Only until human eyes casually inspected the INFO level logs with a keen eye (prior to the >5s warning) was the fully win-blocking issue observed, triggering my investigation.

There are countless everyday issues that can occur with storage that increase the latency, so any improvement that can eliminate extraneous software/hardware issues should probably try to be implemented. A farmer should win if they 1) have put the effort into creating the plots, and 2) has the best proof online at the time and 3) returns it to the timelord before the SP expiration. The "enterprise levelness" of their storage should not dictate their chances of winning. If there's any opportunity to level the playing field for all farmers, we should probably try to implement it. It just seems like the right thing to do 🤷‍♂️

Aside: my average eligible proofs is 3.4 and according to Bram in your recent AMA, that is a "small farmer" 😉

fiveangle · 2021-05-15T21:37:35Z

My guess is that if a random read got stuck in a latency spike, async at least allows traversing other parts of the tree in parallel so the whole search doesn't get stuck.

This is exactly what happened to me for over a week and a half with estimated time-to-win at 18hrs, while nothing stood out as a problem... then only after not seeing any wins for so long, found out a SAS expander was going flakey. I suspect this improvement would go very far toward helping mitigate these types of real-world latency-spiking issues. Thanks for your work on this for the benefit of everyone !

hoffmang9 · 2021-05-15T22:40:11Z

All - don't worry - we're very likely to take it once we have had a chance to consider any cross platform or other issues. I'm a touch shocked that such little reading over the network is this bad and wish to point out that this is partially why we point at and will be expanding support for - remote harvesters. Do seeks on the backplane and network on the network...

arvidn

I appreciate the simplicity of this patch, and agree that keeping it simple for now is a good approach.

I agree that future improvements would be using fewer file handles (technically, a single handle would be sufficient, but would require pread() and overlapped I/O on windows).

my understanding is that GCC and clang implements std::async as spawning a new thread for each invocation, whereas windows use its thread pool. I wouldn't expect spawning a thread to be expensive compared to the disk I/O (but could be on very fast disks).

Which operating system and compiler/standard library did you test this with?
Do you have before and after timings?

arvidn · 2021-05-15T22:46:19Z

src/prover_disk.hpp

-            std::vector<Bits> left = GetInputs(disk_file, xy.second, depth - 1);  // y
-            std::vector<Bits> right = GetInputs(disk_file, xy.first, depth - 1);  // x
+            auto left_fut=std::async(std::launch::async, &DiskProver::GetInputs,this, (uint64_t)xy.second, (uint8_t)(depth - 1));
+            auto right_fut=std::async(std::launch::async, &DiskProver::GetInputs,this, (uint64_t)xy.first, (uint8_t)(depth - 1));


is there a good reason to cast the depth argument to a uint8_t?

The compiler is very very picky about the data types during templating. If you don't cast it, it will throw an error that it can not find a matching template as it will consider the calculation an int - it does not do any implicit conversions and I didn't want to change the argument types.

marcoabreu · 2021-05-15T23:19:03Z

Thanks everyone for your feedback! I have tested this under Ubuntu20 as well as Mac OS. I will provide the exact versions and timings when I'm back at my computer.

marcoabreu · 2021-05-15T23:26:51Z

I measured the impact through chia plots check -n 5 by running it on my target setup on Ubuntu20 compiled with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0. I tested my development using the test_k_21 python test.

Before:

2021-05-14T22:12:11.656  chia.plotting.check_plots        : INFO     Testing plot /....../....plot k=32
2021-05-14T22:12:11.658  chia.plotting.check_plots        : INFO     	Pool public key: x
2021-05-14T22:12:11.667  chia.plotting.check_plots        : INFO     	Farmer public key: x
2021-05-14T22:12:11.667  chia.plotting.check_plots        : INFO     	Local sk: <PrivateKey x>
2021-05-14T22:13:12.005  chia.plotting.check_plots        : INFO     	Proofs 6 / 5, 1.2

Ranged from 60 up to 120 seconds. Since these are 5 full proofs, it's 12 to 24 seconds per full proof.

After:

2021-05-15T04:15:01.172  chia.plotting.check_plots        : INFO     Testing plot /....../....plot k=32
2021-05-15T04:15:01.173  chia.plotting.check_plots        : INFO        Pool public key: xxx
2021-05-15T04:15:01.188  chia.plotting.check_plots        : INFO        Farmer public key: xxx
2021-05-15T04:15:01.188  chia.plotting.check_plots        : INFO        Local sk: <PrivateKey xxx>
2021-05-15T04:15:26.213  chia.plotting.check_plots        : INFO        Proofs 6 / 5, 1.2
2021-05-15T04:15:26.213  chia.plotting.check_plots        : INFO     Testing plot /....../....plot k=32
2021-05-15T04:15:26.213  chia.plotting.check_plots        : INFO        Pool public key: x
2021-05-15T04:15:26.223  chia.plotting.check_plots        : INFO        Farmer public key: x
2021-05-15T04:15:26.223  chia.plotting.check_plots        : INFO        Local sk: <PrivateKey x>
2021-05-15T04:15:41.566  chia.plotting.check_plots        : INFO        Proofs 4 / 5, 0.8

Ranged from 12 up to 30 seconds. Since these are 5 full proofs, it's 2.4 to 6 seconds per full proof.

I also added a benchmark to the live-system with the following results:

2021-05-16T02:57:17.521 harvester chia.harvester.harvester: INFO     Executing full proof benchmark to simulate whether your storage is able to handle a full proof in the appropriate time.
2021-05-16T02:57:19.657 harvester chia.harvester.harvester: INFO     Benchmarking a full proof took 2.1360013484954834 seconds.

2021-05-16T03:55:20.770 harvester chia.harvester.harvester: INFO     Executing full proof benchmark to simulate whether your storage is able to handle a full proof in the appropriate time.
2021-05-16T03:55:23.460 harvester chia.harvester.harvester: INFO     Benchmarking a full proof took 2.689760446548462 seconds.

2021-05-16T10:42:18.742 harvester chia.harvester.harvester: INFO     Executing full proof benchmark to simulate whether your storage is able to handle a full proof in the appropriate time.
2021-05-16T10:42:21.077 harvester chia.harvester.harvester: INFO     Benchmarking a full proof took 2.334613084793091 seconds.

2021-05-16T08:46:10.967 harvester chia.harvester.harvester: INFO     Executing full proof benchmark to simulate whether your storage is able to handle a full proof in the appropriate time.
2021-05-16T08:46:13.558 harvester chia.harvester.harvester: INFO     Benchmarking a full proof took 2.5903327465057373 seconds.

Additionally I tested on my mac with Apple clang version 12.0.5 (clang-1205.0.22.9). I did not test on Windows.

marcoabreu · 2021-05-16T00:04:02Z

As pread is a platform specific implementation and I'd like to keep this change simple and portable, I'd prefer if we could first go ahead with this change here and merge it as-is. Later, we could then add OS specific optimizations like pread for POSIX systems and appropriate methods for Windows. For now, I think this implementation should already bring a benefit that clearly outweighs the small overhead it is creating.

no2chem · 2021-05-16T00:33:23Z

At the risk of adding a distraction, the "right" way to do this is to use a cross platform async library like libuv, which has the proper cross platform and async abstractions

hoffmang9 · 2021-05-16T00:38:20Z

We are biased against adding dependencies as we the inherit their security risks - but that doesn't completely rule them out.

marcoabreu · 2021-05-16T00:45:06Z

@hoffmang9 is there anything you'd like me to change or improve in the meantime or are we fine as long as the tests don't bring up anything?

arvidn

I think this patch looks good to land.

It highlights three minor issues in the code today:

GetInputs() is a member function, even though the only member it uses is k. It might as well take k instead of this.
GetInputs() is not const qualified (suggesting it's not thread safe with regards to this), but it does not mutate the object state, so it should be const.
SafeRead() and SafeSeek() are both static members, despite not needing access to any private members of the class. These should be made free functions.

I'll address these issues later (unless someone beats me to it)

rodrigo081089 · 2021-05-16T17:24:13Z

Sorry for the stupid question. Can wee change the codeon the config or we should wait until an official update happens? Because I have the 30 seconds problem

NPutting · 2021-05-18T03:30:57Z

Is there an easy way to test this out with chia-blockchain? I'm able to compile it but am lacking the knowledge/documentation to figure out how to install it.

arvidn · 2021-05-18T05:50:49Z

@NPutting in your python venv, run:

pip3 install .

mariano54 · 2021-05-21T10:54:10Z

I have wanted to add this for a long time, to help NAS users and remote network users. It's awesome that you got it working with such small changes!

rodrigo081089 · 2021-05-21T11:53:06Z

How can we simple mortals use it?

NPutting · 2021-05-21T13:38:08Z

How can we simple mortals use it?

If you install the latest 1.1.6 version of chia-network then you can also clone this branch and do
pip3 install . inside of it.

rodrigo081089 · 2021-05-21T14:56:10Z

Sorry to continue asking, if im using windows can i use it?

marcoabreu · 2021-05-21T17:55:32Z

@hoffmang9 how do we want to move forward here?

hoffmang9 · 2021-05-21T17:57:25Z

This will likely get in to 1.1.7

wjblanke

aok

long-pham · 2021-06-18T23:44:23Z

Does anyone know if it's safe to run chiapos==1.0.3 with chia-blockchain==1.1.7?

fiveangle · 2021-06-19T00:44:51Z

devs are doing it on testnet according to their commits, but i personally couldn't get chiapos 1.0.3 to build on my system so good luck !

reb0rn21 · 2021-07-11T18:55:08Z

When harvester search for proof it open/close the plot file xx times on each eligibility case... for cloud service this is very bad, can we have option to set config to plot could and option to just once open file and close it at end of search?

Add parallel reads to GetFullProof

6cd64c4

marcoabreu force-pushed the marcoabreu/parallel-proof branch from e756f9a to 6cd64c4 Compare May 15, 2021 04:55

arvidn reviewed May 15, 2021

View reviewed changes

marcoabreu mentioned this pull request May 16, 2021

Add benchmark and time logging for full proof retrieval Chia-Network/chia-blockchain#5170

Closed

arvidn approved these changes May 16, 2021

View reviewed changes

wjblanke approved these changes May 24, 2021

View reviewed changes

wjblanke merged commit 3feab20 into Chia-Network:main May 24, 2021

gallexme added a commit to gallexme/chiapos that referenced this pull request May 30, 2021

Add parallel reads to GetFullProof (Chia-Network#239)

b54fad3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add parallel reads to GetFullProof #239

Add parallel reads to GetFullProof #239

marcoabreu commented May 15, 2021 •

edited

Loading

coding-horror commented May 15, 2021

hoffmang9 commented May 15, 2021

no2chem commented May 15, 2021 •

edited

Loading

marcoabreu commented May 15, 2021 •

edited

Loading

no2chem commented May 15, 2021

fiveangle commented May 15, 2021 •

edited

Loading

fiveangle commented May 15, 2021 •

edited

Loading

hoffmang9 commented May 15, 2021

arvidn left a comment

arvidn May 15, 2021

marcoabreu May 15, 2021

marcoabreu commented May 15, 2021

marcoabreu commented May 15, 2021 •

edited

Loading

marcoabreu commented May 16, 2021

no2chem commented May 16, 2021

hoffmang9 commented May 16, 2021

marcoabreu commented May 16, 2021

arvidn left a comment

rodrigo081089 commented May 16, 2021

NPutting commented May 18, 2021

arvidn commented May 18, 2021

mariano54 commented May 21, 2021

rodrigo081089 commented May 21, 2021

NPutting commented May 21, 2021

rodrigo081089 commented May 21, 2021

marcoabreu commented May 21, 2021

hoffmang9 commented May 21, 2021

wjblanke left a comment

long-pham commented Jun 18, 2021

fiveangle commented Jun 19, 2021

reb0rn21 commented Jul 11, 2021

Add parallel reads to GetFullProof #239

Add parallel reads to GetFullProof #239

Conversation

marcoabreu commented May 15, 2021 • edited Loading

coding-horror commented May 15, 2021

hoffmang9 commented May 15, 2021

no2chem commented May 15, 2021 • edited Loading

marcoabreu commented May 15, 2021 • edited Loading

no2chem commented May 15, 2021

fiveangle commented May 15, 2021 • edited Loading

fiveangle commented May 15, 2021 • edited Loading

hoffmang9 commented May 15, 2021

arvidn left a comment

Choose a reason for hiding this comment

arvidn May 15, 2021

Choose a reason for hiding this comment

marcoabreu May 15, 2021

Choose a reason for hiding this comment

marcoabreu commented May 15, 2021

marcoabreu commented May 15, 2021 • edited Loading

marcoabreu commented May 16, 2021

no2chem commented May 16, 2021

hoffmang9 commented May 16, 2021

marcoabreu commented May 16, 2021

arvidn left a comment

Choose a reason for hiding this comment

rodrigo081089 commented May 16, 2021

NPutting commented May 18, 2021

arvidn commented May 18, 2021

mariano54 commented May 21, 2021

rodrigo081089 commented May 21, 2021

NPutting commented May 21, 2021

rodrigo081089 commented May 21, 2021

marcoabreu commented May 21, 2021

hoffmang9 commented May 21, 2021

wjblanke left a comment

Choose a reason for hiding this comment

long-pham commented Jun 18, 2021

fiveangle commented Jun 19, 2021

reb0rn21 commented Jul 11, 2021

marcoabreu commented May 15, 2021 •

edited

Loading

no2chem commented May 15, 2021 •

edited

Loading

marcoabreu commented May 15, 2021 •

edited

Loading

fiveangle commented May 15, 2021 •

edited

Loading

fiveangle commented May 15, 2021 •

edited

Loading

marcoabreu commented May 15, 2021 •

edited

Loading