Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parallel reads to GetFullProof #239

Merged
merged 1 commit into from
May 24, 2021

Conversation

marcoabreu
Copy link
Contributor

@marcoabreu marcoabreu commented May 15, 2021

This PR adds parallel reads to retrieve the 64 values required in GetFullProof.

Previously, 64 sequential reads were executed. In the best case (5400rpm), this would take 64 * 6ms = 400ms just for the seeking alone. But as we know, you can expect a waaay higher access latency over 64 queue_depth=1 calls. Especially in cases where a NAS or similar non-local storages were used and thus the access-latency got higher, you could see read times over a minute.

This will now ensure that in the rare case somebody would be eligible for a reward, that they are actually able to claim it instead of it silently timing out due to latency issues.

I'm aware that this approach might spawn quite a few threads and create a bunch of filehandles - but they are quite short lived. In future, this could be optimized with an explicit thread- and filehandle pool. For now, this should improve things a lot already.

Supported by: Chia-Network/chia-blockchain#5170

@marcoabreu marcoabreu force-pushed the marcoabreu/parallel-proof branch from e756f9a to 6cd64c4 Compare May 15, 2021 04:55
@coding-horror
Copy link

Wow cool -- this is great!

@hoffmang9
Copy link
Member

This is potentially a nice upgrade but it needs to be kept in perspective. It would only assist when you have more than one likely winning proof at the exact same signage point. Even on large farms - that is rare.

@no2chem
Copy link

no2chem commented May 15, 2021

@marcoabreu Great work. I actually implemented the exact same thing and was looking to submit it. Looks like you beat me to it. However, in my patch, I did the work to switch the reads to pread(), which I think is better than opening multiple handles. I also performed the load of the entire read of ReadLinePoint in a single shot (you can determine the size of the buffer needed at the beginning of the call).

@hoffmang9 --- I think this is actually quite critical on network media with high tail latencies or frequent multi-second latency spikes.

In my interesting NAS setup where latencies average 30-50ms with 2s spikes, my changes seem to make the tests complete in 1/3 of the time - testing on a k18 plot:

        PlotAndTestProofOfSpace("cpp-test-plot.dat", 100, 18, plot_id_1, 11, 95, 4000, 2);
   Success: 95/100 95% (verification took 96.7014s)

versus

   Success: 95/100 95% (verification took 293.075s)

(I added the timing). For reference, it takes 0.84s on my macbook NVMe ssd, and adding async actually makes things slightly slower (1.1s).

For me at least, it seemed to eliminate spikes resulting in harvester warning messages.
My guess is that if a random read got stuck in a latency spike, async at least allows traversing other parts of the tree in parallel so the whole search doesn't get stuck.

@marcoabreu
Copy link
Contributor Author

marcoabreu commented May 15, 2021

Hi @no2chem , that sounds even better! I have to admit that I'm quite new to C++, hence I tried to just use some naiive methods :)

In my tests, the time to execute a single full proof (qualities + 64 full proof values) went from 60 seconds on average down to 3 seconds - with my storage backend.

I'll have a look to see if I can implement your suggestion, thanks a lot!

@hoffmang9 this is not only useful in the scenario when you have multiple proofs (I agree, it's very unlikely). It rather is already useful for a single proof as you reduce the waiting time to effectively 7xReadLatency as opposed to 64xReadLatency. That's already quite useful for environments where the average read latency is not that good, but this excels when you have inconsistent read latencies like I have in my setup.

@no2chem
Copy link

no2chem commented May 15, 2021

@marcoabreu I've made a branch here https://github.com/no2chem/chiapos/tree/fastRead - feel free to take a look.

@fiveangle
Copy link

fiveangle commented May 15, 2021

This is potentially a nice upgrade but it needs to be kept in perspective. It would only assist when you have more than one likely winning proof at the exact same signage point. Even on large farms - that is rare.

Gene: Is it that simple ? In the case of high-latency/high-bandwidth storage solutions like NAS, archive-optimized (aka wide-stripe) RAID, ZFS, etc, or in my case, what turned out to be a flakey SAS expander that I was unaware of, there seems plenty of pitfalls for farms to fall in unbeknownst to them. With my intermittent hardware for example, every IO latency went up dramatically, but transporting plots via netcat/tar (high-queue depth) seemed entirely unaffected. Only until human eyes casually inspected the INFO level logs with a keen eye (prior to the >5s warning) was the fully win-blocking issue observed, triggering my investigation.

There are countless everyday issues that can occur with storage that increase the latency, so any improvement that can eliminate extraneous software/hardware issues should probably try to be implemented. A farmer should win if they 1) have put the effort into creating the plots, and 2) has the best proof online at the time and 3) returns it to the timelord before the SP expiration. The "enterprise levelness" of their storage should not dictate their chances of winning. If there's any opportunity to level the playing field for all farmers, we should probably try to implement it. It just seems like the right thing to do 🤷‍♂️

Aside: my average eligible proofs is 3.4 and according to Bram in your recent AMA, that is a "small farmer" 😉

@fiveangle
Copy link

fiveangle commented May 15, 2021

My guess is that if a random read got stuck in a latency spike, async at least allows traversing other parts of the tree in parallel so the whole search doesn't get stuck.

This is exactly what happened to me for over a week and a half with estimated time-to-win at 18hrs, while nothing stood out as a problem... then only after not seeing any wins for so long, found out a SAS expander was going flakey. I suspect this improvement would go very far toward helping mitigate these types of real-world latency-spiking issues. Thanks for your work on this for the benefit of everyone !

@hoffmang9
Copy link
Member

All - don't worry - we're very likely to take it once we have had a chance to consider any cross platform or other issues. I'm a touch shocked that such little reading over the network is this bad and wish to point out that this is partially why we point at and will be expanding support for - remote harvesters. Do seeks on the backplane and network on the network...

Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the simplicity of this patch, and agree that keeping it simple for now is a good approach.

I agree that future improvements would be using fewer file handles (technically, a single handle would be sufficient, but would require pread() and overlapped I/O on windows).

my understanding is that GCC and clang implements std::async as spawning a new thread for each invocation, whereas windows use its thread pool. I wouldn't expect spawning a thread to be expensive compared to the disk I/O (but could be on very fast disks).

Which operating system and compiler/standard library did you test this with?
Do you have before and after timings?

std::vector<Bits> left = GetInputs(disk_file, xy.second, depth - 1); // y
std::vector<Bits> right = GetInputs(disk_file, xy.first, depth - 1); // x
auto left_fut=std::async(std::launch::async, &DiskProver::GetInputs,this, (uint64_t)xy.second, (uint8_t)(depth - 1));
auto right_fut=std::async(std::launch::async, &DiskProver::GetInputs,this, (uint64_t)xy.first, (uint8_t)(depth - 1));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a good reason to cast the depth argument to a uint8_t?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler is very very picky about the data types during templating. If you don't cast it, it will throw an error that it can not find a matching template as it will consider the calculation an int - it does not do any implicit conversions and I didn't want to change the argument types.

@marcoabreu
Copy link
Contributor Author

Thanks everyone for your feedback! I have tested this under Ubuntu20 as well as Mac OS. I will provide the exact versions and timings when I'm back at my computer.

@marcoabreu
Copy link
Contributor Author

marcoabreu commented May 15, 2021

I measured the impact through chia plots check -n 5 by running it on my target setup on Ubuntu20 compiled with gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0. I tested my development using the test_k_21 python test.

Before:

2021-05-14T22:12:11.656  chia.plotting.check_plots        : INFO     Testing plot /....../....plot k=32
2021-05-14T22:12:11.658  chia.plotting.check_plots        : INFO     	Pool public key: x
2021-05-14T22:12:11.667  chia.plotting.check_plots        : INFO     	Farmer public key: x
2021-05-14T22:12:11.667  chia.plotting.check_plots        : INFO     	Local sk: <PrivateKey x>
2021-05-14T22:13:12.005  chia.plotting.check_plots        : INFO     	Proofs 6 / 5, 1.2

Ranged from 60 up to 120 seconds. Since these are 5 full proofs, it's 12 to 24 seconds per full proof.

After:

2021-05-15T04:15:01.172  chia.plotting.check_plots        : INFO     Testing plot /....../....plot k=32
2021-05-15T04:15:01.173  chia.plotting.check_plots        : INFO        Pool public key: xxx
2021-05-15T04:15:01.188  chia.plotting.check_plots        : INFO        Farmer public key: xxx
2021-05-15T04:15:01.188  chia.plotting.check_plots        : INFO        Local sk: <PrivateKey xxx>
2021-05-15T04:15:26.213  chia.plotting.check_plots        : INFO        Proofs 6 / 5, 1.2
2021-05-15T04:15:26.213  chia.plotting.check_plots        : INFO     Testing plot /....../....plot k=32
2021-05-15T04:15:26.213  chia.plotting.check_plots        : INFO        Pool public key: x
2021-05-15T04:15:26.223  chia.plotting.check_plots        : INFO        Farmer public key: x
2021-05-15T04:15:26.223  chia.plotting.check_plots        : INFO        Local sk: <PrivateKey x>
2021-05-15T04:15:41.566  chia.plotting.check_plots        : INFO        Proofs 4 / 5, 0.8

Ranged from 12 up to 30 seconds. Since these are 5 full proofs, it's 2.4 to 6 seconds per full proof.

I also added a benchmark to the live-system with the following results:

2021-05-16T02:57:17.521 harvester chia.harvester.harvester: INFO     Executing full proof benchmark to simulate whether your storage is able to handle a full proof in the appropriate time.
2021-05-16T02:57:19.657 harvester chia.harvester.harvester: INFO     Benchmarking a full proof took 2.1360013484954834 seconds.

2021-05-16T03:55:20.770 harvester chia.harvester.harvester: INFO     Executing full proof benchmark to simulate whether your storage is able to handle a full proof in the appropriate time.
2021-05-16T03:55:23.460 harvester chia.harvester.harvester: INFO     Benchmarking a full proof took 2.689760446548462 seconds.

2021-05-16T10:42:18.742 harvester chia.harvester.harvester: INFO     Executing full proof benchmark to simulate whether your storage is able to handle a full proof in the appropriate time.
2021-05-16T10:42:21.077 harvester chia.harvester.harvester: INFO     Benchmarking a full proof took 2.334613084793091 seconds.

2021-05-16T08:46:10.967 harvester chia.harvester.harvester: INFO     Executing full proof benchmark to simulate whether your storage is able to handle a full proof in the appropriate time.
2021-05-16T08:46:13.558 harvester chia.harvester.harvester: INFO     Benchmarking a full proof took 2.5903327465057373 seconds.

Additionally I tested on my mac with Apple clang version 12.0.5 (clang-1205.0.22.9). I did not test on Windows.

@marcoabreu
Copy link
Contributor Author

As pread is a platform specific implementation and I'd like to keep this change simple and portable, I'd prefer if we could first go ahead with this change here and merge it as-is. Later, we could then add OS specific optimizations like pread for POSIX systems and appropriate methods for Windows. For now, I think this implementation should already bring a benefit that clearly outweighs the small overhead it is creating.

@no2chem
Copy link

no2chem commented May 16, 2021

At the risk of adding a distraction, the "right" way to do this is to use a cross platform async library like libuv, which has the proper cross platform and async abstractions

@hoffmang9
Copy link
Member

We are biased against adding dependencies as we the inherit their security risks - but that doesn't completely rule them out.

@marcoabreu
Copy link
Contributor Author

@hoffmang9 is there anything you'd like me to change or improve in the meantime or are we fine as long as the tests don't bring up anything?

Copy link
Contributor

@arvidn arvidn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this patch looks good to land.

It highlights three minor issues in the code today:

  1. GetInputs() is a member function, even though the only member it uses is k. It might as well take k instead of this.
  2. GetInputs() is not const qualified (suggesting it's not thread safe with regards to this), but it does not mutate the object state, so it should be const.
  3. SafeRead() and SafeSeek() are both static members, despite not needing access to any private members of the class. These should be made free functions.

I'll address these issues later (unless someone beats me to it)

@rodrigo081089
Copy link

Sorry for the stupid question. Can wee change the codeon the config or we should wait until an official update happens? Because I have the 30 seconds problem

@NPutting
Copy link

Is there an easy way to test this out with chia-blockchain? I'm able to compile it but am lacking the knowledge/documentation to figure out how to install it.

@arvidn
Copy link
Contributor

arvidn commented May 18, 2021

@NPutting in your python venv, run:

pip3 install .

@mariano54
Copy link
Contributor

I have wanted to add this for a long time, to help NAS users and remote network users. It's awesome that you got it working with such small changes!

@rodrigo081089
Copy link

How can we simple mortals use it?

@NPutting
Copy link

How can we simple mortals use it?

If you install the latest 1.1.6 version of chia-network then you can also clone this branch and do
pip3 install . inside of it.

@rodrigo081089
Copy link

Sorry to continue asking, if im using windows can i use it?

@marcoabreu
Copy link
Contributor Author

@hoffmang9 how do we want to move forward here?

@hoffmang9
Copy link
Member

This will likely get in to 1.1.7

Copy link
Contributor

@wjblanke wjblanke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aok

@wjblanke wjblanke merged commit 3feab20 into Chia-Network:main May 24, 2021
gallexme added a commit to gallexme/chiapos that referenced this pull request May 30, 2021
@long-pham
Copy link

Does anyone know if it's safe to run chiapos==1.0.3 with chia-blockchain==1.1.7?

@fiveangle
Copy link

devs are doing it on testnet according to their commits, but i personally couldn't get chiapos 1.0.3 to build on my system so good luck !

@reb0rn21
Copy link

When harvester search for proof it open/close the plot file xx times on each eligibility case... for cloud service this is very bad, can we have option to set config to plot could and option to just once open file and close it at end of search?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.