Nothing tells you when you're timing out on 30 second harvester proof checks #3188

coding-horror · 2021-04-25T01:44:54Z

coding-horror
Apr 25, 2021

The problem

I thought I was farming, but I wasn't -- because something about my network caused the proof check to take more than the hard-coded 30 second limit.

I had an average time to win of 8 or 9 hours for more than 120 hours without a single win. This seemed statistically implausible, so I researched the logs, and cleared any errors or warnings in the logs (well done, all the warnings and errors in debug.log were indeed things I should fix!). Still no wins for a long time.

How to reproduce

Have a bunch of plots on slow storage media; when the proof check happens, verifying the proofs takes longer than the hard-coded 30 seconds allowed. You will never win a single Chia, but there's absolutely nothing in the GUI to inform you that this is happening. You can view the logs, but in the logs it is not even presented as a warning (!), but as an INFO message:

2021-04-24T16:03:29.039 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.74567 s. Total 3922 plots
2021-04-24T16:03:29.433 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.92976 s. Total 3922 plots
2021-04-24T16:03:29.635 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.01540 s. Total 3922 plots
2021-04-24T16:03:44.553 harvester chia.harvester.harvester: INFO     4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.11716 s. Total 3922 plots
2021-04-24T16:04:10.882 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 24.51660 s. Total 3922 plots
2021-04-24T16:04:36.101 harvester chia.harvester.harvester: INFO     9 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 37.96922 s. Total 3922 plots
2021-04-24T16:05:13.959 harvester chia.harvester.harvester: INFO     11 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 63.07326 s. Total 3922 plots
2021-04-24T16:05:13.959 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 50.81979 s. Total 3922 plots
2021-04-24T16:05:31.022 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 54.92095 s. Total 3922 plots
2021-04-24T16:05:31.022 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 42.29693 s. Total 3922 plots
2021-04-24T16:05:31.022 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 29.67779 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 18.59361 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 6.10934 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     10 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.17564 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     13 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.82808 s. Total 3922 plots
2021-04-24T16:05:35.600 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.82815 s. Total 3922 plots
2021-04-24T16:05:35.600 harvester chia.harvester.harvester: INFO     12 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.56252 s. Total 3922 plots
2021-04-24T16:05:58.022 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.26593 s. Total 3922 plots
2021-04-24T16:06:11.887 harvester chia.harvester.harvester: INFO     4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.42098 s. Total 3922 plots
2021-04-24T16:06:36.417 harvester chia.harvester.harvester: INFO     3 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 24.36354 s. Total 3922 plots

Of the above, the proofs that take longer than 30 seconds are not eligible to win, but this is not logged as an ERROR or WARNING or surfaced in the UI in any way.

Expected behavior

The GUI will tell you "hey, your proof checks are too slow, there's absolutely no chance for you to win, even if you are farming infinity plots"

Screenshots

Desktop

OS: Windows 10
OS Version/Flavor: latest 19042.928
CPU: i7-1165G7

Additional context

I followed up on the #support channel in Keybase, where I got the important advice to enable INFO level logging and check for the 30 second proof limit.. and I wrote up a detailed account on the forum; if you need excruciating levels of detail, please check there 🙇‍♂️

Recommended solution

Allow the 30 second proof check threshold to be configurable to account for slower storage (or relax it a bit)
change the log level of beyond 30 second on harvester proof checks to WARN or ERROR
in the logs, provide details on which files are being checked for proofs, so you can identify which media or NAS is potentially the problem one slowing everything down

maomingyang1314 · 2021-04-25T02:50:34Z

maomingyang1314
Apr 25, 2021

Moreover, I would like to know why your laptop can mount nearly 400TB disks, is your disk on nas?

0 replies

maomingyang1314 · 2021-04-25T02:53:12Z

maomingyang1314
Apr 25, 2021

Your NAS caused your task to time out. It is recommended to abandon this method.

0 replies

AZ663 · 2021-04-25T03:46:22Z

AZ663
Apr 25, 2021

多少速度会影响呢

0 replies

mariano54 · 2021-04-25T07:46:59Z

mariano54
Apr 25, 2021

We can't change the consensus algorithm anymore, but we can change the log level, and show the files.

Please note that looking up qualities for plots passing the filter requires about 7 random reads in a plot, whereas actually looking up a proof requires 64 reads. It might not be feasible on a slow NAS, since these are sequential reads. Furthermore, you need to take into account network latency to propagate your proof and block to the network, so you should be under 5 seconds to reduce risk of losing rewards.

Actually, the proof of space library does them sequentially, but they could be done in parallel, since it's a tree, so you could do 1 read, then 2, then 4, .. etc, for a total of around 7 sequential phases (one for each table in the plot). We haven't got around to doing this yet.

3 replies

coding-horror May 5, 2021
Author

Thank you so much for the 5 second WARN! Next, can you please have INFO level log which specific files it checked for proofs, so we can potentially identify problematic / slow media? As people's farms grow, this will become a larger and larger problem over time.

j1mmyfever May 7, 2021

@mariano54

That sequential harvesting worries me down the road. I already see the occasional plot take 5 seconds to to return, and when I have 10x plots (around 2000), I'm worried I'll have issues.

Is it possible to scale out the harvesters on 1 box horizontally? In my particular case, I'm running 12 drives on a NAS, but each drive is an independent LUN over a shared 10Gbe connection. There is zero other processes running on the storage other than Chia activities.

So in theory, I'd like to run 12 Harvesters each dedicated to a disk on one system. Fallback plan would be to spin up 12 VMs/Containers and pass the disks through I suppose. Thanks.

mxfh Jun 4, 2021

There seems to be no correlation on challenge time on reasonably fast NAS vs plots on local HDD. It's all about number of matching plots per single HDD vs random access seek time bottleneck to me.
Even local plots are abysimally slow (5 minutes plus), if you happen to run check plots in parallel, which doesnt even warn about this. Ideally check plots should have lowest IO priority by default or even pause during challenge time windows.

mariano54 · 2021-04-25T08:50:18Z

mariano54
Apr 25, 2021

Also another thing to point out is that the responses are returned to the full node as they come out from the drives, so the high time is probably only affecting the slow drive or the slow NAS.

1 reply

coding-horror May 7, 2021
Author

yes; I isolated part of it by running multiple farmers, you can see several days' worth of data here: (scroll down for more days)

https://chiaforum.com/t/troubleshooting-failure-to-farm-with-lots-of-plots-due-to-harvester-30s-timeout/413/123

coding-horror · 2021-04-25T17:42:41Z

coding-horror
Apr 25, 2021
Author

Yeah, this only started happening as I added multiple NAS devices to the network. With 1 or 2 NASes, it was all fine. Once you get to 5.. not so much, especially if the algorithm picks plots on 6 different devices. At the very least

definitely change the log level to ERROR (or WARN)
surface this in the UI for sure, otherwise you're silently "farming".. absolutely nothing. A silent killer.
tell us WHICH files were picked for proofs so we can look for "problem" devices

thanks @mariano54 !

0 replies

immortalx · 2021-04-26T06:21:01Z

immortalx
Apr 26, 2021

Is there any benefit of using larger plot files in this case? Less files per directory in his case.

1 reply

coding-horror May 7, 2021
Author

the problem was definitely worse when I had 90tb of plot files in a single folder. not recommended.

keliew · 2021-04-26T09:22:38Z

keliew
Apr 26, 2021

I think the problem is inherent in how Windows manages mounted network drives.

If you'd to switch to a Linux-based farmer, it should work better. I know it's not an option for most people, but it'd be a good test to see if that's the cause.

1 reply

coding-horror May 7, 2021
Author

It's the NASes that were the problem, and they ARE linux based. But it's also a good idea to run the harvester on the NAS, which is possible in this case too, but beyond my skill level.

kimklai · 2021-04-26T10:33:06Z

kimklai
Apr 26, 2021

I think lots of people need to know their whole system just too slow to provide the a valid answer in time. So please mark the logs as WARNING as they takes longer a certain threshold.

1 reply

coding-horror May 5, 2021
Author

yep, fortunately the WARN is in, if the harvester proof check takes more than 5 seconds, as of 1.1.3 🙌

desek · 2021-04-26T12:44:08Z

desek
Apr 26, 2021

This was quite a rough finding since I've been happily farming on my RPi on wifi to a remote storage with proof checking usually between 60-90 seconds.

0 replies

tartley · 2021-04-26T16:49:51Z

tartley
Apr 26, 2021

Could some drives be powering down when idle?

1 reply

coding-horror May 5, 2021
Author

no; the NASes all default to leaving the drives on all the time out of the box, as a default ON setting.

bertybassett · 2021-04-27T14:53:04Z

bertybassett
Apr 27, 2021

yeah my 1 Synology NAS is doing the same if I hit it hard with other services such as docker, Plex, etc

My Synology can sleep the HDs and out of the box I believe it set to that as default, no idea about yours.

Synology HDD hibernation
Synology NAS, expansion units, and USB HDDs enter different energy-saving modes after being idle. ... Phase 1: HDDs power down after a period of inactivity (idle time). You may go to DSM > Control Panel > Hardware & Power > HDD Hibernation to configure the length of inactivity for the HDDs to enter hibernation.

0 replies

cccat6 · 2021-04-27T23:13:35Z

cccat6
Apr 27, 2021

We can't change the consensus algorithm anymore, but we can change the log level, and show the files.

Please note that looking up qualities for plots passing the filter requires about 7 random reads in a plot, whereas actually looking up a proof requires 64 reads. It might not be feasible on a slow NAS, since these are sequential reads. Furthermore, you need to take into account network latency to propagate your proof and block to the network, so you should be under 5 seconds to reduce risk of losing rewards.

Actually, the proof of space library does them sequentially, but they could be done in parallel, since it's a tree, so you could do 1 read, then 2, then 4, .. etc, for a total of around 7 sequential phases (one for each table in the plot). We haven't got around to doing this yet.

@mariano54
My disk array is not in the local but distributes in different places. I gather them as a huge disk and connect them with NFS. The latency of this solution for each random seek is between 200-1000ms. The average and the mean is about 400ms.
I wondering whether this is able to farm. 400ms*(7+64)seeks will be 28.4s. Just a little bit below the 30s limitation.
However, this solution support parallel read. If it is seeking 64 different places in the plot at the same time, it will be able to respond to all of them within a single seek time, which is about 400ms.
Is parallel seeking in the todo list and is that work as what I am thinking? When it will be updated?
Thanks a lot!

0 replies

coding-horror · 2021-04-27T23:25:06Z

coding-horror
Apr 27, 2021
Author

Yeah this is a critical issue for the project IMO, since a LOT of people are probably "farming" absolutely nothing due to the 30s harvester timeout, and the logs aren't WARN-ing or ERROR-ing them.. the GUI isn't telling them.. the only way to know this is happening is to intentionally set log level to INFO and scan for 30s or longer in the INFO messages 😱

😭

1 reply

coding-horror May 7, 2021
Author

As of 1.1.3 and beyond any harvester effort over 5s is a WARN at least 🙌

dorkmo · 2021-04-27T23:33:43Z

dorkmo
Apr 27, 2021

i believe the small time window was designed to push tape drives out of the chia eco system. ideally there would be big flashing warnings if youre missing any rewards. hopefully can be addressed in future releases.

0 replies

Sagittarius · 2021-04-29T06:11:53Z

Sagittarius
Apr 29, 2021

You should try "farming on many machines"

2 replies

coding-horror May 7, 2021
Author

I ultimately did switch to 3 farmers, turned off upnp on the other 2. So this is also correct. The NAS times are still worse by 3-4x than the USB connected drives though, there's inherent overhead even with a "fast" NAS and RAID 0!

xiaoliwe May 27, 2021

why "turned off UPnP on the other 2"? It will affect the winning block？

mariano54 · 2021-04-29T06:45:05Z

mariano54
Apr 29, 2021

You fast ones will still answer in a fast speed, since it's all threaded. It will just display the time of when the slowest finished.
Most users can farm many plots in under a second, so this is not an issue for most people. But it's worth checking if you're not winning rewards for a while.

0 replies

fiveangle · 2021-04-30T03:54:20Z

fiveangle
Apr 30, 2021

You fast ones will still answer in a fast speed, since it's all threaded. It will just display the time of when the slowest finished.
Most users can farm many plots in under a second, so this is not an issue for most people. But it's worth checking if you're not winning rewards for a while.

It seems near criminal that we are conditioned to enable "INFO" level logs and routinely told to ignore the countless "ERROR" level scary sounding messages spamming the logs, yet not a whimper about anything in the logs when a proof challenge request lookup is nearing or fully exhausting some non-presented timeout value that only tribal knowledge or code inspection is aware of.

I watched incredulously as my friends with far fewer plots reached, then far surpassed and doubled over my winnings with 1/2 the plots. Only once reaching out after the damage is done do we find out with an obtuse, "OF COURSE YOU WON'T WIN WITH LOOKUP DELAYS LIKE THIS" are we then made painfully aware of the worst of all actual "errors".

Like I said, that these messages are not flagged as ERROR level or at least WARNING is essentially criminal, especially in the face of the constant stream of non-critical "ERROR" level messages spamming the logs.

2 replies

bertybassett Apr 30, 2021

totally agree, no-one on the dev team had a NAS, Chia dropped the ball on this one (as confirmed in the decentral video by Bram).

coding-horror May 7, 2021
Author

As of 1.1.3 at least over 5s harvester is a WARN now. Further improvements needed, but it's a start!

Team4N6 · 2021-04-30T21:21:29Z

Team4N6
Apr 30, 2021

It's unfortunate that this didn't surface better. It seems that in almost any system it's difficult to balance the correct level of logging so it's useful. However, I'm mainly just jealous of your 400 TB of plots. I'm also scared that the total pool has added more than 600 PB and doubled in less than a week.

0 replies

coding-horror · 2021-04-30T22:45:55Z

coding-horror
Apr 30, 2021
Author

Well, I hope the following happens, as it will benefit literally everyone in the world:

Make it log ERROR when harvester takes more than 30s to return. Right now it is logging this terrible event (I even had several proofs that had the audacity to return in 35 seconds instead of under 30) as INFO and that's.. deeply uncool. 😱
Surface this harvester time in the GUI, perhaps only if it's taking longer than 5s so people can be aware if they're slipping down that slippery slope?
Tell us which specific plots the harvester is picking so we can isolate problem devices.

It isn't about me getting mine. It's about fixing the problem for everyone. I believe very deeply in this concept; it is why I founded Stack Overflow and why I am currently working on Discourse.

(I'm also curious if this 30 second harvester limit is truly written in stone; I think relaxing it to something like 60 seconds would be quite helpful long term, but I'm not in charge of the project! But that's a different issue for a different topic.)

12 replies

justwjx May 3, 2021

but what's really annoying is, when deliver plots, it will have plenty of latency up to 20 or 30 more seconds:
2021-05-03T18:41:28.014 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-29-07-10-6eb3b688f8c401a6658c0a12d82d4cd054a29c89f70bb9f8dd7449960cb40712.plot took: 47.49155139923096. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:42:07.091 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-29-16-26-34c7193460c1e38b4747975bca8d674b1a62a704e19aed3d5a61b253acd62959.plot took: 11.479249954223633. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:42:22.758 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-30-09-14-bb075dadaba48f697fe2aa8957b15bc63c83e4854c77d5ad4cfbf78711ae366a.plot took: 19.582388162612915. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:43:05.812 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-30-08-48-f0c565fb9b8a102c595b43be3d6cdffa087e38667f493f379e38829b18cf599d.plot took: 6.216129302978516. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:43:08.675 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-05-01-03-34-212577c0b776191df7dfb2b1beb6171f0293b12903e08c36c0c950649dc01120.plot took: 44.13482999801636. This should be below 5 seconds to minimize risk of losing rewards.

Muffexx May 3, 2021

When advertising Chia as a "green" cryptocurrency they should consider increasing these time thresholds in order to allow drives to sleep - as this should significantly reduce overall power draw.

justwjx May 3, 2021

When advertising Chia as a "green" cryptocurrency they should consider increasing these time thresholds in order to allow drives to sleep - as this should significantly reduce overall power draw.

good point!

xydreen May 3, 2021

but what's really annoying is, when deliver plots, it will have plenty of latency up to 20 or 30 more seconds:
2021-05-03T18:41:28.014 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-29-07-10-6eb3b688f8c401a6658c0a12d82d4cd054a29c89f70bb9f8dd7449960cb40712.plot took: 47.49155139923096. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:42:07.091 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-29-16-26-34c7193460c1e38b4747975bca8d674b1a62a704e19aed3d5a61b253acd62959.plot took: 11.479249954223633. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:42:22.758 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-30-09-14-bb075dadaba48f697fe2aa8957b15bc63c83e4854c77d5ad4cfbf78711ae366a.plot took: 19.582388162612915. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:43:05.812 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-30-08-48-f0c565fb9b8a102c595b43be3d6cdffa087e38667f493f379e38829b18cf599d.plot took: 6.216129302978516. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:43:08.675 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-05-01-03-34-212577c0b776191df7dfb2b1beb6171f0293b12903e08c36c0c950649dc01120.plot took: 44.13482999801636. This should be below 5 seconds to minimize risk of losing rewards.

I wonder if throttling the copy process Mbps might be optimal here, even like an 80% throttle (assuming your only going to be copying a single file at a time)...

justwjx May 4, 2021

but what's really annoying is, when deliver plots, it will have plenty of latency up to 20 or 30 more seconds:
2021-05-03T18:41:28.014 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-29-07-10-6eb3b688f8c401a6658c0a12d82d4cd054a29c89f70bb9f8dd7449960cb40712.plot took: 47.49155139923096. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:42:07.091 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-29-16-26-34c7193460c1e38b4747975bca8d674b1a62a704e19aed3d5a61b253acd62959.plot took: 11.479249954223633. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:42:22.758 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-30-09-14-bb075dadaba48f697fe2aa8957b15bc63c83e4854c77d5ad4cfbf78711ae366a.plot took: 19.582388162612915. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:43:05.812 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-04-30-08-48-f0c565fb9b8a102c595b43be3d6cdffa087e38667f493f379e38829b18cf599d.plot took: 6.216129302978516. This should be below 5 seconds to minimize risk of losing rewards. 2021-05-03T18:43:08.675 harvester chia.harvester.harvester: WARNING Looking up qualities on \\172.168.10.9\chia6\plot-k32-2021-05-01-03-34-212577c0b776191df7dfb2b1beb6171f0293b12903e08c36c0c950649dc01120.plot took: 44.13482999801636. This should be below 5 seconds to minimize risk of losing rewards.

I wonder if throttling the copy process Mbps might be optimal here, even like an 80% throttle (assuming your only going to be copying a single file at a time)...

yes, that did a little help, now I tried to avoid setting destination folder directly to NAS folder, that will give a huge parallel writing to NAS folder. Instead, I put destination on a local ssd, and manually move files to NAS.
Anyway, I still facing a significant latency, even no file transferring. And I can not identify the root cause...
So now I am study how to deploy docker harvester only container... but very few knowledge I can find on the internet.

RichJac · 2021-05-04T20:32:23Z

RichJac
May 4, 2021

I'm so glad I ran across this. I just checked my logs and most of my x plots were eligible for farming are 0. Every once in a blue moon I have 1. I've been running since about the 20th of April. :-(

I'm now moving my plots over to a direct attached USB drive I have.

Thank you!!

2 replies

fiveangle May 4, 2021

unrelated

AyeBraine May 5, 2021

The information about "0 eligible plots" is normal. This says how many of your plots have passed the filter. Mostly they shouldn't pass the filter, only once in a while they do, then they compete for proof. It's completely unrelated to this issue.

nomadengineer · 2021-05-05T19:21:37Z

nomadengineer
May 5, 2021

I went through this thread!
So whats the final consensus on the max number of plots per directory? :)

1 reply

coding-horror May 5, 2021
Author

I would not do more than 165 personally, that's also how many fit on an 18tb hard drive, so the math works for me 😄

s0ftice · 2021-05-06T02:36:38Z

s0ftice
May 6, 2021

Hi all,
The DEBUG log of my harvester shows well below 30 seconds whenever eligible plots are found. However a "chia plots check -n 5" takes around 3 minutes per plot file for my setup (network mount). This is not reflected in the logs anywhere, but I assume I won't be able to earn anything as checking for proofs silently times out.

Is my understanding correct?

13 replies

danielritchie May 14, 2021

you are, unfortunately, correct, I am very sorry to say :(

Hi @coding-horror - I also have this discrepancy. Can you clarify why this is the case, and there would be a big difference between the proof in the live logs and the duration of chia plots check?

I have 3 plots. Checking the logs it's consistently pretty fast
`... Found 0 proofs. Time: 0.16520 s. Total 3 plots'
FASTEST: 0.00067
SLOWEST: 0.22872

However, when I run chia plots check It's pretty slow. For the whole process I'm averaging close to 20 seconds per proof regardless of the number of challenges I use. I'm clearly under the ~2 second mark in the logs, but when I run the checks it seems like I'm dangerously close to the 30 second mark. Hoping to understand what's causing this so I can dig in and see if I have another issue.

coding-horror May 14, 2021
Author

No idea. I was told -n 30 was the closest analog to an actual proof check, but that there was no exact replica of the proof check in the code. You'll need to talk to someone on the project, I don't have the information you seek.

danielritchie May 15, 2021

I didn't realize you weren't involved but thanks for the info. That makes sense that the tests wouldn't be completely representative. I was worried that maybe there was something timing wise that needed to be looked into. I'll dig into it if it seems like a problem. thanks!

ppolewicz May 19, 2021

@coding-horror I've learned so much from you over the years of reading your stuff and using things you've built. Let me shine a little light on this for you. There are two operations involved:

quality check - devs say it takes 7 seeks, but testing shows actually it's up to 9 (sometimes it can be as low as 2 when you find a negative confirmation earlier). There is a trace of it in the logs nowadays that tells you there is a problem whenever you exceed 5s, but doesn't really say why "5" is the right number. That one is implemented fairly well - perhaps those 2 extra seeks could be avoided, hard to say.
full proof - devs say it takes 64 seeks, but actually it takes 66. You can trace it with a custom FUSE and chia plots check -n 5, then divide the amount of non-sequential reads by 5, though you need to run it on a plot that gets 5 proofs in such case (not all do, you can have 1 just as well - see the output of plot checker). First of all, from those 64, we've already done 7 (or 9) just a second ago, so we shouldn't be doing the exact same reads again now - but we do. This part of code is not optimal. The operating system buffer cache might save us here, they have caches and sometimes the second read will go from cache and will not touch the drive and move a disk head. This is why starting firefox for the first time takes some time, but if you shut it down and start it again, it will come up faster. Windows does the same thing btw.

Full proof code is acutally in another repo, chia-pos or something like that and it's in C or C++. It currently has only one mode of operation which is... simple. It kind of assume we are working with a single local fast hdd per plot or a fast ssd. Anyone with a slow NAS or anything more ancient is going to have trouble. Reads are sequential, first the 9, then the 64, but I think due to a performance bug in chia-blockchain python repo, we first wait until the last quality check is done, before we start running the full proofs. This all is not optimized for latency the way we would like it to be. Some work to fix this is happening at Chia-Network/chiapos#239

Now what is not clear for me is how much time exactly there is. Logs say 5s for a quality check is too much, someone made a PR which says 10s is too much for a full proof, but in other places I read it's 30s or even 60s.

coding-horror May 19, 2021
Author

Thank you for the kind words, and the analysis! I agree there's some optimization that can be done here which will help a lot of folks in the future.

oxygen · 2021-05-07T09:32:11Z

oxygen
May 7, 2021

Is chia plots check -n 1 suitable for benchmarking the 30 seconds limit?

3 replies

s0ftice May 7, 2021

WARNING 1 challenges is too low, setting it to the minimum of 5 ... not possible.

oxygen May 7, 2021

Well, the purpose was benchmarking not necessarily testing. So if divided by 5? Is it a proper way to check?

s0ftice May 7, 2021

I thinnk so yes. For me that's still 40 seconds.

bertybassett · 2021-05-07T19:38:56Z

bertybassett
May 7, 2021

For me raid anything used for plot drives on my NAS was utter rubbish and expanding a raid while copying plots to it was the worst ever.

On my AMD based Synology DS1821+ 16Gb of ram (so not under powered) I tried everything but it was all rubbish. Windows vm running on NAS as farmer too slow, docker running farmer (weird missing farming errors), separate hardwired Windows laptop as farmer and NAS with raid/SMB shares and it was pants.

In the end I took coding-horrrors advised and move it to windows.

A) One powerful Windows 10 HP Microserver with 4 drives as individual drives (14Tb each) as the farmer.

b) cheap as chips (less than 60$) HP Microserver N34l with open media vault again 4 more drives individual drives (14Tb) with separate SMB shares for each drive.

c) repeated b as many time as I needed.

plot times are 99% of the times less than 5 seconds and I have 600 plots.

In my limited option its the RAID overhead that fs things up and Synology can not anything but raid (lowest setting is basic but even that is raid as you can expanded).

2 replies

bertybassett May 7, 2021

another key thing, NEVER and I mean NEVER upload plots to the NAS while it is being used for plot storage.

bertybassett May 7, 2021

I copy my plots to USB drive onto plotter computer shuck USB HD and add to Microserver.

Why HP Microservers? I hear say? Desktop PSUs should not power on more than 4 HD at once without a delay, the initial power surge is enough to fry the drives. Each HP Microserver was small in space, less than 30 watts, took 4 drives in special bays, silent and best of all cheap like £50 I paid for the N34l yes the N34l was only 1.3Ghz but more than enough to run OMV and do simple SMB shares.

High powered AMD CPU, passmark 3500 running windows £250

Lower powered AMD 1.3GHz passmark 700 £50 or less running OMV doing 4 x SMB shares

Example of the drive bays

j1mmyfever · 2021-05-07T20:42:04Z

j1mmyfever
May 7, 2021

Just wanted to drop a comment here. I never see anyone talking about Malware on-access scanning. Probably a good call in Windows to set some exclusions.

Process: chia.exe
Process: start_harvester.exe
Process: start_farmer.exe
FileType: .plot
Folder: Your Plot Folders
Folder: Your ChiaTemp Folders

0 replies

grocheireland · 2021-05-13T15:01:21Z

grocheireland
May 13, 2021

I think if lookup times are too long it needs to be at least flagged in log as WARN or ERR .....
I am seeing that CPU cycles go up a good bit of I set log to INFO and have to grep..... As there is just too much logging
I am on PI4 with nvme and no SD card dependancy

Updated!
Sorry, now I see the WARNING in the logs

3 replies

yanadhorn May 16, 2021

have you ever won chia by pi4?

mauricecyril May 23, 2021

Can anyone confirm that Chia can be won using a pi 4? If we're seeing delays of 5-10 seconds is that still within the window if winning a reward?

ppolewicz May 23, 2021

Technically it should be possible, but you should use #5064 and find out yourself how well it works on your setup

marcoabreu · 2021-05-15T04:55:58Z

marcoabreu
May 15, 2021

I have create an idea which should reveal that issue here: #5064

I also submitted a PR which should alleviate the underlying issue of slow storage access: Chia-Network/chiapos#239

1 reply

cccat6 Jun 3, 2021

Thank you so much! I see your branch has been merged! It might save my ass! U R my hero!
Waiting for next deployment!

zorner · 2021-06-10T14:13:35Z

zorner
Jun 10, 2021

@stevepresley the warring log is very helpful. But I have had some that were ignore-able.

I had 4 bad plots in my farm. 1 adding a log line every 2 minutes or so. I went about removing them. After removing the first two, I tried to run 'chia plots check -h' and found the Win10 box ignoring me. The only things running on the Win10 box was Chia, remote desktop, and my cmd. Win10 did recover a minute or so later. It looks like Windows fixed an NTFS error on drive H:\ after I told it to delete a plot on drive H:\. That event caused Chia to log 1 read error on drive H:\ and then 4 warnings about lookups taking too long on 4 other drives.

This one time hiccup is no big deal. However, I see a need to have the harvester io priority raised, so it is less likely to be effected by other io operations. Someone mentioned not to copy plots onto an actively farmed NAS. At 10 ext. HDD, disk/wire/space management is becoming an issue. I want to move to a RAID setup and remove the ext HDD. The RAID will grow over time. As that it can take 15 minutes to copy 1 k32 plot, I do not want the harvester unable to do lookups on plots because a plot is being copied to the RAID. The RAID would be directly attached to the harvester. NAS's can have a separate OS running the NAS and simply sharing folders on the network.

Debug.log:
2021-06-10T02:33:12.442 harvester chia.plotting.plot_tools: WARNING Not farming plot L:\plot-k32-2021-06-01-18-08-3ffcc812176933dc74663afe01203ec0ad83612c83495605849dc850ae19116e.plot. Size is 90.99346400052309 GiB, but expected at least: 99.06 GiB. We assume the file is being copied. 2021-06-10T02:35:20.443 harvester chia.plotting.plot_tools: WARNING Not farming plot L:\plot-k32-2021-06-01-18-08-3ffcc812176933dc74663afe01203ec0ad83612c83495605849dc850ae19116e.plot. Size is 90.99346400052309 GiB, but expected at least: 99.06 GiB. We assume the file is being copied. 2021-06-10T02:37:27.944 harvester chia.plotting.plot_tools: WARNING Not farming plot L:\plot-k32-2021-06-01-18-08-3ffcc812176933dc74663afe01203ec0ad83612c83495605849dc850ae19116e.plot. Size is 90.99346400052309 GiB, but expected at least: 99.06 GiB. We assume the file is being copied. 2021-06-10T02:39:37.570 harvester chia.plotting.plot_tools: WARNING Error reading directory H:\ [WinError 1392] The file or directory is corrupted and unreadable: 'H:\\' 2021-06-10T02:41:46.164 harvester chia.harvester.harvester: WARNING Looking up qualities on F:\plot-k32-2021-05-15-00-05-4fa0899862091d16dd715727325ae18f63ff14cc02b781b70d3a21e3b4a8cb0d.plot took: 5.1093361377716064. This should be below 5 seconds to minimize risk of losing rewards. 2021-06-10T02:43:00.883 harvester chia.harvester.harvester: WARNING Looking up qualities on I:\plot-k32-2021-05-23-11-02-f58ae4dfb8b24cb6de95d01c5e176b48986ee0204e341c7795db4f222a972695.plot took: 21.703153610229492. This should be below 5 seconds to minimize risk of losing rewards. 2021-06-10T02:43:00.883 harvester chia.harvester.harvester: WARNING Looking up qualities on D:\plot-k32-2021-05-18-23-59-e48f62cb1eff3c3cb4fc23fef5e71b74da9eb902369bdc4142729659789741d3.plot took: 21.703153610229492. This should be below 5 seconds to minimize risk of losing rewards. 2021-06-10T02:43:00.883 harvester chia.harvester.harvester: WARNING Looking up qualities on G:\plot-k32-2021-05-14-22-31-499456d2d0b9805ab05509d3394e90cdded7e7bdae4aeb9797f2505bef118700.plot took: 33.64081692695618. This should be below 5 seconds to minimize risk of losing rewards.

2 replies

ppolewicz Jun 10, 2021

you should probably use https://linux.die.net/man/1/ionice for the copy process: I think you want to run the copy with a low priority, a opposed to running the harvester with a high priority

zorner Jun 12, 2021

That is one way of doing it, that means any IO operation that might effect the harvester has to be lowered. Lowering the IO priority all the time is going to get old. Further, I just got 3 more logs that are taking too long. But a again they look like a one off, but I have no log telling what the hold up was. Guessing Win10 was doing something. I am having about 1-5 plots that pass the filter each time, so the harvester is doing some lookups each cycle. And there is no warning for a long period either side of these. This make me feel the OS needs to know to service the harvester first over most other processes as this the primary goal of this box.

Log:
2021-06-11T03:46:24.253 harvester chia.harvester.harvester: WARNING Looking up qualities on M:\plot-k32-2021-05-30-19-11-53dc5c3c541aeba8570c84c1e8ce41c83b508c8c8012ab970d6df8c87c7591f0.plot took: 9.202936887741089. This should be below 5 seconds to minimize risk of losing rewards. 2021-06-11T03:46:24.253 harvester chia.harvester.harvester: WARNING Looking up qualities on D:\plot-k32-2021-05-18-19-55-ab783c8063722e0c5128069be4fb9dbc5cac8045c3b159a813c6599ac4c7a421.plot took: 9.202936887741089. This should be below 5 seconds to minimize risk of losing rewards. 2021-06-11T03:46:24.253 harvester chia.harvester.harvester: WARNING Looking up qualities on D:\plot-k32-2021-05-23-15-14-6de4be146914ca8511e1141e3e276b3f7648943ab9856753bcb98b4283b10f08.plot took: 9.202936887741089. This should be below 5 seconds to minimize risk of losing rewards.

long-pham · 2021-06-18T23:21:59Z

long-pham
Jun 18, 2021

@coding-horror, this fixed the looking up qualities issue for me:

plot_loading_frequency_seconds: 1264000

Wondering if this' a bug that could be fixed?

3 replies

Nothing tells you when you're timing out on 30 second harvester proof checks #3188

Replies: 33 comments · 57 replies

coding-horror May 5, 2021 Author

coding-horror May 7, 2021 Author

coding-horror Apr 25, 2021 Author

coding-horror May 7, 2021 Author

coding-horror May 7, 2021 Author

coding-horror May 5, 2021 Author

coding-horror May 5, 2021 Author

coding-horror Apr 27, 2021 Author

coding-horror May 7, 2021 Author

coding-horror May 7, 2021 Author

coding-horror May 7, 2021 Author

coding-horror Apr 30, 2021 Author

coding-horror May 5, 2021 Author

coding-horror May 14, 2021 Author

coding-horror May 19, 2021 Author

Replies: 33 comments 57 replies

coding-horror May 5, 2021
Author

coding-horror May 7, 2021
Author

coding-horror
Apr 25, 2021
Author

coding-horror May 7, 2021
Author

coding-horror May 7, 2021
Author

coding-horror May 5, 2021
Author

coding-horror May 5, 2021
Author

coding-horror
Apr 27, 2021
Author

coding-horror May 7, 2021
Author

coding-horror May 7, 2021
Author

coding-horror May 7, 2021
Author

coding-horror
Apr 30, 2021
Author

coding-horror May 5, 2021
Author

coding-horror May 14, 2021
Author

coding-horror May 19, 2021
Author