Nothing tells you when you're timing out on 30 second harvester proof checks #2651

coding-horror · 2021-04-25T01:44:54Z

The problem

I thought I was farming, but I wasn't -- because something about my network caused the proof check to take more than the hard-coded 30 second limit.

I had an average time to win of 8 or 9 hours for more than 120 hours without a single win. This seemed statistically implausible, so I researched the logs, and cleared any errors or warnings in the logs (well done, all the warnings and errors in debug.log were indeed things I should fix!). Still no wins for a long time.

How to reproduce

Have a bunch of plots on slow storage media; when the proof check happens, verifying the proofs takes longer than the hard-coded 30 seconds allowed. You will never win a single Chia, but there's absolutely nothing in the GUI to inform you that this is happening. You can view the logs, but in the logs it is not even presented as a warning (!), but as an INFO message:

2021-04-24T16:03:29.039 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.74567 s. Total 3922 plots
2021-04-24T16:03:29.433 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.92976 s. Total 3922 plots
2021-04-24T16:03:29.635 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.01540 s. Total 3922 plots
2021-04-24T16:03:44.553 harvester chia.harvester.harvester: INFO     4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.11716 s. Total 3922 plots
2021-04-24T16:04:10.882 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 24.51660 s. Total 3922 plots
2021-04-24T16:04:36.101 harvester chia.harvester.harvester: INFO     9 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 37.96922 s. Total 3922 plots
2021-04-24T16:05:13.959 harvester chia.harvester.harvester: INFO     11 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 63.07326 s. Total 3922 plots
2021-04-24T16:05:13.959 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 50.81979 s. Total 3922 plots
2021-04-24T16:05:31.022 harvester chia.harvester.harvester: INFO     6 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 54.92095 s. Total 3922 plots
2021-04-24T16:05:31.022 harvester chia.harvester.harvester: INFO     7 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 42.29693 s. Total 3922 plots
2021-04-24T16:05:31.022 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 29.67779 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 18.59361 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     8 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 6.10934 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     10 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.17564 s. Total 3922 plots
2021-04-24T16:05:33.568 harvester chia.harvester.harvester: INFO     13 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.82808 s. Total 3922 plots
2021-04-24T16:05:35.600 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 2.82815 s. Total 3922 plots
2021-04-24T16:05:35.600 harvester chia.harvester.harvester: INFO     12 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 3.56252 s. Total 3922 plots
2021-04-24T16:05:58.022 harvester chia.harvester.harvester: INFO     5 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.26593 s. Total 3922 plots
2021-04-24T16:06:11.887 harvester chia.harvester.harvester: INFO     4 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 12.42098 s. Total 3922 plots
2021-04-24T16:06:36.417 harvester chia.harvester.harvester: INFO     3 plots were eligible for farming 1c75d8d21c... Found 0 proofs. Time: 24.36354 s. Total 3922 plots

Of the above, the proofs that take longer than 30 seconds are not eligible to win, but this is not logged as an ERROR or WARNING or surfaced in the UI in any way.

Expected behavior

The GUI will tell you "hey, your proof checks are too slow, there's absolutely no chance for you to win, even if you are farming infinity plots"

Screenshots

Desktop

OS: Windows 10
OS Version/Flavor: latest 19042.928
CPU: i7-1165G7

Additional context

I followed up on the #support channel in Keybase, where I got the important advice to enable INFO level logging and check for the 30 second proof limit.. and I wrote up a detailed account on the forum; if you need excruciating levels of detail, please check there 🙇‍♂️

Recommended solution

Allow the 30 second proof check threshold to be configurable to account for slower storage (or relax it a bit)
change the log level of beyond 30 second on harvester proof checks to WARN or ERROR
in the logs, provide details on which files are being checked for proofs, so you can identify which media or NAS is potentially the problem one slowing everything down

The text was updated successfully, but these errors were encountered:

maomingyang1314 · 2021-04-25T02:50:34Z

Moreover, I would like to know why your laptop can mount nearly 400TB disks, is your disk on nas?

maomingyang1314 · 2021-04-25T02:53:12Z

Your NAS caused your task to time out. It is recommended to abandon this method.

AZ663 · 2021-04-25T03:46:22Z

多少速度会影响呢

mariano54 · 2021-04-25T07:46:59Z

We can't change the consensus algorithm anymore, but we can change the log level, and show the files.

Please note that looking up qualities for plots passing the filter requires about 7 random reads in a plot, whereas actually looking up a proof requires 64 reads. It might not be feasible on a slow NAS, since these are sequential reads. Furthermore, you need to take into account network latency to propagate your proof and block to the network, so you should be under 5 seconds to reduce risk of losing rewards.

Actually, the proof of space library does them sequentially, but they could be done in parallel, since it's a tree, so you could do 1 read, then 2, then 4, .. etc, for a total of around 7 sequential phases (one for each table in the plot). We haven't got around to doing this yet.

mariano54 · 2021-04-25T08:50:18Z

Also another thing to point out is that the responses are returned to the full node as they come out from the drives, so the high time is probably only affecting the slow drive or the slow NAS.

coding-horror · 2021-04-25T17:42:41Z

Yeah, this only started happening as I added multiple NAS devices to the network. With 1 or 2 NASes, it was all fine. Once you get to 5.. not so much, especially if the algorithm picks plots on 6 different devices. At the very least

definitely change the log level to ERROR (or WARN)
surface this in the UI for sure, otherwise you're silently "farming".. absolutely nothing. A silent killer.
tell us WHICH files were picked for proofs so we can look for "problem" devices

thanks @mariano54 !

immortalx · 2021-04-26T06:21:01Z

Is there any benefit of using larger plot files in this case? Less files per directory in his case.

keliew · 2021-04-26T09:22:38Z

I think the problem is inherent in how Windows manages mounted network drives.

If you'd to switch to a Linux-based farmer, it should work better. I know it's not an option for most people, but it'd be a good test to see if that's the cause.

kimklai · 2021-04-26T10:33:06Z

I think lots of people need to know their whole system just too slow to provide the a valid answer in time. So please mark the logs as WARNING as they takes longer a certain threshold.

desek · 2021-04-26T12:44:08Z

This was quite a rough finding since I've been happily farming on my RPi on wifi to a remote storage with proof checking usually between 60-90 seconds.

tartley · 2021-04-26T16:49:51Z

Could some drives be powering down when idle?

coding-horror · 2021-04-26T17:14:44Z

no, all the NAS devices default to "never sleep drives".

…

On Mon, Apr 26, 2021 at 9:50 AM Jonathan Hartley ***@***.***> wrote: Could some drives be powering down when idle? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#2651 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AALTWVITJALTYDZ56FWXQZDTKWKUJANCNFSM43QXK2MA> .

bertybassett · 2021-04-27T14:53:04Z

yeah my 1 Synology NAS is doing the same if I hit it hard with other services such as docker, Plex, etc

My Synology can sleep the HDs and out of the box I believe it set to that as default, no idea about yours.

Synology HDD hibernation
Synology NAS, expansion units, and USB HDDs enter different energy-saving modes after being idle. ... Phase 1: HDDs power down after a period of inactivity (idle time). You may go to DSM > Control Panel > Hardware & Power > HDD Hibernation to configure the length of inactivity for the HDDs to enter hibernation.

cccat6 · 2021-04-27T23:13:35Z

We can't change the consensus algorithm anymore, but we can change the log level, and show the files.

Please note that looking up qualities for plots passing the filter requires about 7 random reads in a plot, whereas actually looking up a proof requires 64 reads. It might not be feasible on a slow NAS, since these are sequential reads. Furthermore, you need to take into account network latency to propagate your proof and block to the network, so you should be under 5 seconds to reduce risk of losing rewards.

Actually, the proof of space library does them sequentially, but they could be done in parallel, since it's a tree, so you could do 1 read, then 2, then 4, .. etc, for a total of around 7 sequential phases (one for each table in the plot). We haven't got around to doing this yet.

@mariano54
My disk array is not in the local but distributes in different places. I gather them as a huge disk and connect them with NFS. The latency of this solution for each random seek is between 200-1000ms. The average and the mean is about 400ms.
I wondering whether this is able to farm. 400ms*(7+64)seeks will be 28.4s. Just a little bit below the 30s limitation.
However, this solution support parallel read. If it is seeking 64 different places in the plot at the same time, it will be able to respond to all of them within a single seek time, which is about 400ms.
Is parallel seeking in the todo list and is that work as what I am thinking? When it will be updated?
Thanks a lot!

coding-horror · 2021-04-27T23:25:06Z

Yeah this is a critical issue for the project IMO, since a LOT of people are probably "farming" absolutely nothing due to the 30s harvester timeout, and the logs aren't WARN-ing or ERROR-ing them.. the GUI isn't telling them.. the only way to know this is happening is to intentionally set log level to INFO and scan for 30s or longer in the INFO messages 😱

😭

dorkmo · 2021-04-27T23:33:43Z

i believe the small time window was designed to push tape drives out of the chia eco system. ideally there would be big flashing warnings if youre missing any rewards. hopefully can be addressed in future releases.

desek · 2021-04-28T07:44:54Z

I did a somewhat scalable workaround for this by running multiple harvesters in containers (on the RPi4), each with ONE plot directory with 100 plots. This gives arounds 2 seconds for proof checking per container (up to 100 plots).

bertybassett · 2021-04-28T09:52:10Z

isn't the simple answer make each Nas its now farmer until chia fix this. Building the raid array in performance mode on a Synology also causes this issue.

qfishpear · 2021-04-28T15:44:10Z

It's soooooo sad that by "grep eligible" I found nothing there!
(only 1.5TB in my farm, so.... it's just reasonable)

Sagittarius · 2021-04-29T06:11:53Z

You should try "farming on many machines"

mariano54 · 2021-04-29T06:45:05Z

You fast ones will still answer in a fast speed, since it's all threaded. It will just display the time of when the slowest finished.
Most users can farm many plots in under a second, so this is not an issue for most people. But it's worth checking if you're not winning rewards for a while.

fiveangle · 2021-04-30T03:54:20Z

You fast ones will still answer in a fast speed, since it's all threaded. It will just display the time of when the slowest finished.
Most users can farm many plots in under a second, so this is not an issue for most people. But it's worth checking if you're not winning rewards for a while.

It seems near criminal that we are conditioned to enable "INFO" level logs and routinely told to ignore the countless "ERROR" level scary sounding messages spamming the logs, yet not a whimper about anything in the logs when a proof challenge request lookup is nearing or fully exhausting some non-presented timeout value that only tribal knowledge or code inspection is aware of.

I watched incredulously as my friends with far fewer plots reached, then far surpassed and doubled over my winnings with 1/2 the plots. Only once reaching out after the damage is done do we find out with an obtuse, "OF COURSE YOU WON'T WIN WITH LOOKUP DELAYS LIKE THIS" are we then made painfully aware of the worst of all actual "errors".

Like I said, that these messages are not flagged as ERROR level or at least WARNING is essentially criminal, especially in the face of the constant stream of non-critical "ERROR" level messages spamming the logs.

coding-horror added the bug Something isn't working label Apr 25, 2021

Chia-Network deleted a comment from maomingyang1314 Apr 25, 2021

cattivik66 mentioned this issue Apr 27, 2021

Notify harvest timeout martomi/chiadog#49

Closed

sargonas closed this as completed Apr 30, 2021

Chia-Network locked and limited conversation to collaborators Apr 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

Nothing tells you when you're timing out on 30 second harvester proof checks #2651

Nothing tells you when you're timing out on 30 second harvester proof checks #2651

coding-horror commented Apr 25, 2021 •

edited

Loading

maomingyang1314 commented Apr 25, 2021

maomingyang1314 commented Apr 25, 2021

AZ663 commented Apr 25, 2021

mariano54 commented Apr 25, 2021

mariano54 commented Apr 25, 2021

coding-horror commented Apr 25, 2021

immortalx commented Apr 26, 2021

keliew commented Apr 26, 2021

kimklai commented Apr 26, 2021

desek commented Apr 26, 2021

tartley commented Apr 26, 2021

coding-horror commented Apr 26, 2021 via email

bertybassett commented Apr 27, 2021 •

edited

Loading

cccat6 commented Apr 27, 2021 •

edited

Loading

coding-horror commented Apr 27, 2021

dorkmo commented Apr 27, 2021

desek commented Apr 28, 2021

bertybassett commented Apr 28, 2021 •

edited

Loading

qfishpear commented Apr 28, 2021

Sagittarius commented Apr 29, 2021

mariano54 commented Apr 29, 2021

fiveangle commented Apr 30, 2021 •

edited

Loading

This issue was moved to a discussion.

This issue was moved to a discussion.

Nothing tells you when you're timing out on 30 second harvester proof checks #2651

Nothing tells you when you're timing out on 30 second harvester proof checks #2651

Comments

coding-horror commented Apr 25, 2021 • edited Loading

maomingyang1314 commented Apr 25, 2021

maomingyang1314 commented Apr 25, 2021

AZ663 commented Apr 25, 2021

mariano54 commented Apr 25, 2021

mariano54 commented Apr 25, 2021

coding-horror commented Apr 25, 2021

immortalx commented Apr 26, 2021

keliew commented Apr 26, 2021

kimklai commented Apr 26, 2021

desek commented Apr 26, 2021

tartley commented Apr 26, 2021

coding-horror commented Apr 26, 2021 via email

bertybassett commented Apr 27, 2021 • edited Loading

cccat6 commented Apr 27, 2021 • edited Loading

coding-horror commented Apr 27, 2021

dorkmo commented Apr 27, 2021

desek commented Apr 28, 2021

bertybassett commented Apr 28, 2021 • edited Loading

qfishpear commented Apr 28, 2021

Sagittarius commented Apr 29, 2021

mariano54 commented Apr 29, 2021

fiveangle commented Apr 30, 2021 • edited Loading

This issue was moved to a discussion.

coding-horror commented Apr 25, 2021 •

edited

Loading

bertybassett commented Apr 27, 2021 •

edited

Loading

cccat6 commented Apr 27, 2021 •

edited

Loading

bertybassett commented Apr 28, 2021 •

edited

Loading

fiveangle commented Apr 30, 2021 •

edited

Loading