Nothing tells you when you're timing out on 30 second harvester proof checks #3188
Replies: 33 comments 57 replies
-
Moreover, I would like to know why your laptop can mount nearly 400TB disks, is your disk on nas? |
Beta Was this translation helpful? Give feedback.
-
Your NAS caused your task to time out. It is recommended to abandon this method. |
Beta Was this translation helpful? Give feedback.
-
多少速度会影响呢 |
Beta Was this translation helpful? Give feedback.
-
We can't change the consensus algorithm anymore, but we can change the log level, and show the files. Please note that looking up qualities for plots passing the filter requires about 7 random reads in a plot, whereas actually looking up a proof requires 64 reads. It might not be feasible on a slow NAS, since these are sequential reads. Furthermore, you need to take into account network latency to propagate your proof and block to the network, so you should be under 5 seconds to reduce risk of losing rewards. Actually, the proof of space library does them sequentially, but they could be done in parallel, since it's a tree, so you could do 1 read, then 2, then 4, .. etc, for a total of around 7 sequential phases (one for each table in the plot). We haven't got around to doing this yet. |
Beta Was this translation helpful? Give feedback.
-
Also another thing to point out is that the responses are returned to the full node as they come out from the drives, so the high time is probably only affecting the slow drive or the slow NAS. |
Beta Was this translation helpful? Give feedback.
-
Yeah, this only started happening as I added multiple NAS devices to the network. With 1 or 2 NASes, it was all fine. Once you get to 5.. not so much, especially if the algorithm picks plots on 6 different devices. At the very least
thanks @mariano54 ! |
Beta Was this translation helpful? Give feedback.
-
Is there any benefit of using larger plot files in this case? Less files per directory in his case. |
Beta Was this translation helpful? Give feedback.
-
I think the problem is inherent in how Windows manages mounted network drives. If you'd to switch to a Linux-based farmer, it should work better. I know it's not an option for most people, but it'd be a good test to see if that's the cause. |
Beta Was this translation helpful? Give feedback.
-
I think lots of people need to know their whole system just too slow to provide the a valid answer in time. So please mark the logs as WARNING as they takes longer a certain threshold. |
Beta Was this translation helpful? Give feedback.
-
This was quite a rough finding since I've been happily farming on my RPi on wifi to a remote storage with proof checking usually between 60-90 seconds. |
Beta Was this translation helpful? Give feedback.
-
Could some drives be powering down when idle? |
Beta Was this translation helpful? Give feedback.
-
yeah my 1 Synology NAS is doing the same if I hit it hard with other services such as docker, Plex, etc My Synology can sleep the HDs and out of the box I believe it set to that as default, no idea about yours. Synology HDD hibernation |
Beta Was this translation helpful? Give feedback.
-
@mariano54 |
Beta Was this translation helpful? Give feedback.
-
Yeah this is a critical issue for the project IMO, since a LOT of people are probably "farming" absolutely nothing due to the 30s harvester timeout, and the logs aren't WARN-ing or ERROR-ing them.. the GUI isn't telling them.. the only way to know this is happening is to intentionally set log level to INFO and scan for 30s or longer in the INFO messages 😱 😭 |
Beta Was this translation helpful? Give feedback.
-
i believe the small time window was designed to push tape drives out of the chia eco system. ideally there would be big flashing warnings if youre missing any rewards. hopefully can be addressed in future releases. |
Beta Was this translation helpful? Give feedback.
-
You should try "farming on many machines" |
Beta Was this translation helpful? Give feedback.
-
You fast ones will still answer in a fast speed, since it's all threaded. It will just display the time of when the slowest finished. |
Beta Was this translation helpful? Give feedback.
-
It seems near criminal that we are conditioned to enable "INFO" level logs and routinely told to ignore the countless "ERROR" level scary sounding messages spamming the logs, yet not a whimper about anything in the logs when a proof challenge request lookup is nearing or fully exhausting some non-presented timeout value that only tribal knowledge or code inspection is aware of. I watched incredulously as my friends with far fewer plots reached, then far surpassed and doubled over my winnings with 1/2 the plots. Only once reaching out after the damage is done do we find out with an obtuse, "OF COURSE YOU WON'T WIN WITH LOOKUP DELAYS LIKE THIS" are we then made painfully aware of the worst of all actual "errors". Like I said, that these messages are not flagged as ERROR level or at least WARNING is essentially criminal, especially in the face of the constant stream of non-critical "ERROR" level messages spamming the logs. |
Beta Was this translation helpful? Give feedback.
-
It's unfortunate that this didn't surface better. It seems that in almost any system it's difficult to balance the correct level of logging so it's useful. However, I'm mainly just jealous of your 400 TB of plots. I'm also scared that the total pool has added more than 600 PB and doubled in less than a week. |
Beta Was this translation helpful? Give feedback.
-
Well, I hope the following happens, as it will benefit literally everyone in the world:
It isn't about me getting mine. It's about fixing the problem for everyone. I believe very deeply in this concept; it is why I founded Stack Overflow and why I am currently working on Discourse. (I'm also curious if this 30 second harvester limit is truly written in stone; I think relaxing it to something like 60 seconds would be quite helpful long term, but I'm not in charge of the project! But that's a different issue for a different topic.) |
Beta Was this translation helpful? Give feedback.
-
I'm so glad I ran across this. I just checked my logs and most of my x plots were eligible for farming are 0. Every once in a blue moon I have 1. I've been running since about the 20th of April. :-( I'm now moving my plots over to a direct attached USB drive I have. Thank you!! |
Beta Was this translation helpful? Give feedback.
-
I went through this thread! |
Beta Was this translation helpful? Give feedback.
-
Hi all, Is my understanding correct? |
Beta Was this translation helpful? Give feedback.
-
Is |
Beta Was this translation helpful? Give feedback.
-
For me raid anything used for plot drives on my NAS was utter rubbish and expanding a raid while copying plots to it was the worst ever. On my AMD based Synology DS1821+ 16Gb of ram (so not under powered) I tried everything but it was all rubbish. Windows vm running on NAS as farmer too slow, docker running farmer (weird missing farming errors), separate hardwired Windows laptop as farmer and NAS with raid/SMB shares and it was pants. In the end I took coding-horrrors advised and move it to windows. A) One powerful Windows 10 HP Microserver with 4 drives as individual drives (14Tb each) as the farmer. b) cheap as chips (less than 60$) HP Microserver N34l with open media vault again 4 more drives individual drives (14Tb) with separate SMB shares for each drive. c) repeated b as many time as I needed. plot times are 99% of the times less than 5 seconds and I have 600 plots. In my limited option its the RAID overhead that fs things up and Synology can not anything but raid (lowest setting is basic but even that is raid as you can expanded). |
Beta Was this translation helpful? Give feedback.
-
Just wanted to drop a comment here. I never see anyone talking about Malware on-access scanning. Probably a good call in Windows to set some exclusions.
|
Beta Was this translation helpful? Give feedback.
-
I think if lookup times are too long it needs to be at least flagged in log as WARN or ERR ..... Updated! |
Beta Was this translation helpful? Give feedback.
-
I have create an idea which should reveal that issue here: #5064 I also submitted a PR which should alleviate the underlying issue of slow storage access: Chia-Network/chiapos#239 |
Beta Was this translation helpful? Give feedback.
-
@stevepresley the warring log is very helpful. But I have had some that were ignore-able. I had 4 bad plots in my farm. 1 adding a log line every 2 minutes or so. I went about removing them. After removing the first two, I tried to run 'chia plots check -h' and found the Win10 box ignoring me. The only things running on the Win10 box was Chia, remote desktop, and my cmd. Win10 did recover a minute or so later. It looks like Windows fixed an NTFS error on drive H:\ after I told it to delete a plot on drive H:\. That event caused Chia to log 1 read error on drive H:\ and then 4 warnings about lookups taking too long on 4 other drives. This one time hiccup is no big deal. However, I see a need to have the harvester io priority raised, so it is less likely to be effected by other io operations. Someone mentioned not to copy plots onto an actively farmed NAS. At 10 ext. HDD, disk/wire/space management is becoming an issue. I want to move to a RAID setup and remove the ext HDD. The RAID will grow over time. As that it can take 15 minutes to copy 1 k32 plot, I do not want the harvester unable to do lookups on plots because a plot is being copied to the RAID. The RAID would be directly attached to the harvester. NAS's can have a separate OS running the NAS and simply sharing folders on the network. Debug.log: |
Beta Was this translation helpful? Give feedback.
-
@coding-horror, this fixed the
Wondering if this' a bug that could be fixed? |
Beta Was this translation helpful? Give feedback.
-
The problem
I thought I was farming, but I wasn't -- because something about my network caused the proof check to take more than the hard-coded 30 second limit.
I had an average time to win of 8 or 9 hours for more than 120 hours without a single win. This seemed statistically implausible, so I researched the logs, and cleared any errors or warnings in the logs (well done, all the warnings and errors in
debug.log
were indeed things I should fix!). Still no wins for a long time.How to reproduce
Have a bunch of plots on slow storage media; when the proof check happens, verifying the proofs takes longer than the hard-coded 30 seconds allowed. You will never win a single Chia, but there's absolutely nothing in the GUI to inform you that this is happening. You can view the logs, but in the logs it is not even presented as a warning (!), but as an INFO message:
Of the above, the proofs that take longer than 30 seconds are not eligible to win, but this is not logged as an ERROR or WARNING or surfaced in the UI in any way.
Expected behavior
The GUI will tell you "hey, your proof checks are too slow, there's absolutely no chance for you to win, even if you are farming infinity plots"
Screenshots
Desktop
Additional context
I followed up on the #support channel in Keybase, where I got the important advice to enable INFO level logging and check for the 30 second proof limit.. and I wrote up a detailed account on the forum; if you need excruciating levels of detail, please check there 🙇♂️
Recommended solution
Beta Was this translation helpful? Give feedback.
All reactions