Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add debug output to understand calculation of failure probability (FP) #16

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

saladpanda
Copy link
Contributor

Please don't actually merge this. I just wanted to publish it here for others facing the same confusion, that I faced.

When running snapraid smart on my machine I was surprised to see a failure probability of 84% even though all 5 SMART values mentioned in the SnapRAID FAQ suggested a perfectly healthy drive.
To understand where this high value came from I added some debug ouput along the code path for the calculation and saw that it was based on Load_Cycle_Count (193) which was not mentioned in the FAQ, nor in the Backblaze blogposts.

With this patch the output looks like this:

SnapRAID SMART report:

   Temp  Power   Error   FP Size
      C OnDays   Count        TB  Serial          Device    Disk
 -----------------------------------------------------------------------
     41    569       0
|value=0||value/step=0||value=0||result=0.031633| calculated AFR for SMART value 5: 0.031633 (3.113823%)
|value=0||value/step=0||value=0||result=0.047450|calculated AFR for SMART value 187: 0.047450 (4.634185%)
|value=235466||value/step=362||value=255||result=1.822567|calculated AFR for SMART value 193: 1.822567 (83.838958%)
|value=0||value/step=0||value=0||result=0.034067|calculated AFR for SMART value 197: 0.034067 (3.349293%)
|value=0||value/step=0||value=0||result=0.036500|calculated AFR for SMART value 198: 0.036500 (3.584191%)
                        84%  2.0  Z4Z4JCWX        /dev/sda  disk1
     32   2719       0
|value=0||value/step=0||value=0||result=0.031633| calculated AFR for SMART value 5: 0.031633 (3.113823%)
|value=5093425||value/step=7848||value=255||result=1.822567|calculated AFR for SMART value 193: 1.822567 (83.838958%)
|value=0||value/step=0||value=0||result=0.034067|calculated AFR for SMART value 197: 0.034067 (3.349293%)
|value=0||value/step=0||value=0||result=0.036500|calculated AFR for SMART value 198: 0.036500 (3.584191%)
                        84%  0.5  71UTC68RT       /dev/sdd  disk2
     44     39       0
|value=0||value/step=0||value=0||result=0.031633| calculated AFR for SMART value 5: 0.031633 (3.113823%)
|value=0||value/step=0||value=0||result=0.047450|calculated AFR for SMART value 187: 0.047450 (4.634185%)
|value=162||value/step=0||value=0||result=0.000000|calculated AFR for SMART value 193: 0.000000 (0.000000%)
|value=0||value/step=0||value=0||result=0.034067|calculated AFR for SMART value 197: 0.034067 (3.349293%)
|value=0||value/step=0||value=0||result=0.036500|calculated AFR for SMART value 198: 0.036500 (3.584191%)
                         5%  6.0  ZF200GE8        /dev/sdb  parity
      -      -       -  n/a    -  -               /dev/sdh  -
      -      -       -  n/a    -  -               /dev/sdg  -
      -      -       -  n/a    -  -               /dev/sde  -
      -      -       -  n/a    -  -               /dev/sdf  -
     30    606       0  SSD  0.2  Y5IB61BCKNSX    /dev/sdc  -
     31     36       0
|value=0||value/step=0||value=0||result=0.031633| calculated AFR for SMART value 5: 0.031633 (3.113823%)
|value=0||value/step=0||value=0||result=0.034067|calculated AFR for SMART value 197: 0.034067 (3.349293%)
|value=0||value/step=0||value=0||result=0.036500|calculated AFR for SMART value 198: 0.036500 (3.584191%)
                         4%  1.0  S2RXJ9FCB07612  /dev/sdi  -

The FP column is the estimated probability (in percentage) that the disk
is going to fail in the next year.

Probability that at least one disk is going to fail in the next year is 98%.

@saladpanda
Copy link
Contributor Author

To compile on ubuntu:

apt install build-essential autoconf
autoreconf -i
./configure
make

# run the binary:
./snapraid

@amadvance
Copy link
Owner

Hi @saladpanda

If I read the output correctly, the Load Cycle Count of your disk is 235,466, which is indeed a high value. In the data I analyzed, this appears to be an indicator of potential failure. However, it's important to note that this doesn't necessarily mean your disk will fail, as each case is unique. It's a good idea to check your hard drive's specifications to see what it's rated for, just to be sure. Hard drives are typically rated for around 600,000 cycles.

See for example this discussion:

https://superuser.com/questions/840851/how-much-load-cycle-count-can-my-hard-drive-hypotethically-sustain

@amadvance amadvance force-pushed the master branch 2 times, most recently from 9222213 to 79e8794 Compare January 10, 2024 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants