Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SSD Health API and generic implementation #47

Merged
merged 2 commits into from
Sep 18, 2019

Conversation

andriymoroz-mlnx
Copy link
Contributor

Signed-off-by: Andriy Moroz [email protected]

Returns:
A string holding some vendor specific disk information
"""
return self.vendor_ssd_info

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except the attributes you list. It's better to add "capacity" "P/E cycle" "Bad block" "Remaining time" .

Copy link
Contributor Author

@andriymoroz-mlnx andriymoroz-mlnx Aug 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the attributes you suggest are probably specific to InnoDisk SSDs
For example StorFly disks does not have it but provide attribute #168 (NAND Endurance) which initial value is 20000. If compare to P/E from InnoDisk which is 3000 I think StorFly are not 6 times more reliable but rather use different units. That's why I would prefer to show such info with the "--vendor" option
"Bad block" value is also ambiguous. Depending on SSD NAND type (SLC, TLC, MLC) the endurance of flash cells can be different. Of course manufacturer knows about it and compensate worse endurance with the greater amount of reserved cells. That's why the absolute value of the bad (reallocated) cells does not represent the disk health state. Sometimes it is used to calculate disk health as ((<total number of reserved cells> - <number of reallocated cells> / <total number of reserved cells>)*100
"Remaining time" (InnoDisk utility calls this parameter Lifespan) is also provided not by all vendors and is very rough estimation. It is highly dependent on disk usage patterns.

Someday we can add daemon to the pmon which will periodically query current disk health and raise alarm once it reaches some threshold.

@jleveque jleveque merged commit cc2dac5 into sonic-net:master Sep 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants