Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: collect SSD endurance information where available in smartctl #11391

Merged
merged 13 commits into from
Jul 12, 2022

Conversation

bentasker
Copy link
Contributor

@bentasker bentasker commented Jun 24, 2022

Required for all PRs:

Adjusts the smart input plugin to collect SSD endurance information in more situations (for example, resolves #8701)

The smart plugin already collects endurance information if attributes is true and the NVME log contains the attribute Percentage Used.

However, this PR adjusts the plugin to also collect endurance information if it's available in the following locations

  • in the NVME log with name Percentage used endurance indicator
  • For SATA SSDs, in the SMART data labelled as one of Percent_Lifetime_Remain, Wear_Leveling_Count or Media_Wearout_Indicator

Because the SMART data fields are vendor specific, the ID cannot be relied on, so this PR introduces the ability to match rows by name rather than ID.

Additionally, for safety, it is not assumed that the value of any one of those field values could/should overwrite the other, so three new output fields are created

  • endurance_remain_perc
  • endurance_wear_levelling
  • endurance_media_wearout

For the majority of vendors, these values start at 100 and count down - it is expected that there will be exceptions, but this PR doesn't attempt to address those as it seems better to let the user handle those as they see fit at query time.

@telegraf-tiger
Copy link
Contributor

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

@telegraf-tiger telegraf-tiger bot added the feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin label Jun 24, 2022
@telegraf-tiger
Copy link
Contributor

Thanks so much for the pull request!
🤝 ✒️ Just a reminder that the CLA has not yet been signed, and we'll need it before merging. Please sign the CLA when you get a chance, then post a comment here saying !signed-cla

@bentasker
Copy link
Contributor Author

!signed-cla

@sspaink
Copy link
Contributor

sspaink commented Jun 29, 2022

@bentasker for some reason CircleCI tests didn't get triggered, mind rebasing on master to see if it would trigger them? Thanks!

bentasker added 10 commits June 30, 2022 15:42
…where it's present (related to influxdata#8701)

There isn't a standardised attribute ID (or name) for SSD endurance information with SMART data.

Some vendors use ID 202, others use 177 (and others use others).

The two following entries indicate the remaining lifetime of two different SSDs

    202 Percent_Lifetime_Remain P---CK   094   094   000    -    6
    177 Wear_Leveling_Count     ------   100   100   050    -    432

Because those device manufacturers that *don't* use `202` might use `202` for other information, we cannot safely rely on the ID to identify the information.

This commit introduces `deviceFieldNames` which is a map of names (such as `Percent_Lifetime_Remain`).

When the SMART data is iterated over in `gatherDisk()`, each entry's name is checked against `deviceFieldNames` andd a field created where a match is made
…bute (see influxdata#8701)

This form of wording is used in the smart attributes on WD HGST SSDs. It's functionally identical to the "Percent Used" attribute seen on various other brands
Although the different values are assumed to mean the same thing, because their implementation is vendor specific that assumption probably isn't safe.
…used:"

Adds a new SMART snippet (as none of the existing examples had it) and adds a test
The earlier commits introduced new fields, so update the expected result counts to reflect this
@bentasker bentasker force-pushed the 20220624_smartctl_ssd_endurance branch from b0d3144 to 41fee36 Compare June 30, 2022 14:43
@bentasker
Copy link
Contributor Author

@sspaink sure, done

plugins/inputs/smart/smart.go Outdated Show resolved Hide resolved
@bentasker bentasker requested a review from sspaink June 30, 2022 18:42
@sspaink sspaink added the ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review. label Jun 30, 2022
plugins/inputs/smart/smart.go Outdated Show resolved Hide resolved
Other fields seem to be emitted using lowercase, so this follows that convention as well as keeping processing of `deviceFieldNames` in line with the processing applied to `deviceFieldNames`.
@sspaink sspaink merged commit fa0c9c9 into influxdata:master Jul 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feat Improvement on an existing feature such as adding a new setting/mode to an existing plugin ready for final review This pull request has been reviewed and/or tested by multiple users and is ready for a final review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Percentage used endurance indicator wasn't sent even smart attributes enabled
3 participants