Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: /proc/mdstat collection #9101

Merged
merged 57 commits into from
Aug 31, 2021
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
0b07d32
start adding mdadm check
Apr 5, 2021
646b663
Merge branch 'master' of github.com:influxdata/telegraf
Apr 7, 2021
ae44680
basic build of mdadm collector done
Apr 7, 2021
6711648
add mdadm to plugin registry
Apr 7, 2021
60b3b2c
fix some compilation errors
Apr 7, 2021
4e280ed
fix other syntax problems
Apr 7, 2021
ce1d030
update readme
Apr 7, 2021
67ea897
use better naming for plugin
Apr 7, 2021
4e5a7a6
add lincense stanza for initial work on this
Apr 7, 2021
d390810
formatting fixes
Apr 7, 2021
31f016a
fix function name
Apr 7, 2021
c4d39e6
remove un-used variable
Apr 7, 2021
9abfa98
fix formatting problems
Apr 7, 2021
e382f5e
correct assert for working array listing
Apr 7, 2021
46352ad
fix comparison for recoveryLine RE
Apr 7, 2021
ad1830b
actually follow docs for functions...
Apr 7, 2021
24b44c0
fix regex for finish time
Apr 7, 2021
ac39d5b
correct assert and clarify units for some stats
Apr 7, 2021
2c30dee
don't try to read into an empty slice
Apr 7, 2021
2fb25fb
don't restrain build
Apr 7, 2021
3e16660
fix constraints
Apr 7, 2021
57e04d1
remove one extra return function
Apr 7, 2021
b593b0c
Allow for using HOST_PROC with mdstat
Apr 8, 2021
17bfa94
fix test to use proc prefix
Apr 8, 2021
6d6a121
fix function
Apr 8, 2021
50e3639
try to fix tests again
Apr 8, 2021
3ab304a
try another temp file fix
Apr 8, 2021
a99875e
try another temp file fix
Apr 8, 2021
c063aa5
tweak test some more
Apr 8, 2021
232593d
fix tests more
Apr 8, 2021
dd6ba04
fix tests more again one more time
Apr 8, 2021
8ac314a
Merge branch 'master' of github.com:influxdata/telegraf
Apr 8, 2021
ccc053a
simpler tmp file creation
Apr 8, 2021
7da4aad
Merge branch 'master' of github.com:influxdata/telegraf
Apr 12, 2021
8dca456
correct testing (and allow for any file name)
Apr 12, 2021
a83d0c4
simpler if statement
Apr 12, 2021
68f7b7f
update readme
Apr 12, 2021
ccf98ca
Merge branch 'master' of github.com:influxdata/telegraf
Apr 15, 2021
b6a7dfc
add link in main readme
Apr 15, 2021
59893c3
Merge branch 'master' of github.com:influxdata/telegraf
Apr 26, 2021
f12fce6
return values as they are collected
Apr 26, 2021
03141b2
put total collection in the correct place
Apr 26, 2021
d54b2ae
Merge branch 'master' of github.com:influxdata/telegraf
May 18, 2021
5149235
fewer return statements
May 18, 2021
3c61683
fix variable assignment
May 18, 2021
784ee5d
recover trashed changes
May 18, 2021
d4d1f2e
actually set variable
May 18, 2021
19bdc77
actually set variable
May 18, 2021
fe25eb7
fix struct returns
May 18, 2021
1086055
spelling
May 18, 2021
733a6d7
more typos
May 18, 2021
06dba54
Merge branch 'master' of github.com:influxdata/telegraf
May 25, 2021
51ba709
follow conventions better, fix readme, and fix test cases for empty a…
May 25, 2021
5a4b805
follow conventions better, fix readme, and fix test cases for empty a…
May 25, 2021
692694e
exlpicitly define failed disks on return
May 25, 2021
b9720e9
check for 'down' disks correctly (they aren't the same as failed)
May 25, 2021
a5299e9
update example in readme
May 25, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions plugins/inputs/all/all.go
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ import (
_ "github.com/influxdata/telegraf/plugins/inputs/mailchimp"
_ "github.com/influxdata/telegraf/plugins/inputs/marklogic"
_ "github.com/influxdata/telegraf/plugins/inputs/mcrouter"
_ "github.com/influxdata/telegraf/plugins/inputs/mdstat"
_ "github.com/influxdata/telegraf/plugins/inputs/mem"
_ "github.com/influxdata/telegraf/plugins/inputs/memcached"
_ "github.com/influxdata/telegraf/plugins/inputs/mesos"
Expand Down
45 changes: 45 additions & 0 deletions plugins/inputs/mdstat/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# mdstat Input Plugin

The mdstat plugin gathers statistics about any Linux MD RAID arrays configured on the host
by reading /proc/mdstat. For a full list of available fields see the
/proc/mdstat section of the [proc man page](http://man7.org/linux/man-pages/man5/proc.5.html).
For a better idea of what each field represents, see the
[mdstat man page](https://raid.wiki.kernel.org/index.php/Mdstat).


### Configuration:

```toml
# Get kernel statistics from /proc/vmstat
johnseekins marked this conversation as resolved.
Show resolved Hide resolved
[[inputs.mdstat]]
# no configuration
```

### Measurements & Fields:

- mdstat
- BlocksSynced (if the array is rebuilding/checking, this is the count of blocks that have been scanned)
- BlocksSyncedFinishTime (the expected finish time of the rebuild scan, listed in minutes remaining)
- BlocksSyncedPct (the percentage of the rebuild scan left)
- BlocksSyncedSpeed (the current speed the rebuild is running at, listed in K/sec)
- BlocksTotal (the total count of blocks in the array)
- DisksActive (the number of disks that are currently considered healthy in the array)
- DisksFailed (the current count of failed disks in the array)
- DisksSpare (the current count of "spare" disks in the array)
- DisksTotal (total count of disks in the array)

### Tags:

- mdstat
- ActivityState (`active` or `inactive`)
- Devices (comma separated list of devices that make up the array)
- Name (name of the array)

### Example Output:

```
$ telegraf --config ~/ws/telegraf.conf --input-filter mdstat --test
* Plugin: kernel_vmstat, Collection 1
johnseekins marked this conversation as resolved.
Show resolved Hide resolved
> mdstat,ActivityState=active,Devices=sdm1\,sdn1,Name=md1 BlocksSynced=231299072i,BlocksSyncedFinishTime=0,BlocksSyncedPct=0,BlocksSyncedSpeed=0,BlocksTotal=231299072i,DisksActive=2i,DisksFailed=0i,DisksSpare=0i,DisksTotal=2i 1617814276000000000
> mdstat,ActivityState=active,Devices=sdm5\,sdn5,Name=md2 BlocksSynced=2996224i,BlocksSyncedFinishTime=0,BlocksSyncedPct=0,BlocksSyncedSpeed=0,BlocksTotal=2996224i,DisksActive=2i,DisksFailed=0i,DisksSpare=0i,DisksTotal=2i 1617814276000000000
```
262 changes: 262 additions & 0 deletions plugins/inputs/mdstat/mdstat.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,262 @@
// +build linux

// Copyright 2018 The Prometheus Authors
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
//
// Code has been changed since initial import.

package mdstat

import (
"fmt"
"io/ioutil"
"os"
"regexp"
"sort"
"strconv"
"strings"

"github.com/influxdata/telegraf"
"github.com/influxdata/telegraf/plugins/inputs"
)

var (
statusLineRE = regexp.MustCompile(`(\d+) blocks .*\[(\d+)/(\d+)\] \[[U_]+\]`)
recoveryLineBlocksRE = regexp.MustCompile(`\((\d+)/\d+\)`)
recoveryLinePctRE = regexp.MustCompile(`= (.+)%`)
recoveryLineFinishRE = regexp.MustCompile(`finish=(.+)min`)
recoveryLineSpeedRE = regexp.MustCompile(`speed=(.+)[A-Z]`)
componentDeviceRE = regexp.MustCompile(`(.*)\[\d+\]`)
)

type MdstatConf struct {
statFile string
}

func (k *MdstatConf) Description() string {
return "Get md array statistics from /proc/mdstat"
}

var mdSampleConfig = `
## No configuration required for this collector
`

func (k *MdstatConf) SampleConfig() string {
return mdSampleConfig
}

func evalStatusLine(deviceLine, statusLine string) (active, total, size int64, err error) {
sizeFields := strings.Fields(statusLine)
if len(sizeFields) < 1 {
return 0, 0, 0, fmt.Errorf("statusLine empty? %q: %w", statusLine, err)
}
sizeStr := sizeFields[0]
size, err = strconv.ParseInt(sizeStr, 10, 64)
if err != nil {
return 0, 0, 0, fmt.Errorf("unexpected statusLine %q: %w", statusLine, err)
}

if strings.Contains(deviceLine, "raid0") || strings.Contains(deviceLine, "linear") {
johnseekins marked this conversation as resolved.
Show resolved Hide resolved
// In the device deviceLine, only disks have a number associated with them in [].
total = int64(strings.Count(deviceLine, "["))
return total, total, size, nil
}

if strings.Contains(deviceLine, "inactive") {
return 0, 0, size, nil
}

matches := statusLineRE.FindStringSubmatch(statusLine)
if len(matches) != 4 {
return 0, 0, 0, fmt.Errorf("couldn't find all the substring matches: %s", statusLine)
}

total, err = strconv.ParseInt(matches[2], 10, 64)
if err != nil {
return 0, 0, 0, fmt.Errorf("unexpected statusLine %q: %w", statusLine, err)
}

active, err = strconv.ParseInt(matches[3], 10, 64)
if err != nil {
return 0, 0, 0, fmt.Errorf("unexpected statusLine %q: %w", statusLine, err)
}

return active, total, size, nil
}

func evalRecoveryLine(recoveryLine string) (syncedBlocks int64, pct float64, finish float64, speed float64, err error) {
// Get count of completed vs. total blocks
matches := recoveryLineBlocksRE.FindStringSubmatch(recoveryLine)
if len(matches) != 2 {
return 0, 0, 0, 0, fmt.Errorf("unexpected recoveryLine matching syncedBlocks: %s", recoveryLine)
}
syncedBlocks, err = strconv.ParseInt(matches[1], 10, 64)
if err != nil {
return 0, 0, 0, 0, fmt.Errorf("error parsing int from recoveryLine %q: %w", recoveryLine, err)
}

// Get percentage complete
matches = recoveryLinePctRE.FindStringSubmatch(recoveryLine)
if len(matches) != 2 {
return 0, 0, 0, 0, fmt.Errorf("unexpected recoveryLine matching percentage: %s", recoveryLine)
}
pct, err = strconv.ParseFloat(matches[1], 64)
if err != nil {
return 0, 0, 0, 0, fmt.Errorf("error parsing float from recoveryLine %q: %w", recoveryLine, err)
}

// Get time expected left to complete
matches = recoveryLineFinishRE.FindStringSubmatch(recoveryLine)
if len(matches) != 2 {
return 0, 0, 0, 0, fmt.Errorf("unexpected recoveryLine matching est. finish time: %s", recoveryLine)
}
finish, err = strconv.ParseFloat(matches[1], 64)
if err != nil {
return 0, 0, 0, 0, fmt.Errorf("error parsing float from recoveryLine %q: %w", recoveryLine, err)
}

// Get recovery speed
matches = recoveryLineSpeedRE.FindStringSubmatch(recoveryLine)
if len(matches) != 2 {
return 0, 0, 0, 0, fmt.Errorf("unexpected recoveryLine matching speed: %s", recoveryLine)
}
speed, err = strconv.ParseFloat(matches[1], 64)
if err != nil {
return 0, 0, 0, 0, fmt.Errorf("error parsing float from recoveryLine %q: %w", recoveryLine, err)
}

return syncedBlocks, pct, finish, speed, nil
}

func evalComponentDevices(deviceFields []string) string {
mdComponentDevices := make([]string, 0)
if len(deviceFields) > 3 {
for _, field := range deviceFields[4:] {
match := componentDeviceRE.FindStringSubmatch(field)
if match == nil {
continue
}
mdComponentDevices = append(mdComponentDevices, match[1])
}
}

// Ensure no churn on tag ordering change
sort.Strings(mdComponentDevices)
return strings.Join(mdComponentDevices, ",")
}

func (k *MdstatConf) Gather(acc telegraf.Accumulator) error {
data, err := k.getProcMdstat()
if err != nil {
return err
}
lines := strings.Split(string(data), "\n")
for i, line := range lines {
if strings.TrimSpace(line) == "" || line[0] == ' ' || strings.HasPrefix(line, "Personalities") || strings.HasPrefix(line, "unused") {
continue
}
deviceFields := strings.Fields(line)
if len(deviceFields) < 3 || len(lines) <= i+3 {
return fmt.Errorf("not enough fields in mdline (expected at least 3): %s", line)
}
mdName := deviceFields[0] // mdx
state := deviceFields[2] // active or inactive

// Failed disks have the suffix (F) & Spare disks have the suffix (S).
fail := int64(strings.Count(line, "(F)"))
spare := int64(strings.Count(line, "(S)"))

active, total, size, err := evalStatusLine(lines[i], lines[i+1])
if err != nil {
return fmt.Errorf("error parsing md device lines: %w", err)
}

syncLineIdx := i + 2
if strings.Contains(lines[i+2], "bitmap") { // skip bitmap line
syncLineIdx++
}

// If device is syncing at the moment, get the number of currently
// synced bytes, otherwise that number equals the size of the device.
syncedBlocks := size
speed := float64(0)
finish := float64(0)
pct := float64(0)
recovering := strings.Contains(lines[syncLineIdx], "recovery")
resyncing := strings.Contains(lines[syncLineIdx], "resync")
checking := strings.Contains(lines[syncLineIdx], "check")

// Append recovery and resyncing state info.
if recovering || resyncing || checking {
if recovering {
state = "recovering"
} else if checking {
state = "checking"
} else {
state = "resyncing"
}

// Handle case when resync=PENDING or resync=DELAYED.
if strings.Contains(lines[syncLineIdx], "PENDING") || strings.Contains(lines[syncLineIdx], "DELAYED") {
syncedBlocks = 0
} else {
syncedBlocks, pct, finish, speed, err = evalRecoveryLine(lines[syncLineIdx])
if err != nil {
return fmt.Errorf("error parsing sync line in md device %q: %w", mdName, err)
}
}
}
fields := map[string]interface{}{
"DisksActive": active,
"DisksFailed": fail,
"DisksSpare": spare,
"DisksTotal": total,
"BlocksTotal": size,
"BlocksSynced": syncedBlocks,
"BlocksSyncedPct": pct,
"BlocksSyncedFinishTime": finish,
"BlocksSyncedSpeed": speed,
}
tags := map[string]string{
"Name": mdName,
"ActivityState": state,
"Devices": evalComponentDevices(deviceFields),
}
acc.AddFields("mdstat", fields, tags)
}

return nil
}

func (k *MdstatConf) getProcMdstat() ([]byte, error) {
if _, err := os.Stat(k.statFile); os.IsNotExist(err) {
return nil, fmt.Errorf("mdstat: %s does not exist", k.statFile)
} else if err != nil {
return nil, err
}

data, err := ioutil.ReadFile(k.statFile)
if err != nil {
return nil, err
}

return data, nil
}

func init() {
inputs.Add("mdstat", func() telegraf.Input {
return &MdstatConf{
statFile: "/proc/mdstat",
}
})
}
3 changes: 3 additions & 0 deletions plugins/inputs/mdstat/mdstat_notlinux.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
// +build !linux

package mdstat
Loading