-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non standard device accessors such as -d cciss,N
do not work
#26
Comments
Hello! |
I just hit the same issue on some dells. I need: guessing the go code passes the dev directly through as one argument. May need to add a separate flag for additional command line switches to pass to smartctl. |
@jacqueslorentz - no, we already have a solution which covers all of these edge cases written by the mighty @daviswr as part of our Zenoss environment. All of the invocation semantics are in his ZenPack code if anyone enjoys Go enough to muck through adaptation - if it ever becomes enough of an issue to allocate time on our end, i'll rewrite the functionality in Rust 😄. |
@sempervictus, seems issue may be closed, anyway config file now is unsupported 🙂 |
Its still an issue. there is no command line switch that makes this work. Please reopen. |
@kfox1111 can you try with last version, add the debug logs, etc |
I had a similar issue with FROM docker.io/library/golang:bullseye as builder
ARG SMARTCTL_EXPORTER_VERSION=0.9.0
WORKDIR /tmp
RUN git clone --branch v${SMARTCTL_EXPORTER_VERSION} --depth 1 --single-branch https://github.com/prometheus-community/smartctl_exporter.git
COPY smartctl.patch /tmp/smartctl.patch
RUN apt-get -y update && apt-get -y install --no-install-recommends \
patch && \
rm -rf /var/lib/apt/lists/*
RUN cd /tmp/smartctl_exporter && \
patch readjson.go < /tmp/smartctl.patch && \
make build && \
mv /tmp/smartctl_exporter/smartctl_exporter /usr/local/bin/smartctl_exporter && \
chmod +x /usr/local/bin/smartctl_exporter
FROM docker.io/library/debian:bullseye-slim
RUN apt-get -y update && apt-get -y install --no-install-recommends \
smartmontools && \
rm -rf /var/lib/apt/lists/*
COPY --from=builder --chown=root:root /usr/local/bin/smartctl_exporter /usr/local/bin/smartctl_exporter
EXPOSE 9633
ENTRYPOINT [ "/usr/local/bin/smartctl_exporter" ]
CMD [ "--version" ] --- org 2022-10-21 13:01:40.379579891 +0200
+++ fix 2022-10-21 13:02:27.198645801 +0200
@@ -63,7 +63,7 @@
// Get json from smartctl and parse it
func readSMARTctl(logger log.Logger, device string) (gjson.Result, bool) {
level.Debug(logger).Log("msg", "Collecting S.M.A.R.T. counters", "device", device)
- out, err := exec.Command(*smartctlPath, "--json", "--info", "--health", "--attributes", "--tolerance=verypermissive", "--nocheck=standby", "--format=brief", device).Output()
+ out, err := exec.Command(*smartctlPath, "--device=sat", "--json", "--info", "--health", "--attributes", "--tolerance=verypermissive", "--nocheck=standby", "--format=brief", device).Output()
if err != nil {
level.Warn(logger).Log("msg", "S.M.A.R.T. output reading", "err", err)
}
@@ -75,7 +75,7 @@
func readSMARTctlDevices(logger log.Logger) gjson.Result {
level.Debug(logger).Log("msg", "Scanning for devices")
- out, err := exec.Command(*smartctlPath, "--json", "--scan").Output()
+ out, err := exec.Command(*smartctlPath, "--device=sat", "--json", "--scan").Output()
if exiterr, ok := err.(*exec.ExitError); ok {
level.Debug(logger).Log("msg", "Exit Status", "exit_code", exiterr.ExitCode())
// The smartctl command returns 2 if devices are sleeping, ignore this error. |
Part of my refactoring for the flag controls was to allow for easier adjustment of flags to the various smartctl commands. |
Hello, Any news? I need to monitor the disks behind the RAID Hardware controller, what would be the arguments to define the disks. In my case, I can see smartctl results with the following command.
Thank you, |
I heard nothing and I gave up on this issue. I don't know why such an elementary issue will not be solved or even addressed.
Btw. the Github issue here #89 reflects to a related problem... |
@josefzahner This is a community project, nobody gets paid to fix this. You're welcome to open a PR to improve things. |
@SuperQ I fully get this. But please understand as well my disappointment, this thing is called "smartctl exporter" and it doesn't support a very important smartctl use case (RAID controllers with -d flag). It has multiple bugtickets here regarding this topic on github and it seems that it will be ignored since more than a year. This project and the community can do whatever they want and because of that we switched to a solution which works (telegraf). Sorry but I'm not familiar with Go, so I can't support the project here, that's why I posted the telegraf config above. |
Why you do think this is important? All RAID's for many companies decades on Linux Multi Drive - and this supported on node_exporter. For this case much important that RAW devices such a SATA & NVMe are also supported |
You are referencing to linux md (aka software RAID) right? I'm referencing to hardware RAID which doesn't use MD at all. Please check my command above, we have eg. HP RAID controllers and the disk can't be read with smartctl exporter due to the lack of the supported "-d" feature. The smartctl scan options doesn't show anything without the -d flag. |
(some?) Dell controllers, even when just passing the drive through, don't work without manually specifying the -d flag to smartctl. I can't get them to work with prometheus-smartctl-exporter as is. |
Dell controllers flawless work in HBA mode, for example Product Name = PERC H330 Mini
Serial Number = 8153784
SAS Address = 5588a5a0eb40f400
PCI Address = 00:03:00:00
System Time = 12/20/2022 02:08:59
Mfg. Date = 01/09/18
Controller Time = 12/19/2022 19:08:21
FW Package Build = 25.5.9.0001
BIOS Version = 6.33.01.0_4.19.08.00_0x06120304
FW Version = 4.300.01-8369
Driver Name = megaraid_sas
Driver Version = 07.719.03.00-rh1
Current Personality = HBA-Mode
Vendor Id = 0x1000
Device Id = 0x5F
SubVendor Id = 0x1028
SubDevice Id = 0x1F4B
Host Interface = PCI-E
Device Interface = SAS-12G
Bus Number = 3
Device Number = 0
Function Number = 0
Domain ID = 0
Security Protocol = None
JBOD Drives = 13 The drives [root@host]# lsscsi
[0:0:0:0] disk ATA TOSHIBA HDWE160 FS2A /dev/sda
[0:0:1:0] disk ATA TOSHIBA HDWE160 FS2A /dev/sdb
[0:0:2:0] disk ATA TOSHIBA HDWE160 FS2A /dev/sdc
[0:0:3:0] disk ATA TOSHIBA HDWE160 FS2A /dev/sdd
[0:0:4:0] disk ATA TOSHIBA HDWE160 FS2A /dev/sde
[0:0:5:0] disk ATA TOSHIBA HDWE160 FS2A /dev/sdf
[0:0:6:0] disk ATA TOSHIBA HDWR160 0603 /dev/sdg
[0:0:7:0] disk ATA TOSHIBA HDWE160 FS2A /dev/sdh
[0:0:8:0] disk ATA TOSHIBA HDWR160 0603 /dev/sdi
[0:0:9:0] disk ATA TOSHIBA HDWE160 FS2A /dev/sdj
[0:0:10:0] disk ATA TOSHIBA HDWE160 FS2A /dev/sdk
[0:0:11:0] disk ATA TOSHIBA HDWE160 FS2A /dev/sdl
[0:0:13:0] disk ATA ADATA SP600 2.9 /dev/sdm
[0:0:32:0] enclosu DP BP13G+EXP 3.35 -
[N:0:0:1] disk INTEL SSDPEDMD400G4__1 /dev/nvme0n1
[N:1:35:1] disk HUSMR7632BHP301__1 /dev/nvme1n1
[N:2:0:1] disk INTEL SSDPEDMD400G4__1 /dev/nvme2n1
[N:3:35:1] disk HUSMR7632BHP301__1 /dev/nvme3n1
[N:4:35:1] disk HUSMR7632BHP301__1 /dev/nvme4n1 The work of [root@host /]# curl -Ss http://localhost:9633/metrics | grep smartctl_device\{ -c
18 |
That's really nice that it works on your example @k0ste . I tells us that some hardware RAIDs do work without the "-d" flag and some not. |
Btw your output doesn't proof a fully working solution. We have 2 SSDs, and we get the following output:
just think about it. |
some cards do work without the '-d' flag. Not all combinations of firmware,card,kernel do though. It seems like -d needs to be supported by smartctl-exporter in order for maximum compatibility. |
All for you 🙂 smartctl_device{ata_additional_product_id="unknown",ata_version="",device="nvme0",firmware_version="8DV101H0",form_factor="",interface="nvme",model_family="",model_name="INTEL SSDPEDMD400G4",protocol="NVMe",sata_version="",serial_number="CVFT7206004H400LGN"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="",device="nvme1",firmware_version="KNGNP100",form_factor="",interface="nvme",model_family="",model_name="HUSMR7632BHP301",protocol="NVMe",sata_version="",serial_number="SDM0000211CD"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="",device="nvme2",firmware_version="8DV101H0",form_factor="",interface="nvme",model_family="",model_name="INTEL SSDPEDMD400G4",protocol="NVMe",sata_version="",serial_number="CVFT4324002Q400BGN"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="",device="nvme3",firmware_version="KNGND100",form_factor="",interface="nvme",model_family="",model_name="HUSMR7632BHP301",protocol="NVMe",sata_version="",serial_number="SDM00003A532"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="",device="nvme4",firmware_version="KNGND100",form_factor="",interface="nvme",model_family="",model_name="HUSMR7632BHP301",protocol="NVMe",sata_version="",serial_number="SDM00003A513"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ACS-2 (minor revision not indicated)",device="sdm",firmware_version="2.9",form_factor="< 1.8 inches",interface="sat",model_family="JMicron based SSDs",model_name="ADATA SP600",protocol="ATA",sata_version="SATA 3.1",serial_number="7F1020004374"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ACS-3 T13/2161-D revision 5",device="sdg",firmware_version="0603",form_factor="3.5 inches",interface="sat",model_family="",model_name="TOSHIBA HDWR160",protocol="ATA",sata_version="SATA 3.3",serial_number="90C0A045FDKG"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ACS-3 T13/2161-D revision 5",device="sdi",firmware_version="0603",form_factor="3.5 inches",interface="sat",model_family="",model_name="TOSHIBA HDWR160",protocol="ATA",sata_version="SATA 3.3",serial_number="9090A0AUFBNG"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ATA8-ACS (minor revision not indicated)",device="sda",firmware_version="FS2A",form_factor="3.5 inches",interface="sat",model_family="Toshiba X300",model_name="TOSHIBA HDWE160",protocol="ATA",sata_version="SATA 3.0",serial_number="773XK11MF56D"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ATA8-ACS (minor revision not indicated)",device="sdb",firmware_version="FS2A",form_factor="3.5 inches",interface="sat",model_family="Toshiba X300",model_name="TOSHIBA HDWE160",protocol="ATA",sata_version="SATA 3.0",serial_number="Z79BK3HTF56D"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ATA8-ACS (minor revision not indicated)",device="sdc",firmware_version="FS2A",form_factor="3.5 inches",interface="sat",model_family="Toshiba X300",model_name="TOSHIBA HDWE160",protocol="ATA",sata_version="SATA 3.0",serial_number="X7C3KCF3F56D"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ATA8-ACS (minor revision not indicated)",device="sdd",firmware_version="FS2A",form_factor="3.5 inches",interface="sat",model_family="Toshiba X300",model_name="TOSHIBA HDWE160",protocol="ATA",sata_version="SATA 3.0",serial_number="X7F2KFTXF56D"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ATA8-ACS (minor revision not indicated)",device="sde",firmware_version="FS2A",form_factor="3.5 inches",interface="sat",model_family="Toshiba X300",model_name="TOSHIBA HDWE160",protocol="ATA",sata_version="SATA 3.0",serial_number="774DK00EF56D"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ATA8-ACS (minor revision not indicated)",device="sdf",firmware_version="FS2A",form_factor="3.5 inches",interface="sat",model_family="Toshiba X300",model_name="TOSHIBA HDWE160",protocol="ATA",sata_version="SATA 3.0",serial_number="778YK17DF56D"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ATA8-ACS (minor revision not indicated)",device="sdh",firmware_version="FS2A",form_factor="3.5 inches",interface="sat",model_family="Toshiba X300",model_name="TOSHIBA HDWE160",protocol="ATA",sata_version="SATA 3.0",serial_number="7747Y06LF56D"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ATA8-ACS (minor revision not indicated)",device="sdj",firmware_version="FS2A",form_factor="3.5 inches",interface="sat",model_family="Toshiba X300",model_name="TOSHIBA HDWE160",protocol="ATA",sata_version="SATA 3.0",serial_number="X7G2KFUJF56D"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ATA8-ACS (minor revision not indicated)",device="sdk",firmware_version="FS2A",form_factor="3.5 inches",interface="sat",model_family="Toshiba X300",model_name="TOSHIBA HDWE160",protocol="ATA",sata_version="SATA 3.0",serial_number="X7HFK5AIF56D"} 1
smartctl_device{ata_additional_product_id="unknown",ata_version="ATA8-ACS (minor revision not indicated)",device="sdl",firmware_version="FS2A",form_factor="3.5 inches",interface="sat",model_family="Toshiba X300",model_name="TOSHIBA HDWE160",protocol="ATA",sata_version="SATA 3.0",serial_number="X7HAKD5ZF56D"} 1 |
Hm, your output lacks of anything which is important for us. No temperature, no failure logs, no power on time, nothing. Here is the output of smartctl (one disk)
and here the prometheus output:
I don't know what the requirements of other people are, but we wanna see as much as possible output from smartctl. This doesn't seem to be the case in your example. |
args: pod wont stay running:
|
Ok, that explains it - the most funny thing is that smartctl tells you to try the "-d" option :-D. So I think you got the point. I showed above what we would expect - today we are far away from that with smartctl-exporter. In our case the exporter doesn't crash, it just shows nothing |
So, what needs to happen to get the -d option supported? |
Someone needs to open a feature PR to extend the command line flags passed in various parts of the exporter. |
Allow passing a custom `-d` / `--device=` flag to smartctl. The default is the same (`auto`) as upstream smartctl. Fixes: #26 Signed-off-by: SuperQ <[email protected]>
I added some notes to the above PR. As it currently stands I do not think it completely addresses this issue.
I personally do favor software RAID over HBA/embedded RAID, but that doesn't change the fact that many of us are stuck with legacy HP, LSI, Adaptec, Areca, etc. HBA RAID volumes. Here's the comment I left on the above PR: Followup to @darxriggs ' notes above: It is entirely possible and not that uncommon for a given system to present both LSI HBA virtual devices and passed-through individual drives. Eg. the first two drives are mirrored with the HBA and need -d megaraid,0, -d megaraid,1, but the balance of drives are passed through and are accessed as /dev/sdXX. On at least some HBAs with some personality / mode settings, any physical drives that aren't part of a VD are passed through. While I personally am not a fan of HBA RAID or RoC HBAs in general, we're still stuck with legacies, and some organizations embrace HBA RAID. Ideally the exporter would iterate smartctl invocations over the results from smartctl --scan, invoking as necessary for each entry. Having to hardcode devices and access flags in a config file or on the commandline would be rather inconvenient and prone to errors -- that would require external scripting to discover devices and keep that mapping updated across inventory changes. To the note about adding a type label for, let's call them occulted devices, I get where the suggested scheme is coming from. I don't think that's the ideal approach, though. type as a label is IMHO a misnomer, this isn't a type but rather a subunit in the vein of a LUN. I'd rather see something like device=sda,0 or device=sda.0 or device=megaraid0.0. It would be awkward to have to correlate labels for some but not all metrics, or to have to devices complex relabeling rules at Prometheus ingest. Notes:
|
The service fails on InnovaHosting servers due to HBA controller: prometheus-community/smartctl_exporter#26 Signed-off-by: Jakub Sokołowski <[email protected]>
When using HPSA controllers,
smartctl
needs to access devices via the-d cciss,<idx>
parameter. However using the following:results in
My Go-fu isn't great (Rust FTW! :-p), and there's a fair amount of indirection through which i need to dig to resolve that locally. If its not too much trouble, would it be possible for the functionality to be implemented upstream?
The text was updated successfully, but these errors were encountered: