Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPU Temperature Displaying as 400M Degrees Centigrade for FreeNAS/FreeBSD #945

Closed
Maelos opened this issue May 14, 2018 · 17 comments
Closed

Comments

@Maelos
Copy link

Maelos commented May 14, 2018

More info can be found here: https://groups.google.com/forum/#!topic/prometheus-users/MjA77maIz5o

Host operating system: output of uname -a

11.1 Stable FreeBSD

node_exporter version: output of node_exporter --version

does not really show, may be part of the problem, but it does work otherwise
The command does show Go version 1.10.1

node_exporter command line flags

default

Are you running node_exporter in Docker?

no

What did you do that produced an error?

Ran node_exporter normally after building it using https://blog.yo61.com/installing-prometheus-node_exporter-on-freenas/

What did you expect to see?

A temperature within normal limits

What did you see instead?

A CPU temperature reported as 400,000,000+ degrees Centigrade.

@Maelos Maelos changed the title CPU Temperature Displaying as 400M Degrees Centigrade CPU Temperature Displaying as 400M Degrees Centigrade for FreeNAS/FreeBSD May 14, 2018
@SuperQ
Copy link
Member

SuperQ commented May 14, 2018

Strange, this value comes from syscall dev.cpu.X.temperature. Do you get a correct result from sysctl -a | grep temperature?

@Maelos
Copy link
Author

Maelos commented May 14, 2018

Thank you for the speedy reply. No, unfortunately. This is what I get from that grep

image

@SuperQ
Copy link
Member

SuperQ commented May 14, 2018

Can you add a debug log of the raw value to the exporter? Something like this after line 129 (where unix.SysctlUint32 is).

log.Infof("Raw temp value cpu %d value %d", cpu, temp)

This will let us see what the syscall is returning, if it seems like something simple, we can special case the returned result.

@Maelos
Copy link
Author

Maelos commented May 14, 2018

I will see what I can do. I getting back into the swing of developing and getting everything ported over to FreeNAS properly has been a challenge. So far the only way I have gotten it to work is through the FreeBSD Ports -> make -> run. I will see if I can take the fresh code and build a proper binary (with your adjustments).

@SuperQ
Copy link
Member

SuperQ commented May 14, 2018

Yes, the FreeBSD builds are difficult. We have been slowly working on an official build pipeline, but it's not ready yet.

@Maelos
Copy link
Author

Maelos commented May 14, 2018

I know this may be well known, but I am getting a bunch of "case-insensitive import collisions" that are preventing the cross build. I have looked up a few similar issues and am trying to weed through it now.

golang/dep#433
https://stackoverflow.com/questions/43618860/how-to-handle-case-sensitive-import-collisions

fixed that by lowercasing Prometheus /src/github/prometheus/... . I can compile for Windows just fine, but when I switch GOOS to freebsd and build (go build -v node_exporter.go) I get

# github.com/prometheus/node_exporter/collector
collector\boot_time_bsd.go:23:41: undefined: bsdSysctl

which is...

type bootTimeCollector struct{ boottime bsdSysctl }

func init() {
	registerCollector("boottime", defaultEnabled, newBootTimeCollector)
}

// newBootTimeCollector returns a new Collector exposing system boot time on BSD systems.
func newBootTimeCollector() (Collector, error) {
	return &bootTimeCollector{
		boottime: bsdSysctl{
			name:        "boot_time_seconds",
			description: "Unix time of last boot, including microseconds.",
			mib:         "kern.boottime",
			dataType:    bsdSysctlTypeStructTimeval,
		},
	}, nil
}

I'm not quite sure where to go from there. I know that is not presenting a solution, but the deeper I dig the cloudier it seems to get. Any suggestions? Am I somehow getting old or incorrect code from go get pulls?

@brian-brazil
Copy link
Contributor

-1 on a unit32 would be 4B, and our current code divides that by 10 (I presume the value is meant to be decikelvin) which would give ~400M.

@Maelos
Copy link
Author

Maelos commented May 14, 2018

Well then Prometheus is working, but sysctl is not. Odd. Hello again Brian - I just emailed you recently about this same problem.

@SuperQ
Copy link
Member

SuperQ commented May 14, 2018

I guess we should be reading the temp sysctl with int32, not uint32 and checking for negative values.

EDIT: After looking into it, there doesn't seem to be int32 function. :(

@Maelos
Copy link
Author

Maelos commented May 15, 2018

I tried to work around this by using IPMI with a custom exporter. I know there is already a more complete IPMI exporter, but I wanted to build this out once I have the IPMI section working (next up is using smartctl for temps and other drive stats). If able, how does this look as a solution for FreeBSD (albeit only applicable with one who has IPMI access)? https://github.com/Maelos/freenascollector

@Maelos
Copy link
Author

Maelos commented May 16, 2018

I neglected to mention that FreeNAS is running as a VM on ESXI 6.5. This is likely the source of the problem. I am, however, making progress with a custom collector or exporter that will use IPMI for the CPU temp.

@SuperQ
Copy link
Member

SuperQ commented May 16, 2018

FYI, @beorn7 is working on an open source generic IPMI exporter. 👉 :shipit: 👈 😜

@Maelos
Copy link
Author

Maelos commented May 16, 2018

While this is true, and his work seems far better than what I can do at this time, I am making this as a base script that will flesh out into pulling other FreeNAS data. FreeNAS is the only VM that can get access to both the board information (via IPMI) and the drive's information (HBA passed through directly to FreeNAS). I could use what works already for IPMI, but I would still need to get the drive working. I think the drive part will be easier, or I hope it will, but I was going to start with the CPU because I was hoping to test the memory (MemTest) prior to loading in the drives and doing burn in tests/monitoring with them.

For what its worth I logged into my FreeBSD VM and was unable to get a temperature reading with sysctl either, so I think the VM/FreeBSD may be part of the problem. I luckily, just like @beorn7 is doing, have access to IPMI so I can go that route.

@derekmarcotte
Copy link
Contributor

@SuperQ: I can take a peek, off the cuff i dont think its a hard fix. Please feel free to tag me on any freebsd issues, I'm happy to lend a hand where I can.

@Maelos: There is the ipmitool exporter in the text collector examples. Dunno if it helps your situation?

@SuperQ
Copy link
Member

SuperQ commented Jun 5, 2018

@derekmarcotte Thanks for taking a look

SoundCloud published their official IPMI exporter: https://github.com/soundcloud/ipmi_exporter

@derekmarcotte
Copy link
Contributor

@Maelos, dunno if you want to try the patch for this. I don't have a machine reporting the value, but the patch makes sense to me.

You will not be able to cross-build, because of dependencies on the FreeBSD C runtime. If you run a FreeBSD machine (or virtual machine), you should be able to do a native build. You'll need a few packages installed there: https://github.com/derekmarcotte/kitefactory/blob/master/src/ansible/roles/freebsd/tasks/main.yml#L15 I might be able to link to a binary, but perhaps pm me for that - not sure of the etiquette on that.

derekmarcotte added a commit to derekmarcotte/node_exporter that referenced this issue Jun 5, 2018
Added a type conversion to cpu temperature sysctl.  Will still
collect/report -1 when the value is -1, this is because it should be up
to interpretation whether this is the correct value for the system or
not.

Some drivers will report -1 for cpu temperature.  Other sensors will
report "an input into the fan control algorithm", i.e. not the actual
temperature, but how much fan it wants.  Some people cool their machines
with liquid nitrogen.

Signed-off-by: Derek Marcotte <[email protected]>
@Maelos
Copy link
Author

Maelos commented Jun 5, 2018

I ended up going a different route but I will take a look at this. Thank you.

SuperQ pushed a commit that referenced this issue Jun 7, 2018
* Fix for #945, cpu temperature is signed.

Added a type conversion to cpu temperature sysctl.  Will still
collect/report -1 when the value is -1, this is because it should be up
to interpretation whether this is the correct value for the system or
not.

Some drivers will report -1 for cpu temperature.  Other sensors will
report "an input into the fan control algorithm", i.e. not the actual
temperature, but how much fan it wants.  Some people cool their machines
with liquid nitrogen.

Signed-off-by: Derek Marcotte <[email protected]>
oblitorum pushed a commit to shatteredsilicon/node_exporter that referenced this issue Apr 9, 2024
* Fix for prometheus#945, cpu temperature is signed.

Added a type conversion to cpu temperature sysctl.  Will still
collect/report -1 when the value is -1, this is because it should be up
to interpretation whether this is the correct value for the system or
not.

Some drivers will report -1 for cpu temperature.  Other sensors will
report "an input into the fan control algorithm", i.e. not the actual
temperature, but how much fan it wants.  Some people cool their machines
with liquid nitrogen.

Signed-off-by: Derek Marcotte <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants