Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Panic error when telegraf starts #1322

Closed
njordr opened this issue Jun 3, 2016 · 10 comments · Fixed by #1327
Closed

Panic error when telegraf starts #1322

njordr opened this issue Jun 3, 2016 · 10 comments · Fixed by #1327
Labels
bug unexpected problem or unintended behavior

Comments

@njordr
Copy link

njordr commented Jun 3, 2016

Bug report

System info:

telegraf version: 0.13.1
system: red hat 5.2

Steps to reproduce:

  1. Start telegraf

Expected behavior:

Actual behavior:

The following error:
2016/06/03 11:04:43 Starting Telegraf (version 0.13.1)
2016/06/03 11:04:43 Loaded outputs: influxdb
2016/06/03 11:04:43 Loaded inputs: disk net kernel mem processes swap system cpu diskio
2016/06/03 11:04:43 Tags enabled: host=cw-spmi-oas05
2016/06/03 11:04:43 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"cw-spmi-oas05", Flush Interval:10s
panic: runtime error: index out of range

goroutine 69 [running]:
panic(0x107ebc0, 0xc82000e080)
/usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/shirou/gopsutil/disk.IOCounters(0x0, 0x0, 0x0)
/home/ubuntu/telegraf-build/src/github.com/shirou/gopsutil/disk/disk_linux.go:299 +0x823
github.com/influxdata/telegraf/plugins/inputs/system.(_systemPS).DiskIO(0x1c78550, 0x0, 0x0, 0x0)
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/system/ps.go:118 +0x28
github.com/influxdata/telegraf/plugins/inputs/system.(_DiskIOStats).Gather(0xc8200116e0, 0x2aaaaac8b638, 0xc820132330, 0x0, 0x0)
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/system/disk.go:104 +0x67
github.com/influxdata/telegraf/agent.gatherWithTimeout.func1(0xc8203be480, 0xc820011740, 0xc820132330)
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:174 +0x73
created by github.com/influxdata/telegraf/agent.gatherWithTimeout
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:175 +0xe0

@sparrc
Copy link
Contributor

sparrc commented Jun 3, 2016

could you provide your config file? Looks like you can workaround this for the time being by disabling the diskio plugin.

@sparrc sparrc added the bug unexpected problem or unintended behavior label Jun 3, 2016
@njordr
Copy link
Author

njordr commented Jun 3, 2016

This is my config file. Now I've disabled the diskio plugin to gather the other metrics. Do you think is it possible to fix it?

Thanks
telegraf.zip

@sparrc
Copy link
Contributor

sparrc commented Jun 3, 2016

Should be able to fix the panic, but I'm not quite sure I understand how this is happening.

Can you run the following commands on the system you are seeing this on?

cat /proc/diskstats

and

uname -a

@njordr
Copy link
Author

njordr commented Jun 3, 2016

diskstats:

[root@cw-spmi-oas05 collectd]# cat /proc/diskstats
1 0 ram0 0 0 0 0 0 0 0 0 0 0 0
1 1 ram1 0 0 0 0 0 0 0 0 0 0 0
1 2 ram2 0 0 0 0 0 0 0 0 0 0 0
1 3 ram3 0 0 0 0 0 0 0 0 0 0 0
1 4 ram4 0 0 0 0 0 0 0 0 0 0 0
1 5 ram5 0 0 0 0 0 0 0 0 0 0 0
1 6 ram6 0 0 0 0 0 0 0 0 0 0 0
1 7 ram7 0 0 0 0 0 0 0 0 0 0 0
1 8 ram8 0 0 0 0 0 0 0 0 0 0 0
1 9 ram9 0 0 0 0 0 0 0 0 0 0 0
1 10 ram10 0 0 0 0 0 0 0 0 0 0 0
1 11 ram11 0 0 0 0 0 0 0 0 0 0 0
1 12 ram12 0 0 0 0 0 0 0 0 0 0 0
1 13 ram13 0 0 0 0 0 0 0 0 0 0 0
1 14 ram14 0 0 0 0 0 0 0 0 0 0 0
1 15 ram15 0 0 0 0 0 0 0 0 0 0 0
104 0 cciss/c0d0 2737421 2213708 88016887 12784131 160086414 459980224 4960729244 2166799971 0 500048188 2179599478
104 1 cciss/c0d0p1 1059 2128 2 4
104 2 cciss/c0d0p2 4950321 88012127 620087068 665726208
253 0 dm-0 3168023 0 73756330 15505428 617718557 0 4941748456 1614437555 0 499197782 1629981132
253 1 dm-1 1782220 0 14257760 7890375 2372598 0 18980784 399180543 0 1330039 407111778
9 0 md0 0 0 0 0 0 0 0 0 0 0 0

uname

Linux cw-spmi-oas05 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Thanks

@sparrc
Copy link
Contributor

sparrc commented Jun 3, 2016

FWIW, your OS was released over 8 years ago, is there a reason you haven't updated to redhat 5.10? It's possible this is a bug in your kernel,

specifically, do you know why these lines don't have 14 fields?:

104 1 cciss/c0d0p1 1059 2128 2 4
104 2 cciss/c0d0p2 4950321 88012127 620087068 665726208

@sparrc
Copy link
Contributor

sparrc commented Jun 3, 2016

looks like this is fixed in RHEL 5.3: https://bugzilla.redhat.com/show_bug.cgi?id=583285

I'm inclined to close this as "won't fix" unless you have a compelling reason not to upgrade

@njordr
Copy link
Author

njordr commented Jun 3, 2016

Sadly I cannot upgrade but I understand that I could be the only one to have it :(

Thanks

@sparrc
Copy link
Contributor

sparrc commented Jun 3, 2016

I think it will actually be a fairly easy to fix, but requires a patch to a dependency of telegraf (gopsutil). I'll keep this issue open but I'm not sure I'll be able to get it into the next release.

sparrc added a commit to sparrc/gopsutil that referenced this issue Jun 3, 2016
Old kernels have a bug in diskstats where lines can have less than 14
fields. This applies to the kernel present in RHEL 5.2 and earlier.

It's a bit of a niche but probably best to patch to be safe from future
bugs too.

RHEL bug case:
https://bugzilla.redhat.com/show_bug.cgi?id=583285

Encountered in Telegraf:
influxdata/telegraf#1322
@sparrc
Copy link
Contributor

sparrc commented Jun 3, 2016

fixed was merged, so I will close this shortly

@njordr
Copy link
Author

njordr commented Jun 3, 2016

Great, thanks

sparrc added a commit that referenced this issue Jun 3, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants