Panic error when telegraf starts #1322

njordr · 2016-06-03T09:22:58Z

Bug report

System info:

telegraf version: 0.13.1
system: red hat 5.2

Steps to reproduce:

Start telegraf

Expected behavior:

Actual behavior:

The following error:
2016/06/03 11:04:43 Starting Telegraf (version 0.13.1)
2016/06/03 11:04:43 Loaded outputs: influxdb
2016/06/03 11:04:43 Loaded inputs: disk net kernel mem processes swap system cpu diskio
2016/06/03 11:04:43 Tags enabled: host=cw-spmi-oas05
2016/06/03 11:04:43 Agent Config: Interval:10s, Debug:false, Quiet:false, Hostname:"cw-spmi-oas05", Flush Interval:10s
panic: runtime error: index out of range

goroutine 69 [running]:
panic(0x107ebc0, 0xc82000e080)
/usr/local/go/src/runtime/panic.go:481 +0x3e6
github.com/shirou/gopsutil/disk.IOCounters(0x0, 0x0, 0x0)
/home/ubuntu/telegraf-build/src/github.com/shirou/gopsutil/disk/disk_linux.go:299 +0x823
github.com/influxdata/telegraf/plugins/inputs/system.(_systemPS).DiskIO(0x1c78550, 0x0, 0x0, 0x0)
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/system/ps.go:118 +0x28
github.com/influxdata/telegraf/plugins/inputs/system.(_DiskIOStats).Gather(0xc8200116e0, 0x2aaaaac8b638, 0xc820132330, 0x0, 0x0)
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/plugins/inputs/system/disk.go:104 +0x67
github.com/influxdata/telegraf/agent.gatherWithTimeout.func1(0xc8203be480, 0xc820011740, 0xc820132330)
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:174 +0x73
created by github.com/influxdata/telegraf/agent.gatherWithTimeout
/home/ubuntu/telegraf-build/src/github.com/influxdata/telegraf/agent/agent.go:175 +0xe0

sparrc · 2016-06-03T09:28:47Z

could you provide your config file? Looks like you can workaround this for the time being by disabling the diskio plugin.

njordr · 2016-06-03T09:32:07Z

This is my config file. Now I've disabled the diskio plugin to gather the other metrics. Do you think is it possible to fix it?

Thanks
telegraf.zip

sparrc · 2016-06-03T09:55:10Z

Should be able to fix the panic, but I'm not quite sure I understand how this is happening.

Can you run the following commands on the system you are seeing this on?

cat /proc/diskstats

and

uname -a

njordr · 2016-06-03T09:59:08Z

diskstats:

[root@cw-spmi-oas05 collectd]# cat /proc/diskstats
1 0 ram0 0 0 0 0 0 0 0 0 0 0 0
1 1 ram1 0 0 0 0 0 0 0 0 0 0 0
1 2 ram2 0 0 0 0 0 0 0 0 0 0 0
1 3 ram3 0 0 0 0 0 0 0 0 0 0 0
1 4 ram4 0 0 0 0 0 0 0 0 0 0 0
1 5 ram5 0 0 0 0 0 0 0 0 0 0 0
1 6 ram6 0 0 0 0 0 0 0 0 0 0 0
1 7 ram7 0 0 0 0 0 0 0 0 0 0 0
1 8 ram8 0 0 0 0 0 0 0 0 0 0 0
1 9 ram9 0 0 0 0 0 0 0 0 0 0 0
1 10 ram10 0 0 0 0 0 0 0 0 0 0 0
1 11 ram11 0 0 0 0 0 0 0 0 0 0 0
1 12 ram12 0 0 0 0 0 0 0 0 0 0 0
1 13 ram13 0 0 0 0 0 0 0 0 0 0 0
1 14 ram14 0 0 0 0 0 0 0 0 0 0 0
1 15 ram15 0 0 0 0 0 0 0 0 0 0 0
104 0 cciss/c0d0 2737421 2213708 88016887 12784131 160086414 459980224 4960729244 2166799971 0 500048188 2179599478
104 1 cciss/c0d0p1 1059 2128 2 4
104 2 cciss/c0d0p2 4950321 88012127 620087068 665726208
253 0 dm-0 3168023 0 73756330 15505428 617718557 0 4941748456 1614437555 0 499197782 1629981132
253 1 dm-1 1782220 0 14257760 7890375 2372598 0 18980784 399180543 0 1330039 407111778
9 0 md0 0 0 0 0 0 0 0 0 0 0 0

uname

Linux cw-spmi-oas05 2.6.18-92.el5 #1 SMP Tue Apr 29 13:16:15 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux

Thanks

sparrc · 2016-06-03T10:01:21Z

FWIW, your OS was released over 8 years ago, is there a reason you haven't updated to redhat 5.10? It's possible this is a bug in your kernel,

specifically, do you know why these lines don't have 14 fields?:

104 1 cciss/c0d0p1 1059 2128 2 4
104 2 cciss/c0d0p2 4950321 88012127 620087068 665726208

sparrc · 2016-06-03T10:05:13Z

looks like this is fixed in RHEL 5.3: https://bugzilla.redhat.com/show_bug.cgi?id=583285

I'm inclined to close this as "won't fix" unless you have a compelling reason not to upgrade

njordr · 2016-06-03T10:06:32Z

Sadly I cannot upgrade but I understand that I could be the only one to have it :(

Thanks

sparrc · 2016-06-03T10:10:05Z

I think it will actually be a fairly easy to fix, but requires a patch to a dependency of telegraf (gopsutil). I'll keep this issue open but I'm not sure I'll be able to get it into the next release.

Old kernels have a bug in diskstats where lines can have less than 14 fields. This applies to the kernel present in RHEL 5.2 and earlier. It's a bit of a niche but probably best to patch to be safe from future bugs too. RHEL bug case: https://bugzilla.redhat.com/show_bug.cgi?id=583285 Encountered in Telegraf: influxdata/telegraf#1322

sparrc · 2016-06-03T13:03:27Z

fixed was merged, so I will close this shortly

closes #1322

njordr · 2016-06-03T13:26:44Z

Great, thanks

closes #1322

sparrc added the bug unexpected problem or unintended behavior label Jun 3, 2016

njordr mentioned this issue Jun 3, 2016

Error on net stats #1323

Closed

sparrc mentioned this issue Jun 3, 2016

Fix potential panic in linux disk IO counters shirou/gopsutil#207

Merged

sparrc added a commit that referenced this issue Jun 3, 2016

Fix rare panic in RHEL 5.2 diskio plugin

068bb85

closes #1322

sparrc mentioned this issue Jun 3, 2016

Fix rare panic in RHEL 5.2 diskio plugin #1327

Merged

1 task

sparrc closed this as completed in #1327 Jun 3, 2016

sparrc added a commit that referenced this issue Jun 3, 2016

Fix rare panic in RHEL 5.2 diskio plugin (#1327)

8c3d7cd

closes #1322

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Panic error when telegraf starts #1322

Panic error when telegraf starts #1322

njordr commented Jun 3, 2016 •

edited

Loading

sparrc commented Jun 3, 2016

njordr commented Jun 3, 2016

sparrc commented Jun 3, 2016

njordr commented Jun 3, 2016

sparrc commented Jun 3, 2016

sparrc commented Jun 3, 2016

njordr commented Jun 3, 2016

sparrc commented Jun 3, 2016

sparrc commented Jun 3, 2016

njordr commented Jun 3, 2016

Panic error when telegraf starts #1322

Panic error when telegraf starts #1322

Comments

njordr commented Jun 3, 2016 • edited Loading

Bug report

System info:

Steps to reproduce:

Expected behavior:

Actual behavior:

sparrc commented Jun 3, 2016

njordr commented Jun 3, 2016

sparrc commented Jun 3, 2016

njordr commented Jun 3, 2016

sparrc commented Jun 3, 2016

sparrc commented Jun 3, 2016

njordr commented Jun 3, 2016

sparrc commented Jun 3, 2016

sparrc commented Jun 3, 2016

njordr commented Jun 3, 2016

njordr commented Jun 3, 2016 •

edited

Loading