Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[1.2.0] panic: interface conversion: tsm1.Value is tsm1.IntegerValue, not tsm1.FloatValue #8085

Closed
ranjithruban opened this issue Mar 2, 2017 · 15 comments

Comments

@ranjithruban
Copy link

ranjithruban commented Mar 2, 2017

Hello

Using influxdb 1.2.0 we are hitting the below panic occasionally.

Mar  2 11:32:52 localhost influxd[3030]: [I] 2017-03-02T10:32:52Z SELECT * FROM telegraf.autogen.consul LIMIT 1 service=query
Mar  2 11:32:52 localhost influxd[3030]: panic: interface conversion: tsm1.Value is tsm1.IntegerValue, not tsm1.FloatValue
Mar  2 11:32:52 localhost influxd[3030]: goroutine 4191486 [running]:
Mar  2 11:32:52 localhost influxd[3030]: panic(0xa081a0, 0xc428601a40)
Mar  2 11:32:52 localhost influxd[3030]: /usr/local/go/src/runtime/panic.go:500 +0x1a1
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).peekCache(0xc42db2c7e0, 0xc429e520c8, 0xc421c43b88)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:344 +0xab
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).nextFloat(0xc42db2c7e0, 0xc421c43bc8, 0xc429e520c8)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:372 +0x2f
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatAscendingCursor).next(0xc42db2c7e0, 0x8000000000000000, 0x9cc480, 0xc429e520c8)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:368 +0x2b
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*bufCursor).next(0xc43a3f5cc0, 0x8000000000000000, 0x9cc480, 0xc429e520c8)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:64 +0x3b
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*bufCursor).peek(0xc43a3f5cc0, 0x8000000000000000, 0x9cc480, 0xc429e520c8)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:75 +0x2f
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatIterator).Next(0xc4298ac3c0, 0xc4298ac328, 0x0, 0x0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:175 +0x8d
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatSortedMergeIterator).pop(0xc42a33f050, 0x456e30, 0xc425a765e8, 0xc425a765f0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:362 +0xfe
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatSortedMergeIterator).Next(0xc42a33f050, 0xc425a76760, 0x8a1e9d, 0xc428012120)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:351 +0x2b
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728d40, 0xb, 0x24, 0xc43ae56550)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/tsdb/engine/tsm1.(*floatLimitIterator).Next(0xc423571100, 0x0, 0x0, 0xc4309a69a0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/tsdb/engine/tsm1/iterator.gen.go:280 +0x54
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728d80, 0x0, 0x0, 0x0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728dc0, 0x0, 0xc424e36f00, 0x0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatInterruptIterator).Next(0xc431728e00, 0x0, 0x8000000000000001, 0x7ffffffffffffffe)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:777 +0x52
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatLimitIterator).Next(0xc424e91900, 0xc42b35e960, 0x180001, 0x0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:534 +0x37
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*bufFloatIterator).Next(0xc431728e40, 0x268, 0x180001, 0x0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:95 +0x3c
Mar  2 11:32:52 localhost influxd[3030]: github.com/influxdata/influxdb/influxql.(*floatAuxIterator).stream(0xc431728ec0)
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:874 +0x32
Mar  2 11:32:52 localhost influxd[3030]: created by github.com/influxdata/influxdb/influxql.(*floatAuxIterator).Start
Mar  2 11:32:52 localhost influxd[3030]: /root/go/src/github.com/influxdata/influxdb/influxql/iterator.gen.go:860 +0x3f

Not able to see any duplicate report open for this. Please tell me if this is fixed in 1.2.1 or is it a new bug.

Regards
Ranjith

@jwilder
Copy link
Contributor

jwilder commented Mar 2, 2017

@ranjithruban Does your version of telegraf include this change? influxdata/telegraf#2277

Can you show the output of SELECT * FROM telegraf.autogen.consul LIMIT 1?

@jwilder
Copy link
Contributor

jwilder commented Mar 2, 2017

Also, can you attach the output of show shards and SHOW FIELD KEYS?

@ranjithruban
Copy link
Author

ranjithruban commented Mar 2, 2017

@jwilder no , we are using telegraf 1.2.0 , this data was from consul telemetry plugin sending to statsd with some statsd templates added. Also i have another panic with same traces on another metric below.

http://pastebin.com/EWearfiB

Had to drop the shard and move wal to recover influxdb from this error. Once it paniched influxdb was not able to recover properly with below error.

Mar 2 11:32:55 localhost influxd[19008]: [I] 2017-03-02T10:32:55Z Failed to open shard: 173: [shard 173] field type conflict service=store

I have the corrupted shard 173 saved if it helps in debugging.

show shards

132 telegraf autogen 132 2017-02-11T00:00:00Z 2017-02-12T00:00:00Z 2019-02-12T00:00:00Z
134 telegraf autogen 134 2017-02-12T00:00:00Z 2017-02-13T00:00:00Z 2019-02-13T00:00:00Z
136 telegraf autogen 136 2017-02-13T00:00:00Z 2017-02-14T00:00:00Z 2019-02-14T00:00:00Z
138 telegraf autogen 138 2017-02-14T00:00:00Z 2017-02-15T00:00:00Z 2019-02-15T00:00:00Z
140 telegraf autogen 140 2017-02-15T00:00:00Z 2017-02-16T00:00:00Z 2019-02-16T00:00:00Z
142 telegraf autogen 142 2017-02-16T00:00:00Z 2017-02-17T00:00:00Z 2019-02-17T00:00:00Z
144 telegraf autogen 144 2017-02-17T00:00:00Z 2017-02-18T00:00:00Z 2019-02-18T00:00:00Z
146 telegraf autogen 146 2017-02-18T00:00:00Z 2017-02-19T00:00:00Z 2019-02-19T00:00:00Z
148 telegraf autogen 148 2017-02-19T00:00:00Z 2017-02-20T00:00:00Z 2019-02-20T00:00:00Z
150 telegraf autogen 150 2017-02-20T00:00:00Z 2017-02-21T00:00:00Z 2019-02-21T00:00:00Z
152 telegraf autogen 152 2017-02-21T00:00:00Z 2017-02-22T00:00:00Z 2019-02-22T00:00:00Z
154 telegraf autogen 154 2017-02-22T00:00:00Z 2017-02-23T00:00:00Z 2019-02-23T00:00:00Z
159 telegraf autogen 159 2017-02-23T00:00:00Z 2017-02-24T00:00:00Z 2019-02-24T00:00:00Z
161 telegraf autogen 161 2017-02-24T00:00:00Z 2017-02-25T00:00:00Z 2019-02-25T00:00:00Z
163 telegraf autogen 163 2017-02-25T00:00:00Z 2017-02-26T00:00:00Z 2019-02-26T00:00:00Z
165 telegraf autogen 165 2017-02-26T00:00:00Z 2017-02-27T00:00:00Z 2019-02-27T00:00:00Z
167 telegraf autogen 167 2017-02-27T00:00:00Z 2017-02-28T00:00:00Z 2019-02-28T00:00:00Z
169 telegraf autogen 169 2017-02-28T00:00:00Z 2017-03-01T00:00:00Z 2019-03-01T00:00:00Z
171 telegraf autogen 171 2017-03-01T00:00:00Z 2017-03-02T00:00:00Z 2019-03-02T00:00:00Z
175 telegraf autogen 175 2017-03-02T00:00:00Z 2017-03-03T00:00:00Z 2019-03-03T00:00:00Z

name: test
id database retention_policy shard_group start_time end_time expiry_time owners


@jwilder
Copy link
Contributor

jwilder commented Mar 2, 2017

Can attach shard 173?

@ranjithruban
Copy link
Author

Added tsm files. 47mb file. Please see if you can download this.
https://www.dropbox.com/s/r7x1xmpfda4z1nu/173.tar.gz?dl=0

@jwilder
Copy link
Contributor

jwilder commented Mar 2, 2017

@ranjithruban Got it. Thanks.

@jwilder
Copy link
Contributor

jwilder commented Mar 2, 2017

@ranjithruban It looks like the problem is the consul measurement and value field. You have some series with it stored as a float64 and one as an int64. They are different series so there is likely a race in the code that ensures the type is consistent within a shard. You should get a field type conflict during the write and the point would be dropped, but it looks like the writes are being allowed which causes the panic at query time and the shard to fail to load at startup

@ranjithruban
Copy link
Author

ranjithruban commented Mar 2, 2017

@jwilder thanks. I have seen the "Field type conflict, dropping conflicted points: dropping" in telegraf in some of the measurements we use but not for consul/ or for the custom application measurement in second panic. Not really sure why it allowed write in some case.

@jwilder
Copy link
Contributor

jwilder commented Mar 2, 2017

@ranjithruban I would also check your client to ensure that whatever is writing to consul measurement and value field always uses the correct formatting for types. float64 should have a decimal and int64 need a trailing i.

I can attach the problem series keys if that would help.

@ranjithruban
Copy link
Author

ranjithruban commented Mar 2, 2017

Yes i will check that. Please attach if it. To be clear can this bug be fixed in a way that that write are not allowed even if client send it ?. In our case multiple measurements are sending such values and some are java spring metrics.

@jwilder
Copy link
Contributor

jwilder commented Mar 2, 2017

@ranjithruban There is a bug in the database in that it allowed two different field types for the same measurement. We'll need to fix that to prevent the panic and the shard failing to load. Regardless, writing data with different fields is not valid. You will end up with data being dropped or write errors when this is fixed as the database cannot support different field types for the same measurement. You'll need to use different field names, different measurements or ensure they all write the same field type.

@ranjithruban
Copy link
Author

@jwilder Great, thank you. 👍

@jwilder
Copy link
Contributor

jwilder commented Mar 3, 2017

@ranjithruban Would you be able to test out #8092 to see if it prevents you shards from getting into an inconsistent state? We haven't been able to reproduce the issue yet.

@ranjithruban
Copy link
Author

@jwilder Yes i will test it and update the results.

@timhallinflux timhallinflux added this to the 1.2.1 milestone Mar 7, 2017
@jwilder
Copy link
Contributor

jwilder commented Mar 7, 2017

Fixed via #8092 #8104 #8085

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants