Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

index out of range [-1] #2645

Closed
maulikjs opened this issue May 22, 2020 · 17 comments · Fixed by #2728
Closed

index out of range [-1] #2645

maulikjs opened this issue May 22, 2020 · 17 comments · Fixed by #2728

Comments

@maulikjs
Copy link

maulikjs commented May 22, 2020

Thanos Version used: v0.12.2

Object Storage Provider: Ceph S3

What happened: When I run queries against historic dates, for certain dates we get a response
unexpected error: runtime error: index out of range [-1]

{'field': 'subscribed_cores', 'date': datetime.date(2019, 12, 27), 'runtime': 1.5100421905517578, 'text': '{"status":"error","errorType":"execution","error":"unexpected error: runtime error: index out of range [-1]"}\n', 'value': None, 'status': 'error', 'error': 'unexpected error: runtime error: index out of range [-1]'}

Other than that we also sometimes get an empty response if not the error but never the actual value.

{'field': 'subscribed_cores', 'date': datetime.date(2019, 12, 27), 'runtime': 1.227376699447632, 'text': '{"status":"success","data":{"resultType":"vector","result":[]}}\n', 'value': None, 'status': 'success', 'error': None}

What you expected to happen: Get the value for that date.

CC: @andmatt

@GiedriusS
Copy link
Member

GiedriusS commented May 22, 2020

Is there some kind of stack trace printed with the error, perhaps in the Thanos Query logs or somewhere else? Where is this printed in general? Because these don't seem similar to our log format. Do you have a way of reproducing it? 🤔

It's probably impossible to tell where is a (possible) error without a stack trace.

@maulikjs
Copy link
Author

Hey @GiedriusS ,
I was running queries against the API where I got these responses,
I manually also went to the UI and ran the query for the date and got the same output in a error box.
Let me see if I can replicate it and get a screenshot and the pod logs from that.

@maulikjs
Copy link
Author

image

@maulikjs
Copy link
Author

level=info ts=2020-05-22T19:13:37.505197904Z caller=frontend.go:166 msg="slow query" org_id=fake url="http://telemeter-lts.datahub.redhat.com/api/v1/query?query=round%28avg_over_time%28sum%28sum+by+%28_id%29+%28max+without+%28prometheus%2Creceive%2Cinstance%29+%28+cluster%3Acapacity_cpu_cores%3Asum%29%29+%2B+on+%28_id%29+group_left%28_blah%29+%280+%2A+%28id_version_ebs_account_internal%3Acluster_subscribed%7B%7D+%3D%3D+1%29%29%29%5B1d%3A30m%5D%29%29&time=1587945600.0 " time_taken=6.143809981s
level=debug ts=2020-05-22T19:13:37.505330807Z caller=logging.go:44 traceID=18f9107c5c008580 msg="GET /api/v1/query?query=round%28avg_over_time%28sum%28sum+by+%28_id%29+%28max+without+%28prometheus%2Creceive%2Cinstance%29+%28+cluster%3Acapacity_cpu_cores%3Asum%29%29+%2B+on+%28_id%29+group_left%28_blah%29+%280+%2A+%28id_version_ebs_account_internal%3Acluster_subscribed%7B%7D+%3D%3D+1%29%29%29%5B1d%3A30m%5D%29%29&time=1587945600.0 (422) 6.143992196s"
level=debug ts=2020-05-22T19:13:39.081287152Z caller=logging.go:44 traceID=1ff8e57bf643c559 msg="GET /api/v1/query?query=round%28avg_over_time%28sum%28sum+by+%28_id%29+%28max+without+%28prometheus%2Creceive%2Cinstance%29+%28+cluster%3Acapacity_cpu_cores%3Asum%29%29+%2B+on+%28_id%29+group_left%28_blah%29+%280+%2A+%28id_version_ebs_account_internal%3Acluster_subscribed%7B%7D+%3D%3D+1%29%29%29%5B1d%3A30m%5D%29%29&time=1588118400.0 (422) 4.621643791s"
level=debug ts=2020-05-22T19:13:44.465299862Z caller=logging.go:44 traceID=43ff1398f57ec32c msg="GET /api/v1/query?query=round%28avg_over_time%28sum%28sum+by+%28_id%29+%28max+without+%28prometheus%2Creceive%2Cinstance%29+%28+cluster%3Acapacity_cpu_cores%3Asum%29%29+%2B+on+%28_id%29+group_left%28_blah%29+%280+%2A+%28id_version_ebs_account_internal%3Acluster_subscribed%7B%7D+%3D%3D+1%29%29%29%5B1d%3A30m%5D%29%29&time=1588636800.0 (422) 2.583028696s"

These are the logs from the cortex-query-frontend for those queries.

@maulikjs
Copy link
Author

And this is the log from the associated thanos-query pod


level=error ts=2020-05-22T19:48:11.719475257Z caller=engine.go:806 msg="runtime panic in parser" err="runtime error: index out of range [-1]" stacktrace="goroutine 2719879 [running]:\ngithub.com/prometheus/prometheus/promql.(*evaluator).recover(0xc0098f2da0, 0xc0098f2b98)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:804 +0xd8\npanic(0x1d695e0, 0xc0096dc360)\n\t/usr/local/go/src/runtime/panic.go:679 +0x1b2\ngithub.com/thanos-io/thanos/pkg/query.(*dedupSeriesSet).peekLset(0xc00584e060, 0x22ce0e0, 0xc0044dc1e0, 0x33b8480)\n\t/go/src/github.com/thanos-io/thanos/pkg/query/iter.go:372 +0x171\ngithub.com/thanos-io/thanos/pkg/query.(*dedupSeriesSet).next(0xc00584e060, 0xc009926400)\n\t/go/src/github.com/thanos-io/thanos/pkg/query/iter.go:388 +0x8a\ngithub.com/thanos-io/thanos/pkg/query.(*dedupSeriesSet).Next(0xc00584e060, 0x22d48e0)\n\t/go/src/github.com/thanos-io/thanos/pkg/query/iter.go:359 +0xaf\ngithub.com/prometheus/prometheus/promql.expandSeriesSet(0x2306760, 0xc0098e8d20, 0x22f1f20, 0xc00584e060, 0x16ef655, 0xc0098e8d20, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:758 +0x5b\ngithub.com/prometheus/prometheus/promql.checkForSeriesSetExpansion(0x2306760, 0xc0098e8d20, 0x22f1a20, 0xc0023056e0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:747 +0x159\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f1588, 0x22f1a20, 0xc0023056e0, 0xc0098e8ed0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:1262 +0x1d7b\ngithub.com/prometheus/prometheus/promql.(*evaluator).rangeEval(0xc0098f1588, 0xc0098ee420, 0xc0098ee538, 0x2, 0x2, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:888 +0x1408\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f1588, 0x22f17e0, 0xc000d68a50, 0xc0098e8e70, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:1029 +0x3396\ngithub.com/prometheus/prometheus/promql.(*evaluator).rangeEval(0xc0098f1588, 0xc0098ef170, 0xc0098ef288, 0x2, 0x2, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:888 +0x1408\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f1588, 0x22f17e0, 0xc000d68a00, 0xc0098e8e10, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:1029 +0x3396\ngithub.com/prometheus/prometheus/promql.(*evaluator).rangeEval(0xc0098f1588, 0xc0098efe60, 0xc0098eff18, 0x2, 0x2, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:888 +0x1408\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f1588, 0x22f1820, 0xc000e64200, 0xc0098e8db0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:1240 +0x26d8\ngithub.com/prometheus/prometheus/promql.(*evaluator).rangeEval(0xc0098f1588, 0xc0098f0c10, 0xc0098f0d28, 0x2, 0x2, 0x0, 0x445ebe, 0xc000ac8480)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:888 +0x1408\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0002f3588, 0x22f17e0, 0xc000d68aa0, 0x15, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:1029 +0x3396\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f2da0, 0x22f19a0, 0xc000acd440, 0xc000ac8580, 0xc000ac8501)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:1323 +0x1cf8\ngithub.com/prometheus/prometheus/promql.(*evaluator).evalSubquery(0xc0002f4da0, 0xc000acd440, 0x1ee2a73)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:1001 +0x59\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0098f2da0, 0x22f1860, 0xc001d6a820, 0xc0098ea6a0, 0xc004da81c0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:1064 +0x1969\ngithub.com/prometheus/prometheus/promql.(*evaluator).rangeEval(0xc0098f2da0, 0xc0098f2730, 0xc000b24560, 0x1, 0x1, 0x0, 0xc000d68230, 0xc0002f45b8)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:888 +0x1408\ngithub.com/prometheus/prometheus/promql.(*evaluator).eval(0xc0002f4da0, 0x22f1860, 0xc001d6a860, 0xf, 0xc000d68640)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:1070 +0x18cc\ngithub.com/prometheus/prometheus/promql.(*evaluator).Eval(0xc0002f4da0, 0x22f1860, 0xc001d6a860, 0x0, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:815 +0x99\ngithub.com/prometheus/prometheus/promql.(*Engine).execEvalStmt(0xc0003625a0, 0x2306760, 0xc000acd5c0, 0xc0023057a0, 0xc000d68af0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:531 +0x43d\ngithub.com/prometheus/prometheus/promql.(*Engine).exec(0xc0003625a0, 0x2306760, 0xc000acd5c0, 0xc0023057a0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:485 +0x669\ngithub.com/prometheus/prometheus/promql.(*query).Exec(0xc0023057a0, 0x2306760, 0xc000acd350, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/promql/engine.go:199 +0xb4\ngithub.com/thanos-io/thanos/pkg/query/api.(*API).query(0xc0003632c0, 0xc00126e800, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)\n\t/go/src/github.com/thanos-io/thanos/pkg/query/api/v1.go:304 +0x4f1\ngithub.com/thanos-io/thanos/pkg/query/api.(*API).Register.func1.1(0x22fe820, 0xc0003d72d0, 0xc00126e800)\n\t/go/src/github.com/thanos-io/thanos/pkg/query/api/v1.go:145 +0x55\nnet/http.HandlerFunc.ServeHTTP(0xc000b858a0, 0x22fe820, 0xc0003d72d0, 0xc00126e800)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/NYTimes/gziphandler.GzipHandlerWithOpts.func1.1(0x7fd6b3f406c0, 0xc000d68960, 0xc00126e800)\n\t/go/pkg/mod/github.com/!n!y!times/[email protected]/gzip.go:336 +0x23f\nnet/http.HandlerFunc.ServeHTTP(0xc000a78270, 0x7fd6b3f406c0, 0xc000d68960, 0xc00126e800)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/thanos-io/thanos/pkg/tracing.HTTPMiddleware.func1(0x7fd6b3f406c0, 0xc000d68960, 0xc00126e700)\n\t/go/src/github.com/thanos-io/thanos/pkg/tracing/http.go:46 +0x549\nnet/http.HandlerFunc.ServeHTTP(0xc0003f7d10, 0x7fd6b3f406c0, 0xc000d68960, 0xc00126e700)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerResponseSize.func1(0x7fd6b3f406c0, 0xc000d68910, 0xc00126e700)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:196 +0xe9\nnet/http.HandlerFunc.ServeHTTP(0xc000a785a0, 0x7fd6b3f406c0, 0xc000d68910, 0xc00126e700)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerCounter.func1(0x7fd6b3f406c0, 0xc000d688c0, 0xc00126e700)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:100 +0xda\nnet/http.HandlerFunc.ServeHTTP(0xc000a78780, 0x7fd6b3f406c0, 0xc000d688c0, 0xc00126e700)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerRequestSize.func1(0x7fd6b3f406c0, 0xc000d68870, 0xc00126e700)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:163 +0xe9\nnet/http.HandlerFunc.ServeHTTP(0xc000a789f0, 0x7fd6b3f406c0, 0xc000d68870, 0xc00126e700)\n\t/usr/local/go/src/net/http/server.go:2007 +0x44\ngithub.com/prometheus/client_golang/prometheus/promhttp.InstrumentHandlerDuration.func1(0x22fe420, 0xc004f2a000, 0xc00126e700)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/prometheus/promhttp/instrument_server.go:68 +0x11c\ngithub.com/prometheus/common/route.(*Router).handle.func1(0x22fe420, 0xc004f2a000, 0xc00126e600, 0x0, 0x0, 0x0)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/route/route.go:83 +0x296\ngithub.com/julienschmidt/httprouter.(*Router).ServeHTTP(0xc000362660, 0x22fe420, 0xc004f2a000, 0xc00126e600)\n\t/go/pkg/mod/github.com/julienschmidt/[email protected]/router.go:387 +0xa3d\ngithub.com/prometheus/common/route.(*Router).ServeHTTP(0xc000c28960, 0x22fe420, 0xc004f2a000, 0xc00126e600)\n\t/go/pkg/mod/github.com/prometheus/[email protected]/route/route.go:121 +0x4c\nnet/http.(*ServeMux).ServeHTTP(0xc000236940, 0x22fe420, 0xc004f2a000, 0xc00126e600)\n\t/usr/local/go/src/net/http/server.go:2387 +0x1bd\nnet/http.serverHandler.ServeHTTP(0xc0001d40e0, 0x22fe420, 0xc004f2a000, 0xc00126e600)\n\t/usr/local/go/src/net/http/server.go:2802 +0xa4\nnet/http.(*conn).serve(0xc00016c3c0, 0x23066a0, 0xc000e640c0)\n\t/usr/local/go/src/net/http/server.go:1890 +0x875\ncreated by net/http.(*Server).Serve\n\t/usr/local/go/src/net/http/server.go:2927 +0x38e\n"

@maulikjs
Copy link
Author

I couldnt find any error logs on the thanos-store pods
Hope this helps :)

@GiedriusS
Copy link
Member

That's definitely very useful! Would it be possible for you to check this with v0.13.0-rc.0 to see if it is still reproducible?

@ipyker
Copy link

ipyker commented May 23, 2020

That's definitely very useful! Would it be possible for you to check this with v0.13.0-rc.0 to see if it is still reproducible?
I also encountered the error of index out of range [-1], but I tried to reduce the number of --query.replica-label parameters of the thanos-query.

@sevagh
Copy link
Contributor

sevagh commented May 31, 2020

I did the same as ipyker - I added a new --query.replica-label=foo, restarted the Thanos Query, and this started occuring.

Luckily when I remove --query.replica-label=foo it goes away. Is it a problem if we pick a replica-label which does not exist on all our metrics? The label foo is only on some metrics, not all.

@bwplotka
Copy link
Member

bwplotka commented May 31, 2020 via email

@maulikjs
Copy link
Author

maulikjs commented Jun 3, 2020

Hey @bwplotka ,
Do we know whats going on yet?

Would setting up a query pod with none of the --query.replica-label=foo potentially solve our problem?

@bwplotka
Copy link
Member

bwplotka commented Jun 5, 2020

On it, now

@bwplotka
Copy link
Member

bwplotka commented Jun 5, 2020

Ok, got it. The problem is that Querier just have problems with more than 2 replica labels. Fixing.

Thanks for spotting this and quickly describing issue ❤️

@sevagh
Copy link
Contributor

sevagh commented Jun 5, 2020

That makes sense - I have 2 replica labels, and encountered this when adding a 3rd. Thanks for the quick turnaround.

@bwplotka
Copy link
Member

bwplotka commented Jun 5, 2020

Silly missed edge case fixed here: #2728

bwplotka added a commit that referenced this issue Jun 5, 2020
…d panic seen when using larger number of replicas and small series.

Fixes #2645


Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Jun 10, 2020
…d panic seen when using larger number of replicas and small series.

Fixes #2645


Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Jun 10, 2020
…d panic seen when using larger number of replicas and small series.

Fixes #2645


Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Jun 10, 2020
…d panic seen when using larger number of replicas and small series.

Fixes #2645


Signed-off-by: Bartlomiej Plotka <[email protected]>
bwplotka added a commit that referenced this issue Jun 10, 2020
…d panic seen when using larger number of replicas and small series. (#2728)

Fixes #2645


Signed-off-by: Bartlomiej Plotka <[email protected]>
@sevagh
Copy link
Contributor

sevagh commented Jun 10, 2020

Will this make it into 0.13?

@bwplotka
Copy link
Member

bwplotka commented Jun 10, 2020 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants