Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query error after run es_indices_clean.sh #622

Closed
meilihao opened this issue Dec 26, 2017 · 16 comments · Fixed by #1627
Closed

query error after run es_indices_clean.sh #622

meilihao opened this issue Dec 26, 2017 · 16 comments · Fixed by #1627

Comments

@meilihao
Copy link

jaeger version: cba413e
storage: elasticsearch 6.1.1, Build: bd92e7f/2017-12-17T20:23:25.338Z, JVM: 9.0.1

It's work before clean.

My operation process:

⋊> ~/opt ./es_indices_clean.sh 0 localhost:9200                                                                                                                                                             
Installing python dependencies required for curator...
Requirement already satisfied: elasticsearch in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied: elasticsearch-curator in /usr/local/lib/python2.7/dist-packages
Requirement already satisfied: urllib3<1.23,>=1.21.1 in /usr/local/lib/python2.7/dist-packages (from elasticsearch)
Requirement already satisfied: click>=6.7 in /usr/local/lib/python2.7/dist-packages (from elasticsearch-curator)
Requirement already satisfied: pyyaml>=3.10 in /usr/local/lib/python2.7/dist-packages (from elasticsearch-curator)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python2.7/dist-packages (from elasticsearch-curator)
Requirement already satisfied: voluptuous>=0.9.3 in /usr/local/lib/python2.7/dist-packages (from elasticsearch-curator)

Removing jaeger-service-2017-12-26
Removing jaeger-span-2017-12-26
⋊> ~/opt curl -XGET 'http://localhost:9200/_cat/indices'          # generate new data                                                                                                                                          
yellow open jaeger-service-2017-12-26 sMNJrlcYQGqMDp4AUrLlWA 5 1  2 0   9.9kb   9.9kb
yellow open jaeger-span-2017-12-26    wBc1qghHQ4uo0XobS7tKgA 5 1 27 0 263.7kb 263.7kb

Then visit http://local:16686/search, get error:

There was an error querying for traces:
HTTP Error: Search service failed: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]

Status | 500
Status text | Internal Server Error
URL | /api/services
Response body | {   "data": null,   "total": 0,   "limit": 0,   "offset": 0,   "errors": [     {       "code": 500,       "msg": "Search service failed: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]"     }   ] }
@meilihao
Copy link
Author

meilihao commented Mar 1, 2018

Today i tried Release 1.2.0, get the same error.

@yurishkuro
Copy link
Member

@black-adder could it be the timezone issue? The ticket was booked Dec 26, 2017, so I assume the script was run on the same date, so it looks strange that it removed these indices:

Removing jaeger-service-2017-12-26
Removing jaeger-span-2017-12-26

@black-adder
Copy link
Contributor

ill try to reproduce

@ti-mo
Copy link

ti-mo commented May 18, 2018

@meilihao You're deleting the currently-active index. This removes the index's mapping, which is created by the collectors on start-up. This issue explains it in a bit more detail: #374

This would be due to elasticsearch trying to figure out the mappings on its own, since we aren't checking if the index is their post startup. Since we can't aggregate on text fields without enabling fielddata this would break the query searches.

The only way to avoid this is by not deleting an index that's still being used - since you're posting on Dec 26th, and you deleted two indices marked 2017-12-26, I assume that's the case. (ie. never run ./es_indices_clean.sh with a parameter of 0, perhaps the script should warn about this)

@oiooj
Copy link
Contributor

oiooj commented Jul 9, 2018

I have the same issue:

/api/services
 service failed: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]"

I just updated es from 6.2 to 6.3, and cleaned the old data.

image

@sta-szek
Copy link

sta-szek commented Oct 5, 2018

Hi, any progress here? -- I have the same issue.
What I did:
(running jaeger, version 1.4.1 from helm chart https://hub.kubeapps.com/charts/incubator/jaeger/0.7.0)

  1. Delete indices jaeger-* from ES
  2. See jaeger is not working -- mentioned error
  3. helm delete --purge -- delete whole jaeger from k8s
  4. deploy again
  5. See jaeger is not working -- mentioned error
  • no error logs:
pojo@local:~$ kubectl logs -f pod/jaeger-query-665984d7b9-m9rcj -n monitoring
{"level":"info","ts":1538725711.235911,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":16687,"status":"unavailable"}
{"level":"info","ts":1538725711.3213375,"caller":"query/main.go:180","msg":"Archive storage not created","reason":"Archive storage not supported"}
{"level":"info","ts":1538725711.321707,"caller":"query/main.go:127","msg":"Registering metrics handler with HTTP server","route":"/metrics"}
{"level":"info","ts":1538725711.321767,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1538725711.3218033,"caller":"query/main.go:136","msg":"Starting jaeger-query HTTP server","port":16686}
^C
pojo@local:~$ kubectl logs -f pod/jaeger-collector-7889bd4c49-v2ng9 -n monitoring
{"level":"info","ts":1538725707.0003345,"caller":"healthcheck/handler.go:99","msg":"Health Check server started","http-port":14269,"status":"unavailable"}
{"level":"info","ts":1538725707.1209114,"caller":"static/strategy_store.go:76","msg":"No sampling strategies provided, using defaults"}
{"level":"info","ts":1538725707.121135,"caller":"collector/main.go:142","msg":"Registering metrics handler with HTTP server","route":"/metrics"}
{"level":"info","ts":1538725707.121202,"caller":"collector/main.go:150","msg":"Starting Jaeger Collector HTTP server","http-port":14268}
{"level":"info","ts":1538725707.1212277,"caller":"healthcheck/handler.go:133","msg":"Health Check state change","status":"ready"}
{"level":"info","ts":1538725707.1555748,"caller":"collector/main.go:207","msg":"Listening for Zipkin HTTP traffic","zipkin.http-port":9411}

indices in ES:

health status index                           uuid                   pri rep docs.count docs.deleted store.size pri.store.size
green  open   jaeger-span-2018-10-05          C4zjEF33Sn6i3bXH5QkTow   5   1     259767            0     35.4mb         17.7mb
green  open   jaeger-service-2018-10-05       7tQAzJT7S2GoxZoc-1I_QA   5   1         53           14    241.7kb        119.7kb

@sta-szek
Copy link

sta-szek commented Oct 8, 2018

It was not fixed after date change as suggested by @dmitrygusev here: #374 (comment)

@mikelduke
Copy link

I have seen the same problem with bad mappings. You will need to either:

A. Shutdown all jaeger collectors, delete the bad indexes, then restart the collectors and lose the data or
B. Create new ElasticSearch index templates and trigger a reindex to a new index and delete the bad ones after it completes. The index mappings for the template can be extracted from a working index.

@jpkrohling
Copy link
Contributor

@mikelduke, @sta-szek, @oiooj are you able to provide a reproducer?

@pavolloffay
Copy link
Member

This issue is related to #374.

Steps to reproduce:

Start jaeger

docker run -it --rm -e "ES_JAVA_OPTS=-Xms2g -Xmx2g" -p 9200:9200 -p 9300:9300 -e "discovery.type=single-node" -e "xpack.security.enabled=false" --name=elasticsearch  docker.elastic.co/elasticsearch/elasticsearch:5.6.10
SPAN_STORAGE_TYPE=elasticsearch go run -tags ui ./cmd/all-in-one/main.go 
  1. Generate spans via Jaeger UI or hotrod
  2. Remove all indices python plugin/storage/es/esCleaner.py 0 localhost:9200
  3. refresh Jaeger UI - it should show HTTP Error: Search service failed: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception].

@pavolloffay
Copy link
Member

I have fixed this by creating an index template

curl -ivX PUT -H "Content-Type: application/json" localhost:9200/_template/span  -d @./plugin/storage/es/mappings/jaeger-span.json
curl -ivX PUT -H "Content-Type: application/json" localhost:9200/_template/service  -d @./plugin/storage/es/mappings/jaeger-service.json

I think jaeger-collector should create the template on startup. The create index would just create an index and mapping would be derived from the template stored in ES. Maybe we could even omit creating the index and it would be created automatically once data is inserted.

@mikelduke
Copy link

I fixed it for my use by creating a template from an exported copy of a working index.

It would be great to see either Jaeger create the template on startup, or allow for exporting the template from the binary using the command line. This would allow for template creation using different users or a separate ci process.

@pavolloffay
Copy link
Member

I think I will submit a PR where the collector creates a template at startup.

A command for generating the template is an interesting idea. Rollover script could also make use it. We could talk about it in a separate issue.

@yurishkuro
Copy link
Member

is creating a template idempotent? There are many collectors starting.

@trogo
Copy link

trogo commented Jan 27, 2020

@pavolloffay
I'm running into this issue. i do have 1.14 and later but i'm using jaeger-ingester to write to ES.
Does the fix also need to go in there ?

@philicious
Copy link

thx @mikelduke ! , your post helped me solve this weird error that was, at least for me, likely caused by the ES being recreated but collectors werent reset/restarted. and me then spending few hours to debug until I stumbled over your precious hint ❤️ and noticed the index templates were missing !!
I restored them based on https://github.com/jaegertracing/jaeger/tree/master/plugin/storage/es/mappings

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.