Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jaeger Query throws Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception] #2718

Closed
aug70 opened this issue Jan 8, 2021 · 13 comments
Labels

Comments

@aug70
Copy link

aug70 commented Jan 8, 2021

Describe the bug
Jaeger UI throws Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception] while all indices are GREEN. Collectors keep adding records but UI doesn't work or be able to show any traces.

To Reproduce
Steps to reproduce the behavior:
1.
2.
3.

Expected behavior
Jaeger UI should work as expected.

Screenshots
If applicable, add screenshots to help explain your problem.

Version (please complete the following information):

  • OS: Linux
  • Jaeger version: 1.21.0
  • Deployment: Kubernetes

What troubleshooting steps did you try?
Try to follow https://www.jaegertracing.io/docs/latest/troubleshooting/ and describe how far you were able to progress and/or which steps did not work.

Additional context
Indices are all green, ES queries work just fine. However Jaeger UI doesn't render any results.

@aug70
Copy link
Author

aug70 commented Jan 9, 2021

Collectors log

2021-01-09T02:08:20.952331746Z {"level":"debug","ts":1610158100.9485595,"caller":"app/span_processor.go:149","msg":"Span written to the storage by the collector","trace-id":"245aec81734c311e","span-id":"2b1dc897b7e5c9e4"}

which looks like trace data is successfully written. But trace data is failed at ES with

{"type": "server", "timestamp": "2021-01-09T02:40:31,248Z", "level": "INFO", "component": "o.e.a.b.TransportShardBulkAction", "cluster.name": "jaeger-db", "node.name": "jaeger-db-es-master-0", "message": "[jaeger-jaeger-span-2021-01-09][3] mapping update rejected by primary", "cluster.uuid": "4PexpvDoQ5iYz8pUxLR6Ng", "node.id": "7Vj65I0aTkKIBriEnFCHJw" ,
"stacktrace": ["java.lang.IllegalArgumentException: mapper [flags] cannot be changed from type [integer] to [long]",

I believe, https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/mappings/jaeger-span-7.json#L58 must be data type long instead of integer. Same applies here https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/mappings/jaeger-span.json#L65

@aug70
Copy link
Author

aug70 commented Jan 11, 2021

Query doesn't work when I create templates for service and span:

{
  "index_patterns": "*jaeger-service-*", 
  "settings":{
    "index.number_of_shards": 5,
    "index.number_of_replicas": 1,
    "index.mapping.nested_fields.limit":50,
    "index.requests.cache.enable":true
  },
  "mappings":{
    "dynamic_templates":[
      {
        "span_tags_map":{
          "mapping":{
            "type":"keyword",
            "ignore_above":256
          },
          "path_match":"tag.*"
        }
      },
      {
        "process_tags_map":{
          "mapping":{
            "type":"keyword",
            "ignore_above":256
          },
          "path_match":"process.tag.*"
        }
      }
    ],
    "properties":{
      "traceID":{
        "type":"keyword",
        "ignore_above":256
      },
      "parentSpanID":{
        "type":"keyword",
        "ignore_above":256
      },
      "spanID":{
        "type":"keyword",
        "ignore_above":256
      },
      "operationName":{
        "type":"keyword",
        "ignore_above":256
      },
      "startTime":{
        "type":"long"
      },
      "startTimeMillis":{
        "type":"date",
        "format":"epoch_millis"
      },
      "duration":{
        "type":"long"
      },
      "flags":{
        "type":"long"
      },
      "logs":{
        "type":"nested",
        "dynamic":false,
        "properties":{
          "timestamp":{
            "type":"long"
          },
          "fields":{
            "type":"nested",
            "dynamic":false,
            "properties":{
              "key":{
                "type":"keyword",
                "ignore_above":256
              },
              "value":{
                "type":"keyword",
                "ignore_above":256
              },
              "tagType":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          }
        }
      },
      "process":{
        "properties":{
          "serviceName":{
            "type":"keyword",
            "ignore_above":256
          },
          "tag":{
            "type":"object"
          },
          "tags":{
            "type":"nested",
            "dynamic":false,
            "properties":{
              "key":{
                "type":"keyword",
                "ignore_above":256
              },
              "value":{
                "type":"keyword",
                "ignore_above":256
              },
              "tagType":{
                "type":"keyword",
                "ignore_above":256
              }
            }
          }
        }
      },
      "references":{
        "type":"nested",
        "dynamic":false,
        "properties":{
          "refType":{
            "type":"keyword",
            "ignore_above":256
          },
          "traceID":{
            "type":"keyword",
            "ignore_above":256
          },
          "spanID":{
            "type":"keyword",
            "ignore_above":256
          }
        }
      },
      "tag":{
        "type":"object"
      },
      "tags":{
        "type":"nested",
        "dynamic":false,
        "properties":{
          "key":{
            "type":"keyword",
            "ignore_above":256
          },
          "value":{
            "type":"keyword",
            "ignore_above":256
          },
          "tagType":{
            "type":"keyword",
            "ignore_above":256
          }
        }
      }
    }
  }
}
{
        "order": 0,
        "index_patterns": [
            "*jaeger-span-*"
        ],
        "settings": {
            "index": {
                "mapping": {
                    "nested_fields": {
                        "limit": "50"
                    }
                },
                "requests": {
                    "cache": {
                        "enable": "true"
                    }
                },
                "number_of_shards": "5",
                "number_of_replicas": "1"
            }
        },
        "mappings": {
            "dynamic_templates": [
                {
                    "span_tags_map": {
                        "path_match": "tag.*",
                        "mapping": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    }
                },
                {
                    "process_tags_map": {
                        "path_match": "process.tag.*",
                        "mapping": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    }
                }
            ],
            "properties": {
                "traceID": {
                    "ignore_above": 256,
                    "type": "keyword"
                },
                "process": {
                    "properties": {
                        "tag": {
                            "type": "object"
                        },
                        "serviceName": {
                            "ignore_above": 256,
                            "type": "keyword"
                        },
                        "tags": {
                            "dynamic": false,
                            "type": "nested",
                            "properties": {
                                "tagType": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                },
                                "value": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                },
                                "key": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            }
                        }
                    }
                },
                "startTimeMillis": {
                    "format": "epoch_millis",
                    "type": "date"
                },
                "references": {
                    "dynamic": false,
                    "type": "nested",
                    "properties": {
                        "traceID": {
                            "ignore_above": 256,
                            "type": "keyword"
                        },
                        "spanID": {
                            "ignore_above": 256,
                            "type": "keyword"
                        },
                        "refType": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    }
                },
                "flags": {
                    "type": "long"
                },
                "operationName": {
                    "ignore_above": 256,
                    "type": "keyword"
                },
                "parentSpanID": {
                    "ignore_above": 256,
                    "type": "keyword"
                },
                "tags": {
                    "dynamic": false,
                    "type": "nested",
                    "properties": {
                        "tagType": {
                            "ignore_above": 256,
                            "type": "keyword"
                        },
                        "value": {
                            "ignore_above": 256,
                            "type": "keyword"
                        },
                        "key": {
                            "ignore_above": 256,
                            "type": "keyword"
                        }
                    }
                },
                "spanID": {
                    "ignore_above": 256,
                    "type": "keyword"
                },
                "duration": {
                    "type": "long"
                },
                "startTime": {
                    "type": "long"
                },
                "tag": {
                    "type": "object"
                },
                "logs": {
                    "dynamic": false,
                    "type": "nested",
                    "properties": {
                        "fields": {
                            "dynamic": false,
                            "type": "nested",
                            "properties": {
                                "tagType": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                },
                                "value": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                },
                                "key": {
                                    "ignore_above": 256,
                                    "type": "keyword"
                                }
                            }
                        },
                        "timestamp": {
                            "type": "long"
                        }
                    }
                }
            }
        },
        "aliases": {}
    }

@albertteoh
Copy link
Contributor

albertteoh commented Jan 14, 2021

@aug70 please provide the steps to reproduce this locally. FWIW, the integer type flags mapping has been around for 17 months so it's not obvious to me how this error came about.

@jpkrohling jpkrohling added needs-info The ticket does not contain sufficient information and removed needs-triage labels Jan 14, 2021
@flf2ko
Copy link

flf2ko commented Jan 14, 2021

hi @aug70.
I have the same issue as yours recently.
I found the reason is that I have two different versions of jaeger collector at the same time, and I guessed the index mapping was created by older version of jaeger.
In my case, I upgrade all jaeger collectors/queries then remove all previous indices, then it works.

the mapping will be changed to

{
      "duration": {
          "type": "long"
        },
        "flags": {
          "type": "integer"
        },
        ...
}

@jpkrohling
Copy link
Contributor

That sounds odd. Would it happen during a regular upgrade from the older version to the new one as well? Would you need to delete indexes to use the newer collector? It should not be the case.

@aug70
Copy link
Author

aug70 commented Feb 8, 2021

@flf2ko it might be the case, because I had HotRod and my app creating indexes at the same time. I'm suspecting that's what caused this error. Unfortunately I have no access to the collectors, I had to have somebody to stop them so I could delete the indexes and restarted collectors. Problem solved after starting fresh. But it was an inconvenience.

@aug70 aug70 closed this as completed Feb 8, 2021
@aug70
Copy link
Author

aug70 commented Mar 29, 2021

Running into this again! Just upgraded to Helm Jaeger version 0.43.3 and jaeger 1.22.0

{"level":"error","ts":1617029142.5404692,"caller":"app/http_handler.go:435","msg":"HTTP handler, Internal Server Error","error":"search operations failed: elastic: Error 400 (Bad Request): all shards failed [type=search_phase_execution_exception]","stacktrace":"github.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).handleError\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:435\ngithub.com/jaegertracing/jaeger/cmd/query/app.(*APIHandler).getOperationsLegacy\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/http_handler.go:167\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\ngithub.com/opentracing-contrib/go-stdlib/nethttp.MiddlewareFunc.func5\n\tgithub.com/opentracing-contrib/[email protected]/nethttp/server.go:140\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\ngithub.com/gorilla/mux.(*Router).ServeHTTP\n\tgithub.com/gorilla/[email protected]/mux.go:210\ngithub.com/jaegertracing/jaeger/cmd/query/app.additionalHeadersHandler.func1\n\tgithub.com/jaegertracing/jaeger/cmd/query/app/additional_headers_handler.go:28\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\ngithub.com/gorilla/handlers.CompressHandlerLevel.func1\n\tgithub.com/gorilla/[email protected]/compress.go:141\nnet/http.HandlerFunc.ServeHTTP\n\tnet/http/server.go:2042\ngithub.com/gorilla/handlers.recoveryHandler.ServeHTTP\n\tgithub.com/gorilla/[email protected]/recovery.go:78\nnet/http.serverHandler.ServeHTTP\n\tnet/http/server.go:2843\nnet/http.(*conn).serve\n\tnet/http/server.go:1925"}

@aug70 aug70 reopened this Mar 29, 2021
@aug70
Copy link
Author

aug70 commented Mar 29, 2021

@jpkrohling I don't know what other info I can provide you? Please remove the needs-info tag because there isn't anything else I can provide here...

@jpkrohling jpkrohling removed the needs-info The ticket does not contain sufficient information label Mar 30, 2021
@jpkrohling
Copy link
Contributor

@aug70 are you able to consistently reproduce this? If I just install Jaeger 1.22.0 using the current Helm charts on a minikube instance, will I face this issue?

@flf2ko
Copy link

flf2ko commented Apr 12, 2021

hmm..I found this error would happen when do a (wildcard) query on a text type field instead of keyword type.
I found that my elasticsearch used mapping from dynamic mapping, so it is not the same as
https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/mappings/jaeger-span-7.json

The mapping would be something like following format, via dynamic mapping.

"operationName": {
  "type": "text",
  "fields": {
    "keyword": {
      "type": "keyword",
      "ignore_above": 256
    }
  }
}

In this case, the wildcard query should query on operationName.keyword instead of operationName.
It worked normally after I change to new mapping.

Not sure if this is the same case as your, @aug70, since I don't find any text type in your provided mapping above.

@aug70
Copy link
Author

aug70 commented Apr 12, 2021

I think the Jaeger version between collectors and the rest of the system caused this error. I put a PR to upgrade collectors and problem is now solved.

@raz08
Copy link

raz08 commented May 4, 2021

Is it solved after upgrading to latest Jaeger version? Which version do I need to upgrade?

@skundu1234
Copy link

I also faced same issue.I have uninstalled jaeger and deleted jaeger indices and template and redeploy jaeger ,its started working.Mostly its happened when you change index name or shards number so very important step is to delete index template otherwise this error will come.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants