-
Notifications
You must be signed in to change notification settings - Fork 25k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Log queries before execution #9172
Comments
@clintongormley : Do you have any concerns regarding the implementation for this request. What factors should somebody take care of, if he were to think about implementing the same. Thanks. |
From an operational perspective, it would be great to see two lines per query:
The lines would need to have some kind of common uuid logged (or something) so they can be easily paired up. |
This is something that we'd dearly love to see as well. In fact, we would be satisfied with just a regular full query log. We almost took a crack at implementing it and sending a PR, but decided just to update and use the Jetty plugin instead. We were a little apprehensive about using the 0ms slow query log, as we have pretty high query volume (8k requests/sec), and we weren't sure we could trust the slow query logger not to bottleneck performance (and also not to fill the disk). The other big problem with the slow query log is that it logs fetch and query separately at the shard level, which I agree is useful for analyzing slow queries, but it's not what you want if you're trying to measure the full query execution time. The jetty plugin works pretty well for query logging, but:
|
+1 , @clintongormley is this "adoptme" one ? we need this feature too and open to working on it . |
It looks like this has been marked as discuss for a while, but hasn't actually been discussed yet :) Just some thoughts to get the discussion going:
|
I'm a fan of this request. In a previous life we did this logging on the client side and used it to find a few bugs.
But it'll still be visible in the logs. The logs might not be the right tool for identifying what is causing them but this could be helpful.
Its more black-box-ish than a log at start and stop. Logs around the query starting and stopping are more bullet proof I think.
This is probably worth fixing.
Probably all and make turning it on and off an index level dynamic setting. I'd settle for just doing it on the coordinating node ala SearchSlowLog as a first cut. That is only slightly better than what clients can do. |
+1 |
This should probably xref Etsy's ES Restlog (https://github.com/etsy/es-restlog#overview). The points they bring up are very good, especially the point about
I'm also a little unclear on the state of this discussion. It kinda seems like this got de-prioritized in favor of #12187 which then maybe kinda sorta got rolled into this #15117 somewhat maybe (I'm actually not sure I follow how 15117 addresses this issue or 12187 exactly unless searches, including failed searches, are tasks too... or maybe it's just Friday afternoon and I'm overlooking something very obvious?). The original discussion seemed to focus on troubleshooting GC, for example. For me, with ES 2*, my experience has been exactly what @clintongormley suspected they would be, eg,
That being said, I still think this idea has a lot of merit. One of the things the MySQL community got very right was the work they did with the Percona query digest (https://www.percona.com/doc/percona-toolkit/2.1/pt-query-digest.html). A DBA with the output of PT Query Digest can usually zoom in on a handful of bad query "fingerprints" that are causing an inordinate amount of work on the database and clean them up to great effect. This kind of analysis -- figuring out what queries caused the database to do the most work -- fundamentally requires the ability to review and analyze the queries that the cluster is responding to -- on a per query (as opposed to a per shard) level. That's not really feasible with the slow query log. And, specifically where that overlaps what this ticket is talking about, it's also not possible to identify query fingerprints that have a tendency to result in time outs or failures and might need some help. In some cases, being unable to identify these fingerprints might be devastating. Suppose you have a query on a cron job that's set to alert you if a certain event threshold is exceeded (or suppose you use Watcher or Elastalert and set up the same thing there). What if your data grows or your keys get skewed and a query that started out just fine develops a tendency to time out out every time you run it. Will this cron job / Watcher / Elastalert alert you to the fact that the query it's running keeps failing? The cron job might... and Elastalert and Watcher will certainly complain in their log files (maybe there's a way to push alerts on query failure but, if there is, I haven't noticed it). But will the developer / administrator of the cluster see the failures in the logs? It's not difficult to imagine that they wouldn't. Personally, I think logging queries before execution is a great idea. Further, to the point @avleen made, I think it would also be a great idea to log queries after they execute, and to log them on the query (as opposed to shard) level. Logging slow queries at the shard level makes great sense. But when you're trying to figure out what queries are failing most often and/or what queries your cluster spends the most time answering, I don't think the shard level is the right place to keep track of that information. As a datastore administrator, it is considerably easier for me to help developers understand which of their queries are problematic and/or failing when they're working with MySQL. With Elasticsearch... it can be done. But honestly it's a lot harder than it probably should because I don't have a cluster level record of
|
Logging queries at the start of execution means that you need a second log line for the execution time, which then complicates log parsing. You can now check the task manager for long running queries, and optionally kill them. |
Currently the best way to emulate a full query log, is to set the time on the slow query log to 0s.
The downside to this, is that queries aren't logged until they finish running.
If the query takes a really long time (we've seen queries that can overload a box, cause GC problems and basically never end), you never find out what the query was.
Can we have a separate log which records all queries before they are run?
The text was updated successfully, but these errors were encountered: