-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
size:1 on a query applied to _all indexes only returns from the newest index, size:2 returns from all. #6614
Comments
Hi @weberr13 Your request is nested incorrectly:
Please could you fix the nesting and see if that resolves the issue? If it doesn't, add a comment with the correct query and I'll take another look. |
I don't understand the scope problem. If the size is outside the "query" it is ignored, if the size is inside the "filtered" it doesn't function. The size applies to the Query and it is scoped as such. If the "size" was at the wrong level, wouldn't it be ignored rather than returning the wrong 1 result? If it was at the wrong level, wouldn't the query for 2 also function the same way (only returning from the latest index, instead of returning one that is oldest and one from the latest? ) All logic seems to point to an "off by one" error when it comes to how the size parameter mixes with sorted results... |
it is quite possible, especially on older versions, that Elasticsearch hits an unknown key and just stops parsing, rather than throwing an error. |
When I move size up to Query, it is ignored, as I said. The only way to get 1 result is to write the query as above. It isn't about unknown keys either. All the results are valid, it is the sorting that is off. Basically this is what is occuring: Index 2 days ago: 2 records ( id a,b) When I query and limit the size to 10 (default) I get a, b, c, d, e, f When I query and limit the size to 5 I get a, b, c, d, e when I limit to 4 I get a, b, c,e 3: a, b, e 2: a, e 1: e Do you see the issue? |
@weberr13 Yes I see the issue. What I don't see is a query that actually runs. The query that you sent me is invalid and throws an error on 1.2.1: You can see from the documentation that Also, that So either you sent me a different query than the one you posted originally, or you are running a broken query. I'm not saying there isn't an issue. All I'm saying is that you haven't given me the information that I need to figure it out. What would be awesome is if you could give me a short recreation of the problem, with
With that in hand, it becomes a lot easier to figure out if there is a real problem or not. |
I'm running on 1.1.2 (as I said in the previous post). If you want to test on your version, go ahead and pull the size up to the same level of query, but for my install that doesn't work (it ignores the size parameter all together). As for re-creation, any data that can be sorted ASC on date would work. |
@weberr13 the Also, I've just looked at your second query which has another error: your bool query has a Please send me an actual working recreation. For instance, from the data you've provided, I've written the following, which works exactly as it is supposed to work:
So I am currently unable to recreate your issue. If you are able to recreate it and can send me that recreation, that would be very useful. |
I literally cut and pasted from my terminal when I submitted the bug. I'm sorry that you cannot believe me. |
@weberr13 It's not that I don't believe what you're seeing, but I'm telling you that what you copied and pasted is invalid. So before we can decide if there is a bug or not, we need a valid recreation. I demonstrated a recreation above that shows that (for this case) things work as expected. Of course there may be some other issue that does indeed have a bug, but unless you give me that information I am unable to figure out what causes it. Have you tried my recreation? Does it work for you? Do you not believe me? |
Here is what happens: #1 the query as I originally posted it: [weberr@probe Upgrade]$ curl -XGET http://localhost:9200/_all/meta/_search -d '{ The query as you describe (where I took _cache, sort and from outside of the query bock: curl -XGET http://localhost:9200/_all/meta/_search -d '{ I get: {"took":50, Where do you suggest I put the size: parameter in order for the query to work? Even though it returns the wrong entry, it at least works. Even more, by moving it, the Fields block suddenly stopped working as well. I believe it doesn't work for you, but I need something from you that I can run. |
Try this:
The query:
What does this give you? |
The above reproduces the issue: {"took":8512,"timed_out":false,"_shards":{"total":65,"successful":2,"failed":0},"hits":{"total":0,"max_score":null,"hits":[]}}[weberr@probe Upgrade]$ sh foo with size 2: |
OK great - now we're getting somewhere. Please could you run the same query a few times in a row and make sure that you get back the same results every time. I'm wondering if your primary and replica shards have diverged. |
I re-ran it several times with the same results. We've actually seen this on no fewer than 8 test machines over the last month or more (I'm not sure when our tester first saw it). |
hi @weberr13 We've tried to replicate this on 1.1.2 with lots of randomized testing, and so far have been unable to do so. Are you able to recreate a small test case which (when run from scratch) replicates the problem? Alternatively, would you be able to upload all of your data somewhere so that we can replicate the problem locally? You can send a link to the data to me personally if you'd prefer: clinton.gormley at elasticsearch.com |
You can see the test we're running here: 79309dc |
I'm confused, does that test successfully recreate or do you still need data from me? |
this test does pass all the time I ran it over 10k iterations with random number of shards, nodes etc. so it does NOT recreate the issue |
Hi @weberr13 Have you managed to put together a test case for this yet? We are unable to investigate further until we can reproduce the problem. If the only way you can do this is to send me your indices, I'd be happy to take a look at them too. |
I am still figuring out a way to do this. I'd like to send you our indexes but they contain Deep Packet Inspection metadata for our engineering department and are not suitable for public consumption. If there was a way to produce an index that only had the "fields": [ data I could send that. |
@weberr13 Understood - reindexing would be the only way, but that may change something else which "fixes" the issue. I'm happy to receive your indexes privately - just email me a link (but understand if you can't share them with me).. |
@weberr13 Any chance of that recreation? |
At this point I guess we cannot go further. We have a work around (we don't query for 1 anymore, always doing 2 and discarding one of them) and no way to share our indexes with you. If I had the free cycles I would to something, but I don't. |
FYI - I think you are hitting #8226 which is now fixed in the |
Our system is searching for the oldest record that hits a criteria. We attempted to accomplish this with a size limited ascending range query with bool filters.
On ES .90 and 1.1.2 the following queries were run:
curl -XGET http://localhost:9200/_all/meta/_search -d '{ "sort" : [ { "TimeUpdated" : { "order": "asc", "ignore_unmapped" : true } } ], query : {filtered : { filter : { bool : { must : [ { term : {"Written" : true }} , { term : { "LatestUpdate" : true } } , { range : { "TimeUpdated" : { gte : "2013/05/29 20:51:00" } } } ] } } } , _cache : false , from : 0,size : 1 , "fields" : [ "Session", "TimeUpdated", "TimeStart" ] } }'
{"took":616,"timed_out":false,"_shards":{"total":65,"successful":65,"failed":0},"hits":{"total":42959661,"max_score":null,"hits":[{"_index":"network_2014_06_25","_type":"meta","_id":"f3bb5501-ce17-4f13-a262-48b89e175201_2","_score":null,"fields":{"TimeStart":["2014/06/25 03:03:49"],"TimeUpdated":["2014/06/25 03:25:49"],"Session":["f3bb5501-ce17-4f13-a262-48b89e175201"]},"sort":[1403666749000]}]}}
The indices are built by date, this one "network_2014_06_25" is the newest. However adding one to the size gives a radically different result:
curl -XGET http://localhost:9200/_all/meta/_search -d '{ "sort" : [ { "TimeUpdated" : { "order": "asc", "ignore_unmapped" : true } } ], query : {filtered : { filter : { bool : { t : [ { term : {"Written" : true }} , { term : { "LatestUpdate" : true } } , { range : { "TimeUpdated" : { gte : "2013/05/29 20:51:00" } } } ] } } } , _cache : false , from : 0,size : 2 , "fields" : [ "Session", "TimeUpdated", "TimeStart" ] } }'
{"took":770,"timed_out":false,"_shards":{"total":65,"successful":65,"failed":0},"hits":{"total":42973532,"max_score":null,"hits":[{"_index":"network_2014_05_25","_type":"meta","_id":"b97c12c0-b863-49d2-9355-483aee55de16_1","_score":null,"fields":{"TimeStart":["2014/05/25 01:13:05"],"TimeUpdated":["2014/05/25 01:14:04"],"Session":["b97c12c0-b863-49d2-9355-483aee55de16"]},"sort":[1400980444000]},{"_index":"network_2014_06_25","_type":"meta","_id":"bdc9d4b3-4e60-4667-b66a-db831cf2f59b_2","_score":null,"fields":{"TimeStart":["2014/06/25 03:03:49"],"TimeUpdated":["2014/06/25 03:25:49"],"Session":["bdc9d4b3-4e60-4667-b66a-db831cf2f59b"]},"sort":[1403666749000]}]}}
Further increasing the size keeps hitting the correct "b97c12c0-b863-49d2-9355-483aee55de16_1" record rather than the certainly not the oldest "bdc9d4b3-4e60-4667-b66a-db831cf2f59b_2" record.
The text was updated successfully, but these errors were encountered: