You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It appears others would like a Term Count API as well. (it apparently used to be in ES, if I read that correctly).
I understand with sharding that it's not as simple as it may be, because with a pathological case of 1 shard having a lot of terms, but another not, it's not easy to get an accurate term count without getting each distinct list from the shards and doing a distinct on them.
A simpler method may just be to expose a result from each shard, something like:
This is produced from knowing that the absolute minimum number of distinct terms has to be the maximum from an individual shard (when shard37 holds all the unique terms, and the other shards just hold a subset). The absolute maximum number of distinct terms can only be the sum of the shard counts (in the pathological case where each shard is storing terms no other shard has).
This would have to be very fast to compute, and still useful, but may not satisfy all cases. The only alternative is to get a unique term stream from each shard and merge them into a distinct list and count . For very large numbers of terms that could prove a memory hog.
If I knew were to start, I'd have a crack at this, pointers in the direction and I can start to attempt it.
The text was updated successfully, but these errors were encountered:
I couldn't find this in the docs, or in the issue tracker, and following on from a discussion I had googled here:
http://elasticsearch-users.115913.n3.nabble.com/Terms-API-for-Spellchecker-td1691838.html
It appears others would like a Term Count API as well. (it apparently used to be in ES, if I read that correctly).
I understand with sharding that it's not as simple as it may be, because with a pathological case of 1 shard having a lot of terms, but another not, it's not easy to get an accurate term count without getting each distinct list from the shards and doing a distinct on them.
A simpler method may just be to expose a result from each shard, something like:
This is produced from knowing that the absolute minimum number of distinct terms has to be the maximum from an individual shard (when shard37 holds all the unique terms, and the other shards just hold a subset). The absolute maximum number of distinct terms can only be the sum of the shard counts (in the pathological case where each shard is storing terms no other shard has).
This would have to be very fast to compute, and still useful, but may not satisfy all cases. The only alternative is to get a unique term stream from each shard and merge them into a distinct list and count . For very large numbers of terms that could prove a memory hog.
If I knew were to start, I'd have a crack at this, pointers in the direction and I can start to attempt it.
The text was updated successfully, but these errors were encountered: