Term Count API #640

tallpsmith · 2011-01-20T04:23:33Z

I couldn't find this in the docs, or in the issue tracker, and following on from a discussion I had googled here:

http://elasticsearch-users.115913.n3.nabble.com/Terms-API-for-Spellchecker-td1691838.html

It appears others would like a Term Count API as well. (it apparently used to be in ES, if I read that correctly).

I understand with sharding that it's not as simple as it may be, because with a pathological case of 1 shard having a lot of terms, but another not, it's not easy to get an accurate term count without getting each distinct list from the shards and doing a distinct on them.

A simpler method may just be to expose a result from each shard, something like:

{
    "shards": {
        "shard1": 5,
        "shard2": 12,
        "shard37":450
    }
    "range": {
        "min": 450,
        "max": 467
    }
}

This is produced from knowing that the absolute minimum number of distinct terms has to be the maximum from an individual shard (when shard37 holds all the unique terms, and the other shards just hold a subset). The absolute maximum number of distinct terms can only be the sum of the shard counts (in the pathological case where each shard is storing terms no other shard has).

This would have to be very fast to compute, and still useful, but may not satisfy all cases. The only alternative is to get a unique term stream from each shard and merge them into a distinct list and count . For very large numbers of terms that could prove a memory hog.

If I knew were to start, I'd have a crack at this, pointers in the direction and I can start to attempt it.

The text was updated successfully, but these errors were encountered:

jpountz · 2014-03-13T18:32:41Z

#5426 has just been resolved and allows to compute unique counts.

javanna mentioned this issue Oct 18, 2013

Term Count on Search Results #3920

Closed

jpountz closed this as completed Mar 13, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Term Count API #640

Term Count API #640

tallpsmith commented Jan 20, 2011

jpountz commented Mar 13, 2014

Term Count API #640

Term Count API #640

Comments

tallpsmith commented Jan 20, 2011

jpountz commented Mar 13, 2014