Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Term Count API #640

Closed
tallpsmith opened this issue Jan 20, 2011 · 1 comment
Closed

Term Count API #640

tallpsmith opened this issue Jan 20, 2011 · 1 comment

Comments

@tallpsmith
Copy link

I couldn't find this in the docs, or in the issue tracker, and following on from a discussion I had googled here:

http://elasticsearch-users.115913.n3.nabble.com/Terms-API-for-Spellchecker-td1691838.html

It appears others would like a Term Count API as well. (it apparently used to be in ES, if I read that correctly).

I understand with sharding that it's not as simple as it may be, because with a pathological case of 1 shard having a lot of terms, but another not, it's not easy to get an accurate term count without getting each distinct list from the shards and doing a distinct on them.

A simpler method may just be to expose a result from each shard, something like:

{
    "shards": {
        "shard1": 5,
        "shard2": 12,
        "shard37":450
    }
    "range": {
        "min": 450,
        "max": 467
    }
}

This is produced from knowing that the absolute minimum number of distinct terms has to be the maximum from an individual shard (when shard37 holds all the unique terms, and the other shards just hold a subset). The absolute maximum number of distinct terms can only be the sum of the shard counts (in the pathological case where each shard is storing terms no other shard has).

This would have to be very fast to compute, and still useful, but may not satisfy all cases. The only alternative is to get a unique term stream from each shard and merge them into a distinct list and count . For very large numbers of terms that could prove a memory hog.

If I knew were to start, I'd have a crack at this, pointers in the direction and I can start to attempt it.

@jpountz
Copy link
Contributor

jpountz commented Mar 13, 2014

#5426 has just been resolved and allows to compute unique counts.

@jpountz jpountz closed this as completed Mar 13, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants