GitHub - joshwalters/edcount: Estimate distinct count of values on the command line

edcount

Estimate distinct count of values from standard input. Provides a very fast way to perform unique count estimates on the command line.

A value is considered terminated with a newline.

Why use it?

If you want to quickly process a large number of values and find out roughly how many of them are unique.

The program is well suited to estimating unique counts of billions of unique values.

How it works

The edcount program implements HyperLogLog, with some minor modifications, as detailed by by Flajolet et. al. in the paper "HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm".

Additionally, the memory footprint of the program is constant, at a few megabytes. This memory use is constant regardless of the number of records counted, and does not degrade in accuracy.

Benchmark

Counting 10 million unique records with sort and wc:

$ seq 1 10000000 | shuf | sort -u | wc -l
10000000
real  1m16.213s
user  1m13.516s
sys   0m1.516s

Counting 10 million unique records with edcount:

$ seq 1 10000000 | shuf | edcount
9988659
real  0m1.110s
user  0m1.172s
sys   0m0.180s

This is a 0.11% difference from the actual unique value.

Fast Install Instructions

Required software: make,gcc

curl -L -o edcount-1.2.0.tar.gz https://github.com/joshwalters/edcount/releases/download/v1.2.0/edcount-1.2.0.tar.gz
tar -xzvf edcount-1.2.0.tar.gz
cd edcount-1.2.0
./configure
make
make install

You might have to use sudo make install instead of make install.

For more detail, see the INSTALL file.

Contributing / Development

See the CONTRIBUTING file.

License

GPLv3+, see the COPYING file.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
demo		demo
doc		doc
m4		m4
src		src
test		test
.gitignore		.gitignore
AUTHORS		AUTHORS
CONTRIBUTING		CONTRIBUTING
COPYING		COPYING
ChangeLog		ChangeLog
INSTALL		INSTALL
Makefile.am		Makefile.am
NEWS		NEWS
README		README
README.md		README.md
TODO		TODO
_config.yml		_config.yml
bootstrap		bootstrap
configure.ac		configure.ac

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

edcount

Why use it?

How it works

Benchmark

Fast Install Instructions

Contributing / Development

License

About

Releases 1

Packages

Languages

License

joshwalters/edcount

Folders and files

Latest commit

History

Repository files navigation

edcount

Why use it?

How it works

Benchmark

Fast Install Instructions

Contributing / Development

License

About

Resources

License

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages