unic

Works like UNIX sort | uniq to provide global uniques except you don't have to sort first.

Works by using Cuckoo Filters - See: https://github.com/seiflotfy/cuckoofilter

Advantages over `sort | uniq`

Quicker output, lower memory footprint

sort by definitions needs to buffer the entire input before it can begin outputing anything. This can use a lot of memory and prevents anything from getting output until the initial process completes.

unic uses probabalistic filters (Cuckoo) to determine if the input has been seen before, and can begin output after the first line of input.

Original item order is kept

Given the list 3 1 2 1 2 3, compare sort | uniq 's output

$ echo '3\n1\n2\n1\n2\n3' | sort | uniq
1
2
3

to unic

echo '3\n1\n2\n1\n2\n3' | unic
3
1
2

Disadvantages

Probabilistic Filtering

As unic works with Cuckoo Filters, there is a very small probability a line will be wrongly marked duplicate. Lines will never be incorrectly marked as unique due to the nature of the filter.

In cases where a false positive cannot ever be tolerated, unic should not be used.

Not compatible with all of `uniq`'s flags

unic by nature does not buffer; thus some of uniq's flags cannot be implemented.

In these cases, you should use uniq.

Installing

Binaries

See: releases

From Source

$ go install github.com/donatj/unic/cmd/unic@latest

Name		Name	Last commit message	Last commit date
Latest commit History 52 Commits
.github		.github
cmd/unic		cmd/unic
.gitignore		.gitignore
LICENSE.md		LICENSE.md
Makefile		Makefile
README.md		README.md
go.mod		go.mod
go.sum		go.sum
unic.go		unic.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

unic

Advantages over `sort | uniq`

Quicker output, lower memory footprint

Original item order is kept

Disadvantages

Probabilistic Filtering

Not compatible with all of `uniq`'s flags

Installing

Binaries

From Source

About

Releases 4

Sponsor this project

Packages

Contributors 3

Languages

License

donatj/unic

Folders and files

Latest commit

History

Repository files navigation

unic

Advantages over sort | uniq

Quicker output, lower memory footprint

Original item order is kept

Disadvantages

Probabilistic Filtering

Not compatible with all of uniq's flags

Installing

Binaries

From Source

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 4

Sponsor this project

Packages 0

Contributors 3

Languages

Advantages over `sort | uniq`

Not compatible with all of `uniq`'s flags

Packages