High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets
-
Updated
Aug 9, 2024 - Python
High-performance MinHash implementation in Rust with Python bindings for efficient similarity estimation and deduplication of large datasets
SetSketch: Filling the Gap between MinHash and HyperLogLog
Fast MinHash Distances algorithms collection
A simple MinHash implementation based on the explanation in the Mining of Massive Datasets course by Stanford
DataHipsters is a service implementing MinHash similarity on a Key-Value Database (Google AppEngine/GCloud), including an API for k-nearest neighbors (k-nn) used in Online Recommender Systems.
Interactive program to get similar users and movies
Add a description, image, and links to the minhash-similarity topic page so that developers can more easily learn about it.
To associate your repository with the minhash-similarity topic, visit your repo's landing page and select "manage topics."