Skip to content

Commit

Permalink
Correct errors in min_hash filter documentation
Browse files Browse the repository at this point in the history
Related to #39671
  • Loading branch information
mayya-sharipova committed Mar 8, 2019
1 parent 1095e10 commit aad9397
Showing 1 changed file with 2 additions and 2 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ occurring in a document is low. At the same time, as
internally each shingle is hashed into to 128-bit hash, you should choose
`k` small enough so that all possible
different k-words shingles can be hashed to 128-bit hash with
minimal collision. 5-word shingles typically work well.
minimal collision.

* choosing the right settings for `hash_count`, `bucket_count` and
`hash_set_size` needs some experimentation.
Expand All @@ -39,7 +39,7 @@ minimal collision. 5-word shingles typically work well.
will provide a higher guarantee that different tokens are
indexed to different buckets.
** to improve the recall,
you should increase `hash_token` parameter. For example,
you should increase `hash_count` parameter. For example,
setting `hash_count=2`, will make each token to be hashed in
two different ways, thus increasing the number of potential
candidates for search.
Expand Down

0 comments on commit aad9397

Please sign in to comment.