From 671a209ed923511e3b3f29eecb277aa705d8570f Mon Sep 17 00:00:00 2001 From: Mayya Sharipova Date: Fri, 8 Mar 2019 16:16:03 -0500 Subject: [PATCH] Correct errors in min_hash filter documentation Related to #39671 --- .../analysis/tokenfilters/minhash-tokenfilter.asciidoc | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/docs/reference/analysis/tokenfilters/minhash-tokenfilter.asciidoc b/docs/reference/analysis/tokenfilters/minhash-tokenfilter.asciidoc index 21c7387e0f7f5..75bcf53b6d9a4 100644 --- a/docs/reference/analysis/tokenfilters/minhash-tokenfilter.asciidoc +++ b/docs/reference/analysis/tokenfilters/minhash-tokenfilter.asciidoc @@ -30,7 +30,7 @@ occurring in a document is low. At the same time, as internally each shingle is hashed into to 128-bit hash, you should choose `k` small enough so that all possible different k-words shingles can be hashed to 128-bit hash with -minimal collision. 5-word shingles typically work well. +minimal collision. * choosing the right settings for `hash_count`, `bucket_count` and `hash_set_size` needs some experimentation. @@ -39,7 +39,7 @@ minimal collision. 5-word shingles typically work well. will provide a higher guarantee that different tokens are indexed to different buckets. ** to improve the recall, -you should increase `hash_token` parameter. For example, +you should increase `hash_count` parameter. For example, setting `hash_count=2`, will make each token to be hashed in two different ways, thus increasing the number of potential candidates for search.