You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To keep the search index clean and small and improve search results it would be good to have the possibility to remove common words from index. Examples would be something like this, that, a, and or ein, eine, weil, dass, der, die, das for German based indices.
For most sites it would decrease the size of index and would improve search results. For the search term "and" we would not return ”and he goes...", "and Peter...” but something like "Android", "Andreas".
This is not only the case for these common words in the language, but if the list is well chosen, for other words too.
E.g having a site about coffee, the word "coffee" will be on nearly every site and it could make sense to remove this from index because the search results would just be a full representation of the entire site.
The text was updated successfully, but these errors were encountered:
To simplify things, I think the first approach should be just to add a list. Including the NLTK list would be nice, but basically adding the whole list by myself is the feature you always need at the end. So I would start with that and perhaps include ease of use features later.
To keep the search index clean and small and improve search results it would be good to have the possibility to remove common words from index. Examples would be something like this, that, a, and or ein, eine, weil, dass, der, die, das for German based indices.
For most sites it would decrease the size of index and would improve search results. For the search term "and" we would not return ”and he goes...", "and Peter...” but something like "Android", "Andreas".
This is not only the case for these common words in the language, but if the list is well chosen, for other words too.
E.g having a site about coffee, the word "coffee" will be on nearly every site and it could make sense to remove this from index because the search results would just be a full representation of the entire site.
The text was updated successfully, but these errors were encountered: