-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wildcard search not available for content field #379
Comments
On the first view, it seems wildcard quries (especially with a leading wildcards) are not recommended (potential slow search performance). The ngram/edge_ngram tokenizer seems to be preferred for this case. To keep things simple, I first looked into how to get the content field added to the search query and how to be able to define the wildcards yourself. Changing the tokenizer (which would require a re-indexing and also adapted search query) should imho be a long term goal. 1. Add content field to search query (as wildcard) Wildcard field seem to be added here. Adding the content field to this function seems to do the trick:
2. Respect wildcards entered in search field Predefined wildcards seem to be added here. Following change does check for existing wildcards and avoids adding additional wildcards in this case.
After applying those changes, files are found as expected (issue mentioned in the original post solved). I tried this on three instances I'm running and wasn't able to notice any practical performance impact (of course that's not representative in any way 😉... expecially, as I didn't mention any details about the size of the indexes involved). I'm quite sure that wildcard search was working for the content field a couple of years ago (at least I created some personal documentation with wildcard search examples which stopped working at some point). @R0Wi Do you have any insights in this regards? Any suggestion/alternative approach how to solve this issue? |
Hey @XueSheng-GIT, thanks for the comprehensive insights - really impressive 👍 Unfortunately, I don't have too much historical knowledge about the From my point of view you did a pretty well research and the technical solution looks good to me. Maybe we could think about making the wildcard search in |
@R0Wi thanks for your quick reply! 1. Search term:
Show JSON body
2. Search term:
Show JSON body
3. Search term:
Show JSON body
4. Search term:
Show JSON body
|
@ArtificialOwl Do you have any insights, why |
Description
When searching, only the fields
title
andshare_names.user
are considered for wildcard search. It's not possible to use wildcard search for the content of files. Especially for languages like German, it's hard to find something because a lot of words are joined to one word (in my example I used the Word "Barbarenfreunde" and I'm searching for "Freunde").In addition, the current wildcard search does only use a fixed leading/following
*
(wildcard search only looks for*freunde*
in title and share_names). It's not possible to define the available elasticsearch wildcards*
and?
yourself.Steps to reproduce:
Search query is shown below (at the bottom of this issue).
Expected behaviour
Search result should show the above created file.
Actual behaviour
Search result does not show the above created file.
System details
OS: Ubuntu 22.04 LTS
Nextcloud: 29.0.3
Elasticsearch: 8.14.2
Fulltextsearch: 29.0.0
Fulltextsearch_Elasticsearch: 29.0.1
Files_Fulltextsearch: 29.0.0
Search query created by nextcloud:
The text was updated successfully, but these errors were encountered: