-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Owned by a self-identified native" search criterion throws an error for English #2166
Comments
I can only reproduce this issue on tatoeba.org, but not on dev.tatoeba.org or locally. The search daemon returns the following error:
The "owned by a self-identified native" criterion filters by creating a list of natives in the searched language and filtering by all these user ids. I think the amount of native English speakers on Tatoeba crossed some limit of Manticore (4096). |
Is there a possibility, of limiting this list to only those who actually own sentences? Or, is that something that is already being done? |
Yes.
No. Another, maybe more scalable solution is to split the list of users into slices of 4096 and to send a batch of search queries to the daemon, each with a different slice, and group the results. If I remember correctly, the Manticore API makes it easy to do things like that. Or, we can perform the native check during indexation (instead of query) and add this as a new attribute (and it means we have to live-update this attribute too). |
The last time I checked, there were only 5,893 identified native speakers who owned sentences. http://tatoeba.ueuo.com/stats-200118.html My numbers may differ from what's actually on the website for the following reasons.
|
This fixes the error message when #2166 occurs: it should be "search error" instead of "syntax error".
Closes #2166. Work around Manticore filter values limitation. When the number of natives is greater than 4096, Manticore throws an error. To avoid this, we filter by excluding non-natives instead. This is possible because filters are combined with a boolean AND operation, so we can create multiple filters with 4096 values each. At the moment, on Tatoeba, there are 4128 English natives and 5129 non-natives.
It turns out it doesn’t look it’s possible to group the results of a batch of queries. You just get one resultset per query. |
The Problem
At least for some advanced searches, "Owned by a self-identified native" gets an error message.
Experiments: Things you can try
Here are several searches with only minor differences, results sorted oldest first.
Only the first one results in an error message.
Query: liquor|brandy|ale|absinthe|daiquiri|margarita|sangria|wine|tea|soda|smoothie|milkshake|milk|lemonade|juice|coffee|espresso|cappuccino|cocoa|grog|cola|beer|whiskey|bourbon|tequila|rum|cocktail|cider|martini|vodka|gin|"white russian"|"bloody mary"|"tom collins"|"hot chocolate"|"piña colada"|"soft drink"|"soda water"|"black cow"|"mint julep"|"egg nog"|"tonic water"|"mineral water"
Limited to native speakers ** This one gets an error **
Search error
Invalid query. Please refer to the search documentation for more details.
https://tatoeba.org/eng/sentences/search?query=liquor%7Cbrandy%7Cale%7Cabsinthe%7Cdaiquiri%7Cmargarita%7Csangria%7Cwine%7Ctea%7Csoda%7Csmoothie%7Cmilkshake%7Cmilk%7Clemonade%7Cjuice%7Ccoffee%7Cespresso%7Ccappuccino%7Ccocoa%7Cgrog%7Ccola%7Cbeer%7Cwhiskey%7Cbourbon%7Ctequila%7Crum%7Ccocktail%7Ccider%7Cmartini%7Cvodka%7Cgin%7C%22white+russian%22%7C%22bloody+mary%22%7C%22tom+collins%22%7C%22hot+chocolate%22%7C%22pi%C3%B1a+colada%22%7C%22soft+drink%22%7C%22soda+water%22%7C%22black+cow%22%7C%22mint+julep%22%7C%22egg+nog%22%7C%22tonic+water%22%7C%22mineral+water%22&from=eng&to=und&user=&orphans=no&unapproved=no&has_audio=&tags=&list=&native=yes&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort=created&sort_reverse=yes
Limited to List 907 (1,000 results out of 4,499 occurrences)
https://tatoeba.org/eng/sentences/search?query=liquor%7Cbrandy%7Cale%7Cabsinthe%7Cdaiquiri%7Cmargarita%7Csangria%7Cwine%7Ctea%7Csoda%7Csmoothie%7Cmilkshake%7Cmilk%7Clemonade%7Cjuice%7Ccoffee%7Cespresso%7Ccappuccino%7Ccocoa%7Cgrog%7Ccola%7Cbeer%7Cwhiskey%7Cbourbon%7Ctequila%7Crum%7Ccocktail%7Ccider%7Cmartini%7Cvodka%7Cgin%7C%22white+russian%22%7C%22bloody+mary%22%7C%22tom+collins%22%7C%22hot+chocolate%22%7C%22pi%C3%B1a+colada%22%7C%22soft+drink%22%7C%22soda+water%22%7C%22black+cow%22%7C%22mint+julep%22%7C%22egg+nog%22%7C%22tonic+water%22%7C%22mineral+water%22&from=eng&to=und&user=&orphans=no&unapproved=no&has_audio=&tags=&list=&native=yes&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort=created&sort_reverse=yes
Has Audio (1,000 results out of 3,069 occurrences)
https://tatoeba.org/eng/sentences/search?query=liquor%7Cbrandy%7Cale%7Cabsinthe%7Cdaiquiri%7Cmargarita%7Csangria%7Cwine%7Ctea%7Csoda%7Csmoothie%7Cmilkshake%7Cmilk%7Clemonade%7Cjuice%7Ccoffee%7Cespresso%7Ccappuccino%7Ccocoa%7Cgrog%7Ccola%7Cbeer%7Cwhiskey%7Cbourbon%7Ctequila%7Crum%7Ccocktail%7Ccider%7Cmartini%7Cvodka%7Cgin%7C%22white+russian%22%7C%22bloody+mary%22%7C%22tom+collins%22%7C%22hot+chocolate%22%7C%22pi%C3%B1a+colada%22%7C%22soft+drink%22%7C%22soda+water%22%7C%22black+cow%22%7C%22mint+julep%22%7C%22egg+nog%22%7C%22tonic+water%22%7C%22mineral+water%22&from=eng&to=und&user=&orphans=no&unapproved=no&has_audio=yes&tags=&list=&native=&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort=created&sort_reverse=yes
No limits (1,000 results out of 7,870 occurrences)
https://tatoeba.org/eng/sentences/search?query=liquor%7Cbrandy%7Cale%7Cabsinthe%7Cdaiquiri%7Cmargarita%7Csangria%7Cwine%7Ctea%7Csoda%7Csmoothie%7Cmilkshake%7Cmilk%7Clemonade%7Cjuice%7Ccoffee%7Cespresso%7Ccappuccino%7Ccocoa%7Cgrog%7Ccola%7Cbeer%7Cwhiskey%7Cbourbon%7Ctequila%7Crum%7Ccocktail%7Ccider%7Cmartini%7Cvodka%7Cgin%7C%22white+russian%22%7C%22bloody+mary%22%7C%22tom+collins%22%7C%22hot+chocolate%22%7C%22pi%C3%B1a+colada%22%7C%22soft+drink%22%7C%22soda+water%22%7C%22black+cow%22%7C%22mint+julep%22%7C%22egg+nog%22%7C%22tonic+water%22%7C%22mineral+water%22&from=eng&to=und&user=&orphans=no&unapproved=no&has_audio=&tags=&list=&native=&trans_filter=limit&trans_to=und&trans_link=&trans_user=&trans_orphan=&trans_unapproved=&trans_has_audio=&sort=created&sort_reverse=yes
I purposely avoided the "relevance" option because of #1895
The text was updated successfully, but these errors were encountered: