Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Language filtering causes wrong results #71

Closed
3a77 opened this issue Sep 9, 2022 · 1 comment
Closed

Language filtering causes wrong results #71

3a77 opened this issue Sep 9, 2022 · 1 comment
Labels
bug Something isn't working
Milestone

Comments

@3a77
Copy link

3a77 commented Sep 9, 2022

Hi, I think the language filtering that takes place before the n-grams are checked works too aggressively. I've made the observation that one non-German character is sufficient for Lingua to dismiss German as a possible language. Here are a few examples:

Vandalismus in Rotenburg: Bürger unterstützen Cafébesitzer
Barça-Fans feiern fünften Saisonsieg
Führung der César-Akademie zieht sich zurück
Ein gut gekühlter Roséwein
Flüchtlingsreferendum in Ungarn: Eigentor für Orbán
Charité-Beschäftigte streikten schon mehrfach
DFB: Fünf Clásico-Erkenntnisse für Bundestrainer Joachim Löw
Der Eröffnungstag des Sónar-Festivals für elektronische Musik gehörte den Instrumentalkünstlern

@pemistahl
Copy link
Owner

Hi @3a77, thank you for opening this issue. Indeed, the rule engine does not work optimally yet. I will use your examples to improve the algorithm.

@pemistahl pemistahl added the bug Something isn't working label Sep 20, 2022
@pemistahl pemistahl added this to the Lingua 1.1.3 milestone Sep 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants