Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One of two languages with same confidence value is erroneously removed #72

Closed
mmedek opened this issue Sep 23, 2020 · 1 comment
Closed
Labels
bug Something isn't working
Milestone

Comments

@mmedek
Copy link

mmedek commented Sep 23, 2020

There is a potential bug in the computeLanguageConfidenceValues method in LanguageDetector class when the same confidence is computed.

The return value SortedMap<Language, Double> uses comparator by prob-value, this means that when any two languages have the same probability then only one of them could remain in the map. So for example, when the original distribution is cs=0.4, de=0.4, en=0.2 then this method will return only de=0.4, en=0.2.

fun computeLanguageConfidenceValues(text: String): SortedMap<Language, Double> {

@pemistahl pemistahl added the bug Something isn't working label Sep 23, 2020
@pemistahl pemistahl added this to the Lingua 1.1.0 milestone Sep 23, 2020
@pemistahl
Copy link
Owner

Good catch @mmedek, thank you. I wasn't aware of that. I will write a custom comparator that fixes the issue.

@pemistahl pemistahl changed the title Bug in computeLanguageConfidenceValues method when languages have same confidence One of two languages with same confidence value is erroneously removed Sep 23, 2020
@pemistahl pemistahl modified the milestones: Lingua 1.1.0, Lingua 1.0.3 Oct 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants