Skip to content

Commit

Permalink
Update release notes
Browse files Browse the repository at this point in the history
  • Loading branch information
pemistahl committed Apr 3, 2024
1 parent e9cefee commit ef28e8b
Show file tree
Hide file tree
Showing 19 changed files with 67 additions and 41 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -790,7 +790,7 @@ including mean, median and standard deviation.
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/lightgreen.png"> 67</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 99</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 96</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 96</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 97</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 99</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 98</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 99</td>
Expand Down Expand Up @@ -3252,8 +3252,8 @@ including mean, median and standard deviation.
</tr>
<tr>
<td>Ukrainian</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 92</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 86</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 94</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 88</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 83</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 91</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 95</td>
Expand All @@ -3262,7 +3262,7 @@ including mean, median and standard deviation.
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 81</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/lightgreen.png"> 77</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/lightgreen.png"> 78</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 84</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 85</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/lightgreen.png"> 75</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/lightgreen.png"> 66</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/lightgreen.png"> 78</td>
Expand All @@ -3272,7 +3272,7 @@ including mean, median and standard deviation.
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/lightgreen.png"> 62</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/yellow.png"> 46</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/lightgreen.png"> 62</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 97</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 98</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 92</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 85</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 94</td>
Expand All @@ -3282,9 +3282,9 @@ including mean, median and standard deviation.
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 83</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 88</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/lightgreen.png"> 75</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 95</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 93</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 97</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 99</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 96</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 98</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 100</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 100</td>
<td><img src="https://raw.githubusercontent.com/pemistahl/lingua-py/pure-python-impl/images/green.png"> 100</td>
Expand Down
13 changes: 12 additions & 1 deletion RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## Lingua 1.3.5 (released on 08 Dec 2023)
## Lingua 1.3.5 (released on 03 Apr 2024)

### Improvements

Expand All @@ -8,6 +8,17 @@
performance was much too slow with the former approach, this change makes
sense because adding more memory is quite cheap.

- The language model files are now compressed with the Brotli algorithm which
reduces the file size by 15 %, on average.

- The characters `Щщ` are now correctly identified as possible indicators for
the Ukrainian language, leading to slightly higher accuracy when identifying
Ukrainian texts.

### Miscellaneous

- All dependencies have been updated to their latest versions.

## Lingua 1.3.4 (released on 07 Nov 2023)

### Miscellaneous
Expand Down
4 changes: 2 additions & 2 deletions accuracy-reports/aggregated-accuracy-values.csv
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Belarusian,NaN,NaN,NaN,NaN,76,42,87,99,84,67,86,100,85,69,87,99,85,69,88,98,92,8
Bengali,NaN,NaN,NaN,NaN,63,19,69,99,99,98,99,99,92,92,88,97,98,94,99,100,98,94,99,100,98,94,99,100,100,100,100,100,100,100,100,100,100,100,100,100
Bokmal,50,15,45,90,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,13,3,12,23,NaN,NaN,NaN,NaN,69,53,70,85,75,55,77,91,NaN,NaN,NaN,NaN,50,27,47,75,58,39,59,77
Bosnian,NaN,NaN,NaN,NaN,19,4,15,36,33,19,28,52,5,2,4,8,9,9,10,8,54,54,64,44,65,54,76,64,NaN,NaN,NaN,NaN,29,23,29,36,35,29,35,41
Bulgarian,68,44,67,91,66,32,72,93,70,45,66,98,67,46,62,93,78,56,81,99,89,80,88,98,92,83,95,99,72,50,68,96,78,56,81,96,87,70,91,99
Bulgarian,68,44,67,91,66,32,72,93,70,45,66,98,67,46,62,93,78,56,81,99,89,80,88,98,92,83,95,99,72,50,68,97,78,56,81,96,87,70,91,99
Catalan,59,32,62,81,38,4,30,79,48,19,42,84,38,5,29,81,57,33,57,83,63,42,63,85,66,44,67,88,55,26,52,87,58,33,60,82,70,51,74,87
Chinese,NaN,NaN,NaN,NaN,33,NaN,2,98,92,92,83,100,96,90,97,100,71,46,68,100,71,46,68,100,71,46,68,100,64,39,55,97,100,100,100,100,100,100,100,100
Croatian,NaN,NaN,NaN,NaN,51,34,47,73,42,26,42,58,48,16,38,90,47,28,42,72,72,62,79,76,81,64,87,93,73,50,71,98,60,36,57,86,73,53,74,90
Expand Down Expand Up @@ -67,7 +67,7 @@ Thai,NaN,NaN,NaN,NaN,100,100,100,100,99,100,100,98,100,100,100,100,100,100,100,1
Tsonga,NaN,NaN,NaN,NaN,61,19,68,97,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,72,46,73,97,84,66,89,98
Tswana,NaN,NaN,NaN,NaN,56,17,57,94,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,NaN,71,44,73,96,84,65,88,99
Turkish,76,55,78,96,66,30,71,97,69,41,70,97,67,50,67,84,86,70,88,100,86,70,88,100,86,70,88,100,82,62,84,100,87,71,91,99,94,84,98,100
Ukrainian,78,62,75,97,77,46,88,99,81,62,83,98,76,54,77,96,91,78,94,100,95,90,95,100,98,94,98,100,83,66,85,97,86,75,92,93,92,84,97,95
Ukrainian,78,62,75,97,77,46,88,99,81,62,83,98,76,54,77,96,91,78,94,100,95,90,95,100,98,94,98,100,83,66,85,98,88,75,92,96,94,85,98,99
Urdu,NaN,NaN,NaN,NaN,61,8,75,99,61,39,53,92,58,30,46,99,63,40,50,99,75,59,68,99,80,68,74,99,83,67,84,97,79,65,78,94,90,80,94,96
Vietnamese,NaN,NaN,NaN,NaN,63,NaN,90,100,66,26,74,99,86,65,93,100,89,71,97,100,89,71,97,100,89,71,97,100,93,81,97,100,87,75,87,98,91,79,94,99
Welsh,69,58,60,90,72,34,85,98,69,43,66,98,49,11,39,95,64,35,61,96,69,41,71,96,72,46,74,97,85,69,88,99,82,61,87,99,91,78,96,99
Expand Down
6 changes: 3 additions & 3 deletions accuracy-reports/lingua-high-accuracy/Bulgarian.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
##### Bulgarian #####

>>> Accuracy on average: 86.80%
>>> Accuracy on average: 86.70%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 70.20%
Erroneously classified as Macedonian: 12.80%, Russian: 9.40%, Serbian: 4.10%, Ukrainian: 1.60%, Kazakh: 0.80%, Belarusian: 0.60%, Mongolian: 0.50%
Accuracy: 69.90%
Erroneously classified as Macedonian: 12.80%, Russian: 9.40%, Serbian: 4.10%, Ukrainian: 1.90%, Kazakh: 0.80%, Belarusian: 0.60%, Mongolian: 0.50%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 91.20%
Expand Down
6 changes: 3 additions & 3 deletions accuracy-reports/lingua-high-accuracy/Russian.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
##### Russian #####

>>> Accuracy on average: 89.70%
>>> Accuracy on average: 89.67%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 76.50%
Erroneously classified as Ukrainian: 6.30%, Bulgarian: 5.50%, Serbian: 3.40%, Belarusian: 3.30%, Macedonian: 3.00%, Mongolian: 1.10%, Kazakh: 0.90%
Accuracy: 76.40%
Erroneously classified as Ukrainian: 6.40%, Bulgarian: 5.50%, Serbian: 3.40%, Belarusian: 3.30%, Macedonian: 3.00%, Mongolian: 1.10%, Kazakh: 0.90%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 94.80%
Expand Down
14 changes: 7 additions & 7 deletions accuracy-reports/lingua-high-accuracy/Ukrainian.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
##### Ukrainian #####

>>> Accuracy on average: 92.23%
>>> Accuracy on average: 93.77%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 84.40%
Erroneously classified as Russian: 4.90%, Serbian: 3.40%, Bulgarian: 2.60%, Belarusian: 2.20%, Macedonian: 1.40%, Mongolian: 0.90%, Kazakh: 0.20%
Accuracy: 85.00%
Erroneously classified as Russian: 4.50%, Serbian: 3.40%, Bulgarian: 2.40%, Belarusian: 2.20%, Macedonian: 1.40%, Mongolian: 0.90%, Kazakh: 0.20%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 97.30%
Erroneously classified as Russian: 1.00%, Bulgarian: 0.50%, Serbian: 0.50%, Macedonian: 0.40%, Belarusian: 0.30%
Accuracy: 97.50%
Erroneously classified as Russian: 0.80%, Bulgarian: 0.50%, Serbian: 0.50%, Macedonian: 0.40%, Belarusian: 0.30%

>> Detection of 1000 sentences (average length: 108 chars)
Accuracy: 95.00%
Erroneously classified as Kazakh: 4.10%, Belarusian: 0.30%, Macedonian: 0.30%, Russian: 0.30%
Accuracy: 98.80%
Erroneously classified as Belarusian: 0.30%, Kazakh: 0.30%, Macedonian: 0.30%, Russian: 0.30%

6 changes: 3 additions & 3 deletions accuracy-reports/lingua-low-accuracy/Bulgarian.txt
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
##### Bulgarian #####

>>> Accuracy on average: 77.83%
>>> Accuracy on average: 77.77%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 56.40%
Erroneously classified as Macedonian: 13.20%, Russian: 12.50%, Serbian: 6.20%, Kazakh: 3.60%, Ukrainian: 3.50%, Belarusian: 2.40%, Mongolian: 2.20%
Accuracy: 56.20%
Erroneously classified as Macedonian: 13.20%, Russian: 12.50%, Serbian: 6.20%, Ukrainian: 3.80%, Kazakh: 3.60%, Belarusian: 2.40%, Mongolian: 2.10%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 80.60%
Expand Down
10 changes: 5 additions & 5 deletions accuracy-reports/lingua-low-accuracy/Russian.txt
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
##### Russian #####

>>> Accuracy on average: 78.47%
>>> Accuracy on average: 78.37%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 59.20%
Erroneously classified as Ukrainian: 8.40%, Macedonian: 8.20%, Bulgarian: 6.60%, Serbian: 5.30%, Belarusian: 4.80%, Mongolian: 4.00%, Kazakh: 3.50%
Accuracy: 59.10%
Erroneously classified as Ukrainian: 8.60%, Macedonian: 8.20%, Bulgarian: 6.60%, Serbian: 5.30%, Belarusian: 4.80%, Mongolian: 4.00%, Kazakh: 3.40%

>> Detection of 1000 word pairs (average length: 16 chars)
Accuracy: 83.90%
Erroneously classified as Macedonian: 4.80%, Ukrainian: 4.30%, Bulgarian: 2.60%, Serbian: 1.80%, Mongolian: 1.10%, Belarusian: 1.00%, Kazakh: 0.50%
Accuracy: 83.70%
Erroneously classified as Macedonian: 4.80%, Ukrainian: 4.50%, Bulgarian: 2.60%, Serbian: 1.80%, Mongolian: 1.10%, Belarusian: 1.00%, Kazakh: 0.50%

>> Detection of 1000 sentences (average length: 65 chars)
Accuracy: 92.30%
Expand Down
14 changes: 7 additions & 7 deletions accuracy-reports/lingua-low-accuracy/Ukrainian.txt
Original file line number Diff line number Diff line change
@@ -1,16 +1,16 @@
##### Ukrainian #####

>>> Accuracy on average: 86.33%
>>> Accuracy on average: 87.87%

>> Detection of 1000 single words (average length: 8 chars)
Accuracy: 74.70%
Erroneously classified as Russian: 6.50%, Serbian: 5.40%, Belarusian: 3.90%, Macedonian: 3.80%, Bulgarian: 2.10%, Kazakh: 1.90%, Mongolian: 1.70%
Accuracy: 75.30%
Erroneously classified as Russian: 6.20%, Serbian: 5.40%, Belarusian: 3.90%, Macedonian: 3.80%, Kazakh: 1.90%, Bulgarian: 1.80%, Mongolian: 1.70%

>> Detection of 1000 word pairs (average length: 17 chars)
Accuracy: 91.60%
Erroneously classified as Russian: 3.20%, Serbian: 1.90%, Belarusian: 1.00%, Bulgarian: 0.70%, Macedonian: 0.70%, Mongolian: 0.70%, Kazakh: 0.20%
Accuracy: 91.80%
Erroneously classified as Russian: 3.00%, Serbian: 1.90%, Belarusian: 1.00%, Bulgarian: 0.70%, Macedonian: 0.70%, Mongolian: 0.70%, Kazakh: 0.20%

>> Detection of 1000 sentences (average length: 108 chars)
Accuracy: 92.70%
Erroneously classified as Kazakh: 4.10%, Macedonian: 2.10%, Russian: 0.40%, Belarusian: 0.30%, Bulgarian: 0.20%, Serbian: 0.20%
Accuracy: 96.50%
Erroneously classified as Macedonian: 2.10%, Russian: 0.40%, Belarusian: 0.30%, Kazakh: 0.30%, Bulgarian: 0.20%, Serbian: 0.20%

Binary file modified images/plots/barplot-average.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/plots/barplot-sentences.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/plots/barplot-single-words.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/plots/barplot-word-pairs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/plots/boxplot-average.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/plots/boxplot-sentences.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/plots/boxplot-single-words.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified images/plots/boxplot-word-pairs.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
11 changes: 10 additions & 1 deletion lingua/_constant.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,7 +80,16 @@
"ЁёЫыЭэ": frozenset(
[Language.BELARUSIAN, Language.KAZAKH, Language.MONGOLIAN, Language.RUSSIAN]
),
"ЩщЪъ": frozenset(
"Щщ": frozenset(
[
Language.BULGARIAN,
Language.KAZAKH,
Language.MONGOLIAN,
Language.RUSSIAN,
Language.UKRAINIAN,
]
),
"Ъъ": frozenset(
[Language.BULGARIAN, Language.KAZAKH, Language.MONGOLIAN, Language.RUSSIAN]
),
"Òò": frozenset(
Expand Down
8 changes: 7 additions & 1 deletion tests/test_detector.py
Original file line number Diff line number Diff line change
Expand Up @@ -463,7 +463,13 @@ def test_language_detection_with_rules(word, expected_language):
),
pytest.param(
"плаваща",
[Language.BULGARIAN, Language.KAZAKH, Language.MONGOLIAN, Language.RUSSIAN],
[
Language.BULGARIAN,
Language.KAZAKH,
Language.MONGOLIAN,
Language.RUSSIAN,
Language.UKRAINIAN,
],
),
pytest.param(
"довършат",
Expand Down

0 comments on commit ef28e8b

Please sign in to comment.