letter-frequency

This is a python script which tries to identify the language of a text by the frequency of its letter.

Run

Run with python letter_frequency.py, here the five top languages for each text snippet. The script will also generate a graph if matplotlib is installed (optional).

Scores for nl_columbus_short.txt

In de prachtige zeestad Genua, de trotsche bijgenaamd, werd omstreeks het jaar 1435 een knaapje geboren, dat nu in alle landen als Christophorus Columbus bekend is. (...)

Dutch: MSE 0.0016 | 639.8 points
German: MSE 0.006 | 166.3 points
Danish: MSE 0.0076 | 131.6 points
French: MSE 0.0127 | 78.6 points
Swedish: MSE 0.0137 | 72.9 points

Scores for it_boccaccio_short.txt

Solone, il cui petto un umano tempio di divina sapienzia fu reputato, e le cui sacratissime leggi sono ancora alli presenti uomini chiara testimonianza dell'antica giustizia, era, (...)

Italian: MSE 0.0007 | 1503.6 points
Spanish: MSE 0.0042 | 237.0 points
French: MSE 0.0057 | 175.9 points
Esperanto: MSE 0.0068 | 147.4 points
Portuguese: MSE 0.0072 | 139.4 points

Scores for fi_kivi_short.txt

Jukolan talo, eteläisessä Hämeessä, seisoo erään mäen pohjaisella rinteellä, liki Toukolan kylää. Sen läheisin ympäristö on kivinen tanner, mutta alempana alkaa pellot, joissa, (...)

Finnish: MSE 0.002 | 491.5 points
Esperanto: MSE 0.008 | 124.4 points
Czech: MSE 0.0144 | 69.6 points
Swedish: MSE 0.0154 | 65.0 points
Icelandic: MSE 0.0172 | 58.0 points

The three snippets provided are from Project Gutenberg, each one contains only the first 200 lines of the original book.

Languages

Here's a list of languages in the list, which are checked against an example text:

French
German
Spanish
Portuguese
Esperanto
Italian
Turkish
Swedish
Polish
Dutch
Danish
Icelandic
Finnish
Czech

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
snippets		snippets
columbus.png		columbus.png
language_identifier.py		language_identifier.py
letter_frequency.csv		letter_frequency.csv
readme.md		readme.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

letter-frequency

Run

Scores for nl_columbus_short.txt

Scores for it_boccaccio_short.txt

Scores for fi_kivi_short.txt

Languages

About

Releases

Packages

Languages

Elycoo/letter-frequency

Folders and files

Latest commit

History

Repository files navigation

letter-frequency

Run

Scores for nl_columbus_short.txt

Scores for it_boccaccio_short.txt

Scores for fi_kivi_short.txt

Languages

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages