We are publishing pre-trained word vectors for 294 languages, trained on Wikipedia using fastText. These vectors in dimension 300 were obtained using the skip-gram model described in Bojanowski et al. (2016) with default parameters.
The word vectors come in both the binary and text default formats of fastText. In the text format, each line contain a word followed by its embedding. Each value is space separated. Words are ordered by their frequency in a descending order.
The pre-trained word vectors are distributed under the Creative Commons Attribution-Share-Alike License 3.0.
If you use these word embeddings, please cite the following paper:
P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information
@article{bojanowski2016enriching,
title={Enriching Word Vectors with Subword Information},
author={Bojanowski, Piotr and Grave, Edouard and Joulin, Armand and Mikolov, Tomas},
journal={arXiv preprint arXiv:1607.04606},
year={2016}
}
The models can be downloaded from:
Abkhazian: bin+text, text | Acehnese: bin+text, text | Adyghe: bin+text, text |
Afar: bin+text, text | Afrikaans: bin+text, text | Akan: bin+text, text |
Albanian: bin+text, text | Alemannic: bin+text, text | Amharic: bin+text, text |
Anglo_Saxon: bin+text, text | Arabic: bin+text, text | Aragonese: bin+text, text |
Aramaic: bin+text, text | Armenian: bin+text, text | Aromanian: bin+text, text |
Assamese: bin+text, text | Asturian: bin+text, text | Avar: bin+text, text |
Aymara: bin+text, text | Azerbaijani: bin+text, text | Bambara: bin+text, text |
Banjar: bin+text, text | Banyumasan: bin+text, text | Bashkir: bin+text, text |
Basque: bin+text, text | Bavarian: bin+text, text | Belarusian: bin+text, text |
Bengali: bin+text, text | Bihari: bin+text, text | Bishnupriya Manipuri: bin+text, text |
Bislama: bin+text, text | Bosnian: bin+text, text | Breton: bin+text, text |
Buginese: bin+text, text | Bulgarian: bin+text, text | Burmese: bin+text, text |
Buryat: bin+text, text | Cantonese: bin+text, text | Catalan: bin+text, text |
Cebuano: bin+text, text | Central Bicolano: bin+text, text | Chamorro: bin+text, text |
Chavacano: bin+text, text | Chechen: bin+text, text | Cherokee: bin+text, text |
Cheyenne: bin+text, text | Chichewa: bin+text, text | Chinese: bin+text, text |
Choctaw: bin+text, text | Chuvash: bin+text, text | Classical Chinese: bin+text, text |
Cornish: bin+text, text | Corsican: bin+text, text | Cree: bin+text, text |
Crimean Tatar: bin+text, text | Croatian: bin+text, text | Czech: bin+text, text |
Danish: bin+text, text | Divehi: bin+text, text | Dutch: bin+text, text |
Dutch Low Saxon: bin+text, text | Dzongkha: bin+text, text | Eastern Punjabi: bin+text, text |
Egyptian Arabic: bin+text, text | Emilian_Romagnol: bin+text, text | English: bin+text, text |
Erzya: bin+text, text | Esperanto: bin+text, text | Estonian: bin+text, text |
Ewe: bin+text, text | Extremaduran: bin+text, text | Faroese: bin+text, text |
Fiji Hindi: bin+text, text | Fijian: bin+text, text | Finnish: bin+text, text |
Franco_Provençal: bin+text, text | French: bin+text, text | Friulian: bin+text, text |
Fula: bin+text, text | Gagauz: bin+text, text | Galician: bin+text, text |
Gan: bin+text, text | Georgian: bin+text, text | German: bin+text, text |
Gilaki: bin+text, text | Goan Konkani: bin+text, text | Gothic: bin+text, text |
Greek: bin+text, text | Greenlandic: bin+text, text | Guarani: bin+text, text |
Gujarati: bin+text, text | Haitian: bin+text, text | Hakka: bin+text, text |
Hausa: bin+text, text | Hawaiian: bin+text, text | Hebrew: bin+text, text |
Herero: bin+text, text | Hill Mari: bin+text, text | Hindi: bin+text, text |
Hiri Motu: bin+text, text | Hungarian: bin+text, text | Icelandic: bin+text, text |
Ido: bin+text, text | Igbo: bin+text, text | Ilokano: bin+text, text |
Indonesian: bin+text, text | Interlingua: bin+text, text | Interlingue: bin+text, text |
Inuktitut: bin+text, text | Inupiak: bin+text, text | Irish: bin+text, text |
Italian: bin+text, text | Jamaican Patois: bin+text, text | Japanese: bin+text, text |
Javanese: bin+text, text | Kabardian: bin+text, text | Kabyle: bin+text, text |
Kalmyk: bin+text, text | Kannada: bin+text, text | Kanuri: bin+text, text |
Kapampangan: bin+text, text | Karachay_Balkar: bin+text, text | Karakalpak: bin+text, text |
Kashmiri: bin+text, text | Kashubian: bin+text, text | Kazakh: bin+text, text |
Khmer: bin+text, text | Kikuyu: bin+text, text | Kinyarwanda: bin+text, text |
Kirghiz: bin+text, text | Kirundi: bin+text, text | Komi: bin+text, text |
Komi_Permyak: bin+text, text | Kongo: bin+text, text | Korean: bin+text, text |
Kuanyama: bin+text, text | Kurdish (Kurmanji): bin+text, text | Kurdish (Sorani): bin+text, text |
Ladino: bin+text, text | Lak: bin+text, text | Lao: bin+text, text |
Latgalian: bin+text, text | Latin: bin+text, text | Latvian: bin+text, text |
Lezgian: bin+text, text | Ligurian: bin+text, text | Limburgish: bin+text, text |
Lingala: bin+text, text | Lithuanian: bin+text, text | Livvi_Karelian: bin+text, text |
Lojban: bin+text, text | Lombard: bin+text, text | Low Saxon: bin+text, text |
Lower Sorbian: bin+text, text | Luganda: bin+text, text | Luxembourgish: bin+text, text |
Macedonian: bin+text, text | Maithili: bin+text, text | Malagasy: bin+text, text |
Malay: bin+text, text | Malayalam: bin+text, text | Maltese: bin+text, text |
Manx: bin+text, text | Maori: bin+text, text | Marathi: bin+text, text |
Marshallese: bin+text, text | Mazandarani: bin+text, text | Meadow Mari: bin+text, text |
Min Dong: bin+text, text | Min Nan: bin+text, text | Minangkabau: bin+text, text |
Mingrelian: bin+text, text | Mirandese: bin+text, text | Moksha: bin+text, text |
Moldovan: bin+text, text | Mongolian: bin+text, text | Muscogee: bin+text, text |
Nahuatl: bin+text, text | Nauruan: bin+text, text | Navajo: bin+text, text |
Ndonga: bin+text, text | Neapolitan: bin+text, text | Nepali: bin+text, text |
Newar: bin+text, text | Norfolk: bin+text, text | Norman: bin+text, text |
North Frisian: bin+text, text | Northern Luri: bin+text, text | Northern Sami: bin+text, text |
Northern Sotho: bin+text, text | Norwegian (Bokmål): bin+text, text | Norwegian (Nynorsk): bin+text, text |
Novial: bin+text, text | Nuosu: bin+text, text | Occitan: bin+text, text |
Old Church Slavonic: bin+text, text | Oriya: bin+text, text | Oromo: bin+text, text |
Ossetian: bin+text, text | Palatinate German: bin+text, text | Pali: bin+text, text |
Pangasinan: bin+text, text | Papiamentu: bin+text, text | Pashto: bin+text, text |
Pennsylvania German: bin+text, text | Persian: bin+text, text | Picard: bin+text, text |
Piedmontese: bin+text, text | Polish: bin+text, text | Pontic: bin+text, text |
Portuguese: bin+text, text | Quechua: bin+text, text | Ripuarian: bin+text, text |
Romani: bin+text, text | Romanian: bin+text, text | Romansh: bin+text, text |
Russian: bin+text, text | Rusyn: bin+text, text | Sakha: bin+text, text |
Samoan: bin+text, text | Samogitian: bin+text, text | Sango: bin+text, text |
Sanskrit: bin+text, text | Sardinian: bin+text, text | Saterland Frisian: bin+text, text |
Scots: bin+text, text | Scottish Gaelic: bin+text, text | Serbian: bin+text, text |
Serbo_Croatian: bin+text, text | Sesotho: bin+text, text | Shona: bin+text, text |
Sicilian: bin+text, text | Silesian: bin+text, text | Simple English: bin+text, text |
Sindhi: bin+text, text | Sinhalese: bin+text, text | Slovak: bin+text, text |
Slovenian: bin+text, text | Somali: bin+text, text | Southern Azerbaijani: bin+text, text |
Spanish: bin+text, text | Sranan: bin+text, text | Sundanese: bin+text, text |
Swahili: bin+text, text | Swati: bin+text, text | Swedish: bin+text, text |
Tagalog: bin+text, text | Tahitian: bin+text, text | Tajik: bin+text, text |
Tamil: bin+text, text | Tarantino: bin+text, text | Tatar: bin+text, text |
Telugu: bin+text, text | Tetum: bin+text, text | Thai: bin+text, text |
Tibetan: bin+text, text | Tigrinya: bin+text, text | Tok Pisin: bin+text, text |
Tongan: bin+text, text | Tsonga: bin+text, text | Tswana: bin+text, text |
Tulu: bin+text, text | Tumbuka: bin+text, text | Turkish: bin+text, text |
Turkmen: bin+text, text | Tuvan: bin+text, text | Twi: bin+text, text |
Udmurt: bin+text, text | Ukrainian: bin+text, text | Upper Sorbian: bin+text, text |
Urdu: bin+text, text | Uyghur: bin+text, text | Uzbek: bin+text, text |
Venda: bin+text, text | Venetian: bin+text, text | Vepsian: bin+text, text |
Vietnamese: bin+text, text | Volapük: bin+text, text | Võro: bin+text, text |
Walloon: bin+text, text | Waray: bin+text, text | Welsh: bin+text, text |
West Flemish: bin+text, text | West Frisian: bin+text, text | Western Punjabi: bin+text, text |
Wolof: bin+text, text | Wu: bin+text, text | Xhosa: bin+text, text |
Yiddish: bin+text, text | Yoruba: bin+text, text | Zazaki: bin+text, text |
Zeelandic: bin+text, text | Zhuang: bin+text, text | Zulu: bin+text, text |