Rename languages to match ISO 639-3 names #1670

sabretou · 2018-09-14T13:41:55Z

As we are introducing the new language selector, perhaps we should have languages match their ISO 639-3 names. This is because some languages used parentheses or alternate names for easier discovery earlier.

Here are my suggestions:

Language Code -> Current Name -> Proposed Name

cmn -> Chinese (Mandarin) -> Mandarin Chinese
nob -> Norwegian (Bokmål) -> Norwegian Bokmål
nno -> Norwegian (Nynorsk) -> Norwegian Nynorsk
nst -> Naga (Tangshang) -> Tase Naga
pan -> Punjabi (Eastern) -> Punjabi (Punjabi is by far the more popular spelling variant, so I recommend going with that. Alternately, we could add 'Panjabi' in parentheses).
zsm -> Malay -> Standard Malay
mww -> Hmong Daw (White) -> Hmong Daw
afb -> Arabic (Gulf) -> Gulf Arabic
pnb -> Punjabi (Western) -> Western Punjabi (I propose 'Punjabi' over 'Panjabi' for the same reason as above)
aln -> Albanian (Gheg) -> Gheg Albanian
jdt -> Juhuri (Judeo-Tat) -> Judeo-Tat
cjy -> Chinese (Jin) -> Jinyu Chinese
hnj -> Hmong Njua (Green) -> Hmong Njua
bcl -> Bikol (Central) -> Central Bikol
pfl -> Palatine German -> Pfaelzisch
orv -> Old East Slavic -> Old Russian
prg -> Old Prussian -> Prussian
cmo -> Mnong, Central -> Central Mnong
acm -> Iraqi Arabic -> Mesopotamian Arabic
jam -> Jamaican Patois -> Jamaican Creole English
mhr -> Meadow Mari -> Eastern Mari
mrj -> Hill Mari -> Western Mari
dtp -> Central Dusun -> Kadazan Dusun
wuu -> Shanghainese -> Wu Chinese
yue -> Cantonese -> Yue Chinese
pes -> Persian -> Iranian Persian
ell -> Greek -> Modern Greek
pms -> Piedmontese -> Piemontese
tpw -> Old Tupi -> Tupí

I propose zlm -> Malay (Vernacular) stay as it is. In ISO 639-3, it is listed as "Malay (individual language)", which could be confusing.

Similarly, I think kek -> Kekchi (Q'eqchi') should remain as-is for visibility.

ori -> Odia (Oriya) is another special case that I think should stay.

jiru · 2018-09-14T17:43:27Z

I wrote a script that compares CLDR’s language names against Tatoeba’s and print differences. Note that CLDR has alternate namings on the top of the "normal" name.

ISO3 ISO1       Tatoeba's name               CLDR's name (alternative naming)
-----------------------------------------------------------------------------
 abk   ab               Abkhaz                 Abkhazian
 aln  aln      Albanian (Gheg)             Gheg Albanian
 aze   az          Azerbaijani                     Azeri (short)
 ben   bn              Bengali                    Bangla
 bua  bua               Buryat                    Buriat
 crh  crh        Crimean Tatar           Crimean Turkish
 crs  crs   Seychellois Creole     Seselwa Creole French
 fry   fy              Frisian           Western Frisian
 ilo  ilo              Ilocano                     Iloko
 jam  jam      Jamaican Patois   Jamaican Creole English
 kaa  kaa           Karakalpak               Kara-Kalpak
 kal   kl          Greenlandic               Kalaallisut
 ksh  ksh               Kölsch                 Colognian
 kir   ky               Kyrgyz                   Kirghiz (variant)
 lug   lg              Luganda                     Ganda
 mrj  mrj            Hill Mari              Western Mari
 mya   my              Burmese          Myanmar Language (variant)
 nau   na              Nauruan                     Nauru
 nob   nb   Norwegian (Bokmål)          Norwegian Bokmål
 nds  nds            Low Saxon                Low German
 nya   ny            Chinyanja                    Nyanja
 oji   oj               Ojibwe                    Ojibwa
 ori   or         Odia (Oriya)                      Odia
 oss   os             Ossetian                   Ossetic
 pan   pa    Punjabi (Eastern)                   Punjabi
 pam  pam          Kapampangan                  Pampanga
 prg  prg         Old Prussian                  Prussian
 pus   ps               Pashto                    Pushto (variant)
 quc  quc              K'iche'                   Kʼicheʼ
 rif  rif              Tarifit                   Riffian
 rom  rom               Romani                    Romany
 sah  sah                Yakut                     Sakha
 ssw   ss                Swazi                     Swati
 tet  tet                Tetun                     Tetum
 tkl  tkl            Tokelauan                   Tokelau
 tsn   tn             Setswana                    Tswana
 tvl  tvl             Tuvaluan                    Tuvalu
 uig   ug               Uyghur                    Uighur (variant)
 wuu  wuu         Shanghainese                Wu Chinese
 cmn   zh   Chinese (Mandarin)                   Chinese
 cmn   zh   Chinese (Mandarin)          Mandarin Chinese (long)

Hope this helps.

trang · 2018-09-22T12:23:14Z

@cueyayotl I'll let you check and confirm the renaming suggested. I can imagine we won't have a clear answer for all the languages, so it would be nice to at least start with a list of languages we're confident to rename. We don't have to rename everything at once. For the more problematic ones, we can go step by step.

RyckRichards · 2019-02-15T02:45:06Z

@cueyayotl any workflow you'd suggest?

RyckRichards · 2019-03-26T10:16:20Z

@sabretou Now you're the one in charge to validate language requests on Tatoeba. Perhaps you might want to have a look on this :)

sabretou · 2019-03-30T14:41:02Z

Let's go ahead with the first batch of renamings. I have cleared the following for renaming.

cmn -> Chinese (Mandarin) -> Mandarin Chinese
nob -> Norwegian (Bokmål) -> Norwegian Bokmål
nno -> Norwegian (Nynorsk) -> Norwegian Nynorsk
afb -> Arabic (Gulf) -> Gulf Arabic
aln -> Albanian (Gheg) -> Gheg Albanian
bcl -> Bikol (Central) -> Central Bikol
cmo -> Mnong, Central -> Central Mnong
dtp -> Central Dusun -> Kadazan Dusun
wuu -> Shanghainese -> Wu Chinese
yue -> Cantonese -> Yue Chinese

RyckRichards · 2019-03-31T04:53:32Z

Sure

trang · 2019-04-07T21:43:51Z

@sabretou I'm wondering if renaming Cantonese to Yue Chinese will not confuse our users. Looking at some of the comments of nickyeow, our main contributor in yue, he refers to the language as Cantonese. I, myself, am not very familiar with the name "Yue Chinese", while I'm more familiar with Cantonese. If I was not involved in Tatoeba, with this name change I would actually think for a moment that Cantonese has been removed from the supported languages.

I have similar concerns for Shanghainese and to a certain extent Central Dusun, as we are introducing new words as replacement of the initial words.

Actually for Shanghainese, I know there is a comment in our code saying:

// TODO to change when shanghainese will not be the only wu dialect

Meaning that we used wuu as a code for Shanghainese knowing that wuu encapsulates more than just Shanghainese. But I think that changing the name today may not be that easy, because it's been there since 2009.

I suggest to try and contact members of Tatoeba who are contributing in dtp, wuu and yue, or just active members who have those languages listed in their profile, to have their opinion about the name change.

Refs #1670

jiru · 2019-04-18T10:56:33Z

According to Wikipedia, the "yue" iso code stands for Yue Chinese, which encompasses Cantonese as well as other varieties.

While the term Cantonese specifically refers to the prestige variety, it is often used in a broader sense for the entire Yue subgroup of Chinese, including related but largely mutually unintelligible languages and dialects such as Taishanese.

It’s a complex matter. If we want to make it easy to understand, we should use the word "Cantonese", but then we won’t ever have contributors or other Yue dialects such as Taishanese. If we want to follow the ISO standard, we should use "Yue Chinese" and include other dialects under that code, like Taishanese. However these dialects are mutually unintelligible, so it make little sense for contributors to group them under a same language on Tatoeba.

Note that since we’ve been using the name Cantonese on Tatoeba, it’s likely that we only have contributors of Cantonese, and not other Yue dialects.

jiru · 2019-12-24T15:26:51Z

Quoting Wikipedia about Wu Chinese:

Shanghainese (simplified Chinese: 上海话/上海闲话; traditional Chinese: 上海話/上海閒話; pinyin: Shànghǎihuà/Shànghǎi xiánhuà): is also a very common name, used because Shanghai is the most well-known city in the Wu-speaking region, and most people are unfamiliar with the term Wu Chinese. The use of the term Shanghainese for referring to the family is more typically used outside of China and in simplified introductions to the areas where it is spoken or to other similar topics, for example one might encounter sentences like "They speak a kind of Shanghainese in Ningbo." The term Shanghainese is never used by serious linguists to refer to anything but the variety used in Shanghai.

However, looking at the Shanghainese article:

Shanghainese belongs to the Taihu Wu subgroup, and contains vocabulary and expressions from the entire Taihu Wu area of southern Jiangsu and northern Zhejiang. With nearly 14 million speakers, Shanghainese is also the largest single form of Wu Chinese. It serves as the lingua franca of the entire Yangtze River Delta region.

So we should figure out whether sentences currently belonging to our Shanghainese corpus are all Shanghainese dialect of Taihu Wu, or also include other Wu languages.

It is worth noting that this year, there has been a proposal about splitting Wu Chinese, which is still under review by the SIL. If that proposal is accepted, it would result in the creation of Taihu Wu Chinese (among others). That would certainly help sorting out our wuu corpus and solve the naming issue.

jiru · 2019-12-24T15:42:13Z

As for Central Dusun, that name has been changed by the SIL into Kadazan Dusun in 2016 as part of a merge. According to the proposal, the new name matches better how the speakers call their own language and it encompasses more dialects, so it’s probably safe to rename.

jiru · 2020-02-01T00:59:56Z

It is worth noting that this year, there has been a proposal about splitting Wu Chinese

The proposal has been rejected.

RyckRichards · 2021-05-20T09:41:48Z

Should I work on Phase 2?
Related: #936

LBeaudoux · 2024-08-18T15:13:19Z

Here is an updated list of Tatoeba language names that differ from their standard ISO 639-3 names.

ISO 639-3	Tatoeba language name	ISO 639-3 language name
abk	Abkhaz	Abkhazian
acm	Iraqi Arabic	Mesopotamian Arabic
ain	Ainu	Ainu (Japan)
ang	Old English	Old English (ca. 450-1100)
apc	North Levantine Arabic	Levantine Arabic
arn	Mapuche	Mapudungun
ava	Avar	Avaric
brx	Bodo	Bodo (India)
bua	Buryat	Buriat
chn	Chinook Jargon	Chinook jargon
cjy	Jin Chinese	Jinyu Chinese
ckb	Central Kurdish (Soranî)	Central Kurdish
ckt	Chukchi	Chukot
crs	Seychellois Creole	Seselwa Creole French
diq	Southern Zaza (Dimli)	Dimli (individual language)
dtp	Central Dusun	Kadazan Dusun
ell	Greek	Modern Greek (1453-)
enm	Middle English	Middle English (1100-1500)
frm	Middle French	Middle French (ca. 1400-1600)
fro	Old French	Old French (842-ca. 1400)
frr	North Frisian	Northern Frisian
fry	Frisian	Western Frisian
gom	Konkani (Goan)	Goan Konkani
grc	Ancient Greek	Ancient Greek (to 1453)
hat	Haitian Creole	Haitian
hnj	Hmong Njua (Green)	Hmong Njua
hye	Eastern Armenian	Armenian
iii	Nuosu	Sichuan Yi
ike	Inuktitut	Eastern Canadian Inuktitut
ilo	Ilocano	Iloko
ina	Interlingua	Interlingua (International Auxiliary Language Association)
jam	Jamaican Patois	Jamaican Creole English
jdt	Juhuri (Judeo-Tat)	Judeo-Tat
kaa	Karakalpak	Kara-Kalpak
kal	Greenlandic	Kalaallisut
kam	Kamba	Kamba (Kenya)
kek	Kekchi (Q'eqchi')	Kekchí
kir	Kyrgyz	Kirghiz
kiu	Northern Zaza (Kirmanjki)	Kirmanjki (individual language)
kmr	Northern Kurdish (Kurmancî)	Northern Kurdish
lez	Lezgi	Lezghian
lim	Limburgish	Limburgan
liv	Livonian	Liv
lug	Luganda	Ganda
lvs	Latvian	Standard Latvian
mfa	Kelantan-Pattani Malay	Pattani Malay
mhr	Meadow Mari	Eastern Mari
mik	Hitchiti	Mikasuki
mni	Meitei	Manipuri
mrj	Hill Mari	Western Mari
mus	Muskogee (Creek)	Creek
mww	Hmong Daw (White)	Hmong Daw
nau	Nauruan	Nauru
nds	Low German (Low Saxon)	Low German
ngt	Ngeq	Kriang
npi	Nepali	Nepali (individual language)
nst	Naga (Tangshang)	Tase Naga
nya	Chinyanja	Nyanja
oar	Old Aramaic	Old Aramaic (up to 700 BCE)
oci	Occitan	Occitan (post 1500)
oji	Ojibwe	Ojibwa
ood	O'odham	Tohono O'odham
ori	Odia (Oriya)	Oriya (macrolanguage)
orv	Old East Slavic	Old Russian
ota	Ottoman Turkish	Ottoman Turkish (1500-1928)
pal	Middle Persian (Pahlavi)	Pahlavi
pam	Kapampangan	Pampanga
pan	Punjabi (Eastern)	Panjabi
pes	Persian	Iranian Persian
pfl	Palatine German	Pfaelzisch
pms	Piedmontese	Piemontese
pnb	Punjabi (Western)	Western Panjabi
prg	Old Prussian	Prussian
pus	Pashto	Pushto
qxq	Qashqai	Qashqa'i
rap	Rapa Nui	Rapanui
rom	Romani	Romany
run	Kirundi	Rundi
ryu	Okinawan	Central Okinawan
shi	Tashelhit	Tachelhit
ssw	Swazi	Swati
stq	Saterland Frisian	Saterfriesisch
swh	Swahili	Swahili (individual language)
syc	Syriac	Classical Syriac
tet	Tetun	Tetum
tkl	Tokelauan	Tokelau
tmr	Jewish Babylonian Aramaic	Jewish Babylonian Aramaic (ca. 200-1200 CE)
toi	Tonga (Zambezi)	Tonga (Zambia)
ton	Tongan	Tonga (Tonga Islands)
tsn	Setswana	Tswana
tts	Isan	Northeastern Thai
tvl	Tuvaluan	Tuvalu
uig	Uyghur	Uighur
war	Waray	Waray (Philippines)
wuu	Shanghainese	Wu Chinese
yua	Yucatec Maya	Yucateco
yue	Cantonese	Yue Chinese
zea	Zeelandic	Zeeuws
zlm	Malay (Vernacular)	Malay (individual language)
zsm	Malay	Standard Malay

jiru added the ui-strings Issue that requires changing the text of the UI. label Sep 14, 2018

trang mentioned this issue Sep 22, 2018

Reviewing names of languages #936

Closed

trang added the lang-request Request to add a new language, change the icon or change the name. label Sep 22, 2018

trang assigned cueyayotl Sep 22, 2018

trang mentioned this issue Oct 8, 2018

Finnish (Kven) [fkv] #1691

Closed

RyckRichards mentioned this issue Mar 31, 2019

Renamed name of languages - phase 1 #1850

Merged

trang pushed a commit that referenced this issue Apr 17, 2019

Rename some languages (#1850)

652663b

Refs #1670

trang unassigned cueyayotl Feb 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rename languages to match ISO 639-3 names #1670

Rename languages to match ISO 639-3 names #1670

sabretou commented Sep 14, 2018

jiru commented Sep 14, 2018

trang commented Sep 22, 2018

RyckRichards commented Feb 15, 2019

RyckRichards commented Mar 26, 2019

sabretou commented Mar 30, 2019

RyckRichards commented Mar 31, 2019

trang commented Apr 7, 2019

jiru commented Apr 18, 2019

jiru commented Dec 24, 2019 •

edited

Loading

jiru commented Dec 24, 2019

jiru commented Feb 1, 2020

RyckRichards commented May 20, 2021

LBeaudoux commented Aug 18, 2024

Rename languages to match ISO 639-3 names #1670

Rename languages to match ISO 639-3 names #1670

Comments

sabretou commented Sep 14, 2018

jiru commented Sep 14, 2018

trang commented Sep 22, 2018

RyckRichards commented Feb 15, 2019

RyckRichards commented Mar 26, 2019

sabretou commented Mar 30, 2019

RyckRichards commented Mar 31, 2019

trang commented Apr 7, 2019

jiru commented Apr 18, 2019

jiru commented Dec 24, 2019 • edited Loading

jiru commented Dec 24, 2019

jiru commented Feb 1, 2020

RyckRichards commented May 20, 2021

LBeaudoux commented Aug 18, 2024

jiru commented Dec 24, 2019 •

edited

Loading