Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auto-transcribe sentences in Lingua Franca Nova's Latin orthography to Cyrillic, and vice versa #1958

Open
conlangbecca opened this issue Sep 24, 2019 · 4 comments
Labels
enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba.

Comments

@conlangbecca
Copy link

Lingua Franca Nova has two orthographies, one Latin and one Cyrillic, with a one-to-one correspondence between letters in each orthography. A table can be found here: http://www.elefen.org/vici/gramatica/en/spelling_and_pronunciation

The practice up to now has been to contribute in both orthographies wherever possible, but I feel this is inefficient and ignores that there are already systems in place for auto-transcribing between Chinese Simplified and Chinese Traditional, among other orthographies. Could such a system be added for LFN? Thank you.

@jiru
Copy link
Member

jiru commented Sep 26, 2019

Hello conlangbecca, thank you for your request. The information you gave me looks like such autotranscription system is quite feasible in Tatoeba. To get this done, we will need your help as described in the wiki. Please follow the instruction there and get back to us with the relevant data about Latin/Cyrillic transcription in LFN.

@trang trang added the enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba. label Oct 11, 2019
@jiru
Copy link
Member

jiru commented Apr 26, 2020

Following the link you provided, I was able to write a simple conversion algorithm and check it against a few LFN sentence pairs on Tatoeba that were contributed in Latin and Cyrillic. Among these, I found a few sentences where my algorithm gets a different result than what was contributed on Tatoeba:

Sentence Transcription on Tatoeba Transcription by algorithm
5454279 Chelsea ес ентре ла дистритос ла плу модоса де Manhattan, е суа барес е ресторантес ес комун фолида а финис де семана. Кхелсеа ес ентре ла дистритос ла плу модоса де Манхаттан, е суа барес е ресторантес ес комун фолида а финис де семана.
5459786 «Ме ес тристе,» ел иа дисе. «Ло ес ун фарса гранде, ун менти гранде, ун пиротекникал гранде. Ло куал авени но депреса ме, ма симпле ло мотива ме а апаре е парла плу.» "Ме ес тристе," ел иа дисе. "Ло ес ун фарса гранде, ун менти гранде, ун пиротекникал гранде. Ло куал авени но депреса ме, ма симпле ло мотива ме а апаре е парла плу."
5441679 Ла аутор де ла либро "Ла еволуи – имажес де носа жовениа", Емиле Де Кооман, меа фрате, ес гравор е пинтор. Ла аутор де ла либро "Ла еволуи – имажес де носа жовениа", Емиле де Кооман, меа фрате, ес гравор е пинтор.
8657973 Лаила ес ун традуор. Лаyла ес ун традуор.
5623833 Christoph Schlütermann, ун лаборор пер ла Крус Рожа, иа дескриве ел комо "тан диференте де ла отрас – мулте нонкапас де ата син аида". Кхристопх Скхлüтерманн, ун лаборор пер ла Крус Рожа, иа дескриве ел комо "тан диференте де ла отрас – мулте нонкапас де ата син аида".
5500486 Есперанто ес ун лингуа куал он дебе апренде, уса, рекорда, парла, дифуса, асета, скрибе, деже, трансмете. Есперанто ес ун лингуа куал он дебе апренде, уса, рекорда, парла, дифуса, асета, скрибе, леже, трансмете.
5543582 "Нос иа аве но темпо," Кеllnеr иа есплика, "донке ме иа коре пос ел пер киса 15 метрес о симил. Ун де меа амис ес анке ун полисиор, донке нос иа саиси ла ом. Ел иа атента еваде, донке нoс иа тени плу форте ел." "Нос иа аве но темпо," Kеллнер иа есплика, "донке ме иа коре пос ел пер киса 15 метрес о симил. Ун де меа амис ес анке ун полисиор, донке нос иа саиси ла ом. Ел иа атента еваде, донке нос иа тени плу форте ел."

jiru added a commit that referenced this issue Apr 28, 2020
Based on http://www.elefen.org/vici/gramatica/en/spelling_and_pronunciation

Included tests based on existing LFN sentence pairs on Tatoeba picked
at random. The failing tests show edge cases that require to refine
in the algorithm or to rethink of the conversion rules.

Refs #1958.
@conlangbecca
Copy link
Author

Cyrillic transcriptions might not always be one-for-one with proper names, because sometimes people opt for a phonetic transcription into actual LFN phonology rather than a strict letter-for-letter transcription. The automatic transcription, though, should do the letter-for-letter transcription, which is completely acceptable.

@jiru
Copy link
Member

jiru commented Oct 6, 2020

Thanks for clarifying, @conlangbecca.

On a side note, I am mentioning #770 (and #76) because we have a number of LFN sentences that will become duplicates when this issue is solved as they will only differ in terms of script.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Issue that describes a problem that requires a change in the current functionalities of Tatoeba.
Projects
None yet
Development

No branches or pull requests

3 participants