A web app to help with the pronunciation of Turkish words and phrases
Website: irun.fyi
- Install dependencies :
yarn install
- Lint source code :
yarn lint
- Preprocess data :
yarn preprocess
- Start development server :
yarn dev
- Build and generate the app page to the
/out
directory :yarn export
- Serve the generated page in the
/out
directory :yarn serve
- The words which do not exist in the standard English dictionary are filtered from CMUdict.
generate-filtered-dict.js
- From the filtered CMUdict entries, a reverse mapping (from one pronunciation to possibly multiple words) is generated.
generate-reverse-multimap.js
- The raw English word frequency data file is parsed.
generate-frequency-map.js
- The words with the same pronunciation but lower usage frequency are eliminated from the reverse mapping.
generate-reverse-map.js
- All possible syllable combinations are generated from the input Turkish word.
hyphenate-all.js
- The letters in the syllables are written using the alternatives in CMUdict phonetic alphabet.
phonetic-map.json
- The result is searched in the reverse mapping file.
reverse-map.json
- If no match is found for a syllable, simple translations are applied to each letter.
letter-pronunciation-map.json
- The results are sorted prioritizing:
- the ones with the most English word matches
- the one which fits the Turkish natural hyphenation
- The first 10 of the best results are returned.
- (1).
['bah', 'ad', 'ır'], ['ba', 'had', 'ır'], ['bah', 'a', 'dır'], ['ba', 'ha', 'dır']
- (2).
[[['B', 'AA', 'HH'], ['AA', 'D'], ['AH0', 'R']], ... (all combinations) ... ]
- (3, 4).
['baah-odd-er', 'bah-hud-er', 'baah-uh-derr', 'bah-huh-derr']
- (5).
['bah-hud-er', 'bah-huh-derr', 'baah-odd-er', 'baah-uh-derr']
- Consists of a single Next.js statically-generated page with no back-end.
- The reverse mapping file is loaded to the client app, so the algorithm runs on the browser.
- Pronunciation dictionary data source: Carnegie Mellon Pronouncing Dictionary
- Word frequency data source: English Word Frequency dataset on Kaggle
- Text-to-speech API: Voice RSS
- Icons: Freepik on Flaticons
- NPM packages: Next.js, React, Blueprint