- Add corpus in a folder named "tashkeela-raw" in the root folder
- Add corpus in a folder named "tashkeela-raw" in the root folder
- Add corpus in a folder named "kasct-raw" in the root folder
- Open both files and in VS Code
- Reopen with encoding "Arabic Windows(1256)"
- Save with encoding "utf-8"
- python .\process_corpus.py --corpus nawar
- python .\process_corpus.py --corpus kasct
- python .\process_corpus.py --corpus tashkeela
- python .\concat_corpora.py
- Use ara pronounciaiton tool to produce the dict file using
python corpus2cmudict.py -i output_with_last_haraka.txt -p haraka