In recent times, code-switching between Egyptian Arabic and English has become increasingly prevalent. This repository presents our work on developing advanced machine translation (MT) and automatic speech recognition (ASR) systems specifically designed to handle this linguistic phenomenon.
Check out our demo to see ARZEN-LLM in action!
demo.mp4
Our primary objective is to translate code-switched Egyptian Arabic-English to either English or Egyptian Arabic. We employ state-of-the-art methodologies utilizing large language models such as LLama and Gemma.
In the realm of ASR, we leverage the Whisper model for code-switched Egyptian Arabic recognition. Our experimental procedures encompass:
- Data preprocessing techniques
- Advanced training methodologies
We've implemented a consecutive speech-to-text translation system that seamlessly integrates ASR with MT, addressing challenges posed by limited resources and the unique characteristics of the Egyptian Arabic dialect.
Our evaluation against established metrics demonstrates promising results:
- English Translation: Significant improvement of X% over the state-of-the-art
- Arabic Translation: Y% improvement in performance
Code-switching is deeply inherent in spoken languages, making it crucial for ASR systems to effectively handle this phenomenon. This capability enables seamless interaction across various domains, including:
- Business negotiations
- Cultural exchanges
- Academic discourse
We're committed to advancing research in this field. Our models and code are available as open-source resources:
- 🤗 Models: Hugging Face Collection
- 🗣️ Speech Dataset: ARZEN-LLM Speech Dataset
- 🔤 Translation Dataset: ARZEN-LLM Translation Dataset
- 📄 Research Paper: arXiv:2406.18120
Feel free to explore, contribute, and build upon our work!
@article{heakl2024arzen,
title={ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs},
author={Heakl, Ahmed and Zaghloul, Youssef and Ali, Mennatullah and Hossam, Rania and Gomaa, Walid},
journal={arXiv preprint arXiv:2406.18120},
year={2024}
}