ArmSpeechTT

ArmSpeechTT is an AI model designed for speech-to-text conversion specifically tailored for the Armenian language. Leveraging the power of fine-tuning, this model, named whisper-small-hy-AM, is based on openai/whisper-small and trained on the common_voice_16_1 dataset.

Model Performance

The model demonstrates robust performance, as indicated by the following metrics on the evaluation set:

Loss: 0.2853
Word Error Rate (WER): 38.1160

Training Data and Future Enhancements

The training data consists of Mozilla Common Voice version 16.1. Plans for future improvements include continuing the training process and integrating an additional 10 hours of data from datasets such as google/fleurs and possibly google/xtreme_s. Despite its current performance, efforts are underway to further reduce the WER.

Training Details

The model was trained on Google Colab with the following hyperparameters:

Learning Rate: 1e-05
Train Batch Size: 16
Evaluation Batch Size: 8
Seed: 42
Optimizer: Adam with betas=(0.9, 0.999) and epsilon=1e-08
Learning Rate Scheduler Type: Linear
Learning Rate Scheduler Warmup Steps: 500
Training Steps: 4000
Mixed Precision Training: Native AMP

Training Results

Training Loss	Epoch	Step	Validation Loss	WER
0.0989	2.48	1000	0.1948	41.5758
0.03	4.95	2000	0.2165	39.1251
0.0016	7.43	3000	0.2659	38.4089
0.0005	9.9	4000	0.2853	38.1160

Framework Versions

Transformers 4.37.2
Pytorch 2.1.0+cu121
Datasets 2.16.1
Tokenizers 0.15.1

Running the Webcam Demo

To run the webcam demo, follow these steps:

Ensure you have Python installed on your system.
Clone the Repository: Clone the ArmSpeechTT repository to your local machine using the following command and then cd into ArmSpeechTT:
```
git clone https://github.com/Moses2917/ArmSpeechTT.git
cd ArmSpeechTT
```
Install the required dependencies listed in reqs.txt using the command:
```
pip install -r reqs.txt
```
Run the webcamcaption.py script by executing the following command:
```
python webcamcaption.py
```
The script will automatically download the model and then the webcam demo will start, capturing audio input and transcribing it into text in real-time and displaying the captions on the camera.

Feel free to explore and experiment with the webcam demo to experience the capabilities of the ArmSpeechTT model firsthand!

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
LICENSE		LICENSE
README.md		README.md
reqs.txt		reqs.txt
webcamcaption.py		webcamcaption.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ArmSpeechTT

Model Performance

Training Data and Future Enhancements

Training Details

Training Results

Framework Versions

Running the Webcam Demo

About

Releases

Packages

Languages

License

Moses2917/ArmSpeechTT

Folders and files

Latest commit

History

Repository files navigation

ArmSpeechTT

Model Performance

Training Data and Future Enhancements

Training Details

Training Results

Framework Versions

Running the Webcam Demo

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages