Whisper-Finetune

This repository contains code for fine-tuning the Whisper speech-to-text model. It utilizes Weights & Biases (wandb) for logging metrics and storing models. Key features include:

Timestamp training
Prompt training
Stochastic depth implementation for improved model generalization
Correct implementation of SpecAugment for robust audio data augmentation
Checkpointing functionality to save and resume training progress, crucial for handling long-running experiments and potential interruptions
Integration with Weights & Biases (wandb) for experiment tracking and model versioning

Installation

Clone the repository:

git clone https://github.com/i4ds/whisper-finetune.git
cd whisper-finetune

Create and activate a virtual environment (strongly recommended) with Python 3.9.* and a Rust compiler available.
Install the package in editable mode:
```
pip install -e .
```

Data

Please have a look at https://github.com/i4Ds/whisper-prep. The data is passed as a 🤗 Datasets to the script.

Usage

Create a configuration file (see examples in configs/*.yaml)

Run the fine-tuning script:

python src/whisper_finetune/scripts/finetune.py --config configs/large-cv-srg-sg-corpus.yaml

Deployment

We suggest to use faster-whisper. To convert your fine-tuned model, you can use the script located at src/whisper_finetune/scripts/convert_c2t.py.

Further improvement of quality can be archieved by serving the requests with whisperx.

Configuration

Modify the YAML files in the configs/ directory to customize your fine-tuning process. Refer to the existing configuration files for examples of available options.

Thank you

The starting point of this repository was the excellent repository by Jumon at https://github.com/jumon/whisper-finetuning

Contributing

We welcome contributions! Please feel free to submit a Pull Request.

Support

If you encounter any problems, please file an issue along with a detailed description.

Maintainer

Vincenzo Timmel (vincenzo.timmel@fhnw.ch)

Developers

Vincenzo Timmel (vincenzo.timmel@fhnw.ch)
Claudio Paonessa (claudio.paonessa@fhnw.ch)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Whisper-Finetune

Installation

Data

Usage

Deployment

Configuration

Thank you

Contributing

Support

Maintainer

Developers

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Whisper-Finetune

Installation

Data

Usage

Deployment

Configuration

Thank you

Contributing

Support

Maintainer

Developers

License