All notable changes to this project will be documented in this file.
- Merged feature from @Eidenz to add translation in addition to transcription
Since there was some apetite for this, I've rewritten this to make it a tad cleaner with a few additional features based on issues raised and personal preferences.
- Ability to download entire YouTube playlists and upload multiple files at once
- Ability browse, filter, and search through saved audio files (For now, this is done with a simple SQLite database & SQLAlchemy ORM)
- Auto-export of transcriptions in multiple formats (was a feature request)
- Simple substring based search for transcript segments. This is done with a simple
LIKE
query on the SQLite database. - Fully reworked UI with a cleaner layout and more intuitive navigation.
- Ability to save whisper configurations and reuse to prevent having to re-enter the same parameters every time.
- Removed the ability to crop audio after download to simplify the codebase. Also, temporarily removed summarization until GPT-3 integration is complete.
Initial release for demand testing (PR #1).
Features:
- Ability to process media from youtube & local files
- Whisper transcription
- Basic huggingface integration for summarization
[Planned]
- Live Transcription with Whisper - Will streamlit-webrtc library. This enables live transcription of audio from a microphone and can be used to take voice notes.
- CLIP embeddings transcribed text segments + Faiss index for semantic search
- GPT-3 integration - One approach is to simply allow for an instruct prompt to be entered for a transcript and save results. Will await feedback before implementing.
- ...