A minimalistic application to generate transcriptions for audio built using Python
v.0.0.1
v.0.0.2 (Transcribing a Youtube Video Explaining Whisper)
v.0.0.2 (Transcribing an English Song - Thinkin About It)
v.0.0.3 (Transcribing a clip from Lex Fridman's podcast)
v.0.0.4 (Transcribing another clip from Lex Fridman's podcast)
flowchart LR
U([Cliemt])
I{Choose\n Input Mode}
U -----> I
I1[YouTube Video URL]
I2[Upload Video File]
I3[Upload Audio File]
I ---> I1 & I2 & I3
YTC{"Check if\n Audio is available?"}
YTA("Download video\n from YouTube")
YTV("Download video\n from YouTube")
I1 ---> YTC
YTC --yes---> YTA
YTC --no---> YTV
VTA["Convert Video to Audio"]
YTV ---> VTA
I2 ---> VTA
LA["Load Audio File"]
YTA & VTA & I3---> LA
M{"Choose\n Model Type"}
U -----> M
M1[(Ramanujan)]
M2[(Bose)]
M3[(Raman)]
M4[(Kalam)]
M ---> M1 & M2 & M3 & M4
LM[Load Relevant Whisper Model]
M1 & M2 & M3 & M4 --> LM
GT("Generate Transcripts")
LA & LM ---> GT
O1(["Detected \n Language"])
O2(["Complete \nSubtitle Text"])
O3(["Subtitles \nwith Timestamps"])
GT ---> O1 & O2 & O3
OF(["Original\n Audio or Video"])
D{{"Display to Client"}}
I ---> OF
O1 & O2 & OF ---> D
DO{"Choose\n Output Option"}
D1["SRT\n File"]
D2["VTT\n File"]
D3["Text\n File"]
DP["Process Subtitle Object"]
DN{{"Download Button"}}
O3 ---> DP
U ---> DO
DO ---> D1 & D2 & D3 ---> DP ---> DN
subgraph Result
D
DN
end
(Preferred Pipeline Using GitHub Actions for Docker Image)
-
Open your terminal / command prompt.
-
Clone the repository
git clone https://github.com/smaranjitghose/AIAudioTranscriber.git
-
Change the directory to the cloned project
cd AIAudioTranscriber
-
Ensure you have any version of Python below 3.10 installed in your system and you have
virtualenv
package installedwhich python
pip install virtualenv
-
Create a new virtual environment
python -m venv env
-
Activate virtual enviroment
- On Mac/Linux
source env/bin/activate
- On Windows
env/Scripts/Activate.ps1
- On Mac/Linux
-
Install ffmpeg in your local syste,
- On Windows using Chocolatey
choco install ffmpeg
- On MacOS using Homebrew
brew install ffmpeg
- On Debian/Ubuntu
sudo apt update && sudo install ffmpeg
- On Arch Linux
sudo pacman -S ffmpeg
- On Windows using Chocolatey
-
Install the dependencies
pip install -r requirements.txt
-
Download the model weights (This will take a few minutes since the total size of models in gigabytes)
python get_model_weights.py
-
Run the Web application
streamlit run .\Home.py
Note:
- If the app does not load by itself in your default browser, open a browser of your choice and navigate to
http://localhost:8501
- To stop the application, press
CTRL + C
in your terminal
- If the app does not load by itself in your default browser, open a browser of your choice and navigate to
- Make sure you have Docker installed on your system. Refer the documentation here if you need assistance setting up.
- Build a docker image
docker run -t aitranscriber:v0.0.4 .
Note:
- You may give any name instead of aitranscriber and any tag instead of v0.0.4
- Depending on your system it takes a few minutes to successfully build the image
- Once complete, check the docker image
docker images
- Create and run a Docker Container for the image
docker run -p 8501:8501 aitranscriber:v0.0.4
Note:
docker run -p <hostport>:<8501> <container_name>:<tag_name>
- In the above command, you can play around with which port of your host system you wish to map to the 8501 port of the container
- If you used a different docker image name and/or different tag, make sure to update it in the command
- Open your preferred Web Browser and navigate to
http://localhost:8501
Note:
- If you used a different host port in the above command then navigate to that one,
http://localhost:<host_port>
- To stop the container, in the terminal check the containter name:
docker ps --all
- Now use container name with the command:
docker stop <container_name>
- If you used a different host port in the above command then navigate to that one,
-
Google Cloud Run
- Install Google Cloud CLI
- Create an Account on Google Cloud
- Create a New Project
- Build and Push Docker Image to Google Container Registry
gcloud builds submit --tag gcr.io/<ProjectName>/<AppName> --project=<ProjectName>
- Deploy the Docker Container
gcloud run deploy --image gcr.io/<ProjectName>/<AppName> --platform managed --project=<ProjectName> --allow-unauthenticated
-
Amazon EC2 Instance
-
Azure App
(Using Google Colab/Kaggle as temporary MVP server)
-
-
Step 1: Install pyngrok in Google Colab
! pip install pyngrok
-
Step 2: Sign-up in ngrok and get Authentication Token
-
Step 3: Authenticate
from pyngrok import ngrok ngrok.set_auth_token("xxx")
-
Step 4: Load the Streamlit App at port 8051, create a tunnel for it and reveal the public URL for the tunnel
!nohup streamlit run app.py --server.port 8051 & url = ngrok.connect(8051).public_url print(url)
-
Step 5: Share URL with client
-
-
-
Step 1: Install localtunnel
npm install -g localtunnel
-
Step 2
streamlit run Home.py & npx localtunnel --port 8501
-
Step 3: Share URL with client
-
(Using local server as temporary MVP server)
- NGINX + Cloudfare/ngrok
-
Download and use audio from Youtube Video
-
Download and use online audio file
-
Use Session States and Caching for Better UX
-
Display the language detected propely (without using the shortcode)
-
Generate Dedicated SRT,VTT files for transcripts (in addition to txt)
-
Update Model options to honour the name of prominent Indian Scientists
-
Option to limit/increase input model file size
-
Functionality to check the validity URL provided for Youtube Video
-
Add Custom Favicon File
-
Add Scrollable Text Area for Generated Transcripts
-
Containerize the Application with Docker
-
Troubleshoot Docker Container locally
-
Create Basic Workflow on GitHub Actions for Docker Image Build
-
Create Comprehensive Workflow on GitHub Actions for Docker Image Build
-
Resolve bug: Youtube video with multiple audios should download default audio.
- Example: This clip from Huberman Lab is in English yet the script fetches the spanish audio codec from Youtube
-
Test Application by spinning up it's Container on Google Cloud Run
- Push to a particular Docker Image Registry
- Set TTL
- Play around with system resources
- Test with custom domain
-
Add Google Cloud's CI/CD to repo on push/pull requests
- Use cloudbuild.yaml file
- Update build time to 2 hours
-
Optimize Docker Image Size
-
Better CI/CD
-
Kubernetes Upgrade
-
Better GitHub Actions
More Features:
-
Burn transcripts to user-uploaded video ```python import os output_video = "final.mp4"
os.system(f"ffmpeg -i {input_video} -vf subtitles={subtitle} {output_video}") ```
-
Summarize subtitles
-
Sentiment analysis on video summary
-
Batch transcript generation + summary + sentiment analysis
-
Dashboard for video review(s)
Speaker Diarization: Only if Community requires
- Incorporate Speaker Diarization for Podcast/Vlog/Conversational Clips
- Test it with burning transcripts to user uploaded video
- Test it with transcript summarization
More Aligned Subtitles: Only if Community requires
-
Word Level Timestamps for transcripts + Generate ASS Transcript File
-
Test it with burning transcripts to user uploaded video
-
Test it with previous speaker diarization
-
Test it with transcript summarization
-
Improve UI Natively in Streamlit
API Development: Only if Community requires
- Build API for model inference in FastAPI to handle requests asynchronously (on a different branch perhaps)
- Containerize the API with Docker
- Troubleshoot Docker Container for API
- Host the API on Google/AWS/Linode/Heroku
- Perform basic CI/CD for API
- Rehost Streamlit Application on a different service (Reduce it to client side for most operations)
- Play around with pyScript
Front End Development: Only if Community requires
- Build Basic React Front end
- Connect React Front End to FastAPI
- Add Loader Animation
- Add Animations for model inference times
- Handling Errors in Front End/API
- Upload File Component
- Download Button(s)
- Feedback Form
- Contact Page
- About Page
- Home Page
- Stripe Integration
- Improve Navbar UI
- 404 Page
- Footer UI
- Scrollbar UI
- SEO
CI/CD Pipeline (GitHub Actions)
- SAST (Optional)
- Kubernetes Smoke Test (Optional)
- Using Super Linter for Linting (Optional)
- Unit Tests (Optional)
- Integration Test (Optional)
-
To view the generated transcript file(s) in VS Code IDE install Subtitles Editor extension
-
To extensively edit/manipulate the generated transcript file(s) use the open source tool Subtitle Edit
-
For Streamlit Sharing, mentioning versions of the modules in requirements throws error at times
-
Large Modelv2 outperforms all other versions of Whisper in terms of performance especially in Multi-lingual Transcription. However, it takes a 10 times more V-RAM than the base model and has longer inference time
-
To quickly record audio from system microphone use this Python Script:
-
Pre-requisities:
pip install pyaudio wave
-
-
Whisper is unable to read audio file from disk if
python-ffmpeg
orffmpeg
python pacakges are installed. It only works whenffmpeg-python
python package is installed and not the former too# Remove all ffmpeg related python packages pip uninstall python-ffmpeg ffmpeg ffmpeg-python # Install the appropriate pacakge for ffmpeg pip install ffmpeg-python
-
Pixabay has a great collection of copyright free, no royalty songs that one can use for testing the application
-
Poor Performance for Kanada or Telegu songs (often language recognition itself fails) for base model. Example: Kantara movie's Varaha Roopam Song
-
Exclude as much irrelevant files as possible with
.dockerignore
such as README.MD, LICENSE, snapshots, notebooks, input,output,logs, etc -
Minimize the number of layers (Created by RUN, COPY and ADD)
-
Always combine
RUN apt-get update
withapt-get install
in the same RUN statement. Usingapt-get update
alone in a RUN statement causes caching issues and subsequentapt-get install
instructions fail. -
Using
RUN apt-get update && apt-get install -y
ensures your Dockerfile installs the latest package versions with no further coding or manual intervention. This technique is known as “cache busting”. -
In addition, when you clean up the apt cache by removing
/var/lib/apt/lists
it reduces the image size, since the apt cache is not stored in a layer. -
Python Docker Image Info:
- Images tagged with
stretch
/buster
/jessie
/buster
/bullseye
are codenames for different Debian Operating System Production releases. bullseye
being version 11, buster being version 10, and so on. (2022)bookworm
,trixy
andforky
are work-in-progress releases which may not be stable yet-slim
- only installs the minimal packages needed to run the particular tool.
- Images tagged with
-
Base Image with python <= 3.9 raises issue with module
backports.zoneinfo
and pip fails -
To build and test multi-architecture docker images locally,
- Create a new buildx instance
docker buildx create --use
- Build a new docker image for multi-architecture support
docker buildx build --platform linux/arm64,linux/amd64 -t aitranscriber:multi-architecture -f Dockerfile .
- Create a new buildx instance
-
Checking Docker Image Build for multi-architecture is too time consuming for the current application and disabled
This project is licensed under the GNU Affero General Public License v3.0 License - see the LICENSE
file for details.
- General Purpose Speech Recognition Model: OpenAI Whisper
- Animations: LottieFiles
- Favicon: PNG Repo
- Sample Test Clip 1: " Thinkin About It " by Niklas Setzkorn from Pixabay