uttertype (demo)
Installing portaudio on macOS can be somewhat tricky, especially on M1+ chips. In general, using conda seems to be the safest way to install portaudio
conda install portaudio
python -m pip install pyaudio
sudo apt-get install python3-pyaudio
For macOS, the hotkey is automatically set to the globe key by default (🌐 bottom left key). For Windows and Linux, you can configure the hotkey by setting the UTTERTYPE_RECORD_HOTKEYS
environment variable in .env
:
UTTERTYPE_RECORD_HOTKEYS="<ctrl>+<alt>+v"
For more context, view the pynput documentation for using HotKeys (HoldHotKey is extended from this class).
Choose one of the following methods to install the required dependencies:
python -m pip install -r requirements.txt
First, install pipenv if you haven't already:
pip install pipenv
Then, install dependencies using pipenv:
pipenv install
This will create a virtual environment and install all dependencies from the Pipfile. To activate the environment:
pipenv shell
If during/after installation on Linux you see error similar to:
ImportError: /home/soul/anaconda3/lib/libstdc++.so.6: version `GLIBCXX_3.4.32' not found (required by /lib/x86_64-linux-gnu/libjack.so.0)
Check out StackOverflow and Berkley
You can configure uttertype to work with either OpenAI's official API or a local Whisper server. There are two ways to set this up:
Create a .env
file in the project directory with these settings:
# 1. Required: Your API key
OPENAI_API_KEY="sk-your-key-here"
# 2. Optional: Choose your API endpoint
# For OpenAI's official API (default):
OPENAI_BASE_URL="https://api.openai.com/v1"
# OR for a local [Faster Whisper server](https://github.com/fedirz/faster-whisper-server):
OPENAI_BASE_URL="http://localhost:7000/v1"
# 3. Optional: Select your preferred model
# For OpenAI's official API:
OPENAI_MODEL_NAME="whisper-1"
# OR for local Whisper server, some options include:
OPENAI_MODEL_NAME="Systran/faster-whisper-small"
OPENAI_MODEL_NAME="Systran/faster-distil-whisper-large-v3"
OPENAI_MODEL_NAME="deepdml/faster-whisper-large-v3-turbo-ct2"
You can also set these values directly in your terminal:
For Linux/macOS:
export OPENAI_API_KEY="sk-your-key-here"
export OPENAI_BASE_URL="https://api.openai.com/v1" # optional
export OPENAI_MODEL_NAME="whisper-1" # optional
For Windows:
$env:OPENAI_API_KEY = "sk-your-key-here"
$env:OPENAI_BASE_URL = "https://api.openai.com/v1" # optional
$env:OPENAI_MODEL_NAME = "whisper-1" # optional
See .sample_env
in the repository for example configurations.
For faster and cheaper transcription, you can set up a local faster-whisper-server. When using a local server:
- Set
OPENAI_BASE_URL
to your server's address (e.g.,http://localhost:7000/v1
) - Choose from supported local models like:
Systran/faster-whisper-small
(fastest)Systran/faster-distil-whisper-large-v3
(most accurate)deepdml/faster-whisper-large-v3-turbo-ct2
(almost as good, but faster)
Finally, run main.py
python main.py
OR
./start_uttertype.sh # installed and configured pipenv environment would be needed
When the program first runs, you will likely need to give it sufficient permissions. On macOS, this will include adding terminal to accessibility under Privacy and Security > Accessibility
, giving it permission to monitor the keyboard, and finally giving it permission to record using the microphone.
To start transcription, press and hold the registered hotkey to start recording. To stop the recording, lift your registered hotkey. On macOS, the registered hotkey is the globe icon by default. For other operating systems, this will have to by manually configured in main.py
as described earlier.