🎭 Radio Drama Generator

This Radio Drama Generator is a proof of concept for using open-source models & tools to convert input story context into a radio drama featuring multiple speakers. It is designed to work on most local setups, meaning no external API calls or GPU access is required. This makes it more accessible and privacy-friendly by keeping everything local.

Built with

Python 3.10+ (use Python 3.12 for Apple M1/2/3 chips)
Llama-cpp (text-to-text, i.e script generation)
OuteAI / Parler_tts (text-to-speech, i.e audio generation)
Streamlit (UI demo)

Quick-start

Get started with Radio Drama Generator:

Local Installation**

Clone the Repository Inside the Codespaces terminal, run:

git clone https://github.com/stefanfrench/radio-drama-generator.git
cd radio-drama-generator

Install Dependencies Inside the terminal, run:
```
pip install -e .
```
Run the Demo Inside the terminal, start the Streamlit demo by running:
```
python -m streamlit run demo/app.py
```

NOTE: The first time you run the demo app it might take a while to generate the script or the audio because it will download the models to the machine which are a few GBs in size.

How it Works

Document Upload Start by uploading a document in a supported format (e.g., PDF, .txt, or .docx).
Document Pre-Processing The uploaded document is processed to extract and clean the text. This involves:
- Extracting readable text from the document.
- Removing noise such as URLs, email addresses, and special characters to ensure the text is clean and structured.
Script Generation The cleaned text is passed to a language model to generate a radio drama in the form of a conversation between multiple speakers
- Model Loading: The system selects and loads a pre-trained LLM optimized for running locally, using the llama_cpp library. This enables the model to run efficiently on CPUs, making them more accessible and suitable for local setups.
- Customizable Prompt: A user-defined "system prompt" guides the LLM in shaping the conversation, specifying tone, content, speaker interaction, and format.
- Output Transcript: The model generates a radio drama script in structured format, with each speaker's dialogue clearly labeled. Example output:
```
{"Speaker 1": "Bah, humbug! Why would I care for Christmas?",
 "Speaker 2": "If I may, sir, Christmas is about kindness, something we could all use more of.",
 "Speaker 3": "Uncle Scrooge, Christmas is a time for joy and goodwill!",
    ...
}
```
This step ensures that the radio drama script is engaging, relevant, and ready for audio conversion.
Audio Generation

The generated transcript is converted into audio using a Text-to-Speech (TTS) model.
Each speaker is assigned a distinct voice.
- The final output is saved as an audio file in formats like MP3 or WAV.

Models

The architecture of this codebase focuses on modularity and adaptability, meaning it shouldn't be too difficult to swap frameworks to use your own suite of models. We have selected fully open source models that are very memory efficient and can run on a laptop CPU with less than 10GB RAM requirements.

text-to-text

We are using the llama.cpp library, which supports open source models optimized for local inference and minimal hardware requirements. The default text-to-text model in this repo is the open source OLMoE-7B-Instruct from AllenAI.

For the complete list of models supported out-of-the-box, visit this link.

text-to-speech

We support models from the OuteAI and Parler_tts packages. The default text-to-speech model in this repo is OuteTTS-0.1-350M-GGUF. Note that the 0.1-350M version has a CC-By-4.0 (permissive) license, whereas the newer / better 0.2-500M version has a CC-By-NC-4.0 (non-commercial) license. For a complete list of models visit Oute HF (only the GGUF versions) and Parler HF.

Important note: In order to keep the package dependencies as lightweight as possible, only the Oute interface is installed by default. If you want to use the parler models, please also run:

pip install -e '.[parler]'

Pre-requisites

System requirements:
- OS: Windows, macOS, or Linux
- Python 3.10>, <3.12
- Minimum RAM: 10 GB
- Disk space: 32 GB minimum
Dependencies:
- Dependencies listed in pyproject.toml

Troubleshooting

During the installation of the package, it fails with ERROR: Failed building wheel for llama-cpp-python

You are probably missing the GNU Make package. A quick way to solve it is run on your terminal sudo apt install build-essential

License

This project is licensed under the Apache 2.0 License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.devcontainer		.devcontainer
.github		.github
demo		demo
docs		docs
example_data		example_data
images		images
src/document_to_podcast		src/document_to_podcast
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎭 Radio Drama Generator

Built with

Quick-start

Local Installation**

How it Works

Models

text-to-text

text-to-speech

Pre-requisites

Troubleshooting

License

About

Releases

Packages

Languages

License

stefanfrench/radio-drama-generator

Folders and files

Latest commit

History

Repository files navigation

🎭 Radio Drama Generator

Built with

Quick-start

Local Installation**

How it Works

Models

text-to-text

text-to-speech

Pre-requisites

Troubleshooting

License

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages