Entitled Opinions archive

Incredibly fun and powerful to string along a series of simple tools: parsing a podcast xml, downloading and reencoding the files with ffmpeg, transcribed with whisper-cpp, simple html written by concatenating strings in python, scp'd into a nearly-free speech-host.

Browse at cristobal.nfshost.com/entitled-opinions

How to replicate

Download the Entitled Opinions podcast RSS feed and save it as opinions.xml.

curl https://entitled-opinions.com/feed/podcast > opinions.xml

1_parse_xml.py will extract the key information from the XML using the Python built-in html.minidom and save it as opinions.json.

2_download_and_transcribe.py will read the saved json and download the audio files for each episode. It will then convert the .mp3 files into 16 kHz .wav to be processed with whisper-cpp.

whisper-cpp must be installed somewhere in system. The path to the binary is hard-coded in the python file above. The fastest and lowest quality model was used, though higher quality can be done by changing the model size and being patient.

This generates a local file structure with the episode transcripts. The recording Unix timestamp is used as a unique identifier for each episode.

├── 1_parse_xml.py
├── 2_download_and_transcribe.py
├── 3_generate_html.py
│
├── entitled-opinions.xml
├── entitled-opinions.json
│
└── episodes
    ├── 1126670400
    │   ├── audio.mp3
    │   ├── audio.wav
    │   └── transcript-tiny.vtt
    ├── 1126929600
    │   ├── audio.mp3

3_generate_html.py then reads this file structure and creates an index.html file for each episode, as well as an homepage (index.html) and reference index (index2.html). For this reference index, we ignore common words present in 10000-most-common-words.txt.

├── index.html
├── index2.html
└── episodes
    ├── 1126670400
    │   ├── index.html
    │   ├── audio.mp3
    │   ├── audio.wav
    │   └── transcript-tiny.vtt
    ├── 1126929600
    │   ├── index.html

The output can be previewed locally by running a web-server, e.g. python3 -m http.server

Alternatively, it can be hosted on a web provider, e.g. Nearly Free Speech

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
10000-most-common-words.txt		10000-most-common-words.txt
1_parse_xml.py		1_parse_xml.py
2_download_and_transcribe.py		2_download_and_transcribe.py
3_generate_html.py		3_generate_html.py
README.md		README.md
publish.sh		publish.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Entitled Opinions archive

How to replicate

About

Releases

Packages

Languages

tobyshooters/entitled-opinions

Folders and files

Latest commit

History

Repository files navigation

Entitled Opinions archive

How to replicate

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages