This is an open-source proof of concept project in which the program gets an audio and returns video with audio-related images using AI for:
- Understanding speech (Speech to Text - PT-BR)
- Understanding sentences and entities (Google Natural Language)
- Search for images (Google Custom Search) Designed for Podcasts images purposes.
Uses only Google Cloud Platform tools.
This is POC was motivated by my addiction to podcasts and videos. I know that some podcasts have a curated video slideshow, but not all of them can give this attention to content delivery, since podcast production and editing is quite time-demending already. Also, a friend told me she could not consume podcasts due to lack of visual stimulation.
brew install imagemagick
brew install graphicsmagick
brew install ffmpeg
brew install yarn
git clone https://github.com/maricatovictor/podream.git
Run yarn
on project root dir
In order to run it, you'll be required to insert your Google Cloud Computing credentials (api_key, search_engine_id, etc.) for using GCP tools.
The audio must be in .wav format, and you should place it in the content/audios folder. When the program asks, write the name of the audio, e.g.: "nerdcast.wav"
To run, open a terminal in the root project dir and execute:
node index.js
- Import the audio
- Upload it to GCP Storage
- Request a transcription
- Analyse transcript and extract sentences
- From sentences, extract entities
- For each sentence, define the timestamp in which the sentence is present
- For each sentence, define a search-term (or sentence)
- Search for images with these search-term and save 3 image links for each sentence.
- Download the images - Each image is saved in the following format: {sentenceStartTimestamp}-{sentenceEndTimestamp}-(n)
- Convert the images (GM)
- Generate a video with the original audio + images in given timestamps (extracted from image name) -- Uses videoshow that uses FFMpeg
Author: Victor Maricato. 2019