Let's talk about these three levels:
- Data Structure: What kind of data?
- Date Pipeline: How is the data processed?
- Frontend / Backend: How is the data served?
For a higher level overview check out Follow The Data section.
The prominent data models for this application are series, episodes and segments.
%%{
init: {
"theme": "base",
"themeVariables": {
"primaryColor": "#9333eb",
"primaryTextColor": "#fff",
"primaryBorderColor": "#fff",
"secondaryColor": "#66e3e3",
"lineColor": "#ae7edc"
}
}
}%%
stateDiagram-v2
%% This is a hobby project. So why not digress a little here.
%% A classDiagram or erDiagram might be more relevant here
%% however I prefer the visual style of the stateDiagram.
%% The classDiagram leaves empty rows if attributes and
%% methods are left blank. erDiagram has weird arrows.
%% Sometimes I love to comment with subsequent lines
%% slightly shorter than the previous. Some people
%% think that the visual is important for the DX.
%% Nonetheless, architecture is more important.
%% OK. Now back to the current topic at hand.
%% Ok, maybe not yet. Did anyone read this?
%% If so, let me know on Twitter that you
%% found this snippet in code comments.
direction RL
Segments --> Episodes
Episodes --> Series
A series can contain multiple episodes.
Currently only podcast series are included however in the future this might include more media types.
Example of series: Lex Fridman podcast, Huberman Lab podcast, How I Built This with Guy Raz podcast.
An episode is a single published media from a particular series. Their length is usually from a couple minutes to a couple hours.
For example, an episode could be the "Comedy, Robots, Suffering, Love & Burning Man" episode with guest Duncan Trussell on the "Lex Fridman" podcast series.
A segment is a generated piece of an episode. These pieces are generally from a few seconds to a few minutes. Segments are either generated using rules or by a language model.
When indexing, segments are created for each episode by chunks of words.
Subsequently during the reader stage of the pipeline new segments are created based on answers generated by the language model.
Note: The following Data Pipeline information is the same as the main README's How It Works.
Teation uses a transformer retriever/reader pipeline in order to provide semantic search through an underlying question answering capability. When someone asks a question on the Teation website, the following events take place:
- Retriever: Podcast transcriptions are searched using BM25. Top results are forwarded to the reader step. (in the future this will be switched for an embedded vector-based search).
- Reader: Top results are then submitted to a language model for inference with the original question. The language model returns answers to the question. Each answer has a score, and the transcript source. The current language model is based on RoBERTa fine-tuned with SQuAD2.0 for question answering.
Top results from the reader undergoes normalization:
- Alignment: Results are processed with a few adjustments required by this early proof of concept. Some of these adjustments include timing tuning, and adding/removing segments.
- Response: The resulting answers are then sorted by score, combined by video source, and the segments are returned for frontend consumption.
The frontend is developed with SveleteKit and Tailwind CSS.
Caddy is used to proxy requests to SvelteKit, and Cloudflare as CDN.
To better understand the system, let's follow the flow of information:
- Transcriptions are captured from YouTube using this function.
- Each episode is then converted into segments using this function.
- Finally segments are converted into an indexable format using this function.
- Haystack is used to manage all the "smart" AI stuff.
- Running it requires a simple configuration file.
- Segments are indexed in Elasticsearch using this function.
- When a question comes in from the frontend, a couple things happen:
- Elasticsearch is queried (using a retriever).
- Top results are passed into a neural network that returns actual answers to the question (using a reader).
- For example, asking "What is the meaning of life?" for the episode with Andrew Ng on the Lex Fridman podcast would return the answer "helping others achieve whatever are their dreams".
- The answers are then normalized, enriched with video details, and returned to the frontend.
- SveleteKit is used to create the client/frontend of the teation.com website using Svelte. It also serves the pages, and acts as liaison between the website and the AI part.
Questions? Feel free to message Patrick Nomad on Twitter.