Skip to content

Commit

Permalink
Doc build output
Browse files Browse the repository at this point in the history
  • Loading branch information
chrisbrickhouse committed Apr 9, 2024
1 parent 846da71 commit b2f0cf0
Show file tree
Hide file tree
Showing 4 changed files with 33 additions and 13 deletions.
4 changes: 2 additions & 2 deletions doc_src/_site/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -261,7 +261,7 @@ <h2 class="anchored" data-anchor-id="not-another-transcription-service">Not anot
<section id="example" class="level3">
<h3 class="anchored" data-anchor-id="example">Example</h3>
<p>As an example, we’ll transcribe an audio interview of Snoop Dogg by the 85 South Media podcast and output it as a TextGrid.</p>
<div id="1de49e04" class="cell" data-execution_count="1">
<div id="f21cb3ee" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> fave_asr</span>
<span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a>data <span class="op">=</span> fave_asr.transcribe_and_diarize(</span>
Expand All @@ -273,7 +273,7 @@ <h3 class="anchored" data-anchor-id="example">Example</h3>
<span id="cb2-9"><a href="#cb2-9" aria-hidden="true" tabindex="-1"></a>tg <span class="op">=</span> fave_asr.to_TextGrid(data)</span>
<span id="cb2-10"><a href="#cb2-10" aria-hidden="true" tabindex="-1"></a>tg.write(<span class="st">'SnoopDogg_85SouthMedia.TextGrid'</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<div id="bdcfaca3" class="cell" data-execution_count="2">
<div id="ffc61573" class="cell" data-execution_count="2">
<div class="cell-output cell-output-stdout">
<pre><code>File type = "ooTextFile"
Object class = "TextGrid"
Expand Down
1 change: 1 addition & 0 deletions doc_src/_site/robots.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Sitemap: https://Forced-Alignment-and-Vowel-Extraction.github.io/fave-asr/sitemap.xml
19 changes: 19 additions & 0 deletions doc_src/_site/sitemap.xml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://Forced-Alignment-and-Vowel-Extraction.github.io/fave-asr/reference/fave_asr.html</loc>
<lastmod>2024-04-09T01:20:33.002Z</lastmod>
</url>
<url>
<loc>https://Forced-Alignment-and-Vowel-Extraction.github.io/fave-asr/index.html</loc>
<lastmod>2024-04-09T01:20:33.002Z</lastmod>
</url>
<url>
<loc>https://Forced-Alignment-and-Vowel-Extraction.github.io/fave-asr/usage/index.html</loc>
<lastmod>2024-04-09T01:20:33.002Z</lastmod>
</url>
<url>
<loc>https://Forced-Alignment-and-Vowel-Extraction.github.io/fave-asr/reference/index.html</loc>
<lastmod>2024-04-09T01:20:33.002Z</lastmod>
</url>
</urlset>
22 changes: 11 additions & 11 deletions doc_src/_site/usage/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -241,7 +241,7 @@ <h2 class="anchored" data-anchor-id="pipeline-walkthrough">Pipeline walkthrough<
<p>The <code>fave-asr</code> pipeline automates a few different steps that can be broken down depending on your needs. For example, if you just need a transcript but don’t care about <em>who</em> said the words, you can just do the transcribe step and none of the others.</p>
<section id="raw-transcription" class="level3">
<h3 class="anchored" data-anchor-id="raw-transcription">Raw transcription</h3>
<div id="f1cd8520" class="cell" data-execution_count="1">
<div id="dd7e67a3" class="cell" data-execution_count="1">
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> fave_asr</span>
<span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a></span>
<span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a>transcription <span class="op">=</span> fave_asr.transcribe(</span>
Expand All @@ -251,14 +251,14 @@ <h3 class="anchored" data-anchor-id="raw-transcription">Raw transcription</h3>
<span id="cb1-7"><a href="#cb1-7" aria-hidden="true" tabindex="-1"></a> )</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<p>The output in <code>transcription</code> is a dictionary with the keys <code>segments</code> and <code>language_code</code>. <code>segments</code> is a List of Dicts, with each Dict having data on the speech in that segment.</p>
<div id="6804ab0b" class="cell" data-execution_count="2">
<div id="0ac64c56" class="cell" data-execution_count="2">
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a>transcription[<span class="st">'segments'</span>][<span class="dv">0</span>].keys()</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-display" data-execution_count="2">
<pre><code>dict_keys(['id', 'seek', 'start', 'end', 'text', 'tokens', 'temperature', 'avg_logprob', 'compression_ratio', 'no_speech_prob', 'confidence', 'words'])</code></pre>
</div>
</div>
<p>If you wanted a text transcript of the entire file, you can iterate through <code>segments</code> and get the <code>text</code> field for each one.</p>
<div id="5f22f21a" class="cell" data-execution_count="3">
<div id="c178f2c5" class="cell" data-execution_count="3">
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>text_list <span class="op">=</span> []</span>
<span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a><span class="cf">for</span> segment <span class="kw">in</span> transcription[<span class="st">'segments'</span>]:</span>
<span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a> text_list.append(segment[<span class="st">'text'</span>])</span>
Expand All @@ -284,7 +284,7 @@ <h3 class="anchored" data-anchor-id="raw-transcription">Raw transcription</h3>
<h3 class="anchored" data-anchor-id="diarization">Diarization</h3>
<p>Some audio files have more than one speaker, and a raw transcript may not be useful if we don’t know who said what. The process of assigning speech to a speaker in an audio file is <em>diarization</em>. <code>fave-asr</code> uses machine learning models which are <em>gated</em>, meaning that the creators might require you to agree to particular terms before using it. You can learn more and agree to the terms at the <a href="https://huggingface.co/pyannote/speaker-diarization-3.1">page for the diarization model</a>.</p>
<p>Diarization works best if you have <a href="https://huggingface.co/settings/tokens">a HuggingFace Access Token</a> which makes it easy to download the most up-to-date models. You can also leave it blank, like in the example here, but the results are not guaranteed.</p>
<div id="d0f5974d" class="cell" data-execution_count="4">
<div id="50e86024" class="cell" data-execution_count="4">
<div class="sourceCode cell-code" id="cb6"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb6-1"><a href="#cb6-1" aria-hidden="true" tabindex="-1"></a>diarization <span class="op">=</span> fave_asr.diarize(</span>
<span id="cb6-2"><a href="#cb6-2" aria-hidden="true" tabindex="-1"></a> audio_file <span class="op">=</span> <span class="st">'resources/SnoopDogg_85SouthMedia.wav'</span>,</span>
<span id="cb6-3"><a href="#cb6-3" aria-hidden="true" tabindex="-1"></a> hf_token<span class="op">=</span><span class="st">''</span></span>
Expand Down Expand Up @@ -325,18 +325,18 @@ <h3 class="anchored" data-anchor-id="diarization">Diarization</h3>
</div>
<p>The diarization output is a Pandas DataFrame with various columns. Most important are <code>speaker</code>, <code>start</code>, and <code>end</code> which give a speaker label for that segment, the start time of the segment, and the end time of the segment.</p>
<p>For example, you can get a list of unique speaker labels using python’s <code>set</code> function.</p>
<div id="ff10f40d" class="cell" data-execution_count="5">
<div id="da2f125d" class="cell" data-execution_count="5">
<div class="sourceCode cell-code" id="cb8"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb8-1"><a href="#cb8-1" aria-hidden="true" tabindex="-1"></a>speakers <span class="op">=</span> <span class="bu">set</span>(diarization[<span class="st">'speaker'</span>])</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<p>And you can use the <code>len</code> function to get the number of speakers</p>
<div id="fa518428" class="cell" data-execution_count="6">
<div id="3bac7417" class="cell" data-execution_count="6">
<div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a><span class="bu">len</span>(speakers)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-display" data-execution_count="6">
<pre><code>4</code></pre>
</div>
</div>
<p>You can also filter the transcript by selecting only segments with a particular speaker using Pandas’ <code>DataFrame.loc</code> method.</p>
<div id="a75f182a" class="cell" data-execution_count="7">
<div id="aaf370f7" class="cell" data-execution_count="7">
<div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a>snoop_dogg <span class="op">=</span> diarization.loc[diarization[<span class="st">'speaker'</span>] <span class="op">==</span> <span class="st">'SPEAKER_00'</span>]</span>
<span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(snoop_dogg)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-stdout">
Expand Down Expand Up @@ -370,11 +370,11 @@ <h3 class="anchored" data-anchor-id="diarization">Diarization</h3>
<section id="diarized-transcription" class="level3">
<h3 class="anchored" data-anchor-id="diarized-transcription">Diarized transcription</h3>
<p>The last stage of the pipeline is combining the diarization and the transcription by assigning speakers to segments.</p>
<div id="4ed6c0ac" class="cell" data-execution_count="8">
<div id="78c716ef" class="cell" data-execution_count="8">
<div class="sourceCode cell-code" id="cb13"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb13-1"><a href="#cb13-1" aria-hidden="true" tabindex="-1"></a>diarized_transcript <span class="op">=</span> fave_asr.assign_speakers(diarization,transcription)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<p>The structure of <code>diarized_transcript</code> is very similar to the structure of <code>transcription</code> but the segments and words now have a <code>speaker</code> field.</p>
<div id="de83480e" class="cell" data-execution_count="9">
<div id="be1dc200" class="cell" data-execution_count="9">
<div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>diarized_transcript[<span class="st">'segments'</span>][<span class="dv">0</span>][<span class="st">'speaker'</span>]</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
<div class="cell-output cell-output-display" data-execution_count="9">
<pre><code>'SPEAKER_00'</code></pre>
Expand All @@ -387,11 +387,11 @@ <h2 class="anchored" data-anchor-id="output">Output</h2>
<section id="textgrid" class="level3">
<h3 class="anchored" data-anchor-id="textgrid">TextGrid</h3>
<p>A <a href="#Diarization">diarized transcript</a> can be converted to a <a href="https://github.com/kylebgorman/textgrid/tree/master">textgrid</a> object and navigated using that library.</p>
<div id="4e9bb76f" class="cell" data-execution_count="10">
<div id="2d233a45" class="cell" data-execution_count="10">
<div class="sourceCode cell-code" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>tg <span class="op">=</span> fave_asr.to_TextGrid(diarized_transcript)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>
<p>You can write the output to a file using the <code>textgrid.write</code> method by specifying a file name for the output TextGrid.</p>
<div id="6263c2bd" class="cell" data-execution_count="11">
<div id="ae06e4a5" class="cell" data-execution_count="11">
<div class="sourceCode cell-code" id="cb17"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb17-1"><a href="#cb17-1" aria-hidden="true" tabindex="-1"></a>tg.write(<span class="st">'SnoopDogg_Interview.TextGrid'</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
</div>

Expand Down

0 comments on commit b2f0cf0

Please sign in to comment.