feat: `SpeechToDocument` #2676

ZanSara · 2022-06-17T13:46:10Z

Proposed changes:

Introduces SpeechToDocument, a node that takes in input audio files and outputs a Document.
- What works:
  - Can deal with arbitrarily long audio files by chunking them internally into fragments
  - The speed of transcription is faster than the audio by about 2x, so in theory it could be used with streaming inputs (but in practice it's very hard now due to how the rest of Haystack works 😄 ).
  - The audio is aligned with its transcription at the word level using aeneas ("traditional" forced alignment method, so very fast)
- Still to do, probably in the next PR:
  - Support input files other than .wav 😅 (easy)
  - Denoising the input audio (easy to remove most of it, very hard to make it perfect)
  - Adding punctuation or test with models that predict punctuation too (to investigate)
  - Fragmenting the input on voice pauses (now broken down in arbitrarily long chunks) (to investigate)
  - Silence detection (should be relatively easy)
  - Run a spellchecker on the output to improve the transcription quality (should be relatively easy)
Modifies AnswerToSpeech to check if the source document contains alignment data: if it does, the new AnswerToSpeech will extract the audio from the original source instead of generating it.
Introduces a new primitive, AudioAlignment, that contains alignment data
Modifies the SpeechDocument/SpeechAnswer primitives to accommodate for optional alignment data
Modifies Span by overriding its in statement. Now assert 10 in Span(5, 15) returns True 😁

To Dos:

Tutorial
Tests

… into fix-join-nodes

ZanSara · 2022-10-21T10:29:59Z

These nodes will be added to Haystack as external nodes. Closing.

MichelBartels and others added 20 commits May 31, 2022 12:56

fix join nodes

cd3c8ad

Update Documentation & Code Style

9b370f2

fix unused import

66d5a2e

change arg order

7b582b6

fix merge conflict

bc51bee

Update Documentation & Code Style

333dc94

fix kwargs check

9d0d383

Merge branch 'fix-join-nodes' of https://github.com/deepset-ai/haystack…

8c9abbd

… into fix-join-nodes

add warning when there is only one input node

c47cce6

Update Documentation & Code Style

521b8cb

fix type hint

e060458

Merge branch 'fix-join-nodes' of https://github.com/deepset-ai/haystack…

1f3bf06

… into fix-join-nodes

fix wrong import order

c1c31b0

Update Documentation & Code Style

6411cc1

undo kwargs

d535136

fix merge conflict

ecacfb2

add accidentally deleted newline#

9beb3c5

fix type hint

73bef1d

fix type hint

248ddf3

First version of SpeechToDocument and accessories

279198f

ZanSara added type:feature New feature or request topic:audio labels Jun 17, 2022

ZanSara requested review from masci and julian-risch June 17, 2022 13:46

ZanSara added 6 commits June 17, 2022 16:13

working on tutorial 17

70450eb

Merge branch 'fix-join-nodes' into speech2text

ea60dfb

Fix bugs in tutorial 17

c99956b

Merge branch 'master' into speech2text

2cf63a0

lower a bit the fragment length to fit in my ram

4bda6ae

Add lenght to tqdm in speech_to_document

f15350e

name meta must be string

2892a2d

ZanSara removed request for masci and julian-risch June 20, 2022 09:34

ZanSara mentioned this pull request Jul 21, 2022

Add support for images #2418

Closed

8 tasks

Merge branch 'master' into speech2text

382a953

ZanSara mentioned this pull request Jul 21, 2022

Generalize primitives #2867

Closed

ZanSara changed the title ~~SpeechToDocument~~ feat: SpeechToDocument Aug 11, 2022

Merge branch 'master' into speech2text

d68236d

ZanSara mentioned this pull request Oct 21, 2022

Audio nodes (text2speech, speech2text) deepset-ai/haystack-core-integrations#1

Merged

10 tasks

ZanSara closed this Oct 21, 2022

masci deleted the speech2text branch September 13, 2023 08:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: `SpeechToDocument` #2676

feat: `SpeechToDocument` #2676

ZanSara commented Jun 17, 2022 •

edited

Loading

ZanSara commented Oct 21, 2022

feat: SpeechToDocument #2676

feat: SpeechToDocument #2676

Conversation

ZanSara commented Jun 17, 2022 • edited Loading

ZanSara commented Oct 21, 2022

feat: `SpeechToDocument` #2676

feat: `SpeechToDocument` #2676

ZanSara commented Jun 17, 2022 •

edited

Loading