Remove non-speech descriptions from output #3

tannisroot · 2024-08-23T14:15:35Z

wyoming-whisper-cpp does something similar, but I've encountered far more non-speech tokens than just [BLANK AUDIO] so this change instead just removes square and round brackets and their contents altogether.
https://github.com/rhasspy/wyoming-whisper-cpp/blob/476b0e631392034a94196eb578b3d0a60164af53/wyoming_whisper_cpp/handler.py#L92

StrandmonYellow · 2024-09-29T17:14:46Z

Is this already merged?

tannisroot · 2024-09-29T18:18:22Z

Is this already merged?

the status of the PR is open so no

ser · 2024-10-21T03:20:40Z

Hello would you please do it in one regexp and add appropriate comments what is that thingy doing into the code? and maybe it's worth to make it as an option?

text = re.sub(r'\[.*?\]|\(.*?\)', '', text).strip()

Remove non-speech descriptions from output

f39f2a8

tannisroot force-pushed the blank_audio_remove branch from 109a14d to f39f2a8 Compare September 3, 2024 13:11

tannisroot changed the title ~~Remove [BLANK AUDIO] from output~~ Remove non-speech descriptions from output Sep 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove non-speech descriptions from output #3

Remove non-speech descriptions from output #3

tannisroot commented Aug 23, 2024 •

edited

Loading

StrandmonYellow commented Sep 29, 2024

tannisroot commented Sep 29, 2024

ser commented Oct 21, 2024 •

edited

Loading

Remove non-speech descriptions from output #3

Are you sure you want to change the base?

Remove non-speech descriptions from output #3

Conversation

tannisroot commented Aug 23, 2024 • edited Loading

StrandmonYellow commented Sep 29, 2024

tannisroot commented Sep 29, 2024

ser commented Oct 21, 2024 • edited Loading

tannisroot commented Aug 23, 2024 •

edited

Loading

ser commented Oct 21, 2024 •

edited

Loading