Add Text-to-Speech Implementations & CLI App #57

hello-amal · 2024-07-29T22:57:11Z

Description

This PR adds a generic text-to-speech (TTS) abstract class, TextToSpeechEngine, as well as two implementations of that abstract class, one using gTTS (preferred) and the other using pyttsx3 (worse voice quality, but can be used offline). It also adds test cases for each of the engines, using ground-truth saved files. Finally, it adds a command-line interface (CLI) to allow users to easily use text-to-speech (with convenient features like storing history, loading pre-saved utterances, and tab completion).

Testing

Checklist

I have performed a self-review of my code
If it is a core feature, I have added thorough tests
I have added documentation for the changes
[N/A] I have updated the README file if necessary
I have run on hardware if necessary

Additional context

This is a copy of stretchpy#61, so that stretch_ai also has TTS capabilities. Eventually, we should store this code in only one place.

hello-cpaxton · 2024-07-29T23:43:39Z

src/stretch/audio/text_to_speech.py

+DEFAULT_LOGGER = logging.getLogger(__name__)
+
+
+class TextToSpeechEngineType(Enum):


I would prefer enums in a separate file usually, and split these 2 as well. that makes it easier to make them optional dependencies/

I removed this enum and put pyttsx3 and gtts in sepaarte files. I don't agree that enums should be in separate files as a rule; for example, in text_to_speech/executor.py I feel its appropriate to keep the enum TextToSpeechOverrideBehavior in the same file as TextToSpeechExecutor. LMK if you feel otherwise (for that specific case).

src/stretch/audio/text_to_speech.py

src/stretch/audio/text_to_speech_cli.py

hello-cpaxton · 2024-07-29T23:44:44Z

src/test/audio/test_text_to_speech.py

+
+# Adapted from https://github.com/markstent/audio-similarity/blob/main/audio_similarity/audio_similarity.py
+# Note that that script has other audio similarity metrics as well
+def spectral_contrast_similarity(ground_truth_filepath, comparison_filepath, sample_rate=16000):


i could see this being really useful for e.g. wake words, would it make sense in audio/utils or utils/audio or something?

maybe that would not be using filepaths though

Done. Filepaths were the most convenient way to load into librosa, and imo we can generalize it if/when we find another use for the function.

hello-atharva

Should we add an application for TTS inside stretch.apps? Currently there is no main entry point inside <path/to/stretch_ai>/src/stretch/audio/text_to_speech_cli.py

hello-cpaxton · 2024-08-02T22:17:40Z

Can you add the mp3 files to .gitattributes and make sure they are under git-lfs @hello-amal ?

hello-cpaxton · 2024-08-02T22:17:53Z

We want to make sure large files are never added to git history!

hello-amal · 2024-08-06T18:20:47Z

@hello-cpaxton @hello-atharva Done with all suggested changes from this PR and stretchpy#61, except for:

@hello-cpaxton suggested putting tests/audio/manual_test_text_to_speech.py into examples/demos. I agree that an "example" better describes what that file serves as, but the examples folder seems to be gone in stretch_ai. Any suggested location to put it?

I did put mp3 files in LFS, but FWIW the larges of them was just 60 kB, which is far lower than the default 500 kB in the check-added-large-files pre-commit. Anyway, for good measure I went ahead and added that pre-commit.

hello-amal · 2024-08-06T18:30:43Z

@hello-cpaxton could you add espeak to the docker image? It is required for pyttsx3, and is therefore needed for the automated tests; hence, CI is failing.

.github/workflows/python-package.yml

hello-cpaxton · 2024-08-06T18:40:09Z

src/setup.py

        "openai-whisper",
+        "overrides",  # better inheritance of docstrings


this one may have caused issues during installation - did you try it out?

Yup, I uninstalled stretchpy and overrides, and then re-installed from src and it worked on my machine.

hello-cpaxton

LGTM pending minor comments

…into amaln/tts

hello-atharva

LGTM

* Added TTS * Update setup.py * Install `libasound2-dev` in workflow * Github workflows require `sudo apt-get` installs * Add portaudio to the `apt-get` installs * Fixes from pre-commits * Add espeak to github actions installation * Remove mp3s * Configure LFS to track MP3s * Added a check for large files in the pre-commit * Changes from PR review * Update github actions dep to fake audio capabilities * Update the apt install * updates to docker * workflow updates * Add espeak to README audio deps * Add ffmpeg * [WIP] list audioread backends in github actions * Refactor available formats * Implemented GoogleCloudTTS * [WIP] list audioread backends in github actions * [WIP] add verbose logs to failing test case * Remove GoogleCloudTTS on GithubActions * [WIP] verify the named temp file has size > 0 * [WIP] check if FFMPeg gets a decoder error on the mp3s * Pull LFS files in Github Action * Add Git LFS to the action workflow * Mark the git directory as safe before pulling LFS files * Move git-lfs from action workflow to docker file * Re-trigger github actions --------- Co-authored-by: Amal Nanavati <[email protected]> Co-authored-by: Chris Paxton <[email protected]> Co-authored-by: Chris Paxton <[email protected]>

hello-amal requested a review from hello-cpaxton July 29, 2024 22:57

hello-cpaxton reviewed Jul 29, 2024

View reviewed changes

src/stretch/audio/text_to_speech.py Outdated Show resolved Hide resolved

hello-cpaxton reviewed Jul 29, 2024

View reviewed changes

src/stretch/audio/text_to_speech_cli.py Outdated Show resolved Hide resolved

hello-cpaxton reviewed Jul 29, 2024

View reviewed changes

hello-amal mentioned this pull request Aug 1, 2024

Changes from CLV Study hello-robot/stretch_web_teleop#91

Draft

2 tasks

hello-atharva reviewed Aug 2, 2024

View reviewed changes

amalnanavati and others added 11 commits August 6, 2024 09:53

Added TTS

e7ca590

Update setup.py

620082b

Install libasound2-dev in workflow

550dca2

Github workflows require sudo apt-get installs

aa4cc74

Add portaudio to the apt-get installs

ef10f7d

Fixes from pre-commits

86b2157

Add espeak to github actions installation

dcc8180

Remove mp3s

68ead98

Configure LFS to track MP3s

b98af1f

Added a check for large files in the pre-commit

35cb574

Changes from PR review

9460b4d

hello-amal force-pushed the amaln/tts branch from e19ef3e to 9460b4d Compare August 6, 2024 18:17

amalnanavati added 2 commits August 6, 2024 11:24

Update github actions dep to fake audio capabilities

cf8b1e4

Update the apt install

d92ecc1

hello-amal requested a review from hello-cpaxton August 6, 2024 18:30

hello-cpaxton reviewed Aug 6, 2024

View reviewed changes

.github/workflows/python-package.yml Outdated Show resolved Hide resolved

hello-cpaxton reviewed Aug 6, 2024

View reviewed changes

hello-cpaxton approved these changes Aug 6, 2024

View reviewed changes

updates to docker

bab8f75

hello-cpaxton and others added 18 commits August 6, 2024 15:33

workflow updates

b449c51

Add espeak to README audio deps

8f859bc

Add ffmpeg

8b159a3

Merge branch 'main' into amaln/tts

bcdadf2

[WIP] list audioread backends in github actions

d320817

Merge branch 'amaln/tts' of https://github.com/hello-robot/stretch_ai …

711b0bb

…into amaln/tts

Refactor available formats

80a4a1b

Merge branch 'main' into amaln/tts

654af4f

Implemented GoogleCloudTTS

1d3b4f7

[WIP] list audioread backends in github actions

27af958

[WIP] add verbose logs to failing test case

6d2dbe7

Remove GoogleCloudTTS on GithubActions

e4ba202

[WIP] verify the named temp file has size > 0

7c06565

[WIP] check if FFMPeg gets a decoder error on the mp3s

b01ca58

Pull LFS files in Github Action

1d3397a

Add Git LFS to the action workflow

f8d660d

Mark the git directory as safe before pulling LFS files

c80fdf7

Move git-lfs from action workflow to docker file

3e727ab

hello-amal changed the title ~~Added Text-to-Speech~~ Add Text-to-Speech Implementations & CLI App Aug 7, 2024

Re-trigger github actions

1e04629

hello-atharva approved these changes Aug 7, 2024

View reviewed changes

hello-amal merged commit ac4323c into main Aug 7, 2024
1 check passed

hello-amal deleted the amaln/tts branch August 7, 2024 14:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Text-to-Speech Implementations & CLI App #57

Add Text-to-Speech Implementations & CLI App #57

hello-amal commented Jul 29, 2024 •

edited

Loading

hello-cpaxton Jul 29, 2024

hello-amal Aug 6, 2024

hello-cpaxton Jul 29, 2024

hello-cpaxton Jul 29, 2024

hello-amal Aug 6, 2024

hello-atharva left a comment

hello-cpaxton commented Aug 2, 2024

hello-cpaxton commented Aug 2, 2024

hello-amal commented Aug 6, 2024

hello-amal commented Aug 6, 2024 •

edited

Loading

hello-cpaxton Aug 6, 2024

hello-amal Aug 6, 2024

hello-cpaxton left a comment

hello-atharva left a comment

		DEFAULT_LOGGER = logging.getLogger(__name__)


		class TextToSpeechEngineType(Enum):

		"openai-whisper",
		"overrides", # better inheritance of docstrings

Add Text-to-Speech Implementations & CLI App #57

Add Text-to-Speech Implementations & CLI App #57

Conversation

hello-amal commented Jul 29, 2024 • edited Loading

Description

Testing

Checklist

Additional context

hello-cpaxton Jul 29, 2024

Choose a reason for hiding this comment

hello-amal Aug 6, 2024

Choose a reason for hiding this comment

hello-cpaxton Jul 29, 2024

Choose a reason for hiding this comment

hello-cpaxton Jul 29, 2024

Choose a reason for hiding this comment

hello-amal Aug 6, 2024

Choose a reason for hiding this comment

hello-atharva left a comment

Choose a reason for hiding this comment

hello-cpaxton commented Aug 2, 2024

hello-cpaxton commented Aug 2, 2024

hello-amal commented Aug 6, 2024

hello-amal commented Aug 6, 2024 • edited Loading

hello-cpaxton Aug 6, 2024

Choose a reason for hiding this comment

hello-amal Aug 6, 2024

Choose a reason for hiding this comment

hello-cpaxton left a comment

Choose a reason for hiding this comment

hello-atharva left a comment

Choose a reason for hiding this comment

hello-amal commented Jul 29, 2024 •

edited

Loading

hello-amal commented Aug 6, 2024 •

edited

Loading