refactor: Migrate `RemoteWhisperTranscriber` to OpenAI SDK. #6149

awinml · 2023-10-22T21:11:21Z

Related Issues

fixes Migrate Whisper transcriber (v2.0) to OpenAI SDK #5731
fixes a part of Remove api_key from serialization of AzureOCRDocumentConverter, SerperDevWebSearch, RemoteWhisperTranscriber 2.0 #6116

Proposed Changes:

Migrate RemoteWhisperTranscriber to use the OpenAI SDK.

How did you test it?

Added Unit and Integration Tests.

Notes for the reviewer

The Whisper Transcription endpoint supports an optional response_format parameter.
This parameter controls the format of the transcribed output, and supports the following options: json, text, srt, verbose_json, or vtt. By default the API returns a "json" object with the transcribed text under the "text" key.
The implementation in this PR only supports the "json" option for the response_format parameter. We can add support for the other options in a future PR if required.

Checklist

I have read the contributors guidelines and the code of conduct
I have updated the related issue with new insights and changes
I added unit tests and updated the docstrings
I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test:.
I documented my code
I ran pre-commit hooks and fixed any issue

ZanSara

Looking quite good! I left a few suggestions, overall I believe it's soon ready for merge 🚀

haystack/preview/components/audio/whisper_remote.py

test/preview/components/audio/test_whisper_remote.py

vblagoje

Confirming comments from @ZanSara Looking forward to these small corrections @awinml

Co-authored-by: ZanSara <[email protected]>

ZanSara

LGTM! Thank you for your contribution 🤗

vblagoje · 2023-10-25T09:47:08Z

@awinml looks ok to me. Before I approve, aside from unit tests, have you played with the component, e.g. in a small demo?

awinml · 2023-10-26T09:33:07Z

@ZanSara and @vblagoje , Thank you for your feedback!

I did some digging on how the OpenAI SDK handles the file send to the audio.transcribe() API.

These were some of the findings:

The API only accepts a file stream as input and does not accept the raw bytes.
The API uses .name attribute of the input to get the file name. If it is unable to find the name, it throws an incorrect format error.

For reference, please have a look at: https://github.com/openai/openai-python/blob/main/openai/api_resources/audio.py#L61

I reworked the implementation to create a io.BytesIO object from the bytes recieved from the ByteStream instance. I set the .name attribute manually using the file_name received from the metadata. The file is skipped if the file_name is not present in the metadata.

I created a simple demo highlighting these findings and showcasing an example using this component.
This is the link to it: https://colab.research.google.com/drive/1FlxKkw3Q83k7Z-BfXPYat5UDt_iA5rIO?usp=sharing

vblagoje · 2023-10-26T09:46:11Z

@awinml thanks for these elaborate explanations. They are excellent! I'm wondering, can't we try to resolve file_path from metadata, but if it is not there, we just put a random name to satisfy the requirement of that value not being None?

vblagoje · 2023-10-26T09:53:37Z

@ZanSara perhaps we should store file_path in the metadata of ByteStream when we construct it from from_file_path anyway?

awinml · 2023-10-26T10:17:25Z

@vblagoje We can do that.

As far as I checked, the API does extract the file format from the name. We could just set it to something like sample_audio.wav for all the ByteStreams without a file_path.

I added an example to the notebook above, testing the API with a random name, which has a different file format as compared to the input file. It worked perfectly fine.

vblagoje · 2023-10-26T11:36:57Z

haystack/preview/components/audio/whisper_remote.py

+        for stream in streams:
+            try:
+                file = io.BytesIO(stream.data)
+                file.name = stream.metadata["file_path"]


@awinml yes, let's do a check here if stream.metadata["file_path"] is present. If it is, use it. If not, just use a random name, e.g. audio_input , perhaps an extension is not even needed. Please check.

I tried the API without using an extension, it does not allow that. I updated the example notebook with a test.

I think, we can use audio_input.wav as the name if stream.metadata["file_path"] is not present. Should we also log a warning if we do this?

Something like this:

for stream in streams: file = io.BytesIO(stream.data) try: file.name = stream.metadata["file_path"] except KeyError as e: file.name = "audio_input.wav" warning_msg = """Did not find 'file_path', setting 'file_name' to 'audio_input.wav'.""" logger.warning(warning_msg, e)

I wouldn't. That file name is not important at all, as far as I can tell, and we needlessly scare users. wdyt @ZanSara @awinml

Makes sense, I will push the changes without the warning then.

vblagoje · 2023-10-26T11:38:03Z

@awinml yes, let's do that small change and we can later decide to add file_path to ByteStream when it is constructed from the file.

vblagoje

Let's 🚢 Thanks for this awesome contribution @awinml

bilgeyucel · 2023-10-31T10:57:08Z

Hi @awinml, thank you for your contribution to Haystack! Now that you resolved an issue labeled with 'hacktoberfest', you have a chance to receive an exclusive swag package from Haystack. 🎁

Fill in this form, and let us know if you have any questions! https://forms.gle/226vqWoN6NRAaqJ69

awinml added 5 commits October 21, 2023 12:15

Migrate RemoteWhisperTranscriber to OpenAI SDK

8eb09d3

Migrate RemoteWhisperTranscriber to OpenAI SDK

14aefd2

Remove unnecessary imports

0deb512

Add release notes

10f6ba2

Fix api_key serialization

d04fcb1

awinml requested review from a team as code owners October 22, 2023 21:11

awinml requested review from dfokina and vblagoje and removed request for a team October 22, 2023 21:11

github-actions bot added topic:tests type:documentation Improvements on the docs labels Oct 22, 2023

anakin87 mentioned this pull request Oct 23, 2023

feat: updated RemoteWhisperTranscriber from whisper-openai to openai sdk #6142

Closed

Fix linting

c57b7ff

ZanSara mentioned this pull request Oct 24, 2023

build: Upgrade transformers to the latest version 4.34.1 #5994

Merged

ZanSara reviewed Oct 24, 2023

View reviewed changes

vblagoje requested changes Oct 24, 2023

View reviewed changes

awinml and others added 4 commits October 24, 2023 18:11

Apply suggestions from code review

bc13703

Co-authored-by: ZanSara <[email protected]>

Add additional tests for api_key

8e4eebf

Adapt .run() to take ByteStream inputs

25ecec5

Update docstrings

d069035

awinml requested review from ZanSara and vblagoje October 25, 2023 05:08

ZanSara mentioned this pull request Oct 25, 2023

feat: add path field to ByteStream #6168

Closed

ZanSara approved these changes Oct 25, 2023

View reviewed changes

Rework implementation to use io.BytesIO

212c94f

github-actions bot added the 2.x Related to Haystack v2.0 label Oct 26, 2023

Update error message

3714122

vblagoje reviewed Oct 26, 2023

View reviewed changes

Add default file name

5fbbbb5

vblagoje self-requested a review October 26, 2023 13:55

vblagoje approved these changes Oct 26, 2023

View reviewed changes

vblagoje merged commit 5f35e7d into deepset-ai:main Oct 26, 2023
21 checks passed

awinml deleted the migrate_whisper_transcriber_openai_sdk branch November 29, 2023 17:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Migrate `RemoteWhisperTranscriber` to OpenAI SDK. #6149

refactor: Migrate `RemoteWhisperTranscriber` to OpenAI SDK. #6149

awinml commented Oct 22, 2023

ZanSara left a comment

vblagoje left a comment

ZanSara left a comment •

edited

Loading

vblagoje commented Oct 25, 2023

awinml commented Oct 26, 2023

vblagoje commented Oct 26, 2023

vblagoje commented Oct 26, 2023

awinml commented Oct 26, 2023

vblagoje Oct 26, 2023

awinml Oct 26, 2023 •

edited

Loading

vblagoje Oct 26, 2023

awinml Oct 26, 2023

vblagoje commented Oct 26, 2023

vblagoje left a comment

bilgeyucel commented Oct 31, 2023

refactor: Migrate RemoteWhisperTranscriber to OpenAI SDK. #6149

refactor: Migrate RemoteWhisperTranscriber to OpenAI SDK. #6149

Conversation

awinml commented Oct 22, 2023

Related Issues

Proposed Changes:

How did you test it?

Notes for the reviewer

Checklist

ZanSara left a comment

Choose a reason for hiding this comment

vblagoje left a comment

Choose a reason for hiding this comment

ZanSara left a comment • edited Loading

Choose a reason for hiding this comment

vblagoje commented Oct 25, 2023

awinml commented Oct 26, 2023

vblagoje commented Oct 26, 2023

vblagoje commented Oct 26, 2023

awinml commented Oct 26, 2023

vblagoje Oct 26, 2023

Choose a reason for hiding this comment

awinml Oct 26, 2023 • edited Loading

Choose a reason for hiding this comment

vblagoje Oct 26, 2023

Choose a reason for hiding this comment

awinml Oct 26, 2023

Choose a reason for hiding this comment

vblagoje commented Oct 26, 2023

vblagoje left a comment

Choose a reason for hiding this comment

bilgeyucel commented Oct 31, 2023

refactor: Migrate `RemoteWhisperTranscriber` to OpenAI SDK. #6149

refactor: Migrate `RemoteWhisperTranscriber` to OpenAI SDK. #6149

ZanSara left a comment •

edited

Loading

awinml Oct 26, 2023 •

edited

Loading