Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: Migrate RemoteWhisperTranscriber to OpenAI SDK. #6149

Merged

Conversation

awinml
Copy link
Contributor

@awinml awinml commented Oct 22, 2023

Related Issues

Proposed Changes:

Migrate RemoteWhisperTranscriber to use the OpenAI SDK.

How did you test it?

Added Unit and Integration Tests.

Notes for the reviewer

  • The Whisper Transcription endpoint supports an optional response_format parameter.
  • This parameter controls the format of the transcribed output, and supports the following options: json, text, srt, verbose_json, or vtt. By default the API returns a "json" object with the transcribed text under the "text" key.
  • The implementation in this PR only supports the "json" option for the response_format parameter. We can add support for the other options in a future PR if required.

Checklist

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking quite good! I left a few suggestions, overall I believe it's soon ready for merge 🚀

Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Confirming comments from @ZanSara Looking forward to these small corrections @awinml

Copy link
Contributor

@ZanSara ZanSara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for your contribution 🤗

@vblagoje
Copy link
Member

@awinml looks ok to me. Before I approve, aside from unit tests, have you played with the component, e.g. in a small demo?

@github-actions github-actions bot added the 2.x Related to Haystack v2.0 label Oct 26, 2023
@awinml
Copy link
Contributor Author

awinml commented Oct 26, 2023

@ZanSara and @vblagoje , Thank you for your feedback!

I did some digging on how the OpenAI SDK handles the file send to the audio.transcribe() API.

These were some of the findings:

  • The API only accepts a file stream as input and does not accept the raw bytes.
  • The API uses .name attribute of the input to get the file name. If it is unable to find the name, it throws an incorrect format error.

For reference, please have a look at: https://github.com/openai/openai-python/blob/main/openai/api_resources/audio.py#L61

I reworked the implementation to create a io.BytesIO object from the bytes recieved from the ByteStream instance. I set the .name attribute manually using the file_name received from the metadata. The file is skipped if the file_name is not present in the metadata.

I created a simple demo highlighting these findings and showcasing an example using this component.
This is the link to it: https://colab.research.google.com/drive/1FlxKkw3Q83k7Z-BfXPYat5UDt_iA5rIO?usp=sharing

@vblagoje
Copy link
Member

@awinml thanks for these elaborate explanations. They are excellent! I'm wondering, can't we try to resolve file_path from metadata, but if it is not there, we just put a random name to satisfy the requirement of that value not being None?

@vblagoje
Copy link
Member

@ZanSara perhaps we should store file_path in the metadata of ByteStream when we construct it from from_file_path anyway?

@awinml
Copy link
Contributor Author

awinml commented Oct 26, 2023

@vblagoje We can do that.

As far as I checked, the API does extract the file format from the name. We could just set it to something like sample_audio.wav for all the ByteStreams without a file_path.

I added an example to the notebook above, testing the API with a random name, which has a different file format as compared to the input file. It worked perfectly fine.

for stream in streams:
try:
file = io.BytesIO(stream.data)
file.name = stream.metadata["file_path"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@awinml yes, let's do a check here if stream.metadata["file_path"] is present. If it is, use it. If not, just use a random name, e.g. audio_input , perhaps an extension is not even needed. Please check.

Copy link
Contributor Author

@awinml awinml Oct 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried the API without using an extension, it does not allow that. I updated the example notebook with a test.

I think, we can use audio_input.wav as the name if stream.metadata["file_path"] is not present. Should we also log a warning if we do this?

Something like this:

for stream in streams:
    file = io.BytesIO(stream.data)
    try:
        file.name = stream.metadata["file_path"]
    except KeyError as e:
        file.name = "audio_input.wav"
        warning_msg = """Did not find 'file_path', setting 'file_name' to 'audio_input.wav'."""
        logger.warning(warning_msg, e)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't. That file name is not important at all, as far as I can tell, and we needlessly scare users. wdyt @ZanSara @awinml

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I will push the changes without the warning then.

@vblagoje
Copy link
Member

@awinml yes, let's do that small change and we can later decide to add file_path to ByteStream when it is constructed from the file.

@vblagoje vblagoje self-requested a review October 26, 2023 13:55
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's 🚢 Thanks for this awesome contribution @awinml

@vblagoje vblagoje merged commit 5f35e7d into deepset-ai:main Oct 26, 2023
21 checks passed
@bilgeyucel
Copy link
Contributor

Hi @awinml, thank you for your contribution to Haystack! Now that you resolved an issue labeled with 'hacktoberfest', you have a chance to receive an exclusive swag package from Haystack. 🎁

Fill in this form, and let us know if you have any questions! https://forms.gle/226vqWoN6NRAaqJ69

@awinml awinml deleted the migrate_whisper_transcriber_openai_sdk branch November 29, 2023 17:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.x Related to Haystack v2.0 topic:tests type:documentation Improvements on the docs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Migrate Whisper transcriber (v2.0) to OpenAI SDK
4 participants