Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use larger Whisper Model #11

Open
MichaelMcCulloch opened this issue Mar 1, 2023 · 3 comments
Open

Use larger Whisper Model #11

MichaelMcCulloch opened this issue Mar 1, 2023 · 3 comments

Comments

@MichaelMcCulloch
Copy link
Owner

MichaelMcCulloch commented Mar 1, 2023

Before we can do this, we need to run inference on faster hardware. The larger models take significantly longer to run on the cpu, and I've yet to be able to measure the performance of larger models on specialized hardware, ie via NNAPI.

@nyadla-sys
Copy link

Using separate encoder and decoder tflite models would be a better option for multilanguage support with tflite model. For Android APP development, you can refer to the code available at https://github.com/ipsilondev/whisper-cordova/blob/36bf04192c05bbb9aa24d3dfb553039b4aeaa755/android/cpp/native-lib.cpp.

To exercise this feature using Python, you can use the colab notebook available at https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/whisper_encoder_decoder_tflite.ipynb.

Based on my understanding, it is still acceptable to run upto small whisper models on mobile phones.

@MichaelMcCulloch
Copy link
Owner Author

In 849517e You can see my attempt to use the separated encoder decoder model. Very slow so I assume I'm doing something wrong, or maybe that's just the way it is.

@nyadla-sys I have another problem I think you have the domain knowledge to help me with, if you also have the time.

class GenerateModel(tf.Module):
  def __init__(self, model, prefix):
    super(GenerateModel, self).__init__()
    self.model = model
    self.prefix = prefix

  @tf.function(
    input_signature=[
      tf.TensorSpec(shape=(1, 80, 3000), dtype=tf.float32, name="input_features"), 
    ],
  )
  def inference(self, input_features):

    outputs = self.model.generate(
      input_ids=input_features,
      max_new_tokens=384,
      return_dict_in_generate=True,
      forced_decoder_ids=self.prefix,
    )
    return {"sequences": outputs["sequences"]}

  @tf.function(
    input_signature=[
      tf.TensorSpec(shape=(4,2), dtype=tf.int32, name="decoder_prefix"), 
    ],
  )
  def set_prefix(self, decoder_prefix):
    self.prefix = decoder_prefix
    return {"discard": decoder_prefix}

I hope it's clear from this code that

  1. I'm no tensorflow, or even python expert
  2. I'm trying to make it possible to use the generation enabled model while Allowing the decoder inputs to be set via a tensor.
  3. It's not working

wrt the second point, the goal is to instruct it to translate/transcribe languages and configure timestamps. This is not working, I don't know why, but I suspect the value of forced_decoder_ids=self.prefix ends up effectively hard-coded, because calls to set_prefix have no effect on the decoder output. I think this is a product of the way the graph is built, but I don't know enough to fix it.

@nyadla-sys
Copy link

Currently, the translation and transcription of audio works only with encoder and decoder TFLite models. You can find a Python implementation of this process in the following Colab notebook: https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/whisper_encoder_decoder_tflite.ipynb.

Furthermore, there is an ongoing effort to add this feature to an Android app using C++ through the following GitHub link: https://github.com/ipsilondev/whisper-cordova/blob/main/android/cpp/native-lib.cpp.

It should be noted that, as of now, none of the TFLite models support timestamps. However, this feature may be added at a later time.

at this time none of my tflite models support timestamps and it can be added layter

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants