Use larger Whisper Model #11

MichaelMcCulloch · 2023-03-01T22:17:08Z

Before we can do this, we need to run inference on faster hardware. The larger models take significantly longer to run on the cpu, and I've yet to be able to measure the performance of larger models on specialized hardware, ie via NNAPI.

nyadla-sys · 2023-03-01T23:30:59Z

Using separate encoder and decoder tflite models would be a better option for multilanguage support with tflite model. For Android APP development, you can refer to the code available at https://github.com/ipsilondev/whisper-cordova/blob/36bf04192c05bbb9aa24d3dfb553039b4aeaa755/android/cpp/native-lib.cpp.

To exercise this feature using Python, you can use the colab notebook available at https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/whisper_encoder_decoder_tflite.ipynb.

Based on my understanding, it is still acceptable to run upto small whisper models on mobile phones.

MichaelMcCulloch · 2023-03-12T23:14:09Z

In 849517e You can see my attempt to use the separated encoder decoder model. Very slow so I assume I'm doing something wrong, or maybe that's just the way it is.

@nyadla-sys I have another problem I think you have the domain knowledge to help me with, if you also have the time.

class GenerateModel(tf.Module):
  def __init__(self, model, prefix):
    super(GenerateModel, self).__init__()
    self.model = model
    self.prefix = prefix

  @tf.function(
    input_signature=[
      tf.TensorSpec(shape=(1, 80, 3000), dtype=tf.float32, name="input_features"), 
    ],
  )
  def inference(self, input_features):

    outputs = self.model.generate(
      input_ids=input_features,
      max_new_tokens=384,
      return_dict_in_generate=True,
      forced_decoder_ids=self.prefix,
    )
    return {"sequences": outputs["sequences"]}

  @tf.function(
    input_signature=[
      tf.TensorSpec(shape=(4,2), dtype=tf.int32, name="decoder_prefix"), 
    ],
  )
  def set_prefix(self, decoder_prefix):
    self.prefix = decoder_prefix
    return {"discard": decoder_prefix}

I hope it's clear from this code that

I'm no tensorflow, or even python expert
I'm trying to make it possible to use the generation enabled model while Allowing the decoder inputs to be set via a tensor.
It's not working

wrt the second point, the goal is to instruct it to translate/transcribe languages and configure timestamps. This is not working, I don't know why, but I suspect the value of forced_decoder_ids=self.prefix ends up effectively hard-coded, because calls to set_prefix have no effect on the decoder output. I think this is a product of the way the graph is built, but I don't know enough to fix it.

nyadla-sys · 2023-03-13T04:13:45Z

Currently, the translation and transcription of audio works only with encoder and decoder TFLite models. You can find a Python implementation of this process in the following Colab notebook: https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/whisper_encoder_decoder_tflite.ipynb.

Furthermore, there is an ongoing effort to add this feature to an Android app using C++ through the following GitHub link: https://github.com/ipsilondev/whisper-cordova/blob/main/android/cpp/native-lib.cpp.

It should be noted that, as of now, none of the TFLite models support timestamps. However, this feature may be added at a later time.

at this time none of my tflite models support timestamps and it can be added layter

MichaelMcCulloch mentioned this issue Aug 6, 2023

Add more language + keyboard #15

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use larger Whisper Model #11

Use larger Whisper Model #11

MichaelMcCulloch commented Mar 1, 2023 •

edited

Loading

nyadla-sys commented Mar 1, 2023

MichaelMcCulloch commented Mar 12, 2023

nyadla-sys commented Mar 13, 2023

Use larger Whisper Model #11

Use larger Whisper Model #11

Comments

MichaelMcCulloch commented Mar 1, 2023 • edited Loading

nyadla-sys commented Mar 1, 2023

MichaelMcCulloch commented Mar 12, 2023

nyadla-sys commented Mar 13, 2023

MichaelMcCulloch commented Mar 1, 2023 •

edited

Loading