-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use larger Whisper Model #11
Comments
Using separate encoder and decoder tflite models would be a better option for multilanguage support with tflite model. For Android APP development, you can refer to the code available at https://github.com/ipsilondev/whisper-cordova/blob/36bf04192c05bbb9aa24d3dfb553039b4aeaa755/android/cpp/native-lib.cpp. To exercise this feature using Python, you can use the colab notebook available at https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/whisper_encoder_decoder_tflite.ipynb. Based on my understanding, it is still acceptable to run upto small whisper models on mobile phones. |
In 849517e You can see my attempt to use the separated encoder decoder model. Very slow so I assume I'm doing something wrong, or maybe that's just the way it is. @nyadla-sys I have another problem I think you have the domain knowledge to help me with, if you also have the time. class GenerateModel(tf.Module):
def __init__(self, model, prefix):
super(GenerateModel, self).__init__()
self.model = model
self.prefix = prefix
@tf.function(
input_signature=[
tf.TensorSpec(shape=(1, 80, 3000), dtype=tf.float32, name="input_features"),
],
)
def inference(self, input_features):
outputs = self.model.generate(
input_ids=input_features,
max_new_tokens=384,
return_dict_in_generate=True,
forced_decoder_ids=self.prefix,
)
return {"sequences": outputs["sequences"]}
@tf.function(
input_signature=[
tf.TensorSpec(shape=(4,2), dtype=tf.int32, name="decoder_prefix"),
],
)
def set_prefix(self, decoder_prefix):
self.prefix = decoder_prefix
return {"discard": decoder_prefix} I hope it's clear from this code that
wrt the second point, the goal is to instruct it to translate/transcribe languages and configure timestamps. This is not working, I don't know why, but I suspect the value of |
Currently, the translation and transcription of audio works only with encoder and decoder TFLite models. You can find a Python implementation of this process in the following Colab notebook: https://colab.research.google.com/github/usefulsensors/openai-whisper/blob/main/notebooks/whisper_encoder_decoder_tflite.ipynb. Furthermore, there is an ongoing effort to add this feature to an Android app using C++ through the following GitHub link: https://github.com/ipsilondev/whisper-cordova/blob/main/android/cpp/native-lib.cpp. It should be noted that, as of now, none of the TFLite models support timestamps. However, this feature may be added at a later time. at this time none of my tflite models support timestamps and it can be added layter |
Before we can do this, we need to run inference on faster hardware. The larger models take significantly longer to run on the cpu, and I've yet to be able to measure the performance of larger models on specialized hardware, ie via NNAPI.
The text was updated successfully, but these errors were encountered: