[WiP] Whisper Implementation #147

dbogunowicz · 2024-03-26T10:37:51Z

No description provided.

robertgshaw2-neuralmagic

Left a few comments on the model definitions

Will follow up on the core implementation once we synchronize the branches and sync up with the current multimodal implementation upstream

robertgshaw2-neuralmagic · 2024-03-26T13:11:46Z

vllm/model_executor/models/whisper.py

+
+        self.scaling = self.head_dim**-0.5
+
+        self.k_proj = ColumnParallelLinear(self.d_model,


Can we use QKVColumnLinear for this?

Or no because of the bias?

Maybe we should extend QKVParallelLinear for this case?

Could also do this in a follow up PR

@robertgshaw2-neuralmagic we could use the QKVParallelLinear in two scenarios:

encoder attention

decoder self-attention

we cannot use it for decoder cross attention, as queries are coming from a different source then keys and values.

I decided not to introduce the new module, since our current implementation (that is very close to T5) is still failing and we are not sure yet why. I'll start refactoring once we fix the current issues, as adding more moving parts may obfuscate the path to the solution.

@robertgshaw2-neuralmagic but nothing prevents us from adding QKVParallelLinear support for the T5 model (once again, only for encoder attention and decoder self-attention)!

robertgshaw2-neuralmagic · 2024-03-26T13:13:46Z

vllm/model_executor/models/whisper.py

+                               stride=2,
+                               padding=1)
+
+        self.embed_positions = nn.Embedding(self.max_source_positions,


could this be VocabParallelEmbedding?

yes, this part is definitely parallelizable, my omission, I did introduce the VocabParallelEmbedding here before 🥴

robertgshaw2-neuralmagic · 2024-03-26T13:15:39Z

vllm/model_executor/models/whisper.py

+        super().__init__()
+        self.d_model = config.d_model
+
+        self.embed_tokens = nn.Embedding(config.vocab_size, self.d_model)


Can these be VocabParallelEmbedding?

Perhaps, let me check

robertgshaw2-neuralmagic · 2024-03-26T13:24:40Z

vllm/model_executor/models/whisper.py

+        # TODO: For now we are not implementing the sampling method
+        return hidden_states
+
+    def load_weights(


Can you match the style of llama in this function? They have been very consistent with this logic across functions

https://github.com/vllm-project/vllm/blob/main/vllm/model_executor/models/llama.py#L363

Looking good, seems like we could adhere to their convention

libratiger · 2024-04-01T10:07:35Z

vllm/model_executor/models/whisper.py

+            if kv_caches[0][0] is None:
+                hidden_states = None
+            else:
+                hidden_states = self.decoder(input_ids=decoder_input_ids,


the decoder_input_ids is something wrong? when we go into the else branch, there is no decoder_input_ids variable.

libratiger · 2024-04-01T10:15:58Z

vllm/model_executor/models/whisper.py

+        return hidden_states
+
+
+class WhisperDecoderBlock(nn.Module):


why not just keep the same name with the WhisperDecoderLayer as in transformers library?

dbogunowicz · 2024-04-02T16:22:45Z

vllm/sequence.py

        self.prefix = prefix
+        self.multi_modal_data = multi_modal_data


Suggested change

self.multi_modal_data = multi_modal_data

andy-neuma · 2024-08-12T13:46:15Z

stale

dbogunowicz added 2 commits March 26, 2024 10:37

initial commit

ffe24e8

trying something new

3382139

robertgshaw2-neuralmagic suggested changes Mar 26, 2024

View reviewed changes

sucessfull pass, time to check correctness

73891d2

dbogunowicz mentioned this pull request Mar 28, 2024

Whisper support vllm-project/vllm#180

Open

libratiger reviewed Apr 1, 2024

View reviewed changes

some progress

642c8c1

dbogunowicz changed the title ~~Whisper Implementation~~ [WiP] Whisper Implementation Apr 2, 2024

cleanup

ff1b0a9

dbogunowicz commented Apr 2, 2024

View reviewed changes

This was referenced Apr 2, 2024

[WIP] Upstream encoder/decoder support based on multiple blocktables #161

Closed

[WIP] Upstream encoder/decoder support based on multiple blocktables afeldman-nm/vllm#3

Open

andy-neuma closed this Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WiP] Whisper Implementation #147

[WiP] Whisper Implementation #147

dbogunowicz commented Mar 26, 2024

robertgshaw2-neuralmagic left a comment

robertgshaw2-neuralmagic Mar 26, 2024

robertgshaw2-neuralmagic Mar 26, 2024

robertgshaw2-neuralmagic Mar 26, 2024

dbogunowicz Mar 26, 2024

dbogunowicz Mar 26, 2024

robertgshaw2-neuralmagic Mar 26, 2024

dbogunowicz Mar 26, 2024

robertgshaw2-neuralmagic Mar 26, 2024

dbogunowicz Mar 26, 2024

robertgshaw2-neuralmagic Mar 26, 2024

dbogunowicz Mar 26, 2024

libratiger Apr 1, 2024

libratiger Apr 1, 2024

dbogunowicz Apr 2, 2024

andy-neuma commented Aug 12, 2024


		self.scaling = self.head_dim**-0.5

		self.k_proj = ColumnParallelLinear(self.d_model,

		self.prefix = prefix
		self.multi_modal_data = multi_modal_data

[WiP] Whisper Implementation #147

[WiP] Whisper Implementation #147

Conversation

dbogunowicz commented Mar 26, 2024

robertgshaw2-neuralmagic left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

andy-neuma commented Aug 12, 2024