You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
I'm trying to run phi3 model on edge device via ExecuTorch, where I can only use StaticCache. However, the current phi3 model fails to work with StaticCache.
To reproduce this issue, please run the following script:
/opt/anaconda3/envs/executorch/bin/python /Users/lunwenh/executorch/examples/models/phi-3-mini/test.py
Loading checkpoint shards: 100%|██████████| 2/2 [00:01<00:00, 1.03it/s]
Generating tokens:You are not running the flash-attention implementation, expect numerical differences.
Traceback (most recent call last):
File "/Users/lunwenh/executorch/examples/models/phi-3-mini/test.py", line 92, in <module>
main(
File "/Users/lunwenh/executorch/examples/models/phi-3-mini/test.py", line 77, in main
generated_tokens = _generate_token_with_kv_cache(seq_len, model, tokens)
File "/Users/lunwenh/executorch/examples/models/phi-3-mini/test.py", line 38, in _generate_token_with_kv_cache
result = model.forward(
File "/Users/lunwenh/executorch/examples/models/phi-3-mini/test.py", line 24, in forward
return self.model.forward(
File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py", line 1207, in forward
outputs = self.model(
File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py", line 1002, in forward
layer_outputs = decoder_layer(
File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py", line 739, in forward
attn_outputs, self_attn_weights, present_key_value = self.self_attn(
File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/opt/anaconda3/envs/executorch/lib/python3.10/site-packages/transformers/models/phi3/modeling_phi3.py", line 405, in forward
raise ValueError(
ValueError: Attention weights should be of size (1, 32, 1, 1), but is torch.Size([1, 32, 1, 132])
Process finished with exit code 1
This happens because the current StaticCache implementation does not slice the k_out, v_out upon update and it returns the whole cache up to max_cache_len.
In the long term, #31421 and #30862 should resolve this problem by supporting StaticCache and dynamic length.
For now, removing this size check should make phi3 work with StaticCache.
Expected behavior
After removing the size check, the above mentioned script works well.
The text was updated successfully, but these errors were encountered:
System Info
Name: torch
Version: 2.5.0.dev20240716
Name: transformers
Version: 4.44.0.dev0
Who can help?
@ArthurZucker
@zucchini-nlp
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I'm trying to run phi3 model on edge device via ExecuTorch, where I can only use StaticCache. However, the current phi3 model fails to work with StaticCache.
To reproduce this issue, please run the following script:
It fails with the following error:
This happens because the current StaticCache implementation does not slice the
k_out, v_out
uponupdate
and it returns the whole cache up tomax_cache_len
.In the long term, #31421 and #30862 should resolve this problem by supporting StaticCache and dynamic length.
For now, removing this size check should make phi3 work with StaticCache.
Expected behavior
After removing the size check, the above mentioned script works well.
The text was updated successfully, but these errors were encountered: