You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Traceback (most recent call last):
File "modular.py", line 107, in <module>
chatbot = MiniGPT4LLama2Chatbot(cfg_path, gpu_id)
File "modular.py", line 62, in __init__
self.model = model_cls.from_config(model_config)
File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 173, in from_config
model = cls(
File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt4.py", line 45, in __init__
super().__init__(
File "/home/ubuntu/MiniGPT-4/minigpt4/models/minigpt_base.py", line 43, in __init__
self.llama_model, self.llama_tokenizer = self.init_llm(
File "/home/ubuntu/MiniGPT-4/minigpt4/models/base_model.py", line 202, in init_llm
llama_model = HuggingFaceGenerationModelAdapter(llama_model_cpu.config, llama_model_neuron)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers_neuronx/generation_utils.py", line 18, in __init__
super().__init__(config)
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1190, in __init__
config = self._autoset_attn_implementation(
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1311, in _autoset_attn_implementation
config = cls._check_and_enable_sdpa(
File "/opt/aws_neuron_venv_pytorch/lib/python3.8/site-packages/transformers/modeling_utils.py", line 1464, in _check_and_enable_sdpa
raise ValueError(
ValueError: HuggingFaceGenerationModelAdapter does not support an attention implementation through torch.nn.functional.scaled_dot_product_attention yet. Please open an issue on GitHub to request support for this architecture: https://github.com/huggingface/transformers/issues/new
Is there a work around for this? Or is supporting this attention implemention the only way? I simply want to use the generate api with a neuron-compiled model.
The text was updated successfully, but these errors were encountered:
This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.
Please note that issues that do not follow the contributing guidelines are likely to be ignored.
I was trying to use the generate API for Llama 2 using the same code from this example:
https://awsdocs-neuron.readthedocs-hosted.com/en/latest/libraries/transformers-neuronx/transformers-neuronx-developer-guide.html#features
My code:
Error:
Is there a work around for this? Or is supporting this attention implemention the only way? I simply want to use the generate api with a neuron-compiled model.
The text was updated successfully, but these errors were encountered: