Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loading compiled fails: model_type=bert -> transformers being used in compiled config. #102

Open
michaelfeil opened this issue Dec 2, 2024 · 1 comment

Comments

@michaelfeil
Copy link

michaelfeil commented Dec 2, 2024

I am running the following code inside the following container (build by huggingface-optimum team)

763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-inference-neuronx:2.1.2-transformers4.43.2-neuronx-py310-sdk2.20.0-ubuntu20.04
import torch
from optimum.neuron import NeuronModelForFeatureExtraction  # type: ignore
from transformers import AutoConfig, AutoTokenizer  # type: ignore[import-untyped]

compiler_args = {"num_cores": get_nc_count(), "auto_cast_type": "fp16"}
input_shapes = {
            "batch_size": 4,
            "sequence_length": (
                self.config.max_position_embeddings
                if hasattr(self.config, "max_position_embeddings")
                else 512
            ),
        }
self.model = NeuronModelForFeatureExtraction.from_pretrained(
            model_id="TaylorAI/bge-micro-v2", # BERT SMALL
            revision=None,
            trust_remote_code=True,
            export=True,
            **compiler_args,
            **input_shapes,
        )

Leads to the following error:

INFO     2024-12-02 08:21:07,125 sentence_transformers.SentenceTransformer INFO: Load pretrained SentenceTransformer: TaylorAI/bge-micro-v2                                                                                                                     SentenceTransformer.py:218
***** Compiling bge-micro-v2 *****
.
Compiler status PASS
[Compilation Time] 24.19 seconds.
[Total compilation Time] 24.19 seconds.
2024-12-02 08:21:34.000152:  620  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
2024-12-02 08:21:34.000154:  620  INFO ||NEURON_CACHE||: Compile cache path: /var/tmp/neuron-compile-cache
Model cached in: /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4aeca57e8a4997651e84.
ERROR:    Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 693, in lifespan
    async with self.lifespan_context(app) as maybe_state:
  File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__
    return await anext(self.gen)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/infinity_server.py", line 96, in lifespan
    app.engine_array = AsyncEngineArray.from_args(engine_args_list)  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 291, in from_args
    return cls(engines=tuple(engines))
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 70, in from_args
    engine = cls(**engine_args.to_dict(), _show_deprecation_warning=False)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/engine.py", line 55, in __init__
    self._model_replicas, self._min_inference_t, self._max_inference_t = select_model(
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/inference/select_model.py", line 81, in select_model
    loaded_engine = unloaded_engine.value(engine_args=engine_args_copy)
  File "/usr/local/lib/python3.10/site-packages/infinity_emb/transformer/embedder/neuron.py", line 109, in __init__
    self.model = NeuronModelForFeatureExtraction.from_pretrained(
  File "/usr/local/lib/python3.10/site-packages/optimum/modeling_base.py", line 402, in from_pretrained
    return from_pretrained_method(
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 242, in _from_transformers
    return cls._export(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 370, in _export
    return cls._from_pretrained(save_dir_path, config, model_save_dir=save_dir)
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 201, in _from_pretrained
    neuron_config = cls._neuron_config_init(config) if neuron_config is None else neuron_config
  File "/usr/local/lib/python3.10/site-packages/optimum/neuron/modeling_traced.py", line 468, in _neuron_config_init
    neuron_config_constructor = TasksManager.get_exporter_config_constructor(
  File "/usr/local/lib/python3.10/site-packages/optimum/exporters/tasks.py", line 2033, in get_exporter_config_constructor
    model_tasks = TasksManager.get_supported_tasks_for_model_type(
  File "/usr/local/lib/python3.10/site-packages/optimum/exporters/tasks.py", line 1245, in get_supported_tasks_for_model_type
    raise KeyError(
KeyError: "transformer is not supported yet for transformers. Only ['audio-spectrogram-transformer', 'albert', 'bart', 'beit', 'bert', 'blenderbot', 'blenderbot-small', 'bloom', 'camembert', 'clip', 'codegen', 'convbert', 'convnext', 'convnextv2', 'cvt', 'data2vec-text', 'data2vec-vision', 'data2vec-audio', 'deberta', 'deberta-v2', 'deit', 'detr', 'distilbert', 'donut', 'donut-swin', 'dpt', 'electra', 'encoder-decoder', 'esm', 'falcon', 'flaubert', 'gemma', 'glpn', 'gpt2', 'gpt-bigcode', 'gptj', 'gpt-neo', 'gpt-neox', 'groupvit', 'hubert', 'ibert', 'imagegpt', 'layoutlm', 'layoutlmv3', 'lilt', 'levit', 'longt5', 'marian', 'markuplm', 'mbart', 'mistral', 'mobilebert', 'mobilevit', 'mobilenet-v1', 'mobilenet-v2', 'mpnet', 'mpt', 'mt5', 'musicgen', 'm2m-100', 'nystromformer', 'owlv2', 'owlvit', 'opt', 'qwen2', 'llama', 'pegasus', 'perceiver', 'phi', 'phi3', 'pix2struct', 'poolformer', 'regnet', 'resnet', 'roberta', 'roformer', 'sam', 'segformer', 'sew', 'sew-d', 'speech-to-text', 'speecht5', 'splinter', 'squeezebert', 'swin', 'swin2sr', 't5', 'table-transformer', 'trocr', 'unispeech', 'unispeech-sat', 'vision-encoder-decoder', 'vit', 'vits', 'wavlm', 'wav2vec2', 'wav2vec2-conformer', 'whisper', 'xlm', 'xlm-roberta', 'yolos', 't5-encoder', 't5-decoder', 'mixtral'] are supported for the library transformers. If you want to support transformer please propose a PR or open up an issue."

Analysis:

  • Compiling worked.
  • Model got saved /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4aeca57e8a4997651e84/config.json
  • inside /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_4aeca57e8a4997651e84/config.json the model_type="transformer", but should be "bert"

Reproduction:
docker run -it --device /dev/neuron0 michaelf34/aws-neuron-base-img:inf-repro

root@c2fd099ea82b:/app# nano /var/tmp/neuron-compile-cache/neuronxcc-2.14.227.0+2d4f85be/MODULE_79d2cd5b82fe880e7bef/
config.json              model.neuron             special_tokens_map.json  tokenizer.json           tokenizer_config.json    vocab.txt  
# config.json
{
  "_name_or_path": "michaelfeil/bge-small-en-v1.5",
  "architectures": [
    "BertModel"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "export_model_type": "transformer",
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 384,
  "id2label": {
    "0": "LABEL_0"
  },
  "initializer_range": 0.02,
  "intermediate_size": 1536,
  "label2id": {
    "LABEL_0": 0
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "neuron": {
    "auto_cast": null,
    "auto_cast_type": null,
    "compiler_type": "neuronx-cc",
    "compiler_version": "2.14.227.0+2d4f85be",
    "disable_fast_relayout": false,
    "dynamic_batch_size": false,
    "inline_weights_to_neff": true,
    "input_names": [
      "input_ids",
      "attention_mask"
    ],
    "model_type": "transformer",
    "optlevel": "2",
    "output_attentions": false,
    "output_hidden_states": false,
    "output_names": [
      "token_embeddings",
      "sentence_embedding"
    ],
    "static_batch_size": 4,
    "static_sequence_length": 512
  },
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "task": "feature-extraction",
  "torch_dtype": "float32",
  "torchscript": true,
  "transformers_version": "4.41.1",

Also fails with same command with:

accelerate-0.23.0 optimum-1.18.1 optimum-neuron-0.0.22 tokenizers-0.15.2 transformers-4.36.2

Also fails with

optimum-1.23.* + optimum-neuron-0.0.26

Does not fail with same command with

optimum-1.17.1 + optimum-neuron-0.0.20
@michaelfeil michaelfeil changed the title export=True leads to transformers beeing used as config. export=True leads to model_type=bert -> transformers being used in compiled config. Dec 2, 2024
@michaelfeil michaelfeil changed the title export=True leads to model_type=bert -> transformers being used in compiled config. Loading compiled fails: model_type=bert -> transformers being used in compiled config. Dec 2, 2024
@michaelfeil
Copy link
Author

michaelfeil commented Dec 2, 2024

Maybe better location for issue: huggingface/optimum-neuron#744

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant