HuggingFaceM4/idefics2: TGI would crash when I set do_image_splitting to False #2029

newsbreakDuadua9 · 2024-06-06T08:39:55Z

System Info

When I try to start a service with HuggingFaceM4/idefics2-8b-base
I met the following errors:
Traceback (most recent call last):
File "/opt/conda/bin/text-generation-server", line 8, in
sys.exit(app())
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 311, in call
return get_command(self)(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1157, in call
return self.main(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 778, in main
return _main(
File "/opt/conda/lib/python3.10/site-packages/typer/core.py", line 216, in _main
rv = self.invoke(ctx)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/opt/conda/lib/python3.10/site-packages/click/core.py", line 783, in invoke
return __callback(*args, **kwargs)
File "/opt/conda/lib/python3.10/site-packages/typer/main.py", line 683, in wrapper
return callback(**use_params) # type: ignore
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 90, in serve
server.serve(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 253, in serve
asyncio.run(
File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 636, in run_until_complete
self.run_forever()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 603, in run_forever
self._run_once()
File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 1909, in _run_once
handle._run()
File "/opt/conda/lib/python3.10/asyncio/events.py", line 80, in _run
self._context.run(self._callback, *self._args)
File "/opt/conda/lib/python3.10/site-packages/grpc_interceptor/server.py", line 165, in invoke_intercept_method
return await self.intercept(

File "/opt/conda/lib/python3.10/site-packages/text_generation_server/interceptor.py", line 21, in intercept
return await response
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 82, in _unary_interceptor
raise error
File "/opt/conda/lib/python3.10/site-packages/opentelemetry/instrumentation/grpc/_aio_server.py", line 73, in _unary_interceptor
return await behavior(request_or_iterator, context)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 114, in Warmup
max_supported_total_tokens = self.model.warmup(batch)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 776, in warmup
_, batch, _ = self.generate_token(batch)
File "/opt/conda/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 966, in generate_token
raise e
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/flash_causal_lm.py", line 963, in generate_token
out, speculative_logits = self.forward(batch)
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/vlm_causal_lm.py", line 326, in forward
logits, speculative_logits = self.model.forward(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/idefics2.py", line 810, in forward
inputs_embeds = self._merge_input_ids_with_image_features(
File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/custom_modeling/idefics2.py", line 725, in _merge_input_ids_with_image_features
inputs_embeds[mask] = image_features.view(-1, image_features.shape[-1])
RuntimeError: shape mismatch: value tensor of shape [64, 4096] cannot be broadcast to indexing result of shape [320, 4096]

Considering 320 / 64 =5, I doubt it caused by HuggingFaceM4/idefics2-8b-base set do_image_splitting as False.
While HuggingFaceM4/idefics2-8b has do_image_splitting = True, I think that's the reason why tgi can work with HuggingFaceM4/idefics2-8b.
I also have a fintuned model based on HuggingFaceM4/idefics2-8b, and set the processor do_image_splitting to true when saved the processor. Same error.

However, do_image_splitting will consume huge amount of vram which is not necessary in most of times.
Please help!

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

CUDA_VISIBLE_DEVICES=0 docker run
--gpus "device=0"
--shm-size 10g -p 8000:80 ghcr.io/huggingface/text-generation-inference:2.0.3
--model-id HuggingFaceM4/idefics2-8b-base
--dtype bfloat16
--max-total-tokens 32768
--max-input-tokens 32767
--sharded false

Expected behavior

The TGI should work with do_image_splitting = False

Before this change, the number of reserved image tokens was not the same as the number of images. Fixes #2029.

Before this change, the number of reserved image tokens was not the same as the number of images. Fixes #2029. While at it, also remove all the image token handling duplication in `prepare_input`.

Before this change, the number of reserved image tokens was not the same as the number of images. Fixes huggingface#2029. While at it, also remove all the image token handling duplication in `prepare_input`.

danieldk added a commit that referenced this issue Jun 17, 2024

Idefics2: sync added image tokens with transformers

ff64349

Before this change, the number of reserved image tokens was not the same as the number of images. Fixes #2029.

danieldk mentioned this issue Jun 17, 2024

Idefics2: sync added image tokens with transformers #2080

Merged

5 tasks

danieldk added a commit that referenced this issue Jun 18, 2024

Idefics2: sync added image tokens with transformers

e25c622

Before this change, the number of reserved image tokens was not the same as the number of images. Fixes #2029.

danieldk added a commit that referenced this issue Jun 18, 2024

Idefics2: sync added image tokens with transformers

54691e1

Before this change, the number of reserved image tokens was not the same as the number of images. Fixes #2029.

danieldk added a commit that referenced this issue Jun 18, 2024

Idefics2: sync added image tokens with transformers

411d0c3

Before this change, the number of reserved image tokens was not the same as the number of images. Fixes #2029.

danieldk added a commit that referenced this issue Jun 18, 2024

Idefics2: sync added image tokens with transformers

9a0d4bc

Before this change, the number of reserved image tokens was not the same as the number of images. Fixes #2029.

danieldk added a commit that referenced this issue Jun 18, 2024

Idefics2: sync added image tokens with transformers

829d5b5

Before this change, the number of reserved image tokens was not the same as the number of images. Fixes #2029.

Narsil closed this as completed in #2080 Jun 27, 2024

Narsil closed this as completed in dd2d91b Jun 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HuggingFaceM4/idefics2: TGI would crash when I set do_image_splitting to False #2029

HuggingFaceM4/idefics2: TGI would crash when I set do_image_splitting to False #2029

newsbreakDuadua9 commented Jun 6, 2024 •

edited

Loading

HuggingFaceM4/idefics2: TGI would crash when I set do_image_splitting to False #2029

HuggingFaceM4/idefics2: TGI would crash when I set do_image_splitting to False #2029

Comments

newsbreakDuadua9 commented Jun 6, 2024 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

newsbreakDuadua9 commented Jun 6, 2024 •

edited

Loading