Attn implementation for composite models #32238

zucchini-nlp · 2024-07-26T06:53:14Z

What does this PR do?

Related to #32221 and fixes #30565. As described in the issue, currently MultiModal models set attn implementation only in the LM backbone, and not in vision backbone. While working on it, I found another bug. Specifically, using a property to set SDPA flag to VLM class doesn't work because we don't know what is self.LM before init the VLM. I tried class property, but that would throw error that self has no attribute LM.

This PR makes a few modifications to how attn impl is set in modeling to fix the above issues.

More precisely:

If the model is composite, i.e. has 2 or more PreTrainedConfigs as part of the main config (text/vision config), then we set the requested attention implementation to each of the sub-configs as a dict. For ex, in LLaVa case:

model.from_pretrained(id, attn_implementation="sdpa")

# internally we do the following, where the second line dispatches SDPA on LM backbone
# if we can't dispatch, an error is raised as it should be when loading an LLM with unsupported attention (i.e. llama doesn't support SDPA)
self.config._attn_implementation = {"text_config": "sdpa", "vision_config": "sdpa"}
AutoModelForCausalLM._from_config(config.text_config, attn_implementation=config._attn_implementation["text_config"])

If user just loads the composite model without any explicit attention, then we dispatch SDPA in any of the sub-models whenever possible. In case any/none of the vision/text models support SDPA, we silently fallback to eager. No warnings, no errors, which is in line with non-composite models

# suppose LM backbone doesn't support SDPA here, then we'll use eager silently
# but dispatch SDPA on vision backbone
model.from_pretrained(id, attn_implementation=None)

The users are also free now to set different attentions in different backbones. We can set eager on vision model and sdpa on LM. That can be done if one passes a dict attn_implementation. And the dict keys have to be identical to the ModelConfig attribute names for sub-configs. In other words, if the config has config.text_config, then the dict key must be "text_config" to correctly dispatch

model.from_pretrained(id,  {"text_config": "sdpa", "vision_config": "eager"})

For non-composite models nothing changes and everything works as before

NOTE:
I had a few rare cases where the model was not really composite but had several sub-configs. For example DBRX, and I decided to get rid of the sub-configs there. I am not sure what is the right way to deprecate that, so any comments on that welcome
Also, some models have nested configs, where a vision config is part of text config (Qwen2-VL). Only three models are like that, mostly because I didn;t review properly when the model was added and didn't insist on changing stuff. For those models, we don't consider them as composite and just leave it working as it was

…, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma

…, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo

HuggingFaceDocBuilderDev · 2024-07-26T07:19:24Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

qubvel

Thanks for working on this! Not sure I fully understand the logic at this moment, so I left a few comments below

src/transformers/modeling_utils.py

ydshieh · 2024-07-26T14:59:21Z

Regarding

MODEL_MAPPING.get(type(config.text_config), None)
            if text_model_cls is not None:

it could probably reuse

def _get_model_class(config, model_mapping):

But both will be somehow slow as you mentioned, and sometimes give unexpected errors (because loading a modeling module file). But we might have someway to avoid that.

zucchini-nlp · 2024-07-29T13:59:32Z

@ydshieh @qubvel I made some changes after our last discussion that changing class attr is not good. So now we don't change the attr per se, but we change the config's attn_impl manually while setting sdpa. I checked that it works this way also. The last question is whether to set it to SDPA when both sub-configs support it or at least one supports it.

I will add more tests if we agree on the design

And also, regarding

But both will be somehow slow as you mentioned, and sometimes give unexpected errors (because loading a modeling module file). But we might have someway to avoid that.

unfortunately we can't do that, cause it raises error if key is not in the dict and we have not auto-mappable models. So we rather try to get and if not simple continue.

ydshieh · 2024-07-29T14:39:52Z

unfortunately we can't do that, cause it raises error if key is not in the dict and we have not auto-mappable models. So we rather try to get and if not simple continue.

but why not try except around the usage of _get_model_class? I don't feel very strong here, but just it is something well designed in order to get the model class which is good if we can reuse it.

zucchini-nlp · 2024-07-30T04:49:28Z

Hmm, not sure how good if to use a try except, can it be another source of hidden bugs even if we use except AttributeError? I'm okay with any way, as long as it's consistent with the library and doesn't cause bugs

zucchini-nlp · 2024-07-31T10:57:24Z

As we discussed internally, I made the loading logic model-agnostic. So now we check if the config contains any PretrainedConfig inside and set attn implementation to each PretrainedConfig. This solved vision-encoder-decoder problem we had in slack and makes us more versatile for possible future multimodal models.

I added some tests in vision enc-dec models where only one sub-config supports SDPA and when both support. Also added tests in Idefics (which were skipped earlier) to verify that LM has its SDPA dispatched even though other modules do not.

TODO in the next few hours or tomorrow:

Add similar tests to encoder-decoder models, not only vision enc-dec
Accept from users a config with separate attn_implementation already set, such that attn_implementation={"vision_config": "sdpa", "text_config": "eager"}

ArthurZucker · 2024-10-04T20:18:05Z

Related to #33953 as well, will have a look, but having to change modeling code means we are doing something wrong as it should be a layer of abstraction!

zucchini-nlp · 2024-10-05T07:47:18Z

Cool thanks! The modeling is mostly changed to get the attn implementation from dict and for IDEFICS (1 and 2), which fall out of standards and enforce hard requirements on sdpa. Hope we can get this merged soon, as it is getting more of a problem when new quirky models are added

ArthurZucker

Thanks, it is indeed important work!

ArthurZucker · 2024-10-05T09:38:13Z

examples/modular-transformers/modeling_my_new_model2.py

have seen these modified elsewhere, let's revert as unrelated

src/transformers/modeling_utils.py

ArthurZucker · 2024-10-05T09:42:43Z

src/transformers/models/blip_2/modeling_blip_2.py

+        self.vision_model = Blip2VisionModel._from_config(
+            config.vision_config, attn_implementation=config._attn_implementation["vision_config"]
+        )


my main question is why is this info not stored in the config.vision_config directly?
Initializing the composite config should propagate the attn_implementation attribute to the sub configs, instead of extracting it in modeling

That means updating from_config to support extracting the attn_implementation from the config 😉

oh right! The was the initial plan was similar, but it was supposed to recursively call autoset_attn_impl from within autoset_attn_impl for each sub-module-class. But that was too much extra code and checks

But I think I got you mean here, like instead of doing a dict for generatl config we can simply propagare user-requested attn for each sub-config. Then we hope it will get picked up in later AutoModel.from_config calls. Good idea and a lot cleaner, let me see how it goes. Hopefully that way we don't need specific treatment for nested configs

🤗 good luck!

ArthurZucker · 2024-10-05T09:43:26Z

src/transformers/modeling_utils.py

+        # Composite models consisting of several PretrainedModels have to specify attention impl as a dict
+        # where keys are sub-config names. But most people will specify one `str` which means that should dispatch it
+        # for all sub-models.


It makes sense, sometimes you want output hidden states of one module but not the other one, so I guess it's minimul flexibility.

but IMO we should propagate here

zucchini-nlp · 2024-10-11T09:45:47Z

@ArthurZucker I made changes as you suggested but I want to warn that it still needed a few extra logic. Making this PR revealed soo many issues. So we have two options now to indicate attn for composite models:

Attn implementation as a dict:
The prev solution when we simply copy paste the user-requested attn in all sub-configs, and form a dict. So that the general config.attn = Dict(text_config: None, vision_config: None)

no need for flag whether attn implementation is set or not, because when we have a dict we always use the very first requested attn in all sub-configs. We just need to save it as a dict once and then go back to that dict to "get()"
we don't need to unset attn implementation when saving so one can set a different attn when loading any model back.

BUT we change a lot of model code to pass in explictly (attn_implementation=config._attn_implementation["text_config"]) and we need to account for cases like Chameleon or DBRX where the sub-configs don't even need attn

Attn implementation directly propagated in corresponding configs:
Current solution when we simply propagate the attn to all configs we can find, and don't touch the general config

no need to change any modeling code, except for general clean up

BUT we need a flag attn_implementation_autoset so we don't dispatch attn on the same config twice. Also we would need to unset the flag when saving. See below for reason

Imagine we have a composite or nested config. Currently we set attn in three places: from_pretrained, from_config and init. When we don't require any attn (None) and load with model.from_pretrained(id), internally we will call autoset_attn and set SDPA to the general config and the sub-configs will get None propagated. But then the same method autoset_attn is called second time when we init the class, and this time we'll try to propagate SDPA to all sub-configs (because general config has SDPA already set). If any sub-module has no SDPA support, an error will be raised as it was requested directly. From user perspective they did nothing wrong, but they see an error.
So the decision is to either stop autosetting attn for each config once it was set, which is happening now. Or in prev version I had attn implementations saved as dict, which basically did the same. When we got a dict attn, we had nothing to set and simply returned

ArthurZucker

Modeling changes are just perfect 😉 before we have a big refactor this is super super welcome

ArthurZucker · 2024-10-15T15:18:35Z

src/transformers/modeling_utils.py

+        for key in config:
+            if isinstance(getattr(config, key), PretrainedConfig):
+                sub_config = getattr(config, key)
+                curr_attn_implementation = (
+                    requested_attn_implementation
+                    if not isinstance(requested_attn_implementation, dict)
+                    else requested_attn_implementation.get(key, None)
+                )
+                sub_config._attn_implementation_internal = curr_attn_implementation


If this is donc to set the attn_implementation of the sub configs, I am wondering if we can't have a way to use is_composition to simply say:

if we are a composition, then we call on our composiite member the _autoset_attn_implementation function

Today we have a problem with is_compoisition: we have this

@classmethod def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig": cls._set_token_in_kwargs(kwargs) config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs) if config_dict.get("model_type") == "mllama": config_dict = config_dict["vision_config"] if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type: logger.warning( f"You are using a model of type {config_dict['model_type']} to instantiate a model of type " f"{cls.model_type}. This is not supported for all configurations of models and can yield errors." ) return cls.from_dict(config_dict, **kwargs)

which IMO we should never have to do and we copy past this everywhere.

This means there is something wrong with how we handle composition!

ArthurZucker · 2024-10-15T15:21:29Z

If we find a clean way to auto_set attn implementation would be better in the futur!

zucchini-nlp · 2024-10-21T15:11:33Z

merging tomorrow, the tests pass locally in most cases except fro the cases not touched by this PR. For ex some FA2/sdpa equivalence tests are failing for me in main branch

* first try * codestyle * idefics2 is happy * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma * fix-copies * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo * blip-2 needs to init vision from config * when was this removed O_o * minor fix * tests * this way? * tests * model-agnostic code * codestyle * add tests for idefics * modify general test for VLMs * no generation test for vlm yet! * no generation test here also * wanr in VIT-SDPA if output attn * add more tests * user can pass dict as attn impl * repo consistency * update * muicgen * no prints * forgot speech enc-dec and clip * how many composite models we have? * musicgen meelody is same as mudicgen * +siglip * fix tests + add some more * remove idefics custom overriden code * make idefics2 automappable * nits * skip tests * doctests * Update src/transformers/models/idefics2/configuration_idefics2.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/clip/test_modeling_clip.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/configuration_utils.py Co-authored-by: amyeroberts <[email protected]> * major update, no need for automap * clean up * add FA2 test * more tests * style * skip tests * why did these started failing now? * no attributes for FA2 needed * one tiny test * address comment about FA2 false warning * style * add new models and resolve conflicts * fix copies * let it be this way for now, come back tomorrow to review * some more fixes * update * more updates * update * fix copies * style and tests * another big update * fix tests * fix tests * update * another update * fix tests * fix copies * fix tests --------- Co-authored-by: amyeroberts <[email protected]>

zucchini-nlp added 4 commits July 25, 2024 16:10

first try

028e502

codestyle

589d18a

idefics2 is happy

b33982f

[run-slow] llava, llava_next, video_llava, vipllava, llava_next_video…

b0baa75

…, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma

zucchini-nlp changed the title ~~Vlm sdpa flag~~ VLMs: dispatch sdpa to each sub model Jul 26, 2024

zucchini-nlp added the run-slow label Jul 26, 2024

zucchini-nlp added 2 commits July 26, 2024 08:59

fix-copies

9f19211

[run-slow] llava, llava_next, video_llava, vipllava, llava_next_video…

19e0f3f

…, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo

zucchini-nlp requested a review from qubvel July 26, 2024 07:01

blip-2 needs to init vision from config

56a6f81

qubvel reviewed Jul 26, 2024

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

zucchini-nlp added 3 commits July 26, 2024 11:23

when was this removed O_o

bbff1ac

minor fix

8485df9

tests

ba7ee7f

qubvel reviewed Jul 26, 2024

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

ydshieh reviewed Jul 26, 2024

View reviewed changes

src/transformers/modeling_utils.py Outdated Show resolved Hide resolved

zucchini-nlp mentioned this pull request Jul 29, 2024

LLaVa: add cache class attribute #32278

Merged

zucchini-nlp added 2 commits July 29, 2024 15:46

this way?

d793d04

tests

58aff27

zucchini-nlp added 5 commits July 31, 2024 09:37

model-agnostic code

4e4bd27

codestyle

52e77b3

add tests for idefics

0feec1e

modify general test for VLMs

7b61096

no generation test for vlm yet!

9d15024

zucchini-nlp mentioned this pull request Oct 5, 2024

Specifying torch dtype in Qwen2VLForConditionalGeneration #33953

Merged

ArthurZucker reviewed Oct 5, 2024

View reviewed changes

ArthurZucker mentioned this pull request Oct 5, 2024

ValueError: Flash Attention 2.0 only supports torch.float16 and torch.bfloat16 dtypes. You passed torch.float32, this might lead to unexpected behaviour. #28052

Closed

4 tasks

zucchini-nlp mentioned this pull request Oct 7, 2024

from_pretrained's torch_dtype "auto" mode doesn't handle nested models #33997

Closed

qubvel mentioned this pull request Oct 7, 2024

Make VisionEncoderDecoderModel.from_pretrained take different arguments for encoder and decoder #33992

Closed

zucchini-nlp added 8 commits October 8, 2024 18:02

another big update

74b211f

fix tests

6dcf21a

Merge remote-tracking branch 'upstream/main' into vlm_sdpa_flag

5a6ffe7

fix tests

b93b79a

update

702dacf

another update

0616732

Merge branch 'main' into vlm_sdpa_flag

28859ce

fix tests

ebf15ef

zucchini-nlp requested a review from ArthurZucker October 15, 2024 07:34

ArthurZucker approved these changes Oct 15, 2024

View reviewed changes

zucchini-nlp mentioned this pull request Oct 21, 2024

get warning: use Flash Attention 2.0 without specifying a torch dtype #34269

Closed

4 tasks

zucchini-nlp added 3 commits October 21, 2024 13:05

merge main

0941fee

fix copies

24a9fc5

fix tests

4eed237

zucchini-nlp merged commit 21d5025 into huggingface:main Oct 22, 2024
23 of 25 checks passed

zucchini-nlp mentioned this pull request Oct 30, 2024

Support Kosmos-2.5 #31711

Open

qubvel mentioned this pull request Oct 31, 2024

Enabled Flash Attention for PaliGemma models #34009

Open

5 tasks

jambayk mentioned this pull request Nov 19, 2024

StableDiffusionSafetyChecker ignores attn_implementation load kwarg huggingface/diffusers#8957

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attn implementation for composite models #32238

Attn implementation for composite models #32238

zucchini-nlp commented Jul 26, 2024 •

edited

Loading

HuggingFaceDocBuilderDev commented Jul 26, 2024

qubvel left a comment

ydshieh commented Jul 26, 2024

zucchini-nlp commented Jul 29, 2024 •

edited

Loading

ydshieh commented Jul 29, 2024

zucchini-nlp commented Jul 30, 2024

zucchini-nlp commented Jul 31, 2024

ArthurZucker commented Oct 4, 2024

zucchini-nlp commented Oct 5, 2024

ArthurZucker left a comment

ArthurZucker Oct 5, 2024

ArthurZucker Oct 5, 2024

ArthurZucker Oct 5, 2024

zucchini-nlp Oct 5, 2024

ArthurZucker Oct 5, 2024

ArthurZucker Oct 5, 2024

ArthurZucker Oct 5, 2024

zucchini-nlp commented Oct 11, 2024

ArthurZucker left a comment

ArthurZucker Oct 15, 2024

ArthurZucker commented Oct 15, 2024

zucchini-nlp commented Oct 21, 2024 •

edited

Loading

Attn implementation for composite models #32238

Attn implementation for composite models #32238

Conversation

zucchini-nlp commented Jul 26, 2024 • edited Loading

What does this PR do?

HuggingFaceDocBuilderDev commented Jul 26, 2024

qubvel left a comment

Choose a reason for hiding this comment

ydshieh commented Jul 26, 2024

zucchini-nlp commented Jul 29, 2024 • edited Loading

ydshieh commented Jul 29, 2024

zucchini-nlp commented Jul 30, 2024

zucchini-nlp commented Jul 31, 2024

ArthurZucker commented Oct 4, 2024

zucchini-nlp commented Oct 5, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

zucchini-nlp commented Oct 11, 2024

ArthurZucker left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ArthurZucker commented Oct 15, 2024

zucchini-nlp commented Oct 21, 2024 • edited Loading

zucchini-nlp commented Jul 26, 2024 •

edited

Loading

zucchini-nlp commented Jul 29, 2024 •

edited

Loading

zucchini-nlp commented Oct 21, 2024 •

edited

Loading