-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attn implementation for composite models #32238
Conversation
…, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma
…, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this! Not sure I fully understand the logic at this moment, so I left a few comments below
Regarding
it could probably reuse
But both will be somehow slow as you mentioned, and sometimes give unexpected errors (because loading a modeling module file). But we might have someway to avoid that. |
@ydshieh @qubvel I made some changes after our last discussion that changing class attr is not good. So now we don't change the attr per se, but we change the config's I will add more tests if we agree on the design And also, regarding
unfortunately we can't do that, cause it raises error if key is not in the dict and we have not auto-mappable models. So we rather try to get and if not simple continue. |
but why not try except around the usage of |
Hmm, not sure how good if to use a |
As we discussed internally, I made the loading logic model-agnostic. So now we check if the config contains any I added some tests in vision enc-dec models where only one sub-config supports SDPA and when both support. Also added tests in Idefics (which were skipped earlier) to verify that LM has its SDPA dispatched even though other modules do not. TODO in the next few hours or tomorrow:
|
Related to #33953 as well, will have a look, but having to change modeling code means we are doing something wrong as it should be a layer of abstraction! |
Cool thanks! The modeling is mostly changed to get the attn implementation from dict and for IDEFICS (1 and 2), which fall out of standards and enforce hard requirements on sdpa. Hope we can get this merged soon, as it is getting more of a problem when new quirky models are added |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, it is indeed important work!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have seen these modified elsewhere, let's revert as unrelated
self.vision_model = Blip2VisionModel._from_config( | ||
config.vision_config, attn_implementation=config._attn_implementation["vision_config"] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my main question is why is this info not stored in the config.vision_config directly?
Initializing the composite config should propagate the attn_implementation
attribute to the sub configs, instead of extracting it in modeling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That means updating from_config
to support extracting the attn_implementation from the config 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right! The was the initial plan was similar, but it was supposed to recursively call autoset_attn_impl
from within autoset_attn_impl
for each sub-module-class. But that was too much extra code and checks
But I think I got you mean here, like instead of doing a dict for generatl config we can simply propagare user-requested attn for each sub-config. Then we hope it will get picked up in later AutoModel.from_config
calls. Good idea and a lot cleaner, let me see how it goes. Hopefully that way we don't need specific treatment for nested configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤗 good luck!
# Composite models consisting of several PretrainedModels have to specify attention impl as a dict | ||
# where keys are sub-config names. But most people will specify one `str` which means that should dispatch it | ||
# for all sub-models. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense, sometimes you want output hidden states of one module but not the other one, so I guess it's minimul flexibility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but IMO we should propagate here
@ArthurZucker I made changes as you suggested but I want to warn that it still needed a few extra logic. Making this PR revealed soo many issues. So we have two options now to indicate attn for composite models: Attn implementation as a dict:
Attn implementation directly propagated in corresponding configs:
Imagine we have a composite or nested config. Currently we set attn in three places: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modeling changes are just perfect 😉 before we have a big refactor this is super super welcome
for key in config: | ||
if isinstance(getattr(config, key), PretrainedConfig): | ||
sub_config = getattr(config, key) | ||
curr_attn_implementation = ( | ||
requested_attn_implementation | ||
if not isinstance(requested_attn_implementation, dict) | ||
else requested_attn_implementation.get(key, None) | ||
) | ||
sub_config._attn_implementation_internal = curr_attn_implementation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is donc to set the attn_implementation
of the sub configs, I am wondering if we can't have a way to use is_composition
to simply say:
- if we are a composition, then we call on our composiite member the
_autoset_attn_implementation
function
Today we have a problem with is_compoisition
: we have this
@classmethod
def from_pretrained(cls, pretrained_model_name_or_path: Union[str, os.PathLike], **kwargs) -> "PretrainedConfig":
cls._set_token_in_kwargs(kwargs)
config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, **kwargs)
if config_dict.get("model_type") == "mllama":
config_dict = config_dict["vision_config"]
if "model_type" in config_dict and hasattr(cls, "model_type") and config_dict["model_type"] != cls.model_type:
logger.warning(
f"You are using a model of type {config_dict['model_type']} to instantiate a model of type "
f"{cls.model_type}. This is not supported for all configurations of models and can yield errors."
)
return cls.from_dict(config_dict, **kwargs)
which IMO we should never have to do and we copy past this everywhere.
This means there is something wrong with how we handle composition!
If we find a clean way to auto_set attn implementation would be better in the futur! |
merging tomorrow, the tests pass locally in most cases except fro the cases not touched by this PR. For ex some FA2/sdpa equivalence tests are failing for me in |
* first try * codestyle * idefics2 is happy * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo, paligemma * fix-copies * [run-slow] llava, llava_next, video_llava, vipllava, llava_next_video, idefics, idefics2, kosmos2, fuyu, blip, blip_2, instructblip, instructblipvideo * blip-2 needs to init vision from config * when was this removed O_o * minor fix * tests * this way? * tests * model-agnostic code * codestyle * add tests for idefics * modify general test for VLMs * no generation test for vlm yet! * no generation test here also * wanr in VIT-SDPA if output attn * add more tests * user can pass dict as attn impl * repo consistency * update * muicgen * no prints * forgot speech enc-dec and clip * how many composite models we have? * musicgen meelody is same as mudicgen * +siglip * fix tests + add some more * remove idefics custom overriden code * make idefics2 automappable * nits * skip tests * doctests * Update src/transformers/models/idefics2/configuration_idefics2.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/clip/test_modeling_clip.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <[email protected]> * Update tests/models/idefics2/test_modeling_idefics2.py Co-authored-by: amyeroberts <[email protected]> * Update src/transformers/configuration_utils.py Co-authored-by: amyeroberts <[email protected]> * major update, no need for automap * clean up * add FA2 test * more tests * style * skip tests * why did these started failing now? * no attributes for FA2 needed * one tiny test * address comment about FA2 false warning * style * add new models and resolve conflicts * fix copies * let it be this way for now, come back tomorrow to review * some more fixes * update * more updates * update * fix copies * style and tests * another big update * fix tests * fix tests * update * another update * fix tests * fix copies * fix tests --------- Co-authored-by: amyeroberts <[email protected]>
What does this PR do?
Related to #32221 and fixes #30565. As described in the issue, currently MultiModal models set attn implementation only in the LM backbone, and not in vision backbone. While working on it, I found another bug. Specifically, using a property to set SDPA flag to VLM class doesn't work because we don't know what is
self.LM
before init the VLM. I tried class property, but that would throw error thatself has no attribute LM
.This PR makes a few modifications to how attn impl is set in modeling to fix the above issues.
More precisely:
attn_implementation
. And the dict keys have to be identical to the ModelConfig attribute names for sub-configs. In other words, if the config has config.text_config, then the dict key must be "text_config" to correctly dispatchNOTE:
I had a few rare cases where the model was not really composite but had several sub-configs. For example DBRX, and I decided to get rid of the sub-configs there. I am not sure what is the right way to deprecate that, so any comments on that welcome
Also, some models have nested configs, where a vision config is part of text config (Qwen2-VL). Only three models are like that, mostly because I didn;t review properly when the model was added and didn't insist on changing stuff. For those models, we don't consider them as composite and just leave it working as it was