_is_peft_model update to recognise peft submodules, allowing training quantised models with peft submodules #30884

ambroser53 · 2024-05-17T19:59:21Z

What does this PR do?

Don't necessarily have peft models as the top-level wrapper for models, especially when working with custom built multi-modal models. For example:

model = AutoModelForVision2Seq.from_pretrained(
  args.pretrained_ckpt,
  torch_dtype=compute_dtype,
  quantization_config=BitsAndBytesConfig(
      load_in_4bit=bits == 4,
      load_in_8bit=bits == 8,
      llm_int8_threshold=6.0,
      int8_quant_skip_modules=int8_quant_skip_modules,
      llm_int8_has_fp16_weight=False,
      bnb_4bit_compute_dtype=compute_dtype,
      bnb_4bit_use_double_quant=True,
      bnb_4bit_quant_type='nf4'  # {'fp4', 'nf4'}
  ) if bits < 16 else None,
  attn_implementation=args.attn_implementation,
)

if (args.use_lora and not resume_from_checkpoint and not ft_checkpoint_dir):
  target_modules = get_target_modules(model.model.text_model, args, bits)
  peft_config = LoraConfig(
      target_modules=target_modules,
      inference_mode=args.inference_mode,
      r=args.lora_r,
      lora_alpha=args.lora_alpha,
      lora_dropout=args.lora_dropout,
      use_dora=args.use_dora
  )
  model.model.text_model = get_peft_model(model.model.text_model, peft_config)

  if args.vit_train:
      target_modules = get_target_modules(model.model.vision_model, args, args.vit_bits, vit=True)
      peft_config = LoraConfig(
          target_modules=target_modules,
          inference_mode=args.inference_mode,
          r=args.vit_lora_r,
          lora_alpha=args.vit_lora_alpha,
          lora_dropout=args.lora_dropout,
          use_dora=args.use_dora_vit
      )
      model.model.vision_model = get_peft_model(model.model.vision_model, peft_config)

  if args.lora_abstractor:
      target_modules = get_target_modules(model.model.connector, args, args.bits)
      peft_config = LoraConfig(
          target_modules=target_modules,
          inference_mode=args.inference_mode,
          r=args.lora_r,
          lora_alpha=args.lora_alpha,
          lora_dropout=args.lora_dropout,
          use_dora=args.use_dora
      )
      model.model.connector = get_peft_model(model.model.connector, peft_config)

This allows the hf trainer to recognise such models as still being peft models and thereby allow quantised training (QLoRA).

Fixes #30878

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case. -> Have _is_peft_model check if there's any peft submodule/Allow quantised training #30878
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

@younesbelkada

younesbelkada

Thanks for adding the submodules support for PEFT + Trainer ! Left one suggestion - what do you think?

src/transformers/trainer.py

HuggingFaceDocBuilderDev · 2024-05-20T09:35:55Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

amyeroberts

Thanks for adding! Could you add a test with a dummy model that has peft submodules that behaves correctly with this change but fails on main?

younesbelkada

Thanks @ambroser53 !
For the styling checks, can you try to run pip install -U ".[quality]" and re-run make fixup ?

ambroser53 · 2024-05-21T11:22:20Z

@younesbelkada I've just done as you asked and it ran successfully but I have no working tree changes to commit.

younesbelkada · 2024-05-21T11:57:33Z

Thanks ! Hmm I think something is off with our CI currently, let's wait for #30932 being merged first

amyeroberts

Thanks for iterating on this.

I'm a bit confused about the intended behaviour here from the tests

tests/trainer/test_trainer.py

amyeroberts · 2024-05-21T17:25:44Z

tests/trainer/test_trainer.py

+            with self.assertRaises(ValueError):
+                _ = Trainer(tiny_model, args, train_dataset=train_dataset)  # noqa


Wait - I'm confused 😅

My understanding of this PR was that it's meant to allow training on quantized models which have a PEFT submodule.

This tests doesn't obviously quantize (I might be missing something here). Is it meant to fail if the model isn't quantized?

It does quantise the model. See load_in_4bit=True on line 1002.

Ah, sorry, I missed that. In this case, why is this error being thrown? Shouldn't this model now be trainable (it's quantized and has a peft submodule).

ambroser53 · 2024-06-25T09:29:45Z

Is there any update on this? I'm still running the local version for training my model that uses this. Still don't understand why it couldn't be merged in the first place.

amyeroberts · 2024-06-25T11:34:26Z

@ambroser53 It's still open as there's two outstanding comments/questions on the tests which were marked as resolved, but were not. In the first case, the comment needs to be updated, in the second, could you explain the intended behaviour?

…l architectures that vary their multi-modal set-up. This should work with a wider range of models out the box. - Updated test to have sample subclass with save_pretrained support.

ambroser53 · 2024-06-25T15:05:10Z

@amyeroberts I see where the confusion was here. I was under the assumption the code was fine and we were waiting on a fix for workflow checks (i.e. styling). I have changed the comments in the test to hopefully be more clear and expanded the it to run trainer.train() instead of just initialising the Trainer object and included a simple custom subclass with save_pretrained to allow it to work in that case to show the full use-case with training.

Intended Behaviour: model is trainable when quantised with a peft sub-module.

Workflow tests are still failing so if there is actually anything I need to do there let me know.

amyeroberts · 2024-06-25T17:55:39Z

@ambroser53 Great. For some of the failing tests re passing trust_remote_code for datasets, there's been upstream fixes. Could you rebase on main to include these? I think this should also fix a lot of the quality checks

amyeroberts · 2024-07-05T14:19:38Z

@ambroser53 Apologies for the continued delay. Could you try rebasing again? This should resolve the timeout errors we're having on the CI

amyeroberts

Thanks for iterating!

As mentioned in my PR - needing to override save_pretrained in the test is an indication we should update the standard save_pretrained to account for models with peft submodules

amyeroberts · 2024-07-11T12:18:28Z

tests/trainer/test_trainer.py

+        # Due to the way the Trainer is implemented we must be able to save the model with 'save_pretrained'
+        # Therefore to use peft submodules you must specify your own 'save_pretrained' method
+        # This example subclass allows for the saving of any and all submodules that are of type PeftModel


This isn't right as it means we're no longer testing the transformers functionality. Instead, this PR should update save_pretrained to enable saving of models with PEFT submodules

Not sure who's best to tag here @BenjaminBossan @SunMarc

I would suggest splitting that issue off into a separate PR. IIUC, this PR is originally about avoiding a false positive ValueError that the model is presumably not trainable. I think we should focus on that here.

Now this test touches on another issue, namely that save_pretrained of a PEFT model only saves the trainable adapter part, but of the PEFT model itself is just a submodule, save_pretrained acts like it would on a normal transformers model and saves the whole checkpoint (unless I misunderstand the intent).

IMO, first of all, this should not be handled here in the test. Instead, I'd just add a comment that says something along the lines of: "Note that save_pretrained saves the whole model here, not just the PEFT adater". The test should still pass, right?

Regarding the question of what the right thing to do is if the model has PEFT submodule(s): IMO this is not quite clear and changing save_pretrained for all transformers users to only save the PEFT submodules could be breaking existing code. Probably 90% of users in this situation would only want to save the PEFT adapters, but they would already require special handling to load these submodules correctly (right?) so maybe it's fine that they also need special handling to save them? IMHO, If we want to make it possible to only save the PEFT adapaters of the submodules, it should be configurable, with the default being the status quo.

BenjaminBossan · 2024-07-22T12:54:09Z

src/transformers/trainer.py

+    if _is_peft_model(model):
+        return True
+    elif is_peft_available():
+        classes_to_check = (PeftModel,) if is_peft_available() else ()


Suggested change

classes_to_check = (PeftModel,) if is_peft_available() else ()

classes_to_check = (PeftModel,)

is_peft_available() is already checked in the line above, so no need to check again.

Also, how about changign this slightly to:

if not is_peft_available(): return False ...

That way, the early returns are all handled in one place and we also save one level of indentation for the rest of the function body.

BenjaminBossan · 2024-07-22T12:54:12Z

src/transformers/trainer.py

+        for submodule in model.modules():
+            if isinstance(submodule, classes_to_check):
+                return True
+    return False


Could be condensed to:

return any(isinstance(submodule, classes_to_check) for submodule in model.modules())

BenjaminBossan · 2024-07-22T13:02:37Z

tests/trainer/test_trainer.py

+        # Due to the way the Trainer is implemented we must be able to save the model with 'save_pretrained'
+        # Therefore to use peft submodules you must specify your own 'save_pretrained' method
+        # This example subclass allows for the saving of any and all submodules that are of type PeftModel


I would suggest splitting that issue off into a separate PR. IIUC, this PR is originally about avoiding a false positive ValueError that the model is presumably not trainable. I think we should focus on that here.

Now this test touches on another issue, namely that save_pretrained of a PEFT model only saves the trainable adapter part, but of the PEFT model itself is just a submodule, save_pretrained acts like it would on a normal transformers model and saves the whole checkpoint (unless I misunderstand the intent).

IMO, first of all, this should not be handled here in the test. Instead, I'd just add a comment that says something along the lines of: "Note that save_pretrained saves the whole model here, not just the PEFT adater". The test should still pass, right?

Regarding the question of what the right thing to do is if the model has PEFT submodule(s): IMO this is not quite clear and changing save_pretrained for all transformers users to only save the PEFT submodules could be breaking existing code. Probably 90% of users in this situation would only want to save the PEFT adapters, but they would already require special handling to load these submodules correctly (right?) so maybe it's fine that they also need special handling to save them? IMHO, If we want to make it possible to only save the PEFT adapaters of the submodules, it should be configurable, with the default being the status quo.

github-actions · 2024-08-16T08:06:22Z

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.

_is_peft_model update to allow submodules

07f1c85

younesbelkada reviewed May 20, 2024

View reviewed changes

src/transformers/trainer.py Outdated Show resolved Hide resolved

refactor suggestion

787e531

younesbelkada requested a review from amyeroberts May 20, 2024 09:15

amyeroberts reviewed May 20, 2024

View reviewed changes

ambroser53 added 3 commits May 21, 2024 11:20

Added test within test_trainer.py

c8e4706

Updated code quality for test_trainer.py

01bbeb0

Update test_trainer.py code style (again)

db29baf

younesbelkada reviewed May 21, 2024

View reviewed changes

amyeroberts reviewed May 21, 2024

View reviewed changes

huggingface deleted a comment from github-actions bot Jun 17, 2024

ambroser53 and others added 2 commits June 25, 2024 14:20

Update test comment

1d04ff1

- switched to _has_peft_submodule to fix edge cases with certain mode…

81e482b

…l architectures that vary their multi-modal set-up. This should work with a wider range of models out the box. - Updated test to have sample subclass with save_pretrained support.

Merge branch 'huggingface:main' into main

0c832dd

Merge branch 'huggingface:main' into main

9fa901d

amyeroberts reviewed Jul 11, 2024

View reviewed changes

BenjaminBossan reviewed Jul 22, 2024

View reviewed changes

ambroser53 closed this Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

_is_peft_model update to recognise peft submodules, allowing training quantised models with peft submodules #30884

_is_peft_model update to recognise peft submodules, allowing training quantised models with peft submodules #30884

ambroser53 commented May 17, 2024

younesbelkada left a comment

HuggingFaceDocBuilderDev commented May 20, 2024

amyeroberts left a comment

younesbelkada left a comment

ambroser53 commented May 21, 2024

younesbelkada commented May 21, 2024

amyeroberts left a comment

amyeroberts May 21, 2024

ambroser53 May 21, 2024

amyeroberts May 22, 2024

ambroser53 commented Jun 25, 2024

amyeroberts commented Jun 25, 2024

ambroser53 commented Jun 25, 2024

amyeroberts commented Jun 25, 2024

amyeroberts commented Jul 5, 2024

amyeroberts left a comment

amyeroberts Jul 11, 2024

BenjaminBossan Jul 22, 2024

BenjaminBossan Jul 22, 2024

BenjaminBossan Jul 22, 2024

BenjaminBossan Jul 22, 2024

github-actions bot commented Aug 16, 2024

		with self.assertRaises(ValueError):
		_ = Trainer(tiny_model, args, train_dataset=train_dataset) # noqa

	classes_to_check = (PeftModel,) if is_peft_available() else ()
	classes_to_check = (PeftModel,)

_is_peft_model update to recognise peft submodules, allowing training quantised models with peft submodules #30884

_is_peft_model update to recognise peft submodules, allowing training quantised models with peft submodules #30884

Conversation

ambroser53 commented May 17, 2024

What does this PR do?

Before submitting

Who can review?

younesbelkada left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented May 20, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

younesbelkada left a comment

Choose a reason for hiding this comment

ambroser53 commented May 21, 2024

younesbelkada commented May 21, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ambroser53 commented Jun 25, 2024

amyeroberts commented Jun 25, 2024

ambroser53 commented Jun 25, 2024

amyeroberts commented Jun 25, 2024

amyeroberts commented Jul 5, 2024

amyeroberts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

github-actions bot commented Aug 16, 2024