Patch for Cambricon MLUs test #1747

huismiling · 2024-05-20T06:44:34Z

Cambricon MLUs has been supported by peft (huggingface/peft/pull/1687).
But the pytest have some failed cases. It's mostly about tolerances.
This patch fix the failed case of tolerances.

huismiling · 2024-05-20T07:15:14Z

tests/testing_common.py

@@ -464,13 +464,13 @@ def _test_merge_layers_fp16(self, model_id, config_cls, config_kwargs):
        if ("gpt2" in model_id.lower()) and (config_cls != LoraConfig):
            self.skipTest("Merging GPT2 adapters not supported for IA³ (yet)")

-        model = self.transformers_class.from_pretrained(model_id, torch_dtype=torch.float16)
+        model = self.transformers_class.from_pretrained(model_id)
        config = config_cls(


torch_dtype=torch.float16 leads to an error.
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

Hmm, I cannot replicate this, whether with or without GPU. The idea of this test is exactly to check that this error does not occur with fp16, so not using this dtype is counter-productive. Is this only occurring with MLU devices?

The reproduction code is as follows.
The issue can be reproduced using PyTorch 2.1, but it executes normally with PyTorch 2.3.

import torch a = torch.rand(4,4).to(torch.float16) b = torch.rand(4,4).to(torch.float16) a @ b

Okay, so instead of changing the dtype, how about skipping the test if an old pytorch version is detected?

Hmm, maybe we can use fp16 with pt>=2.3, and fp32 with pt<2.3 ?

We really don't need to test merging with fp32 here, as it's tested extensively in other tests. This test is very specifically for merging with fp16, so if we don't use fp16, we can skip it.

Ha, Got it! I will fix it.

Hmm, I found that it is the device "cpu" leads error.
When device is changed to self.torch_device as I fixed, MLU tests is OK with torch.float16 .
@BenjaminBossan Will this test use "cpu" device? If not, it isn't need to skip test for pt2.1 .

peft/tests/testing_common.py

Line 473 in cb0bf07

model = model.to(device="cpu", dtype=torch.float16)

So IIRC, there is an error when using CPU + float16 + old PyTorch. If we change either of those variables, there is no error. On CI, we have a new PyTorch version, so it passes, despite using CPU.

If we switch to self.torch_device, it depends, because the device is auto-inferred based on what devices are available. So on our CI, this would still be CPU. On yours, it might not, but then we don't really test what was the original intent, namely that float16 works on CPU.

I assume this fails on your CI because it uses an older PyTorch version. This is why I suggested to just skip the test with older PyTorch versions. If you want, you could add a specific test for merging float16 with MLU, which would be skipped if the device is not available.

@BenjaminBossan
Got it! Skipping test_merge_layers_fp16 for pt2.1 when cpu device, this should be OK.

huismiling · 2024-05-21T06:55:50Z

@BenjaminBossan Hi, could you help to reivew this PR? Thx.

HuggingFaceDocBuilderDev · 2024-05-21T09:23:05Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

BenjaminBossan · 2024-05-21T09:28:35Z

Thanks @huismiling, could you please run make style on this PR?

huismiling · 2024-05-22T01:03:47Z

make style has been done. Thx. @BenjaminBossan

BenjaminBossan

Thanks for these adjustments. Are the results from the tests on MLU visible somewhere?

Overall, the changes LGTM, I just have one concern about the float16 test. Please check my comment.

BenjaminBossan · 2024-05-22T12:42:37Z

tests/testing_common.py

@@ -464,13 +464,13 @@ def _test_merge_layers_fp16(self, model_id, config_cls, config_kwargs):
        if ("gpt2" in model_id.lower()) and (config_cls != LoraConfig):
            self.skipTest("Merging GPT2 adapters not supported for IA³ (yet)")

-        model = self.transformers_class.from_pretrained(model_id, torch_dtype=torch.float16)
+        model = self.transformers_class.from_pretrained(model_id)
        config = config_cls(


Hmm, I cannot replicate this, whether with or without GPU. The idea of this test is exactly to check that this error does not occur with fp16, so not using this dtype is counter-productive. Is this only occurring with MLU devices?

huismiling · 2024-05-23T06:04:39Z

The results from the tests on MLU are as follows. More details is in attachment.
mlu_pytest.txt

== 4368 passed, 1483 skipped, 1 xfailed, 2981 warnings in 5991.03s (1:39:51) ===

BenjaminBossan

Thanks for adding the skipping logic. There's just a small typo there, the rest should be good.

BenjaminBossan · 2024-06-05T09:08:54Z

tests/testing_common.py

@@ -464,13 +465,16 @@ def _test_merge_layers_fp16(self, model_id, config_cls, config_kwargs):
        if ("gpt2" in model_id.lower()) and (config_cls != LoraConfig):
            self.skipTest("Merging GPT2 adapters not supported for IA³ (yet)")

+        if (self.torch_device in ["cpu"]) and (version.parse(torch.__version__) <= version.parse(2.1)):


Suggested change

if (self.torch_device in ["cpu"]) and (version.parse(torch.__version__) <= version.parse(2.1)):

if (self.torch_device in ["cpu"]) and (version.parse(torch.__version__) <= version.parse("2.1")):

Sorry for that. Fixed !

BenjaminBossan

Fantastic, thanks for the PR. Keep us in the loop if tests should start failing for Cambricon MLUs.

huismiling added 10 commits August 23, 2023 23:30

support mlu device

d853d39

Merge commit '6c44096c7b8d55a2ecf24be9bc68393467e1584a' into mlu-dev

f605d3c

merge from main

e0cfce6

rollback

4f462b8

up

2fbdb33

add version check for mlu

3b2b8d6

better accelerate version check for mlu device

2d0ef7c

fix error with make style

0965469

patch for MLUs pytest

4447fe2

Merge remote-tracking branch 'hf_peft/main' into mlu-dev

3e8e62b

huismiling commented May 20, 2024

View reviewed changes

make style for MLUs

99965ac

BenjaminBossan requested changes May 22, 2024

View reviewed changes

huismiling added 2 commits May 31, 2024 09:31

rollback to torch.float16 for MLU test.

83eaba2

skip test_merge_layers_fp16 for pt2.1 & cpu device

ad14bfe

BenjaminBossan requested changes Jun 5, 2024

View reviewed changes

fix err for test_merge_layers_fp16

2b1ad33

BenjaminBossan approved these changes Jun 6, 2024

View reviewed changes

BenjaminBossan merged commit 63a536b into huggingface:main Jun 6, 2024
14 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Patch for Cambricon MLUs test #1747

Patch for Cambricon MLUs test #1747

huismiling commented May 20, 2024

huismiling May 20, 2024

BenjaminBossan May 22, 2024

huismiling May 23, 2024

BenjaminBossan May 28, 2024

huismiling May 29, 2024

BenjaminBossan May 29, 2024

huismiling May 30, 2024

huismiling May 31, 2024

BenjaminBossan May 31, 2024

huismiling Jun 5, 2024 •

edited

Loading

huismiling commented May 21, 2024

HuggingFaceDocBuilderDev commented May 21, 2024

BenjaminBossan commented May 21, 2024

huismiling commented May 22, 2024

BenjaminBossan left a comment

BenjaminBossan May 22, 2024

huismiling commented May 23, 2024

BenjaminBossan left a comment

BenjaminBossan Jun 5, 2024

huismiling Jun 6, 2024

BenjaminBossan left a comment

	if (self.torch_device in ["cpu"]) and (version.parse(torch.__version__) <= version.parse(2.1)):
	if (self.torch_device in ["cpu"]) and (version.parse(torch.__version__) <= version.parse("2.1")):

Patch for Cambricon MLUs test #1747

Patch for Cambricon MLUs test #1747

Conversation

huismiling commented May 20, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huismiling Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

huismiling commented May 21, 2024

HuggingFaceDocBuilderDev commented May 21, 2024

BenjaminBossan commented May 21, 2024

huismiling commented May 22, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huismiling commented May 23, 2024

BenjaminBossan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

BenjaminBossan left a comment

Choose a reason for hiding this comment

huismiling Jun 5, 2024 •

edited

Loading