Fix Cohere CI #31263

ydshieh · 2024-06-05T15:06:41Z

What does this PR do?

Currently some tests are failing, see

https://github.com/huggingface/transformers/actions/runs/9360271371/job/25765482477

This PR skips 2 of them and make 4 pass:

FAILED tests/models/cohere/test_modeling_cohere.py::CohereModelTest::test_eager_matches_sdpa_generate - ValueError: Some modules are dispatched on the CPU or the disk. ..
FAILED tests/models/cohere/test_modeling_cohere.py::CohereIntegrationTest::test_batched_4bit - torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 88.00 MiB. GPU
FAILED tests/models/cohere/test_modeling_cohere.py::CohereIntegrationTest::test_batched_small_model_logits - RuntimeError: Half did not match Float
FAILED tests/models/cohere/test_tokenization_cohere.py::CohereTokenizationTest::test_torch_encode_plus_sent_to_model - RuntimeError: scaled_dot_product_attention_flash_attention does not support FP16 on the platforms without amx_fp16 support

ydshieh · 2024-06-05T15:07:40Z

tests/models/cohere/test_modeling_cohere.py

+    @unittest.skip("foo")
+    def test_initialization(self):
+        super().test_initialization()
+
+    @unittest.skip("foo")
+    def test_fast_init_context_manager(self):
+        super().test_fast_init_context_manager()


I will open issue to keep this track, but let's follow what we have discussed earlier that we sometimes also need to skip some tests considering the amount of time we have.

ydshieh · 2024-06-05T15:08:35Z

tests/models/cohere/test_modeling_cohere.py

+            "Hi there, here we are again with another great collection of free fonts for your next project. This time we have gathered 10 free fonts that you can download and use in your designs. These fonts are perfect for any kind",
        ]

-        model = CohereForCausalLM.from_pretrained(model_id)
+        model = CohereForCausalLM.from_pretrained(model_id, device_map="auto")


This was previously GPU OOM. After changing to auto, I have to update the expected output value too.

HuggingFaceDocBuilderDev · 2024-06-05T15:35:46Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

ydshieh · 2024-06-06T14:26:05Z

tests/models/cohere/test_tokenization_cohere.py

+    # This gives CPU OOM on a single-gpu runner (~60G RAM). On multi-gpu runner, it has ~180G RAM which is enough.
+    @require_torch_multi_gpu
+    def test_torch_encode_plus_sent_to_model(self):
+        super().test_torch_encode_plus_sent_to_model()


ydshieh · 2024-06-06T14:27:23Z

tests/models/cohere/test_modeling_cohere.py

+    @unittest.skip("Failing.")
+    def test_initialization(self):
+        super().test_initialization()
+
+    @unittest.skip("Failing.")
+    def test_fast_init_context_manager(self):
+        super().test_fast_init_context_manager()
+


I will open issue to track this, but as we discussed earlier, let's not spend too much time on some tests and focus on other priorities.

ydshieh · 2024-06-06T14:29:46Z

.github/workflows/self-pr-slow-ci.yml

+        run: |
+          export CUDA_VISIBLE_DEVICES="$(python3 utils/set_cuda_devices_for_ci.py --test_folder ${{ matrix.folders }})"
+          echo $CUDA_VISIBLE_DEVICES
+          python3 -m pytest -v -rsfE --make-reports=${{ matrix.machine_type }}_run_models_gpu_${{ matrix.folders }}_test_reports tests/${{ matrix.folders }}


So far on workflow, we set CUDA_VISIBLE_DEVICES=0,1 as otherwise overall there are some more strange issues when running 4 GPUs.

However, for cohere test_eager_matches_sdpa_generate, it requires more GPU memory. So I just tried to allow CUDA_VISIBLE_DEVICES=0,1,2,3 for this job.

It requires a new script below (see at the end).

LysandreJik

Ok, thanks Yih-Dar!

* [run-slow] cohere * [run-slow] cohere * [run-slow] cohere --------- Co-authored-by: ydshieh <[email protected]>

ydshieh commented Jun 5, 2024

View reviewed changes

ydshieh added the run-slow label Jun 5, 2024

ydshieh force-pushed the fix_cohere branch 2 times, most recently from 761b75a to 06c81eb Compare June 6, 2024 13:44

ydshieh commented Jun 6, 2024

View reviewed changes

ydshieh requested a review from LysandreJik June 6, 2024 14:30

LysandreJik approved these changes Jun 10, 2024

View reviewed changes

ydshieh added 2 commits June 10, 2024 14:45

[run-slow] cohere

1056f3e

[run-slow] cohere

53433ac

ydshieh mentioned this pull request Jun 10, 2024

[CI] 2 Cohere tests are failing and skipped for now #31351

Closed

[run-slow] cohere

e4dac44

ydshieh force-pushed the fix_cohere branch from fd33bed to e4dac44 Compare June 10, 2024 12:50

ydshieh merged commit 8fff07d into main Jun 10, 2024
22 of 24 checks passed

ydshieh deleted the fix_cohere branch June 10, 2024 13:16

zucchini-nlp pushed a commit to zucchini-nlp/transformers that referenced this pull request Jun 14, 2024

Fix Cohere CI (huggingface#31263)

30cc826

* [run-slow] cohere * [run-slow] cohere * [run-slow] cohere --------- Co-authored-by: ydshieh <[email protected]>

itazap pushed a commit that referenced this pull request Jun 17, 2024

Fix Cohere CI (#31263)

eea5af1

* [run-slow] cohere * [run-slow] cohere * [run-slow] cohere --------- Co-authored-by: ydshieh <[email protected]>

itazap pushed a commit that referenced this pull request Jun 17, 2024

Fix Cohere CI (#31263)

347ea80

* [run-slow] cohere * [run-slow] cohere * [run-slow] cohere --------- Co-authored-by: ydshieh <[email protected]>

itazap pushed a commit that referenced this pull request Jun 17, 2024

Fix Cohere CI (#31263)

5c308a4

* [run-slow] cohere * [run-slow] cohere * [run-slow] cohere --------- Co-authored-by: ydshieh <[email protected]>

itazap pushed a commit that referenced this pull request Jun 18, 2024

Fix Cohere CI (#31263)

6898fcd

* [run-slow] cohere * [run-slow] cohere * [run-slow] cohere --------- Co-authored-by: ydshieh <[email protected]>

ydshieh mentioned this pull request Jun 20, 2024

[run-slow] cohere #31517

Merged

itazap pushed a commit that referenced this pull request Jun 20, 2024

Fix Cohere CI (#31263)

9d9a17b

* [run-slow] cohere * [run-slow] cohere * [run-slow] cohere --------- Co-authored-by: ydshieh <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix Cohere CI #31263

Fix Cohere CI #31263

ydshieh commented Jun 5, 2024 •

edited

Loading

ydshieh Jun 5, 2024 •

edited

Loading

ydshieh Jun 5, 2024

HuggingFaceDocBuilderDev commented Jun 5, 2024

ydshieh Jun 6, 2024

ydshieh Jun 6, 2024

ydshieh Jun 6, 2024

ydshieh Jun 6, 2024

LysandreJik left a comment

Fix Cohere CI #31263

Fix Cohere CI #31263

Conversation

ydshieh commented Jun 5, 2024 • edited Loading

What does this PR do?

ydshieh Jun 5, 2024 • edited Loading

Choose a reason for hiding this comment

ydshieh Jun 5, 2024

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Jun 5, 2024

ydshieh Jun 6, 2024

Choose a reason for hiding this comment

ydshieh Jun 6, 2024

Choose a reason for hiding this comment

ydshieh Jun 6, 2024

Choose a reason for hiding this comment

ydshieh Jun 6, 2024

Choose a reason for hiding this comment

LysandreJik left a comment

Choose a reason for hiding this comment

ydshieh commented Jun 5, 2024 •

edited

Loading

ydshieh Jun 5, 2024 •

edited

Loading