LLM: Add XPU Memory Optimizations for Pipeline Parallel #11567

xiangyuT · 2024-07-12T01:15:01Z

Description

3. Summary of the change

Add model forward convert for llama, chatglm3/4 when IPEX_LLM_LOW_MEM is set.

4. How to test?

Local test
Unit test: Please manually trigger the PR Validation here by inputting the PR number (e.g., 1234). And paste your action link here once it has been successfully finished.
https://github.com/intel-analytics/ipex-llm-workflow/actions/runs/9900998189

plusbang · 2024-07-15T02:53:48Z

python/llm/src/ipex_llm/transformers/pipeline_parallel.py

+    _enable_lowmem = os.getenv('IPEX_LLM_LOW_MEM')
+    _enable_lowmem = (_enable_lowmem is not None) and (_enable_lowmem.lower() == "1")
+    if _enable_lowmem:
+        model = low_mem_convert(model)


Not sure this memory optimization could also be added in single GPU inference case? Other LGTM.

It could, but for single GPU current existing optimizations may be enough for the demands, so we don't override the CausalLM forward previously.

…ics#11567) Add XPU Memory Optimizations for Pipeline Parallel

init

ce6552c

xiangyuT marked this pull request as ready for review July 12, 2024 01:35

xiangyuT changed the title ~~[WIP] LLM: Add XPU Memory Optimizations for Pipeline Parallel~~ LLM: Add XPU Memory Optimizations for Pipeline Parallel Jul 12, 2024

refine

7fe31d1

glorysdj requested review from plusbang and hkvision July 12, 2024 02:18

hkvision approved these changes Jul 15, 2024

View reviewed changes

hkvision requested a review from lalalapotter July 15, 2024 02:49

plusbang reviewed Jul 15, 2024

View reviewed changes

xiangyuT merged commit 79c742d into intel-analytics:main Jul 16, 2024
1 check passed

RyuKosei pushed a commit to RyuKosei/ipex-llm that referenced this pull request Jul 19, 2024

LLM: Add XPU Memory Optimizations for Pipeline Parallel (intel-analyt…

910b4b5

…ics#11567) Add XPU Memory Optimizations for Pipeline Parallel

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LLM: Add XPU Memory Optimizations for Pipeline Parallel #11567

LLM: Add XPU Memory Optimizations for Pipeline Parallel #11567

xiangyuT commented Jul 12, 2024 •

edited

Loading

plusbang Jul 15, 2024

hkvision Jul 15, 2024

LLM: Add XPU Memory Optimizations for Pipeline Parallel #11567

LLM: Add XPU Memory Optimizations for Pipeline Parallel #11567

Conversation

xiangyuT commented Jul 12, 2024 • edited Loading

Description

3. Summary of the change

4. How to test?

plusbang Jul 15, 2024

Choose a reason for hiding this comment

hkvision Jul 15, 2024

Choose a reason for hiding this comment

xiangyuT commented Jul 12, 2024 •

edited

Loading