Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LLM: Add XPU Memory Optimizations for Pipeline Parallel #11567

Merged
merged 2 commits into from
Jul 16, 2024

Conversation

xiangyuT
Copy link
Contributor

@xiangyuT xiangyuT commented Jul 12, 2024

Description

3. Summary of the change

Add model forward convert for llama, chatglm3/4 when IPEX_LLM_LOW_MEM is set.

4. How to test?

@xiangyuT xiangyuT marked this pull request as ready for review July 12, 2024 01:35
@xiangyuT xiangyuT changed the title [WIP] LLM: Add XPU Memory Optimizations for Pipeline Parallel LLM: Add XPU Memory Optimizations for Pipeline Parallel Jul 12, 2024
@glorysdj glorysdj requested review from plusbang and hkvision July 12, 2024 02:18
@hkvision hkvision requested a review from lalalapotter July 15, 2024 02:49
_enable_lowmem = os.getenv('IPEX_LLM_LOW_MEM')
_enable_lowmem = (_enable_lowmem is not None) and (_enable_lowmem.lower() == "1")
if _enable_lowmem:
model = low_mem_convert(model)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure this memory optimization could also be added in single GPU inference case? Other LGTM.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could, but for single GPU current existing optimizations may be enough for the demands, so we don't override the CausalLM forward previously.

@xiangyuT xiangyuT merged commit 79c742d into intel-analytics:main Jul 16, 2024
1 check passed
RyuKosei pushed a commit to RyuKosei/ipex-llm that referenced this pull request Jul 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants