Support for gradient_checkpointing #9

Richar-Du · 2023-07-13T13:05:38Z

Thanks for your awesome work! There is a small problem: when I fine-tune long_llama with gradient_checkpointing, it raises an error:

Could you please update the code in transformers to make long_llama support gradient_checkpointing. I think it is useful for the community to use long_llama.
@CStanKonrad

CStanKonrad · 2023-07-13T14:52:44Z

Hi, thanks for the request. In the recent commit, I have added initial support for gradient checkpointing (it just skips memory layers). As I am writing, it is not yet present in the Hugging Face repository, so to use it you can download code from the src directory in this repository and write something like this:

from transformers import LlamaTokenizer
from .modeling_longllama import LongLlamaForCausalLM
import torch

MODEL_PATH = "syzymon/long_llama_3b"

tokenizer = LlamaTokenizer.from_pretrained(MODEL_PATH)
model = LongLlamaForCausalLM.from_pretrained(MODEL_PATH, torch_dtype=torch.float32)

Richar-Du · 2023-07-14T02:48:34Z

Thanks for your commit!

Now I would like to fine-tune longllama, but the sequence is too long and it returns CUDA OOM (4x80G). I wonder if I could fine-tune longllama under a regular framework without support for long context (e.g. the training framework of alpaca or vicuna). If I could not, could you please release the fine-tuning code of longllama?

CStanKonrad · 2023-08-08T18:05:00Z

I apologize for the late response. We have recently published the code that allows for fine-tuning the model on a single A100 80GB GPU. We use a total context size of 2048, with last_context_length being 1024. For shorter inputs, we randomly decide how much data will be present in memory. We achieve this by randomly padding the input.

You can try the instruction+chat fine-tuned model in the Colab.

For the Colab model, we provide the fine-tuning config and log of train loss.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for gradient_checkpointing #9

Support for gradient_checkpointing #9

Richar-Du commented Jul 13, 2023

CStanKonrad commented Jul 13, 2023

Richar-Du commented Jul 14, 2023 •

edited

Loading

CStanKonrad commented Aug 8, 2023

Support for gradient_checkpointing #9

Support for gradient_checkpointing #9

Comments

Richar-Du commented Jul 13, 2023

CStanKonrad commented Jul 13, 2023

Richar-Du commented Jul 14, 2023 • edited Loading

CStanKonrad commented Aug 8, 2023

Richar-Du commented Jul 14, 2023 •

edited

Loading