Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add llama3's prompt template to conversation.py #1443

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

KazutoshiShinoda
Copy link

No description provided.

@KazutoshiShinoda
Copy link
Author

#1426

@awzhgw
Copy link

awzhgw commented Apr 23, 2024

@KazutoshiShinoda can add preprocess_llama_3 func code? i will test it on prepare stage

@Jayantverma2
Copy link

what about prepocess during LazySupervisedDataset

@mmaaz60
Copy link

mmaaz60 commented Apr 26, 2024

Hi @KazutoshiShinoda, @awzhgw , @Jayantverma2 ,

I hope you are doing well. We have just released our project LLaVA++: Extending Visual Capabilities with LLaMA-3 and Phi-3, which features LLaMA-3 and Phi-3-Mini based LLaVA models. Please have a look at this at LLaVA++.

  • We have released the codes required to support both LLaMA-3 & Phi-3-Mini models in LLaVA framework. The chat formats and corresponding preprocess methods are available at our GitHub repo.
  • We released all the checkpoints on Hugging Face
  • On our GitHub repository we have provided .py files that needs to be replaced/added to official LLaVA repository to train and infer LLaMA-3 & Phi-3-Mini based models.

I hope this would be helpful. Please let me know if you have any questions. Thanks

@pluswcm
Copy link

pluswcm commented Jun 5, 2024

@mmaaz60 In your implementation, I can see the following logic for preprocessing, but I don't quite understand why
round_len -= 1 when i > 0. Could you explain that a little bit?

    for conversation, target in zip(conversations, targets):
        total_len = int(target.ne(tokenizer.pad_token_id).sum())

        rounds = conversation.split(conv.sep)
        re_rounds = [conv.sep.join(rounds[:3])]
        for conv_idx in range(3, len(rounds), 2):
            re_rounds.append(conv.sep.join(rounds[conv_idx:conv_idx + 2]))
        cur_len = 0
        target[:cur_len] = IGNORE_INDEX
        for i, rou in enumerate(re_rounds):
            if rou == "":
                break

            parts = rou.split(sep)
            if len(parts) != 2:
                break
            parts[0] += sep

            if has_image:
                round_len = len(tokenizer_image_token(rou, tokenizer)) + 1
                instruction_len = len(tokenizer_image_token(parts[0], tokenizer))
            else:
                round_len = len(tokenizer(rou).input_ids) + 1
                instruction_len = len(tokenizer(parts[0]).input_ids)

            if i > 0:
                round_len -= 1
                instruction_len -= 1

            target[cur_len: cur_len + instruction_len] = IGNORE_INDEX

            cur_len += round_len

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants