Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何训练自己的数据集呢? #15

Open
cripsgreen opened this issue Oct 16, 2024 · 2 comments
Open

如何训练自己的数据集呢? #15

cripsgreen opened this issue Oct 16, 2024 · 2 comments

Comments

@cripsgreen
Copy link

我想请问我该怎样训练自己的数据集呢,是直接在预训练好的权重文件上微调呢,还是从预训练开始呢,另外LLaVA-Pretrain/chat-translated.json和LLaVA-Instruct/llava_instruct_230k.json这两个json文件是怎么生成的,是用大模型还是一些其他的脚本生成的呢?

@jingyaogong
Copy link
Owner

  1. 取决于数量级,推荐微调
  2. 来自LLaVA

@cripsgreen
Copy link
Author

那微调是不是需要将图片预处理,用llava生成对话描述,然后整合成llava_instruct_230k.json的格式?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants