-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] LLaVA Pretraining with Mixtral 8×7B #1417
Comments
I have not faced this issue. Can you give me the reproducing command. |
@martinakaduc Thank you very much for your prompt reply! Below is my pretraining script. The --model_name_or_path is the model I downloaded from HF mistralai/Mixtral-8x7B-v0.1. Despite the warnings, running this script will produce a mm_projector.bin file. When pretraining, the loss decreases from ~15 to ~6 and does not decrease any more. Can you figure out the problem? |
Have you merged my pull request about adding mixtral? If not, you can use my modified repo here: https://github.com/martinakaduc/LLaVA My pretraining script: And fine-tuning script: |
@martinakaduc Thank you! I will try it now and find out the problem! |
Interesting, would pretraining on mixtral-8x-22b also be possible? |
I think it is possible. However I have not tested yet. |
@martinakaduc Hi, I'm using your pretrained MixSUraV model downloaded from HF to finetune on my own dataset. The script I use is like Figure 1, is it correct? If correct, I found it infeasible when using 8 3090 GPUs (24G) even with 4-bit quantification (set --bits 4, like the red rectangle in the figure). The code for model loading is like Figure 2. However, when I use the code shown in Figure 3, only 3 GPUs are more than enough (may be 1 is ok). Is there any difference between these two codes? And could you please tell me your finetuning script and computational resources needed if you have done this? Thank tou so much! |
Hi, how do you know the training was effecitve? Did you use the default training setting? I LoRA with default parameters and basically no improvement. |
@fisher75 I have LoRA finetuned with my own dataset using LLaVA-v1.5 and the qualitative results are better than the original LLaVA-v1.5. |
Hi @ShawnAn-WHU thanks for your reply. I am also working on this, may I ask is the improvement is very obvious? May I see the training and inference scripts(mostly I am curious about the parameter settings), btw if possible, may I add your WeChat? Could be very helpful to share some details. |
@fisher75 Sure, e-mail me your WeChat ID is ok. |
Question
Does anyone have carried out the pretraining with Mixtral 8×7B? When I run the petraining script, one problem occured like the figure shown below. I just add a llava_mixtral.py to the llava/model/language_model and some necessary supplementary code.
The text was updated successfully, but these errors were encountered: