-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication of finetuning code #6
Comments
I suggest giving up on the reproduction, my friend. Code tastes bitter and Truth goes opaque. |
Any other questions are welcome. |
@jacklanda I am not sure what do you mean by that? |
I'm guessing poetry generation 🍷 |
@akutuzov Thank you for your response. The parameters I have been testing with have been: I have been running it using 4 32GB V100 gpus at the Puhti supercomputer, on a single node. |
@VilhelmHovland I believe the root of your troubles is this line: You are trying to use the Wordnet dataset directly as it is on HF. We didn't try that, and I doubt the fine-tuning script deals with this well. As mentioned before, we fine-tune on tab-separated files with two columns:
(see the example here) Note that the examples should be already augmented with the instruction prompt ("What is the definition of TARGET_WORD?" or whatever prompt you are using). |
@akutuzov I see, thank you. Is the exact data you used available anywhere, or do I need to process the CoDWoE and naacl data? |
"naacl data" means datasets from Ishivatari et al 2019, right? We did not publish our converted versions, since we felt it would be not polite to re-distribute datasets created by others (simply saved in another format). Again, it should be trivial convert these datasets to |
Hello again, I have now changed my data, but I am still getting the same error. I am using the same parameters except with direct data files. I formatted them like this, in .tsv files, does it look correct? What else could be causing issues?
|
@VilhelmHovland did you try to fine-tune a smaller model ( |
@VilhelmHovland I've just tried to fine-tune the On this toy dataset, fine-tuning with batch size 4 and 2 epochs completed without any issues. I used one A100 GPU with 40GB of RAM. Here is the exact command:
|
Okay, I tried as well, it does work now, thank you. What would be the bottleneck for finetuning the larger models then? Is there any way I could get it to work for those as well? |
Well, the usual procedure: set the per-device batch size to 1, and then increase it until you hit out-of-memory error again. This will be your ceiling in terms of RAM. Often, you can increase the batch size even more by using gradient accumulation (at the cost of slower training). |
@akutuzov Hello, thank you very much for the help earlier, I was hoping you would give me some more advice, I am still working with this model; the model seems to not be learning, and from what I can see in the logging the loss starts and stays at 0.0 (though the logging seems to be very limited and only showing a single epoch), so I suspect the issue is still with the training data. Attached is the batch script and a data sample (in tsv format). |
Hi @VilhelmHovland Otherwise, your SLURM script and data look good (of course I hope in reality you train on more examples, since in the attached file I see only 2, so it won't work with batch size 4 specified in the SLURM script). |
I am fine-tuning the already fine-tuned model yes; using definitions from the Historical Thesaurus of English from the Oxford English Dictionary, I was expecting very high loss. That was just sample data to show the structure yes, the dataset I have is fairly large |
Try to remove |
Hello, I want to try finetuning your model with own data but I have two questions:
Thank you for any assistance here.
The text was updated successfully, but these errors were encountered: