About mask generating and adaptation #2

HeeseongEom · 2024-08-16T06:47:15Z

Hello, Ashwinee Panda

I was very impressed with your work and wanted to thank you for the excellent contribution. I am currently following the tutorial using the openbookqa task to finally experiment with how well the model adapts to finetuning in my medical domain. However, I seem to be encountering some difficulties, possibly due to a misunderstanding on my part, so I wanted to leave a comment to seek some clarification.

As I understand it, I should first conduct finetuning through rlaif and then run save_mask afterwards. However, it seems that the process does not work properly unless a mask_path is specified during the finetuning step, which has led me to some questions.

I would like to ask for a more detailed guide on how to find and apply a 0.2 sparse mask for the openbookqa dataset.

The code I previously ran is as follows:
python -u train_single_gpu.py do_first_eval=False
loss=sft
model=llama7b
model.archive="null"
datasets=[openbookqa]
exp_name=test1
eval_batch_size=16
sample_during_eval=false
lr=1e-7
trainer=BasicTrainer
activation_checkpointing=True
data_fraction=1.0
save_every=epoch_2
eval_every=5000
n_epochs=100000
batch_size=8
gradient_accumulation_steps=1
model.fsdp_policy_mp=bfloat16
optimizer=RMSprop
grad_norm_strategy=even
max_grad_norm=10

One thing I am curious about is how the command should be set up to generate the mask. I would greatly appreciate it if you could provide a simple guide on this process.

Thank you,
Heeseong Eom

kiddyboots216 · 2024-08-16T06:53:17Z

Hi, to generate the mask you can use this script https://github.com/kiddyboots216/lottery-ticket-adaptation/blob/main/rlaif/scripts/make_diffs_and_masks.sh an example is given in https://github.com/kiddyboots216/lottery-ticket-adaptation/blob/main/rlaif/scripts/continual_learning.sh#L66

Note: You should not set n_epochs=100000 because an epoch is a pass over your entire dataset. Also, you should always set `trainer=FSDPTrainer. I would recommend following this script https://github.com/kiddyboots216/lottery-ticket-adaptation/blob/main/rlaif/scripts/train_single_gpu.sh and just editing the top of the file with the paths that you want and arguments. So in your case, just change the model to llama7b, and change the dataset to openbookqa, and that should be sufficient.

So basically first you are going to train on your dataset for however long you want, this will give you a task vector, then you are going to run the script that will make a diff (make sure you install mergekit from the appropriate folder in this repo first), and then it will make a mask with some level of sparsity. Then you can run the same command but this time pass the mask path that you just created and the level of sparsity.

Hope this helps!

kiddyboots216 · 2024-08-16T06:54:48Z

Also, the sparsity level is the fraction of coordinates that are not updated. I'd say by default go with 0.90 for this when you actually create the mask.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About mask generating and adaptation #2

About mask generating and adaptation #2

HeeseongEom commented Aug 16, 2024

kiddyboots216 commented Aug 16, 2024

kiddyboots216 commented Aug 16, 2024

About mask generating and adaptation #2

About mask generating and adaptation #2

Comments

HeeseongEom commented Aug 16, 2024

kiddyboots216 commented Aug 16, 2024

kiddyboots216 commented Aug 16, 2024