Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

About mask generating and adaptation #2

Open
HeeseongEom opened this issue Aug 16, 2024 · 2 comments
Open

About mask generating and adaptation #2

HeeseongEom opened this issue Aug 16, 2024 · 2 comments

Comments

@HeeseongEom
Copy link

Hello, Ashwinee Panda

I was very impressed with your work and wanted to thank you for the excellent contribution. I am currently following the tutorial using the openbookqa task to finally experiment with how well the model adapts to finetuning in my medical domain. However, I seem to be encountering some difficulties, possibly due to a misunderstanding on my part, so I wanted to leave a comment to seek some clarification.

As I understand it, I should first conduct finetuning through rlaif and then run save_mask afterwards. However, it seems that the process does not work properly unless a mask_path is specified during the finetuning step, which has led me to some questions.

I would like to ask for a more detailed guide on how to find and apply a 0.2 sparse mask for the openbookqa dataset.

The code I previously ran is as follows:
python -u train_single_gpu.py do_first_eval=False
loss=sft
model=llama7b
model.archive="null"
datasets=[openbookqa]
exp_name=test1
eval_batch_size=16
sample_during_eval=false
lr=1e-7
trainer=BasicTrainer
activation_checkpointing=True
data_fraction=1.0
save_every=epoch_2
eval_every=5000
n_epochs=100000
batch_size=8
gradient_accumulation_steps=1
model.fsdp_policy_mp=bfloat16
optimizer=RMSprop
grad_norm_strategy=even
max_grad_norm=10

One thing I am curious about is how the command should be set up to generate the mask. I would greatly appreciate it if you could provide a simple guide on this process.

Thank you,
Heeseong Eom

@kiddyboots216
Copy link
Owner

Hi, to generate the mask you can use this script https://github.com/kiddyboots216/lottery-ticket-adaptation/blob/main/rlaif/scripts/make_diffs_and_masks.sh an example is given in https://github.com/kiddyboots216/lottery-ticket-adaptation/blob/main/rlaif/scripts/continual_learning.sh#L66

Note: You should not set n_epochs=100000 because an epoch is a pass over your entire dataset. Also, you should always set `trainer=FSDPTrainer. I would recommend following this script https://github.com/kiddyboots216/lottery-ticket-adaptation/blob/main/rlaif/scripts/train_single_gpu.sh and just editing the top of the file with the paths that you want and arguments. So in your case, just change the model to llama7b, and change the dataset to openbookqa, and that should be sufficient.

So basically first you are going to train on your dataset for however long you want, this will give you a task vector, then you are going to run the script that will make a diff (make sure you install mergekit from the appropriate folder in this repo first), and then it will make a mask with some level of sparsity. Then you can run the same command but this time pass the mask path that you just created and the level of sparsity.

Hope this helps!

@kiddyboots216
Copy link
Owner

Also, the sparsity level is the fraction of coordinates that are not updated. I'd say by default go with 0.90 for this when you actually create the mask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants