Skip to content

Commit

Permalink
fix docs of eva02 and evaclip
Browse files Browse the repository at this point in the history
  • Loading branch information
nemonameless committed Oct 11, 2023
1 parent b520814 commit 8946e24
Show file tree
Hide file tree
Showing 5 changed files with 65 additions and 66 deletions.
7 changes: 0 additions & 7 deletions paddlemix/examples/clip/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,13 +40,6 @@ python setup.py install --prefix=$INSTALL_DIR
export $PATH=$PATH:$INSTALL_DIR
```

4)安装paddlemix

```
git clone [email protected]:PaddlePaddle/PaddleMIX.git
cd PaddleMix
python setup.py install
```

## 3. 数据准备

Expand Down
8 changes: 0 additions & 8 deletions paddlemix/examples/coca/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,14 +42,6 @@ python setup.py install --prefix=$INSTALL_DIR
export $PATH=$PATH:$INSTALL_DIR
```

4)安装paddlemix

```
git clone [email protected]:PaddlePaddle/PaddleMIX.git
cd PaddleMix
python setup.py install
```

## 3. 数据准备

1) coco数据
Expand Down
100 changes: 57 additions & 43 deletions paddlemix/examples/eva02/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,33 +99,29 @@ python setup.py install --prefix=$INSTALL_DIR
export $PATH=$PATH:$INSTALL_DIR
```

4)安装paddlemix

```
git clone [email protected]:PaddlePaddle/PaddleMIX.git
cd PaddleMix
python setup.py install
```

## 2. 数据集和预训练权重

1) ImageNet 1k数据
1) ImageNet-1k数据

我们使用标准的ImageNet-1K数据集(ILSVRC 2012,1000类的120万张图像),从 http://image-net.org 下载,然后使用[shell script](https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh) 将训练和验证图像移动并提取到标记的子文件夹中。
我们使用标准的ImageNet-1K数据集(ILSVRC 2012,1000类的120万张图像),从 http://image-net.org 下载,然后使用[shell script](https://github.com/pytorch/examples/blob/main/imagenet/extract_ILSVRC.sh) 将训练和验证图像移动并提取到标记的子文件夹中。注意其train和val文件夹里均需为1000个子文件夹即1000类。


## 4. 使用说明

### 4.1 Pretrain预训练

使用`paddlemix/examples/eva02/run_eva02_pretrain_dist.py`
使用`paddlemix/examples/eva02/run_eva02_pretrain_dist.py`

训练命令及参数配置示例
注意

这里示例采用单机8卡程序:
1. 如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`
2. `model_name` 可单独使用创建模型,如果更换teacher,则需自己改写`paddlemix/EVA/EVA02/eva02_Ti_for_pretrain`中config.json and model_config.json的teacher_config这个字段的内容,比如将默认的 `paddlemix/EVA/EVA01-CLIP-g-14` 改为 "paddlemix/EVA/EVA02-CLIP-bigE-14"。而student_config是dict,student模型本身是train from scratch的;
3. 如果model_name=None,也可采用teacher_name 和 student_name来创建模型,但它们必须都各自具有config.json和model_state.pdparams,一般eval或加载全量权重debug时采用model_name=None的形式;
4. `TEA_PRETRAIN_CKPT`通常情况下设置为None,模型训练前已加载来自`teacher_name`中的对应teacher预训练权重。但是**如果设置 MP_DEGREE > 1**时,则必须再次设置`TEA_PRETRAIN_CKPT`的路径去加载,一般设置绝对路径,也可从对应的下载链接单独下载相应的`model_state.pdparams`并放置;

注意如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`.

训练命令及参数配置示例,这里示例采用单机8卡程序:
```shell
export FLAGS_embedding_deterministic=1
export FLAGS_cudnn_deterministic=1
Expand All @@ -149,24 +145,23 @@ TRAINING_MODEL_RESUME="None"
TRAINER_INSTANCES='127.0.0.1'
MASTER='127.0.0.1:8080'

TRAINERS_NUM=1
TRAINERS_NUM=1 # machine num
TRAINING_GPUS_PER_NODE=8
DP_DEGREE=8
MP_DEGREE=1
SHARDING_DEGREE=1
DP_DEGREE=8 # dp_parallel_degree
MP_DEGREE=1 # tensor_parallel_degree
SHARDING_DEGREE=1 # sharding_parallel_degree

model_name="paddlemix/EVA/EVA02/eva02_Ti_for_pretrain"
# model_name=None # if set None, will use teacher_name and student_name from_pretrained, both should have config and pdparams
teacher_name="paddlemix/EVA/EVA01-CLIP-g-14"
#teacher_name="paddlemix/EVA/EVA02-CLIP-bigE-14"
student_name="paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14"

TEA_PRETRAIN_CKPT=https://bj.bcebos.com/v1/paddlenlp/models/community/paddlemix/EVA/EVA01-CLIP-g-14/model_state.pdparams # must add if MP is used
TEA_PRETRAIN_CKPT=None # /root/.paddlenlp/models/paddlemix/EVA/EVA01-CLIP-g-14/model_state.pdparams # must add if MP_DEGREE > 1
STU_PRETRAIN_CKPT=None

OUTPUT_DIR=./output/pretrain_eva02_ti
OUTPUT_DIR=./output/eva02_Ti_pt_in21k_p14

DATA_PATH=./dataset/ILSVRC2012
DATA_PATH=./dataset/ILSVRC2012 # put your ImageNet-1k val data path
input_size=224
num_mask_patches=105 ### 224*224/14/14 * 0.4
batch_size=10 # 100(bsz_per_gpu)*8(#gpus_per_node)*5(#nodes)*1(update_freq)=4000(total_bsz)
Expand Down Expand Up @@ -223,23 +218,38 @@ ${TRAINING_PYTHON} paddlemix/examples/eva02/run_eva02_pretrain_dist.py \


默认teacher为`paddlemix/EVA/EVA01-CLIP-g-14`,如果更换teacher,可改为类似如下:

```
model_name=None
model_name="paddlemix/EVA/EVA02/eva02_Ti_for_pretrain" # should modify teacher_config in config.json and model_config.json
# model_name=None # if set None, will use teacher_name and student_name from_pretrained, both should have config and pdparams
teacher_name="paddlemix/EVA/EVA02-CLIP-bigE-14"
student_name="paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14"
TEA_PRETRAIN_CKPT=paddlemix/EVA/EVA02-CLIP-bigE-14/model_state.pdparams
TEA_PRETRAIN_CKPT=None # /root/.paddlenlp/models/paddlemix/EVA/EVA01-CLIP-bigE-14/model_state.pdparams # must add if MP_DEGREE > 1
STU_PRETRAIN_CKPT=None
```

注意 `model_name` 可单独使用创建模型,默认teacher_config是`paddlemix/EVA/EVA01-CLIP-g-14`,而student_config是dict,student模型本身是train from scratch的;
如果model_name=None,也可采用teacher_name 和 student_name来创建模型,但它们必须都各自具有config.json和model_state.pdparams,一般eval或加载全量权重debug时采用model_name=None的形式;
注意:
1. `model_name` 可单独使用创建模型,如果更换teacher,则需自己改写`paddlemix/EVA/EVA02/eva02_Ti_for_pretrain`中config.json and model_config.json的teacher_config这个字段的内容,比如将默认的 `paddlemix/EVA/EVA01-CLIP-g-14` 改为 "paddlemix/EVA/EVA02-CLIP-bigE-14"。而student_config是dict,student模型本身是train from scratch的;
2. 如果model_name=None,也可采用teacher_name 和 student_name来创建模型,但它们必须都各自具有config.json和model_state.pdparams,一般eval或加载全量权重debug时采用model_name=None的形式;
3. `TEA_PRETRAIN_CKPT`通常情况下设置为None,模型训练前已加载来自`teacher_name`中的对应teacher预训练权重。但是**如果设置 MP_DEGREE > 1**时,则必须再次设置`TEA_PRETRAIN_CKPT`的路径去加载,一般设置绝对路径,也可从对应的下载链接单独下载相应的`model_state.pdparams`并放置;



### 4.2 Finetune训练

使用`paddlemix/examples/eva02/run_eva02_finetune_dist.py`

注意:

1. 如果采用分布式策略,分布式并行关系有:`nnodes * nproc_per_node == tensor_parallel_degree * sharding_parallel_degree * dp_parallel_degree`,其中`dp_parallel_degree`参数根据其他几个值计算出来,因此需要保证`nnodes * nproc_per_node >= tensor_parallel_degree * sharding_parallel_degree`

### 4.2 Finetune微调
2. 如果训练`paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14`, 则必须加载**其对应的预训练权重**`paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14`,然后设置预训练权重的`model_state.pdparams`的绝对路径,或单独从[这个链接](https://bj.bcebos.com/v1/paddlenlp/models/community/paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14/model_state.pdparams)下载并放置。

使用`paddlemix/examples/eva02/run_eva02_finetune_dist.py`
3. tiny/s是336尺度训练,B/L是448尺度训练,而它们的预训练权重均是224尺度训练得到的。


训练命令及参数配置示例,这里示例采用单机8卡程序:
```shell
export FLAGS_embedding_deterministic=1
export FLAGS_cudnn_deterministic=1
Expand All @@ -257,30 +267,31 @@ CLIP_GRAD=0.0
num_train_epochs=100
save_epochs=2 # save every 2 epochs

warmup_epochs=5 # set 0 will fast convergence in 0 epoch
warmup_epochs=5 # set 0 will fast convergence in 1 epoch
warmup_steps=0
drop_path=0.1

TRAINING_MODEL_RESUME="None"
TRAINER_INSTANCES='127.0.0.1'
MASTER='127.0.0.1:8080'

TRAINERS_NUM=1
TRAINERS_NUM=1 # machine num
TRAINING_GPUS_PER_NODE=8
DP_DEGREE=8
MP_DEGREE=1
SHARDING_DEGREE=1
DP_DEGREE=8 # dp_parallel_degree
MP_DEGREE=1 # tensor_parallel_degree
SHARDING_DEGREE=1 # sharding_parallel_degree

MODEL_NAME="paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14"
PRETRAIN_CKPT=https://bj.bcebos.com/v1/paddlenlp/models/community/paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14/model_state.pdparams
PRETRAIN_CKPT=/root/.paddlenlp/models/paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_p14/model_state.pdparams # pretrained model, input_size is 224

OUTPUT_DIR=./output/eva02_Ti_pt_in21k_ft_in1k_p14

OUTPUT_DIR=./output/finetune_eva02_ti
DATA_PATH=./dataset/ILSVRC2012 # put your ImageNet-1k val data path

DATA_PATH=./dataset/ILSVRC2012
input_size=336
batch_size=128 # 128(bsz_per_gpu)*8(#gpus_per_node)*1(#nodes)*1(update_freq)=1024(total_bsz)
num_workers=10
accum_freq=2 # update_freq
accum_freq=1 # update_freq
logging_steps=10 # print_freq
seed=0

Expand All @@ -298,8 +309,6 @@ ${TRAINING_PYTHON} paddlemix/examples/eva02/run_eva02_finetune_dist.py \
--input_size ${input_size} \
--layer_decay ${layer_decay} \
--drop_path ${drop_path} \
--smoothing ${smoothing} \
--do_train \
--optim ${optim} \
--learning_rate ${lr} \
--weight_decay ${weight_decay} \
Expand Down Expand Up @@ -332,26 +341,31 @@ ${TRAINING_PYTHON} paddlemix/examples/eva02/run_eva02_finetune_dist.py \
--fp16 ${USE_AMP} \
```

注意tiny/s是336尺度训,B/L是448尺度训,而它们的预训练权重均为224尺度。

### 4.3 评估

使用`paddlemix/examples/eva02/run_eva02_finetune_eval.py`

### 4.3 评估
注意:

1. 默认加载的是下载的`paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14`里的训好的权重,所以PRETRAIN_CKPT=None,如果是本地新训好的权重,则可设置PRETRAIN_CKPT的具体路径去加载和评估;

使用`paddlemix/examples/eva02/run_eva02_finetune_eval.py`

```shell
MODEL_NAME="paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14"
DATA_PATH=./datasets/ILSVRC2012
DATA_PATH=./dataset/ILSVRC2012 # put your ImageNet-1k val data path
OUTPUT_DIR=./outputs

input_size=336
batch_size=128
num_workers=10

PRETRAIN_CKPT=None # output/eva02_Ti_pt_in21k_ft_in1k_p14/checkpoint-best/model_state.pdparams

CUDA_VISIBLE_DEVICES=0 python paddlemix/examples/eva02/run_eva02_finetune_eval.py \
--do_eval \
--model ${MODEL_NAME} \
--pretrained_model_path ${PRETRAIN_CKPT} \
--eval_data_path ${DATA_PATH}/val \
--input_size ${input_size} \
--per_device_eval_batch_size ${batch_size} \
Expand All @@ -364,7 +378,7 @@ CUDA_VISIBLE_DEVICES=0 python paddlemix/examples/eva02/run_eva02_finetune_eval.p
```
# 参数说明
--model #设置实际使用的模型,示例为`EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14`,注意必须用`EVA/EVA02/`开头,后面的模型可自行替换
--model #设置实际使用的模型,示例为`paddlemix/EVA/EVA02/eva02_Ti_pt_in21k_ft_in1k_p14`,会自动下载,也可自己写本地机器上的路径,后面的模型可自行替换
--eval_data_path #评估数据路径
Expand Down
8 changes: 8 additions & 0 deletions paddlemix/examples/eva02/run_eva02_finetune_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
import paddle
from paddlenlp.trainer import PdArgumentParser

from paddlemix.checkpoint import load_model
from paddlemix.datasets.dataset import ImageFolder
from paddlemix.examples.eva02.run_eva02_finetune_dist import (
Collator,
Expand Down Expand Up @@ -59,6 +60,13 @@ def main_worker(training_args, model_args, data_args):
model = EVA02VisionTransformer.from_pretrained(model_args.model, ignore_mismatched_sizes=False)
model.eval()

if (
training_args.pretrained_model_path
and training_args.pretrained_model_path != "None"
and training_args.resume_from_checkpoint is None
):
load_model(training_args, model, ckpt_dir=training_args.pretrained_model_path)

eval_dataset = ImageFolder(root=f"{data_args.eval_data_path}")
image_processor = EVA02FinetuneImageProcessor.from_pretrained(os.path.join(model_args.model, "processor", "eval"))
processor = EVA02Processor(image_processor)
Expand Down
8 changes: 0 additions & 8 deletions paddlemix/examples/evaclip/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,14 +67,6 @@ python setup.py install --prefix=$INSTALL_DIR
export $PATH=$PATH:$INSTALL_DIR
```

4)安装paddlemix

```
git clone [email protected]:PaddlePaddle/PaddleMIX.git
cd PaddleMix
python setup.py install
```

## 3. 数据准备

1) coco数据
Expand Down

0 comments on commit 8946e24

Please sign in to comment.