diff --git a/image_classification/MAE/README.md b/image_classification/MAE/README.md
deleted file mode 100644
index 8db9f25b..00000000
--- a/image_classification/MAE/README.md
+++ /dev/null
@@ -1,174 +0,0 @@
-# TODO: This README should be modified
-# An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale, [arxiv](https://arxiv.org/abs/2010.11929)
-
-PaddlePaddle training/validation code and pretrained models for **ViT**.
-
-The official TF implementation is [here](https://github.com/google-research/vision_transformer).
-
-This implementation is developed by [PaddleViT](https://github.com/BR-IDL/PaddleViT.git).
-
-
-
-
-
ViT Model Overview
-
-
-
-### Update
-- Update (2021-09-27): More weights are uploaded.
-- Update (2021-08-11): Code is released and ported weights are uploaded.
-
-## Models Zoo
-| Model | Acc@1 | Acc@5 | #Params | FLOPs | Image Size | Crop_pct | Interpolation | Link |
-|-------------------------------|-------|-------|---------|--------|------------|----------|---------------|--------------|
-| vit_base_patch32_224 | 80.68 | 95.61 | 88.2M | 4.4G | 224 | 0.875 | bicubic | [google](https://drive.google.com/file/d/1DPEhEuu9sDdcmOPukQbR7ZcHq2bxx9cr/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1ppOLj5SWlJmA-NjoLCoYIw)(ubyr) |
-| vit_base_patch32_384 | 83.35 | 96.84 | 88.2M | 12.7G | 384 | 1.0 | bicubic | [google](https://drive.google.com/file/d/1nCOSwrDiFBFmTkLEThYwjL9SfyzkKoaf/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1jxnL00ocpmdiPM4fOu4lpg)(3c2f) |
-| vit_base_patch16_224 | 84.58 | 97.30 | 86.4M | 17.0G | 224 | 0.875 | bicubic | [google](https://drive.google.com/file/d/13D9FqU4ISsGxWXURgKW9eLOBV-pYPr-L/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1ms3o2fHMQpIoVqnEHitRtA)(qv4n) |
-| vit_base_patch16_384 | 85.99 | 98.00 | 86.4M | 49.8G | 384 | 1.0 | bicubic | [google](https://drive.google.com/file/d/1kWKaAgneDx0QsECxtf7EnUdUZej6vSFT/view?usp=sharing)/[baidu](https://pan.baidu.com/s/15ggLdiL98RPcz__SXorrXA)(wsum) |
-| vit_large_patch16_224 | 85.81 | 97.82 | 304.1M | 59.9G | 224 | 0.875 | bicubic | [google](https://drive.google.com/file/d/1jgwtmtp_cDWEhZE-FuWhs7lCdpqhAMft/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1HRxUJAwEiKgrWnJSjHyU0A)(1bgk) |
-| vit_large_patch16_384 | 87.08 | 98.30 | 304.1M | 175.9G | 384 | 1.0 | bicubic | [google](https://drive.google.com/file/d/1zfw5mdiIm-mPxxQddBFxt0xX-IR-PF2U/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1KvxfIpMeitgXAUZGr5HV8A)(5t91) |
-| vit_large_patch32_384 | 81.51 | 96.09 | 306.5M | 44.4G | 384 | 1.0 | bicubic | [google](https://drive.google.com/file/d/1Py1EX3E35jL7DComW-29Usg9788BB26j/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1W8sUs0pObOGpohP4vsT05w)(ieg3) |
-| | | | | | | | | |
-
-> *The results are evaluated on ImageNet2012 validation set.
-
-## Notebooks
-We provide a few notebooks in aistudio to help you get started:
-
-**\*(coming soon)\***
-
-
-## Requirements
-- Python>=3.6
-- yaml>=0.2.5
-- [PaddlePaddle](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html)>=2.1.0
-- [yacs](https://github.com/rbgirshick/yacs)>=0.1.8
-
-## Data
-ImageNet2012 dataset is used in the following folder structure:
-```
-│imagenet/
-├──train/
-│ ├── n01440764
-│ │ ├── n01440764_10026.JPEG
-│ │ ├── n01440764_10027.JPEG
-│ │ ├── ......
-│ ├── ......
-├──val/
-│ ├── n01440764
-│ │ ├── ILSVRC2012_val_00000293.JPEG
-│ │ ├── ILSVRC2012_val_00002138.JPEG
-│ │ ├── ......
-│ ├── ......
-```
-
-## Usage
-To use the model with pretrained weights, download the `.pdparam` weight file and change related file paths in the following python scripts. The model config files are located in `./configs/`.
-
-For example, assume the downloaded weight file is stored in `./vit_base_patch16_224.pdparams`, to use the `vit_base_patch16_224` model in python:
-```python
-from config import get_config
-from transformer import build_vit as build_model
-# config files in ./configs/
-config = get_config('./configs/vit_base_patch16_224.yaml')
-# build model
-model = build_model(config)
-# load pretrained weights, .pdparams is NOT needed
-model_state_dict = paddle.load('./vit_base_patch16_224.pdparams')
-model.set_dict(model_state_dict)
-```
-
-## Evaluation
-To evaluate ViT model performance on ImageNet2012 with a single GPU, run the following script using command line:
-```shell
-sh run_eval.sh
-```
-or
-```shell
-CUDA_VISIBLE_DEVICES=0 \
-python main_single_gpu.py \
- -cfg='./configs/vit_base_patch16_224.yaml' \
- -dataset='imagenet2012' \
- -batch_size=16 \
- -data_path='/dataset/imagenet' \
- -eval \
- -pretrained='./vit_base_patch16_224.pdparams'
-```
-
-
-
-
-Run evaluation using multi-GPUs:
-
-
-
-```shell
-sh run_eval_multi.sh
-```
-or
-```shell
-CUDA_VISIBLE_DEVICES=0,1,2,3 \
-python main_multi_gpu.py \
- -cfg='./configs/vit_base_patch16_224.yaml' \
- -dataset='imagenet2012' \
- -batch_size=16 \
- -data_path='/dataset/imagenet' \
- -eval \
- -pretrained='./vit_base_patch16_224.pdparams'
-```
-
-
-
-
-## Training
-To train the ViT model on ImageNet2012 with single GPU, run the following script using command line:
-```shell
-sh run_train.sh
-```
-or
-```shell
-CUDA_VISIBLE_DEVICES=0 \
-python main_single_gpu.py \
- -cfg='./configs/vit_base_patch16_224.yaml' \
- -dataset='imagenet2012' \
- -batch_size=32 \
- -data_path='/dataset/imagenet' \
-```
-
-
-
-
-
-Run training using multi-GPUs:
-
-
-
-```shell
-sh run_train_multi.sh
-```
-or
-```shell
-CUDA_VISIBLE_DEVICES=0,1,2,3 \
-python main_multi_gpu.py \
- -cfg='./configs/vit_base_patch16_224.yaml' \
- -dataset='imagenet2012' \
- -batch_size=16 \
- -data_path='/dataset/imagenet' \
-```
-
-
-
-
-
-## Visualization Attention Map
-**(coming soon)**
-
-## Reference
-```
-@article{dosovitskiy2020image,
- title={An image is worth 16x16 words: Transformers for image recognition at scale},
- author={Dosovitskiy, Alexey and Beyer, Lucas and Kolesnikov, Alexander and Weissenborn, Dirk and Zhai, Xiaohua and Unterthiner, Thomas and Dehghani, Mostafa and Minderer, Matthias and Heigold, Georg and Gelly, Sylvain and others},
- journal={arXiv preprint arXiv:2010.11929},
- year={2020}
-}
-```
diff --git a/image_classification/MAE/augment.py b/image_classification/MAE/augment.py
deleted file mode 100644
index 7a7f081c..00000000
--- a/image_classification/MAE/augment.py
+++ /dev/null
@@ -1,285 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""Augmentation"""
-""" Rand Augmentation """
-# reference: RandAugment: Practical automated data augmentation with a reduced search space
-# https://arxiv.org/abs/1909.13719
-
-""" Auto Augmentation """
-# reference: AutoAugment: Learning Augmentation Policies from Data
-# https://arxiv.org/abs/1805.09501
-
-import random
-import numpy as np
-from PIL import Image, ImageEnhance, ImageOps
-
-
-def auto_augment_policy_original():
- """25 types of augment policies in original paper"""
- policy = [
- [('Posterize', 0.4, 8), ('Rotate', 0.6, 9)],
- [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
- [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
- [('Posterize', 0.6, 7), ('Posterize', 0.6, 6)],
- [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
- [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
- [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
- [('Posterize', 0.8, 5), ('Equalize', 1.0, 2)],
- [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
- [('Equalize', 0.6, 8), ('Posterize', 0.4, 6)],
- [('Rotate', 0.8, 8), ('Color', 0.4, 0)],
- [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
- [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
- [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
- [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
- [('Rotate', 0.8, 8), ('Color', 1.0, 2)],
- [('Color', 0.8, 8), ('Solarize', 0.8, 7)],
- [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
- [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
- [('Color', 0.4, 0), ('Equalize', 0.6, 3)],
- [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
- [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
- [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
- [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
- [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
- ]
- policy = [[SubPolicy(*args) for args in subpolicy] for subpolicy in policy]
- return policy
-
-
-def rand_augment_policy_original(magnitude_idx=9):
- """
- 14 types of augment policies in original paper
- Args:
- magnitude_idx: M
- """
- policy = [
- ('Posterize', 1, magnitude_idx), ('Rotate', 1, magnitude_idx),
- ('Solarize', 1, magnitude_idx), ('AutoContrast', 1, magnitude_idx),
- ('Equalize', 1, magnitude_idx), ('Contrast', 1, magnitude_idx),
- ('Color', 1, magnitude_idx), ('Invert', 1, magnitude_idx),
- ('Sharpness', 1, magnitude_idx), ('Brightness', 1, magnitude_idx),
- ('ShearX', 1, magnitude_idx), ('ShearY', 1, magnitude_idx),
- ('TranslateX', 1, magnitude_idx), ('TranslateY', 1, magnitude_idx),
- ]
- policy = [SubPolicy(*args) for args in policy]
- return policy
-
-
-class AutoAugment():
- """Auto Augment
- Randomly choose a tuple of augment ops from a list of policy
- Then apply the tuple of augment ops to input image
-
- Examples:
- policy = auto_augment_policy_original()
- augment = AutoAugment(policy)
- transformed_image = augment(image)
- """
-
- def __init__(self, policy):
- self.policy = policy
-
- def __call__(self, image, policy_idx=None):
- if policy_idx is None:
- policy_idx = random.randint(0, len(self.policy) - 1)
-
- sub_policy = self.policy[policy_idx]
- for op in sub_policy:
- image = op(image)
- return image
-
-
-class RandAugment():
- """Rand Augment
- Randomly choose N augment ops from a list of K policies
- Then apply the N ops to input image
-
- Examples:
- policy = rand_augment_policy_original(magnitude_idx)
- augment = RandAugment(policy)
- transformed_image = augment(image)
- """
-
- def __init__(self, policy, num_layers=2):
- """
- Args:
- policy: list of SubPolicy
- num_layers: int
- """
- self.policy = policy
- self.num_layers = num_layers
-
- def __call__(self, image):
- selected_idx = np.random.choice(len(self.policy), self.num_layers)
-
- for policy_idx in selected_idx:
- sub_policy = self.policy[policy_idx]
- image = sub_policy(image)
- return image
-
-
-class SubPolicy:
- """Subpolicy
- Read augment name and magnitude, apply augment with probability
- Args:
- op_name: str, augment operation name
- prob: float, if prob > random prob, apply augment
- magnitude_idx: int, index of magnitude in preset magnitude ranges
- """
-
- def __init__(self, op_name, prob, magnitude_idx):
- # ranges of operations' magnitude
- ranges = {
- 'ShearX': np.linspace(0, 0.3, 10), # [-0.3, 0.3] (by random negative)
- 'ShearY': np.linspace(0, 0.3, 10), # [-0.3, 0.3] (by random negative)
- 'TranslateX': np.linspace(0, 150 / 331, 10), # [-0.45, 0.45] (by random negative)
- 'TranslateY': np.linspace(0, 150 / 331, 10), # [-0.45, 0.45] (by random negative)
- 'Rotate': np.linspace(0, 30, 10), # [-30, 30] (by random negative)
- 'Color': np.linspace(0, 0.9, 10), # [-0.9, 0.9] (by random negative)
- 'Posterize': np.round(np.linspace(8, 4, 10), 0).astype(np.int), # [0, 4]
- 'Solarize': np.linspace(256, 0, 10), # [0, 256]
- 'Contrast': np.linspace(0, 0.9, 10), # [-0.9, 0.9] (by random negative)
- 'Sharpness': np.linspace(0, 0.9, 10), # [-0.9, 0.9] (by random negative)
- 'Brightness': np.linspace(0, 0.9, 10), # [-0.9, 0.9] (by random negative)
- 'AutoContrast': [0] * 10, # no range
- 'Equalize': [0] * 10, # no range
- 'Invert': [0] * 10, # no range
- }
-
- # augmentation operations
- # Lambda is not pickleable for DDP
- # image_ops = {
- # 'ShearX': lambda image, magnitude: shear_x(image, magnitude),
- # 'ShearY': lambda image, magnitude: shear_y(image, magnitude),
- # 'TranslateX': lambda image, magnitude: translate_x(image, magnitude),
- # 'TranslateY': lambda image, magnitude: translate_y(image, magnitude),
- # 'Rotate': lambda image, magnitude: rotate(image, magnitude),
- # 'AutoContrast': lambda image, magnitude: auto_contrast(image, magnitude),
- # 'Invert': lambda image, magnitude: invert(image, magnitude),
- # 'Equalize': lambda image, magnitude: equalize(image, magnitude),
- # 'Solarize': lambda image, magnitude: solarize(image, magnitude),
- # 'Posterize': lambda image, magnitude: posterize(image, magnitude),
- # 'Contrast': lambda image, magnitude: contrast(image, magnitude),
- # 'Color': lambda image, magnitude: color(image, magnitude),
- # 'Brightness': lambda image, magnitude: brightness(image, magnitude),
- # 'Sharpness': lambda image, magnitude: sharpness(image, magnitude),
- # }
- image_ops = {
- 'ShearX': shear_x,
- 'ShearY': shear_y,
- 'TranslateX': translate_x_relative,
- 'TranslateY': translate_y_relative,
- 'Rotate': rotate,
- 'AutoContrast': auto_contrast,
- 'Invert': invert,
- 'Equalize': equalize,
- 'Solarize': solarize,
- 'Posterize': posterize,
- 'Contrast': contrast,
- 'Color': color,
- 'Brightness': brightness,
- 'Sharpness': sharpness,
- }
-
- self.prob = prob
- self.magnitude = ranges[op_name][magnitude_idx]
- self.op = image_ops[op_name]
-
- def __call__(self, image):
- if self.prob > random.random():
- image = self.op(image, self.magnitude)
- return image
-
-
-# PIL Image transforms
-# https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.transform
-def shear_x(image, magnitude, fillcolor=(128, 128, 128)):
- factor = magnitude * random.choice([-1, 1]) # random negative
- return image.transform(image.size, Image.AFFINE, (1, factor, 0, 0, 1, 0), fillcolor=fillcolor)
-
-
-def shear_y(image, magnitude, fillcolor=(128, 128, 128)):
- factor = magnitude * random.choice([-1, 1]) # random negative
- return image.transform(image.size, Image.AFFINE, (1, 0, 0, factor, 1, 0), fillcolor=fillcolor)
-
-
-def translate_x_relative(image, magnitude, fillcolor=(128, 128, 128)):
- pixels = magnitude * image.size[0]
- pixels = pixels * random.choice([-1, 1]) # random negative
- return image.transform(image.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0), fillcolor=fillcolor)
-
-
-def translate_y_relative(image, magnitude, fillcolor=(128, 128, 128)):
- pixels = magnitude * image.size[0]
- pixels = pixels * random.choice([-1, 1]) # random negative
- return image.transform(image.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels), fillcolor=fillcolor)
-
-
-def translate_x_absolute(image, magnitude, fillcolor=(128, 128, 128)):
- magnitude = magnitude * random.choice([-1, 1]) # random negative
- return image.transform(image.size, Image.AFFINE, (1, 0, magnitude, 0, 1, 0), fillcolor=fillcolor)
-
-
-def translate_y_absolute(image, magnitude, fillcolor=(128, 128, 128)):
- magnitude = magnitude * random.choice([-1, 1]) # random negative
- return image.transform(image.size, Image.AFFINE, (1, 0, 0, 0, 1, magnitude), fillcolor=fillcolor)
-
-
-def rotate(image, magnitude):
- rot = image.convert("RGBA").rotate(magnitude)
- return Image.composite(rot,
- Image.new('RGBA', rot.size, (128,) * 4),
- rot).convert(image.mode)
-
-
-def auto_contrast(image, magnitude=None):
- return ImageOps.autocontrast(image)
-
-
-def invert(image, magnitude=None):
- return ImageOps.invert(image)
-
-
-def equalize(image, magnitude=None):
- return ImageOps.equalize(image)
-
-
-def solarize(image, magnitude):
- return ImageOps.solarize(image, magnitude)
-
-
-def posterize(image, magnitude):
- return ImageOps.posterize(image, magnitude)
-
-
-def contrast(image, magnitude):
- magnitude = magnitude * random.choice([-1, 1]) # random negative
- return ImageEnhance.Contrast(image).enhance(1 + magnitude)
-
-
-def color(image, magnitude):
- magnitude = magnitude * random.choice([-1, 1]) # random negative
- return ImageEnhance.Color(image).enhance(1 + magnitude)
-
-
-def brightness(image, magnitude):
- magnitude = magnitude * random.choice([-1, 1]) # random negative
- return ImageEnhance.Brightness(image).enhance(1 + magnitude)
-
-
-def sharpness(image, magnitude):
- magnitude = magnitude * random.choice([-1, 1]) # random negative
- return ImageEnhance.Sharpness(image).enhance(1 + magnitude)
-
diff --git a/image_classification/MAE/configs/vit_base_patch16_224_finetune.yaml b/image_classification/MAE/configs/vit_base_patch16_224_finetune.yaml
deleted file mode 100644
index 9cee1446..00000000
--- a/image_classification/MAE/configs/vit_base_patch16_224_finetune.yaml
+++ /dev/null
@@ -1,42 +0,0 @@
-DATA:
- IMAGE_SIZE: 224
- CROP_PCT: 0.875
-MODEL:
- TYPE: MAE
- NAME: vit_base_patch16_224
- DROPPATH: 0.1
- TRANS:
- PATCH_SIZE: 16
- MLP_RATIO: 4.0
- QKV_BIAS: true
- MASK_RATIO: 0.75
- ENCODER:
- EMBED_DIM: 768
- DEPTH: 12
- NUM_HEADS: 12
-
-TRAIN:
- NUM_EPOCHS: 100
- WARMUP_EPOCHS: 5
- WEIGHT_DECAY: 0.05
- BASE_LR: 1e-3
- WARMUP_START_LR: 1e-6
- ACCUM_ITER: 2 # the total batch size should be 1024
-
- LR_SCHEDULER:
- NAME: 'warmupcosine'
-
- OPTIMIZER:
- NAME: 'AdamW'
- BETAS: (0.9, 0.999)
-
- SMOOTHING: 0.1
- RAND_AUGMENT: True
- RAND_AUGMENT_LAYERS: 9
- RAND_AUGMENT_MAGNITUDE: 5
- MIXUP_ALPHA: 0.8
- MIXUP_PROB: 1.0
- MIXUP_SWITCH_PROB: 0.5
- MIXUP_MODE: 'batch'
- CUTMIX_ALPHA: 1.0
- CUTMIX_MINMAX: None
\ No newline at end of file
diff --git a/image_classification/MAE/configs/vit_base_patch16_224_pretrain.yaml b/image_classification/MAE/configs/vit_base_patch16_224_pretrain.yaml
deleted file mode 100644
index 5eb52f39..00000000
--- a/image_classification/MAE/configs/vit_base_patch16_224_pretrain.yaml
+++ /dev/null
@@ -1,36 +0,0 @@
-DATA:
- IMAGE_SIZE: 224
- CROP_PCT: 0.875
-MODEL:
- TYPE: MAE
- NAME: vit_base_patch16_224
- DROPPATH: 0.0
- MAE_PRETRAIN: True
- TRANS:
- PATCH_SIZE: 16
- MLP_RATIO: 4.0
- QKV_BIAS: true
- MASK_RATIO: 0.75
- ENCODER:
- EMBED_DIM: 768
- DEPTH: 12
- NUM_HEADS: 12
- DECODER:
- EMBED_DIM: 512
- DEPTH: 8
- NUM_HEADS: 8
-TRAIN:
- NUM_EPOCHS: 800
- WARMUP_EPOCHS: 40
- WEIGHT_DECAY: 0.05
- BASE_LR: 1.5e-4
- WARMUP_START_LR: 1e-6
- GRAD_CLIP: None
- ACCUM_ITER: 2 # the total batch size should be 4096
-
- LR_SCHEDULER:
- NAME: 'warmupcosine'
-
- OPTIMIZER:
- NAME: 'AdamW'
- BETAS: (0.9, 0.95)
diff --git a/image_classification/MAE/configs/vit_base_patch16_224_pretrain_dec1.yaml b/image_classification/MAE/configs/vit_base_patch16_224_pretrain_dec1.yaml
deleted file mode 100644
index c4284444..00000000
--- a/image_classification/MAE/configs/vit_base_patch16_224_pretrain_dec1.yaml
+++ /dev/null
@@ -1,37 +0,0 @@
-DATA:
- IMAGE_SIZE: 224
- CROP_PCT: 0.875
-MODEL:
- TYPE: MAE
- NAME: vit_base_patch16_224_dec1
- DROPPATH: 0.0
- MAE_PRETRAIN: True
- TRANS:
- PATCH_SIZE: 16
- MLP_RATIO: 4.0
- QKV_BIAS: true
- MASK_RATIO: 0.75
- ENCODER:
- EMBED_DIM: 768
- DEPTH: 12
- NUM_HEADS: 12
- DECODER:
- EMBED_DIM: 512
- DEPTH: 1
- NUM_HEADS: 8
-TRAIN:
- NUM_EPOCHS: 800
- WARMUP_EPOCHS: 40
- WEIGHT_DECAY: 0.05
- BASE_LR: 1.5e-4
- WARMUP_START_LR: 1e-6
- GRAD_CLIP: None
- ACCUM_ITER: 2 # 8gpus only have 2048 batch size, the total batch size should be 4096
- LINEAR_SCALED_LR: None
-
- LR_SCHEDULER:
- NAME: 'warmupcosine'
-
- OPTIMIZER:
- NAME: 'AdamW'
- BETAS: (0.9, 0.95)
diff --git a/image_classification/MAE/configs/vit_large_patch16_224_finetune.yaml b/image_classification/MAE/configs/vit_large_patch16_224_finetune.yaml
deleted file mode 100644
index 11136830..00000000
--- a/image_classification/MAE/configs/vit_large_patch16_224_finetune.yaml
+++ /dev/null
@@ -1,42 +0,0 @@
-DATA:
- IMAGE_SIZE: 224
- CROP_PCT: 0.875
-MODEL:
- TYPE: MAE
- NAME: vit_large_patch16_224
- DROPPATH: 0.1
- TRANS:
- PATCH_SIZE: 16
- MLP_RATIO: 4.0
- QKV_BIAS: true
- MASK_RATIO: 0.75
- ENCODER:
- EMBED_DIM: 768
- DEPTH: 12
- NUM_HEADS: 12
-
-TRAIN:
- NUM_EPOCHS: 50
- WARMUP_EPOCHS: 5
- WEIGHT_DECAY: 0.05
- BASE_LR: 1e-3
- WARMUP_START_LR: 1e-6
- ACCUM_ITER: 2 # the total batch size should be 1024
-
- LR_SCHEDULER:
- NAME: 'warmupcosine'
-
- OPTIMIZER:
- NAME: 'AdamW'
- BETAS: (0.9, 0.999)
-
- SMOOTHING: 0.1
- RAND_AUGMENT: True
- RAND_AUGMENT_LAYERS: 9
- RAND_AUGMENT_MAGNITUDE: 5
- MIXUP_ALPHA: 0.8
- MIXUP_PROB: 1.0
- MIXUP_SWITCH_PROB: 0.5
- MIXUP_MODE: 'batch'
- CUTMIX_ALPHA: 1.0
- CUTMIX_MINMAX: None
\ No newline at end of file
diff --git a/image_classification/MAE/configs/vit_large_patch16_224_pretrain.yaml b/image_classification/MAE/configs/vit_large_patch16_224_pretrain.yaml
deleted file mode 100644
index 04b5e086..00000000
--- a/image_classification/MAE/configs/vit_large_patch16_224_pretrain.yaml
+++ /dev/null
@@ -1,36 +0,0 @@
-DATA:
- IMAGE_SIZE: 224
- CROP_PCT: 0.875
-MODEL:
- TYPE: MAE
- NAME: vit_large_patch16_224
- DROPPATH: 0.0
- MAE_PRETRAIN: True
- TRANS:
- PATCH_SIZE: 16
- MLP_RATIO: 4.0
- QKV_BIAS: true
- MASK_RATIO: 0.75
- ENCODER:
- EMBED_DIM: 768
- DEPTH: 12
- NUM_HEADS: 12
- DECODER:
- EMBED_DIM: 512
- DEPTH: 8
- NUM_HEADS: 8
-TRAIN:
- NUM_EPOCHS: 800
- WARMUP_EPOCHS: 40
- WEIGHT_DECAY: 0.05
- BASE_LR: 1.5e-4
- WARMUP_START_LR: 1e-6
- GRAD_CLIP: None
- ACCUM_ITER: 2 # the total batch size should be 4096
-
- LR_SCHEDULER:
- NAME: 'warmupcosine'
-
- OPTIMIZER:
- NAME: 'AdamW'
- BETAS: (0.9, 0.95)
\ No newline at end of file
diff --git a/image_classification/MAE/main_multi_gpu_finetune.py b/image_classification/MAE/main_multi_gpu_finetune.py
deleted file mode 100644
index a6ace004..00000000
--- a/image_classification/MAE/main_multi_gpu_finetune.py
+++ /dev/null
@@ -1,580 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""MAE finetuning/validation using multiple GPU """
-
-import sys
-import os
-import time
-import logging
-import argparse
-import random
-import numpy as np
-import paddle
-import paddle.nn as nn
-import paddle.nn.functional as F
-import paddle.distributed as dist
-from datasets import get_dataloader
-from datasets import get_dataset
-from transformer import build_mae_finetune as build_model
-from losses import LabelSmoothingCrossEntropyLoss
-from losses import SoftTargetCrossEntropyLoss
-from utils import AverageMeter
-from utils import WarmupCosineScheduler
-from utils import get_exclude_from_weight_decay_fn
-from config import get_config
-from config import update_config
-from mixup import Mixup
-
-
-def get_arguments():
- """return argumeents, this will overwrite the config after loading yaml file"""
- parser = argparse.ArgumentParser('ViT')
- parser.add_argument('-cfg', type=str, default=None)
- parser.add_argument('-dataset', type=str, default=None)
- parser.add_argument('-batch_size', type=int, default=None)
- parser.add_argument('-image_size', type=int, default=None)
- parser.add_argument('-data_path', type=str, default=None)
- parser.add_argument('-output', type=str, default=None)
- parser.add_argument('-ngpus', type=int, default=None)
- parser.add_argument('-pretrained', type=str, default=None)
- parser.add_argument('-resume', type=str, default=None)
- parser.add_argument('-last_epoch', type=int, default=None)
- parser.add_argument('-eval', action='store_true')
- parser.add_argument('-mae_pretrain', action='store_true')
- parser.add_argument('-amp', action='store_true')
- arguments = parser.parse_args()
- return arguments
-
-
-def get_logger(filename, logger_name=None):
- """set logging file and format
- Args:
- filename: str, full path of the logger file to write
- logger_name: str, the logger name, e.g., 'master_logger', 'local_logger'
- Return:
- logger: python logger
- """
- log_format = "%(asctime)s %(message)s"
- logging.basicConfig(stream=sys.stdout, level=logging.INFO,
- format=log_format, datefmt="%m%d %I:%M:%S %p")
- # different name is needed when creating multiple logger in one process
- logger = logging.getLogger(logger_name)
- fh = logging.FileHandler(os.path.join(filename))
- fh.setFormatter(logging.Formatter(log_format))
- logger.addHandler(fh)
- return logger
-
-
-def train(dataloader,
- model,
- criterion,
- optimizer,
- epoch,
- total_epochs,
- total_batch,
- debug_steps=100,
- accum_iter=1,
- mixup_fn=None,
- amp=False,
- local_logger=None,
- master_logger=None):
- """Training for one epoch
- Args:
- dataloader: paddle.io.DataLoader, dataloader instance
- model: nn.Layer, a ViT model
- criterion: nn.criterion
- epoch: int, current epoch
- total_epochs: int, total num of epochs
- total_batch: int, total num of batches for one epoch
- debug_steps: int, num of iters to log info, default: 100
- accum_iter: int, num of iters for accumulating gradients, default: 1
- mixup_fn: Mixup, mixup instance, default: None
- amp: bool, if True, use mix precision training, default: False
- local_logger: logger for local process/gpu, default: None
- master_logger: logger for main process, default: None
- Returns:
- train_loss_meter.avg: float, average loss on current process/gpu
- train_acc_meter.avg: float, average top1 accuracy on current process/gpu
- master_train_loss_meter.avg: float, average loss on all processes/gpus
- master_train_acc_meter.avg: float, average top1 accuracy on all processes/gpus
- train_time: float, training time
- """
- model.train()
- train_loss_meter = AverageMeter()
- train_acc_meter = AverageMeter()
- master_train_loss_meter = AverageMeter()
- master_train_acc_meter = AverageMeter()
-
- if amp is True:
- scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
- time_st = time.time()
-
- for batch_id, data in enumerate(dataloader):
- image = data[0]
- label = data[1]
- label_orig = label.clone()
-
- if mixup_fn is not None:
- image, label = mixup_fn(image, label_orig)
-
- if amp is True: # mixed precision training
- with paddle.amp.auto_cast():
- output = model(image)
- loss = criterion(output, label)
- scaled = scaler.scale(loss)
- scaled.backward()
- if ((batch_id +1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
- scaler.minimize(optimizer, scaled)
- optimizer.clear_grad()
- else: # full precision training
- output = model(image)
- loss = criterion(output, label)
- # NOTE: division may be needed depending on the loss function
- # Here no division is needed:
- # default 'reduction' param in nn.CrossEntropyLoss is set to 'mean'
- # loss = loss / accum_iter
- loss.backward()
-
- if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
- optimizer.step()
- optimizer.clear_grad()
-
- pred = F.softmax(output)
- if mixup_fn:
- acc = paddle.metric.accuracy(pred, label_orig)
- else:
- acc = paddle.metric.accuracy(pred, label_orig.unsqueeze(1))
-
- batch_size = paddle.to_tensor(image.shape[0])
-
- # sync from other gpus for overall loss and acc
- master_loss = loss.clone()
- master_acc = acc.clone()
- master_batch_size = batch_size.clone()
- dist.all_reduce(master_loss)
- dist.all_reduce(master_acc)
- dist.all_reduce(master_batch_size)
- master_loss = master_loss / dist.get_world_size()
- master_acc = master_acc / dist.get_world_size()
- master_train_loss_meter.update(master_loss.numpy()[0], master_batch_size.numpy()[0])
- master_train_acc_meter.update(master_acc.numpy()[0], master_batch_size.numpy()[0])
-
- train_loss_meter.update(loss.numpy()[0], batch_size.numpy()[0])
- train_acc_meter.update(acc.numpy()[0], batch_size.numpy()[0])
-
- if batch_id % debug_steps == 0:
- if local_logger:
- local_logger.info(
- f"Epoch[{epoch:03d}/{total_epochs:03d}], " +
- f"Step[{batch_id:04d}/{total_batch:04d}], " +
- f"Avg Loss: {train_loss_meter.avg:.4f}, " +
- f"Avg Acc: {train_acc_meter.avg:.4f}")
- if master_logger and dist.get_rank() == 0:
- master_logger.info(
- f"Epoch[{epoch:03d}/{total_epochs:03d}], " +
- f"Step[{batch_id:04d}/{total_batch:04d}], " +
- f"Avg Loss: {master_train_loss_meter.avg:.4f}, " +
- f"Avg Acc: {master_train_acc_meter.avg:.4f}")
-
- train_time = time.time() - time_st
- return (train_loss_meter.avg,
- train_acc_meter.avg,
- master_train_loss_meter.avg,
- master_train_acc_meter.avg,
- train_time)
-
-
-def validate(dataloader,
- model,
- criterion,
- total_batch,
- debug_steps=100,
- local_logger=None,
- master_logger=None):
- """Validation for whole dataset
- Args:
- dataloader: paddle.io.DataLoader, dataloader instance
- model: nn.Layer, a ViT model
- criterion: nn.criterion
- total_epoch: int, total num of epoch, for logging
- debug_steps: int, num of iters to log info, default: 100
- local_logger: logger for local process/gpu, default: None
- master_logger: logger for main process, default: None
- Returns:
- val_loss_meter.avg: float, average loss on current process/gpu
- val_acc1_meter.avg: float, average top1 accuracy on current process/gpu
- val_acc5_meter.avg: float, average top5 accuracy on current process/gpu
- master_val_loss_meter.avg: float, average loss on all processes/gpus
- master_val_acc1_meter.avg: float, average top1 accuracy on all processes/gpus
- master_val_acc5_meter.avg: float, average top5 accuracy on all processes/gpus
- val_time: float, validation time
- """
- model.eval()
- val_loss_meter = AverageMeter()
- val_acc1_meter = AverageMeter()
- val_acc5_meter = AverageMeter()
- master_val_loss_meter = AverageMeter()
- master_val_acc1_meter = AverageMeter()
- master_val_acc5_meter = AverageMeter()
- time_st = time.time()
-
- with paddle.no_grad():
- for batch_id, data in enumerate(dataloader):
- image = data[0]
- label = data[1]
-
- output = model(image)
- loss = criterion(output, label)
-
- pred = F.softmax(output)
- acc1 = paddle.metric.accuracy(pred, label.unsqueeze(1))
- acc5 = paddle.metric.accuracy(pred, label.unsqueeze(1), k=5)
-
- batch_size = paddle.to_tensor(image.shape[0])
-
- master_loss = loss.clone()
- master_acc1 = acc1.clone()
- master_acc5 = acc5.clone()
- master_batch_size = batch_size.clone()
-
- dist.all_reduce(master_loss)
- dist.all_reduce(master_acc1)
- dist.all_reduce(master_acc5)
- dist.all_reduce(master_batch_size)
- master_loss = master_loss / dist.get_world_size()
- master_acc1 = master_acc1 / dist.get_world_size()
- master_acc5 = master_acc5 / dist.get_world_size()
-
- master_val_loss_meter.update(master_loss.numpy()[0], master_batch_size.numpy()[0])
- master_val_acc1_meter.update(master_acc1.numpy()[0], master_batch_size.numpy()[0])
- master_val_acc5_meter.update(master_acc5.numpy()[0], master_batch_size.numpy()[0])
-
- val_loss_meter.update(loss.numpy()[0], batch_size.numpy()[0])
- val_acc1_meter.update(acc1.numpy()[0], batch_size.numpy()[0])
- val_acc5_meter.update(acc5.numpy()[0], batch_size.numpy()[0])
-
- if batch_id % debug_steps == 0:
- if local_logger:
- local_logger.info(
- f"Val Step[{batch_id:04d}/{total_batch:04d}], " +
- f"Avg Loss: {val_loss_meter.avg:.4f}, " +
- f"Avg Acc@1: {val_acc1_meter.avg:.4f}, " +
- f"Avg Acc@5: {val_acc5_meter.avg:.4f}")
- if master_logger and dist.get_rank() == 0:
- master_logger.info(
- f"Val Step[{batch_id:04d}/{total_batch:04d}], " +
- f"Avg Loss: {master_val_loss_meter.avg:.4f}, " +
- f"Avg Acc@1: {master_val_acc1_meter.avg:.4f}, " +
- f"Avg Acc@5: {master_val_acc5_meter.avg:.4f}")
- val_time = time.time() - time_st
- return (val_loss_meter.avg,
- val_acc1_meter.avg,
- val_acc5_meter.avg,
- master_val_loss_meter.avg,
- master_val_acc1_meter.avg,
- master_val_acc5_meter.avg,
- val_time)
-
-
-def main_worker(*args):
- # STEP 0: Preparation
- config = args[0]
- dist.init_parallel_env()
- last_epoch = config.TRAIN.LAST_EPOCH
- world_size = dist.get_world_size()
- local_rank = dist.get_rank()
- seed = config.SEED + local_rank
- paddle.seed(seed)
- np.random.seed(seed)
- random.seed(seed)
- # logger for each process/gpu
- local_logger = get_logger(
- filename=os.path.join(config.SAVE, 'log_{}.txt'.format(local_rank)),
- logger_name='local_logger')
- # overall logger
- if local_rank == 0:
- master_logger = get_logger(
- filename=os.path.join(config.SAVE, 'log.txt'),
- logger_name='master_logger')
- master_logger.info(f'\n{config}')
- else:
- master_logger = None
- local_logger.info(f'----- world_size = {world_size}, local_rank = {local_rank}')
- if local_rank == 0:
- master_logger.info(f'----- world_size = {world_size}, local_rank = {local_rank}')
-
- # STEP 1: Create model
- model = build_model(config)
- model = paddle.DataParallel(model)
-
- # STEP 2: Create train and val dataloader
- dataset_train, dataset_val = args[1], args[2]
- dataloader_train = get_dataloader(config, dataset_train, 'train', True)
- dataloader_val = get_dataloader(config, dataset_val, 'test', True)
- total_batch_train = len(dataloader_train)
- total_batch_val = len(dataloader_val)
- local_logger.info(f'----- Total # of train batch (single gpu): {total_batch_train}')
- local_logger.info(f'----- Total # of val batch (single gpu): {total_batch_val}')
- if local_rank == 0:
- master_logger.info(f'----- Total # of train batch (single gpu): {total_batch_train}')
- master_logger.info(f'----- Total # of val batch (single gpu): {total_batch_val}')
-
- # STEP 3: Define Mixup function
- mixup_fn = None
- if config.TRAIN.MIXUP_PROB > 0 or config.TRAIN.CUTMIX_ALPHA > 0 or config.TRAIN.CUTMIX_MINMAX is not None:
- mixup_fn = Mixup(mixup_alpha=config.TRAIN.MIXUP_ALPHA,
- cutmix_alpha=config.TRAIN.CUTMIX_ALPHA,
- cutmix_minmax=config.TRAIN.CUTMIX_MINMAX,
- prob=config.TRAIN.MIXUP_PROB,
- switch_prob=config.TRAIN.MIXUP_SWITCH_PROB,
- mode=config.TRAIN.MIXUP_MODE,
- label_smoothing=config.TRAIN.SMOOTHING)
-
- # STEP 4: Define criterion
- if config.TRAIN.MIXUP_PROB > 0.:
- criterion = SoftTargetCrossEntropyLoss()
- elif config.TRAIN.SMOOTHING:
- criterion = LabelSmoothingCrossEntropyLoss()
- else:
- criterion = nn.CrossEntropyLoss()
- # only use cross entropy for val
- criterion_val = nn.CrossEntropyLoss()
-
- # STEP 5: Define optimizer and lr_scheduler
- # set lr according to batch size and world size (hacked from Swin official code and modified for CSwin)
- if config.TRAIN.LINEAR_SCALED_LR is not None:
- linear_scaled_lr = (
- config.TRAIN.BASE_LR * config.DATA.BATCH_SIZE * world_size) / config.TRAIN.LINEAR_SCALED_LR
- linear_scaled_warmup_start_lr = (
- config.TRAIN.WARMUP_START_LR * config.DATA.BATCH_SIZE * world_size) / config.TRAIN.LINEAR_SCALED_LR
- linear_scaled_end_lr = (
- config.TRAIN.END_LR * config.DATA.BATCH_SIZE * world_size) / config.TRAIN.LINEAR_SCALED_LR
-
- if config.TRAIN.ACCUM_ITER > 1:
- linear_scaled_lr = linear_scaled_lr * config.TRAIN.ACCUM_ITER
- linear_scaled_warmup_start_lr = linear_scaled_warmup_start_lr * config.TRAIN.ACCUM_ITER
- linear_scaled_end_lr = linear_scaled_end_lr * config.TRAIN.ACCUM_ITER
-
- config.TRAIN.BASE_LR = linear_scaled_lr
- config.TRAIN.WARMUP_START_LR = linear_scaled_warmup_start_lr
- config.TRAIN.END_LR = linear_scaled_end_lr
-
- scheduler = None
- if config.TRAIN.LR_SCHEDULER.NAME == "warmupcosine":
- scheduler = WarmupCosineScheduler(learning_rate=config.TRAIN.BASE_LR,
- warmup_start_lr=config.TRAIN.WARMUP_START_LR,
- start_lr=config.TRAIN.BASE_LR,
- end_lr=config.TRAIN.END_LR,
- warmup_epochs=config.TRAIN.WARMUP_EPOCHS,
- total_epochs=config.TRAIN.NUM_EPOCHS,
- last_epoch=config.TRAIN.LAST_EPOCH,
- )
- elif config.TRAIN.LR_SCHEDULER.NAME == "cosine":
- scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=config.TRAIN.BASE_LR,
- T_max=config.TRAIN.NUM_EPOCHS,
- last_epoch=last_epoch)
- elif config.scheduler == "multi-step":
- milestones = [int(v.strip())
- for v in config.TRAIN.LR_SCHEDULER.MILESTONES.split(",")]
- scheduler = paddle.optimizer.lr.MultiStepDecay(learning_rate=config.TRAIN.BASE_LR,
- milestones=milestones,
- gamma=config.TRAIN.LR_SCHEDULER.DECAY_RATE,
- last_epoch=last_epoch)
- else:
- local_logger.fatal(f"Unsupported Scheduler: {config.TRAIN.LR_SCHEDULER}.")
- if local_rank == 0:
- master_logger.fatal(f"Unsupported Scheduler: {config.TRAIN.LR_SCHEDULER}.")
- raise NotImplementedError(f"Unsupported Scheduler: {config.TRAIN.LR_SCHEDULER}.")
-
- if config.TRAIN.OPTIMIZER.NAME == "SGD":
- if config.TRAIN.GRAD_CLIP:
- clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
- else:
- clip = None
- optimizer = paddle.optimizer.Momentum(
- parameters=model.parameters(),
- learning_rate=scheduler if scheduler is not None else config.TRAIN.BASE_LR,
- weight_decay=config.TRAIN.WEIGHT_DECAY,
- momentum=config.TRAIN.OPTIMIZER.MOMENTUM,
- grad_clip=clip)
- elif config.TRAIN.OPTIMIZER.NAME == "AdamW":
- if config.TRAIN.GRAD_CLIP:
- clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
- else:
- clip = None
- optimizer = paddle.optimizer.AdamW(
- parameters=model.parameters(),
- learning_rate=scheduler if scheduler is not None else config.TRAIN.BASE_LR,
- beta1=config.TRAIN.OPTIMIZER.BETAS[0],
- beta2=config.TRAIN.OPTIMIZER.BETAS[1],
- weight_decay=config.TRAIN.WEIGHT_DECAY,
- epsilon=config.TRAIN.OPTIMIZER.EPS,
- grad_clip=clip,
- #apply_decay_param_fun=get_exclude_from_weight_decay_fn(['pos_embed', 'cls_token']),
- )
- else:
- local_logger.fatal(f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}.")
- if local_rank == 0:
- master_logger.fatal(f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}.")
- raise NotImplementedError(f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}.")
-
- # STEP 6: Load pretrained model / load resumt model and optimizer states
- if config.MODEL.PRETRAINED:
- if (config.MODEL.PRETRAINED).endswith('.pdparams'):
- raise ValueError(
- f'{config.MODEL.PRETRAINED} should not contain .pdparams')
- assert os.path.isfile(config.MODEL.PRETRAINED + '.pdparams') is True
- model_state = paddle.load(config.MODEL.PRETRAINED+'.pdparams')
- model.set_dict(model_state)
- local_logger.info(f"----- Pretrained: Load model state from {config.MODEL.PRETRAINED}")
- if local_rank == 0:
- master_logger.info(
- f"----- Pretrained: Load model state from {config.MODEL.PRETRAINED}")
-
- if config.MODEL.RESUME:
- assert os.path.isfile(config.MODEL.RESUME + '.pdparams') is True
- assert os.path.isfile(config.MODEL.RESUME + '.pdopt') is True
- model_state = paddle.load(config.MODEL.RESUME + '.pdparams')
- model.set_dict(model_state)
- opt_state = paddle.load(config.MODEL.RESUME+'.pdopt')
- optimizer.set_state_dict(opt_state)
- local_logger.info(
- f"----- Resume Training: Load model and optmizer from {config.MODEL.RESUME}")
- if local_rank == 0:
- master_logger.info(
- f"----- Resume Training: Load model and optmizer from {config.MODEL.RESUME}")
-
- # STEP 7: Validation (eval mode)
- if config.EVAL:
- local_logger.info('----- Start Validating')
- if local_rank == 0:
- master_logger.info('----- Start Validating')
- val_loss, val_acc1, val_acc5, avg_loss, avg_acc1, avg_acc5, val_time = validate(
- dataloader=dataloader_val,
- model=model,
- criterion=criterion_val,
- total_batch=total_batch_val,
- debug_steps=config.REPORT_FREQ,
- local_logger=local_logger,
- master_logger=master_logger)
- local_logger.info(f"Validation Loss: {val_loss:.4f}, " +
- f"Validation Acc@1: {val_acc1:.4f}, " +
- f"Validation Acc@5: {val_acc5:.4f}, " +
- f"time: {val_time:.2f}")
- if local_rank == 0:
- master_logger.info(f"Validation Loss: {avg_loss:.4f}, " +
- f"Validation Acc@1: {avg_acc1:.4f}, " +
- f"Validation Acc@5: {avg_acc5:.4f}, " +
- f"time: {val_time:.2f}")
- return
-
- # STEP 8: Start training and validation (train mode)
- local_logger.info(f"Start training from epoch {last_epoch+1}.")
- if local_rank == 0:
- master_logger.info(f"Start training from epoch {last_epoch+1}.")
- for epoch in range(last_epoch+1, config.TRAIN.NUM_EPOCHS+1):
- # train
- local_logger.info(f"Now training epoch {epoch}. LR={optimizer.get_lr():.6f}")
- if local_rank == 0:
- master_logger.info(f"Now training epoch {epoch}. LR={optimizer.get_lr():.6f}")
- train_loss, train_acc, avg_loss, avg_acc, train_time = train(
- dataloader=dataloader_train,
- model=model,
- criterion=criterion,
- optimizer=optimizer,
- epoch=epoch,
- total_epochs=config.TRAIN.NUM_EPOCHS,
- total_batch=total_batch_train,
- debug_steps=config.REPORT_FREQ,
- accum_iter=config.TRAIN.ACCUM_ITER,
- mixup_fn=mixup_fn,
- amp=config.AMP,
- local_logger=local_logger,
- master_logger=master_logger)
-
- scheduler.step()
-
- local_logger.info(f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
- f"Train Loss: {train_loss:.4f}, " +
- f"Train Acc: {train_acc:.4f}, " +
- f"time: {train_time:.2f}")
- if local_rank == 0:
- master_logger.info(f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
- f"Train Loss: {avg_loss:.4f}, " +
- f"Train Acc: {avg_acc:.4f}, " +
- f"time: {train_time:.2f}")
-
- # validation
- if epoch % config.VALIDATE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
- local_logger.info(f'----- Validation after Epoch: {epoch}')
- if local_rank == 0:
- master_logger.info(f'----- Validation after Epoch: {epoch}')
- val_loss, val_acc1, val_acc5, avg_loss, avg_acc1, avg_acc5, val_time = validate(
- dataloader=dataloader_val,
- model=model,
- criterion=criterion_val,
- total_batch=total_batch_val,
- debug_steps=config.REPORT_FREQ,
- local_logger=local_logger,
- master_logger=master_logger)
- local_logger.info(f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
- f"Validation Loss: {val_loss:.4f}, " +
- f"Validation Acc@1: {val_acc1:.4f}, " +
- f"Validation Acc@5: {val_acc5:.4f}, " +
- f"time: {val_time:.2f}")
- if local_rank == 0:
- master_logger.info(f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
- f"Validation Loss: {avg_loss:.4f}, " +
- f"Validation Acc@1: {avg_acc1:.4f}, " +
- f"Validation Acc@5: {avg_acc5:.4f}, " +
- f"time: {val_time:.2f}")
- # model save
- if local_rank == 0:
- if epoch % config.SAVE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
- model_path = os.path.join(
- config.SAVE, f"{config.MODEL.TYPE}-Epoch-{epoch}-Loss-{train_loss}")
- paddle.save(model.state_dict(), model_path + '.pdparams')
- paddle.save(optimizer.state_dict(), model_path + '.pdopt')
- local_logger.info(f"----- Save model: {model_path}.pdparams")
- local_logger.info(f"----- Save optim: {model_path}.pdopt")
- if local_rank == 0:
- master_logger.info(f"----- Save model: {model_path}.pdparams")
- master_logger.info(f"----- Save optim: {model_path}.pdopt")
-
-
-def main():
- # config is updated by: (1) config.py, (2) yaml file, (3) arguments
- arguments = get_arguments()
- config = get_config()
- config = update_config(config, arguments)
-
- # set output folder
- if not config.EVAL:
- config.SAVE = '{}/train-{}'.format(config.SAVE, time.strftime('%Y%m%d-%H-%M-%S'))
- else:
- config.SAVE = '{}/eval-{}'.format(config.SAVE, time.strftime('%Y%m%d-%H-%M-%S'))
-
- if not os.path.exists(config.SAVE):
- os.makedirs(config.SAVE, exist_ok=True)
-
- # get dataset and start DDP
- dataset_train = get_dataset(config, mode='train')
- dataset_val = get_dataset(config, mode='val')
- config.NGPUS = len(paddle.static.cuda_places()) if config.NGPUS == -1 else config.NGPUS
- dist.spawn(main_worker, args=(config, dataset_train, dataset_val, ), nprocs=config.NGPUS)
-
-
-if __name__ == "__main__":
- main()
diff --git a/image_classification/MAE/main_multi_gpu_pretrain.py b/image_classification/MAE/main_multi_gpu_pretrain.py
deleted file mode 100644
index d1789ddf..00000000
--- a/image_classification/MAE/main_multi_gpu_pretrain.py
+++ /dev/null
@@ -1,417 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""MEA pre-training using multiple GPU """
-
-import sys
-import os
-import time
-import logging
-import argparse
-import random
-import numpy as np
-import paddle
-import paddle.nn as nn
-import paddle.nn.functional as F
-import paddle.distributed as dist
-from datasets import get_dataloader
-from datasets import get_dataset
-from transformer import build_mae_pretrain as build_model
-from utils import AverageMeter
-from utils import WarmupCosineScheduler
-from utils import get_exclude_from_weight_decay_fn
-from config import get_config
-from config import update_config
-
-
-def get_arguments():
- """return argumeents, this will overwrite the config after loading yaml file"""
- parser = argparse.ArgumentParser('MAE')
- parser.add_argument('-cfg', type=str, default=None)
- parser.add_argument('-dataset', type=str, default=None)
- parser.add_argument('-batch_size', type=int, default=None)
- parser.add_argument('-image_size', type=int, default=None)
- parser.add_argument('-data_path', type=str, default=None)
- parser.add_argument('-output', type=str, default=None)
- parser.add_argument('-ngpus', type=int, default=None)
- parser.add_argument('-pretrained', type=str, default=None)
- parser.add_argument('-resume', type=str, default=None)
- parser.add_argument('-last_epoch', type=int, default=None)
- parser.add_argument('-eval', action='store_true')
- parser.add_argument('-mae_pretrain', action='store_true')
- parser.add_argument('-amp', action='store_true')
- arguments = parser.parse_args()
- return arguments
-
-
-def get_logger(filename, logger_name=None):
- """set logging file and format
- Args:
- filename: str, full path of the logger file to write
- logger_name: str, the logger name, e.g., 'master_logger', 'local_logger'
- Return:
- logger: python logger
- """
- log_format = "%(asctime)s %(message)s"
- logging.basicConfig(stream=sys.stdout, level=logging.INFO,
- format=log_format, datefmt="%m%d %I:%M:%S %p")
- # different name is needed when creating multiple logger in one process
- logger = logging.getLogger(logger_name)
- fh = logging.FileHandler(os.path.join(filename))
- fh.setFormatter(logging.Formatter(log_format))
- logger.addHandler(fh)
- return logger
-
-
-def train(dataloader,
- patch_size,
- model,
- criterion,
- optimizer,
- epoch,
- total_epochs,
- total_batch,
- normalize_target=True,
- debug_steps=100,
- accum_iter=1,
- amp=False,
- local_logger=None,
- master_logger=None):
- """Training for one epoch
- Args:
- dataloader: paddle.io.DataLoader, dataloader instance
- patch_size: int/tuple, image patch size
- model: nn.Layer, a ViT model
- criterion: nn.criterion
- epoch: int, current epoch
- total_epochs: int, total num of epochs
- normalize_target: bool, if True, tokens are normalized by itself, default: True
- total_batch: int, total num of batches for one epoch
- debug_steps: int, num of iters to log info, default: 100
- accum_iter: int, num of iters for accumulating gradients, default: 1
- mixup_fn: Mixup, mixup instance, default: None
- amp: bool, if True, use mix precision training, default: False
- local_logger: logger for local process/gpu, default: None
- master_logger: logger for main process, default: None
- Returns:
- train_loss_meter.avg: float, average loss on current process/gpu
- train_acc_meter.avg: float, average top1 accuracy on current process/gpu
- master_train_loss_meter.avg: float, average loss on all processes/gpus
- master_train_acc_meter.avg: float, average top1 accuracy on all processes/gpus
- train_time: float, training time
- """
- model.train()
- train_loss_meter = AverageMeter()
- master_train_loss_meter = AverageMeter()
-
- if amp is True:
- scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
- time_st = time.time()
-
- for batch_id, data in enumerate(dataloader):
- images = data[0]
- masks = paddle.to_tensor(data[1], dtype='bool')
-
- with paddle.no_grad():
- mean = paddle.to_tensor([0.485, 0.456, 0.406]).reshape([1, 3, 1, 1])
- std = paddle.to_tensor([0.229, 0.224, 0.225]).reshape([1, 3, 1, 1])
- unnorm_images = images * std + mean
- B, C, H, W = images.shape
- if normalize_target:
- images_patch = unnorm_images.reshape([B, C, H//patch_size, patch_size, W//patch_size, patch_size])
- images_patch = images_patch.transpose([0, 2, 4, 3, 5, 1])
- images_patch = unnorm_images.reshape([B, -1, patch_size * patch_size, C])
- images_patch = (images_patch - images_patch.mean(axis=-2, keepdim=True)) / (
- images_patch.var(axis=-2, keepdim=True).sqrt() + 1e-6)
- images_patch = images_patch.flatten(-2)
- else:
- images_patch = unnorm_images.reshape([B, C, H//patch_size, patch_size, W//patch_size, patch_size])
- images_patch = images_patch.transpose([0, 2, 4, 3, 5, 1])
- images_patch = unnorm_images.reshape([B, -1, patch_size * patch_size, C])
- images_patch = images_patch.flatten(-2)
-
- B, _, C = images_patch.shape
- labels = images_patch[masks[:, 1:]].reshape([B, -1, C])
-
- if amp is True:
- with paddle.amp.auto_cast():
- reconstructed_patches = model(images, masks)
- loss = criterion(reconstructed_patches, labels)
- scaled = scaler.scale(loss)
- scaled.backward()
-
- if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
- scaler.minimize(optimizer, scaled)
- optimizer.clear_grad()
- else:
- reconstructed_patches = model(images, masks)
- loss = criterion(reconstructed_patches, labels)
- # NOTE: division may be needed depending on the loss function
- # Here no division is needed:
- # default 'reduction' param in nn.CrossEntropyLoss is set to 'mean'
- # loss = loss / accum_iter
- loss.backward()
-
- if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
- optimizer.step()
- optimizer.clear_grad()
-
- batch_size = paddle.to_tensor(images.shape[0])
-
- # sync from other gpus for overall loss and acc
- master_loss = loss.clone()
- master_batch_size = batch_size.clone()
- dist.all_reduce(master_loss)
- dist.all_reduce(master_batch_size)
- master_loss = master_loss / dist.get_world_size()
- master_train_loss_meter.update(master_loss.numpy()[0], master_batch_size.numpy()[0])
-
- train_loss_meter.update(loss.numpy()[0], batch_size.numpy()[0])
-
- if batch_id % debug_steps == 0:
- if local_logger:
- local_logger.info(
- f"Epoch[{epoch:03d}/{total_epochs:03d}], " +
- f"Step[{batch_id:04d}/{total_batch:04d}], " +
- f"Avg Loss: {train_loss_meter.avg:.4f}")
- if master_logger and dist.get_rank() == 0:
- master_logger.info(
- f"Epoch[{epoch:03d}/{total_epochs:03d}], " +
- f"Step[{batch_id:04d}/{total_batch:04d}], " +
- f"Avg Loss: {master_train_loss_meter.avg:.4f}")
-
- train_time = time.time() - time_st
- return (train_loss_meter.avg,
- master_train_loss_meter.avg,
- train_time)
-
-
-def main_worker(*args):
- # STEP 0: Preparation
- config = args[0]
- dist.init_parallel_env()
- last_epoch = config.TRAIN.LAST_EPOCH
- world_size = dist.get_world_size()
- local_rank = dist.get_rank()
- seed = config.SEED + local_rank
- paddle.seed(seed)
- np.random.seed(seed)
- random.seed(seed)
- # logger for each process/gpu
- local_logger = get_logger(
- filename=os.path.join(config.SAVE, 'log_{}.txt'.format(local_rank)),
- logger_name='local_logger')
- # overall logger
- if local_rank == 0:
- master_logger = get_logger(
- filename=os.path.join(config.SAVE, 'log.txt'),
- logger_name='master_logger')
- master_logger.info(f'\n{config}')
- else:
- master_logger = None
- local_logger.info(f'----- world_size = {world_size}, local_rank = {local_rank}')
- if local_rank == 0:
- master_logger.info(f'----- world_size = {world_size}, local_rank = {local_rank}')
-
- # STEP 1: Create model
- model = build_model(config)
- model = paddle.DataParallel(model)
-
- # STEP 2: Create train and val dataloader
- dataset_train = args[1]
- dataloader_train = get_dataloader(config, dataset_train, 'train', True)
- total_batch_train = len(dataloader_train)
- local_logger.info(f'----- Total # of train batch (single gpu): {total_batch_train}')
- if local_rank == 0:
- master_logger.info(f'----- Total # of train batch (single gpu): {total_batch_train}')
-
- # STEP 3: Define criterion
- criterion = nn.MSELoss()
-
- # STEP 4: Define optimizer and lr_scheduler
- # set lr according to batch size and world size (hacked from Swin official code and modified for CSwin)
- if config.TRAIN.LINEAR_SCALED_LR is not None:
- linear_scaled_lr = (
- config.TRAIN.BASE_LR * config.DATA.BATCH_SIZE * world_size) / config.TRAIN.LINEAR_SCALED_LR
- linear_scaled_warmup_start_lr = (
- config.TRAIN.WARMUP_START_LR * config.DATA.BATCH_SIZE * world_size) / config.TRAIN.LINEAR_SCALED_LR
- linear_scaled_end_lr = (
- config.TRAIN.END_LR * config.DATA.BATCH_SIZE * world_size) / config.TRAIN.LINEAR_SCALED_LR
-
- if config.TRAIN.ACCUM_ITER > 1:
- linear_scaled_lr = linear_scaled_lr * config.TRAIN.ACCUM_ITER
- linear_scaled_warmup_start_lr = linear_scaled_warmup_start_lr * config.TRAIN.ACCUM_ITER
- linear_scaled_end_lr = linear_scaled_end_lr * config.TRAIN.ACCUM_ITER
-
- config.TRAIN.BASE_LR = linear_scaled_lr
- config.TRAIN.WARMUP_START_LR = linear_scaled_warmup_start_lr
- config.TRAIN.END_LR = linear_scaled_end_lr
-
- scheduler = None
- if config.TRAIN.LR_SCHEDULER.NAME == "warmupcosine":
- scheduler = WarmupCosineScheduler(learning_rate=config.TRAIN.BASE_LR,
- warmup_start_lr=config.TRAIN.WARMUP_START_LR,
- start_lr=config.TRAIN.BASE_LR,
- end_lr=config.TRAIN.END_LR,
- warmup_epochs=config.TRAIN.WARMUP_EPOCHS,
- total_epochs=config.TRAIN.NUM_EPOCHS,
- last_epoch=config.TRAIN.LAST_EPOCH,
- )
- elif config.TRAIN.LR_SCHEDULER.NAME == "cosine":
- scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=config.TRAIN.BASE_LR,
- T_max=config.TRAIN.NUM_EPOCHS,
- last_epoch=last_epoch)
- elif config.scheduler == "multi-step":
- milestones = [int(v.strip())
- for v in config.TRAIN.LR_SCHEDULER.MILESTONES.split(",")]
- scheduler = paddle.optimizer.lr.MultiStepDecay(learning_rate=config.TRAIN.BASE_LR,
- milestones=milestones,
- gamma=config.TRAIN.LR_SCHEDULER.DECAY_RATE,
- last_epoch=last_epoch)
- else:
- local_logger.fatal(f"Unsupported Scheduler: {config.TRAIN.LR_SCHEDULER}.")
- if local_rank == 0:
- master_logger.fatal(f"Unsupported Scheduler: {config.TRAIN.LR_SCHEDULER}.")
- raise NotImplementedError(f"Unsupported Scheduler: {config.TRAIN.LR_SCHEDULER}.")
-
- if config.TRAIN.OPTIMIZER.NAME == "SGD":
- if config.TRAIN.GRAD_CLIP:
- clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
- else:
- clip = None
- optimizer = paddle.optimizer.Momentum(
- parameters=model.parameters(),
- learning_rate=scheduler if scheduler is not None else config.TRAIN.BASE_LR,
- weight_decay=config.TRAIN.WEIGHT_DECAY,
- momentum=config.TRAIN.OPTIMIZER.MOMENTUM,
- grad_clip=clip)
- elif config.TRAIN.OPTIMIZER.NAME == "AdamW":
- if config.TRAIN.GRAD_CLIP:
- clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
- else:
- clip = None
- optimizer = paddle.optimizer.AdamW(
- parameters=model.parameters(),
- learning_rate=scheduler if scheduler is not None else config.TRAIN.BASE_LR,
- beta1=config.TRAIN.OPTIMIZER.BETAS[0],
- beta2=config.TRAIN.OPTIMIZER.BETAS[1],
- weight_decay=config.TRAIN.WEIGHT_DECAY,
- epsilon=config.TRAIN.OPTIMIZER.EPS,
- grad_clip=clip,
- #apply_decay_param_fun=get_exclude_from_weight_decay_fn(['pos_embed', 'cls_token']),
- )
- else:
- local_logger.fatal(f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}.")
- if local_rank == 0:
- master_logger.fatal(f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}.")
- raise NotImplementedError(f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}.")
-
- # STEP 5: Load pretrained model / load resumt model and optimizer states
- if config.MODEL.PRETRAINED:
- if (config.MODEL.PRETRAINED).endswith('.pdparams'):
- raise ValueError(
- f'{config.MODEL.PRETRAINED} should not contain .pdparams')
- assert os.path.isfile(config.MODEL.PRETRAINED + '.pdparams') is True
- model_state = paddle.load(config.MODEL.PRETRAINED+'.pdparams')
- model.set_dict(model_state)
- local_logger.info(f"----- Pretrained: Load model state from {config.MODEL.PRETRAINED}")
- if local_rank == 0:
- master_logger.info(
- f"----- Pretrained: Load model state from {config.MODEL.PRETRAINED}")
-
- if config.MODEL.RESUME:
- assert os.path.isfile(config.MODEL.RESUME + '.pdparams') is True
- assert os.path.isfile(config.MODEL.RESUME + '.pdopt') is True
- model_state = paddle.load(config.MODEL.RESUME + '.pdparams')
- model.set_dict(model_state)
- opt_state = paddle.load(config.MODEL.RESUME + '.pdopt')
- optimizer.set_state_dict(opt_state)
- local_logger.info(
- f"----- Resume: Load model and optmizer from {config.MODEL.RESUME}")
- if local_rank == 0:
- master_logger.info(
- f"----- Resume Training: Load model and optmizer from {config.MODEL.RESUME}")
-
- # STEP 6: Start training (train mode)
- local_logger.info(f"Start training from epoch {last_epoch+1}.")
- if local_rank == 0:
- master_logger.info(f"Start training from epoch {last_epoch+1}.")
- for epoch in range(last_epoch+1, config.TRAIN.NUM_EPOCHS+1):
- # train
- local_logger.info(f"Now training epoch {epoch}. LR={optimizer.get_lr():.6f}")
- if local_rank == 0:
- master_logger.info(f"Now training epoch {epoch}. LR={optimizer.get_lr():.6f}")
-
- train_loss,avg_loss, train_time = train(
- dataloader=dataloader_train,
- patch_size=config.MODEL.TRANS.PATCH_SIZE,
- model=model,
- criterion=criterion,
- optimizer=optimizer,
- epoch=epoch,
- total_epochs=config.TRAIN.NUM_EPOCHS,
- total_batch=total_batch_train,
- debug_steps=config.REPORT_FREQ,
- accum_iter=config.TRAIN.ACCUM_ITER,
- amp=config.AMP,
- local_logger=local_logger,
- master_logger=master_logger)
-
- scheduler.step()
-
- local_logger.info(f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
- f"Train Loss: {train_loss:.4f}, " +
- f"time: {train_time:.2f}")
- if local_rank == 0:
- master_logger.info(f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
- f"Train Loss: {avg_loss:.4f}, " +
- f"time: {train_time:.2f}")
-
- # model save
- if local_rank == 0:
- if epoch % config.SAVE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
- model_path = os.path.join(
- config.SAVE, f"{config.MODEL.TYPE}-Epoch-{epoch}-Loss-{train_loss}")
- paddle.save(model.state_dict(), model_path + '.pdparams')
- paddle.save(optimizer.state_dict(), model_path + '.pdopt')
- local_logger.info(f"----- Save model: {model_path}.pdparams")
- local_logger.info(f"----- Save optim: {model_path}.pdopt")
- if local_rank == 0:
- master_logger.info(f"----- Save model: {model_path}.pdparams")
- master_logger.info(f"----- Save optim: {model_path}.pdopt")
-
-
-def main():
- # config is updated by: (1) config.py, (2) yaml file, (3) arguments
- arguments = get_arguments()
- config = get_config()
- config = update_config(config, arguments)
-
- # set output folder
- if not config.EVAL:
- config.SAVE = '{}/train-{}'.format(config.SAVE, time.strftime('%Y%m%d-%H-%M-%S'))
- else:
- config.SAVE = '{}/eval-{}'.format(config.SAVE, time.strftime('%Y%m%d-%H-%M-%S'))
-
- if not os.path.exists(config.SAVE):
- os.makedirs(config.SAVE, exist_ok=True)
-
- # get dataset and start DDP
- dataset_train = get_dataset(config, mode='train')
- config.NGPUS = len(paddle.static.cuda_places()) if config.NGPUS == -1 else config.NGPUS
- dist.spawn(main_worker, args=(config, dataset_train, ), nprocs=config.NGPUS)
-
-
-if __name__ == "__main__":
- main()
diff --git a/image_classification/MAE/main_single_gpu_finetune.py b/image_classification/MAE/main_single_gpu_finetune.py
deleted file mode 100644
index ea267943..00000000
--- a/image_classification/MAE/main_single_gpu_finetune.py
+++ /dev/null
@@ -1,403 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""ViT finetuning/validation using single GPU """
-
-import sys
-import os
-import time
-import logging
-import argparse
-import random
-import numpy as np
-import paddle
-import paddle.nn as nn
-import paddle.nn.functional as F
-from datasets import get_dataloader
-from datasets import get_dataset
-from transformer import build_mae_finetune as build_model
-from utils import AverageMeter
-from utils import WarmupCosineScheduler
-from config import get_config
-from config import update_config
-from mixup import Mixup
-from losses import LabelSmoothingCrossEntropyLoss
-from losses import SoftTargetCrossEntropyLoss
-
-
-def get_arguments():
- """return argumeents, this will overwrite the config after loading yaml file"""
- parser = argparse.ArgumentParser('ViT')
- parser.add_argument('-cfg', type=str, default=None)
- parser.add_argument('-dataset', type=str, default=None)
- parser.add_argument('-batch_size', type=int, default=None)
- parser.add_argument('-image_size', type=int, default=None)
- parser.add_argument('-data_path', type=str, default=None)
- parser.add_argument('-output', type=str, default=None)
- parser.add_argument('-ngpus', type=int, default=None)
- parser.add_argument('-pretrained', type=str, default=None)
- parser.add_argument('-resume', type=str, default=None)
- parser.add_argument('-last_epoch', type=int, default=None)
- parser.add_argument('-eval', action='store_true')
- parser.add_argument('-mae_pretrain', action='store_true')
- parser.add_argument('-amp', action='store_true')
- arguments = parser.parse_args()
- return arguments
-
-
-def get_logger(filename, logger_name=None):
- """set logging file and format
- Args:
- filename: str, full path of the logger file to write
- logger_name: str, the logger name, e.g., 'master_logger', 'local_logger'
- Return:
- logger: python logger
- """
- log_format = "%(asctime)s %(message)s"
- logging.basicConfig(stream=sys.stdout, level=logging.INFO,
- format=log_format, datefmt="%m%d %I:%M:%S %p")
- # different name is needed when creating multiple logger in one process
- logger = logging.getLogger(logger_name)
- fh = logging.FileHandler(os.path.join(filename))
- fh.setFormatter(logging.Formatter(log_format))
- logger.addHandler(fh)
- return logger
-
-
-def train(dataloader,
- model,
- criterion,
- optimizer,
- epoch,
- total_epochs,
- total_batch,
- debug_steps=100,
- accum_iter=1,
- mixup_fn=None,
- amp=False,
- logger=None):
- """Training for one epoch
- Args:
- dataloader: paddle.io.DataLoader, dataloader instance
- model: nn.Layer, a ViT model
- criterion: nn.criterion
- epoch: int, current epoch
- total_epochs: int, total num of epochs
- total_batch: int, total num of batches for one epoch
- debug_steps: int, num of iters to log info, default: 100
- accum_iter: int, num of iters for accumulating gradients, default: 1
- mixup_fn: Mixup, mixup instance, default: None
- amp: bool, if True, use mix precision training, default: False
- logger: logger for logging, default: None
- Returns:
- train_loss_meter.avg: float, average loss on current process/gpu
- train_acc_meter.avg: float, average top1 accuracy on current process/gpu
- train_time: float, training time
- """
- model.train()
- train_loss_meter = AverageMeter()
- train_acc_meter = AverageMeter()
-
- if amp is True:
- scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
- time_st = time.time()
-
-
- for batch_id, data in enumerate(dataloader):
- image = data[0]
- label = data[1]
- label_orig = label.clone()
-
- if mixup_fn is not None:
- image, label = mixup_fn(image, label_orig)
-
- if amp is True: # mixed precision training
- with paddle.amp.auto_cast():
- output = model(image)
- loss = criterion(output, label)
- scaled = scaler.scale(loss)
- scaled.backward()
-
- if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
- scaler.minimize(optimizer, scaled)
- optimizer.clear_grad()
-
- else:
- output = model(image)
- loss = criterion(output, label)
- # NOTE: division may be needed depending on the loss function
- # Here no division is needed:
- # default 'reduction' param in nn.CrossEntropyLoss is set to 'mean'
- # loss = loss / accum_iter
- loss.backward()
-
- if ((batch_id +1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
- optimizer.step()
- optimizer.clear_grad()
-
- pred = F.softmax(output)
- if mixup_fn:
- acc = paddle.metric.accuracy(pred, label_orig)
- else:
- acc = paddle.metric.accuracy(pred, label_orig.unsqueeze(1))
-
- batch_size = image.shape[0]
- train_loss_meter.update(loss.numpy()[0], batch_size)
- train_acc_meter.update(acc.numpy()[0], batch_size)
-
- if logger and batch_id % debug_steps == 0:
- logger.info(
- f"Epoch[{epoch:03d}/{total_epochs:03d}], " +
- f"Step[{batch_id:04d}/{total_batch:04d}], " +
- f"Avg Loss: {train_loss_meter.avg:.4f}, " +
- f"Avg Acc: {train_acc_meter.avg:.4f}")
-
- train_time = time.time() - time_st
- return train_loss_meter.avg, train_acc_meter.avg, train_time
-
-
-def validate(dataloader, model, criterion, total_batch, debug_steps=100, logger=None):
- """Validation for whole dataset
- Args:
- dataloader: paddle.io.DataLoader, dataloader instance
- model: nn.Layer, a ViT model
- criterion: nn.criterion
- total_batch: int, total num of batches for one epoch
- debug_steps: int, num of iters to log info, default: 100
- logger: logger for logging, default: None
- Returns:
- val_loss_meter.avg: float, average loss on current process/gpu
- val_acc1_meter.avg: float, average top1 accuracy on current process/gpu
- val_acc5_meter.avg: float, average top5 accuracy on current process/gpu
- val_time: float, valitaion time
- """
- model.eval()
- val_loss_meter = AverageMeter()
- val_acc1_meter = AverageMeter()
- val_acc5_meter = AverageMeter()
- time_st = time.time()
-
- with paddle.no_grad():
- for batch_id, data in enumerate(dataloader):
- image = data[0]
- label = data[1]
-
- output = model(image)
- loss = criterion(output, label)
-
- pred = F.softmax(output)
- acc1 = paddle.metric.accuracy(pred, label.unsqueeze(1))
- acc5 = paddle.metric.accuracy(pred, label.unsqueeze(1), k=5)
-
- batch_size = image.shape[0]
- val_loss_meter.update(loss.numpy()[0], batch_size)
- val_acc1_meter.update(acc1.numpy()[0], batch_size)
- val_acc5_meter.update(acc5.numpy()[0], batch_size)
-
- if logger and batch_id % debug_steps == 0:
- logger.info(
- f"Val Step[{batch_id:04d}/{total_batch:04d}], " +
- f"Avg Loss: {val_loss_meter.avg:.4f}, " +
- f"Avg Acc@1: {val_acc1_meter.avg:.4f}, " +
- f"Avg Acc@5: {val_acc5_meter.avg:.4f}")
-
- val_time = time.time() - time_st
- return val_loss_meter.avg, val_acc1_meter.avg, val_acc5_meter.avg, val_time
-
-
-def main():
- # 0. Preparation
- # config is updated by: (1) config.py, (2) yaml file, (3) arguments
- arguments = get_arguments()
- config = get_config()
- config = update_config(config, arguments)
- # set output folder
- if not config.EVAL:
- config.SAVE = '{}/train-{}'.format(config.SAVE, time.strftime('%Y%m%d-%H-%M-%S'))
- else:
- config.SAVE = '{}/eval-{}'.format(config.SAVE, time.strftime('%Y%m%d-%H-%M-%S'))
- if not os.path.exists(config.SAVE):
- os.makedirs(config.SAVE, exist_ok=True)
- last_epoch = config.TRAIN.LAST_EPOCH
- seed = config.SEED
- paddle.seed(seed)
- np.random.seed(seed)
- random.seed(seed)
- logger = get_logger(filename=os.path.join(config.SAVE, 'log.txt'))
- logger.info(f'\n{config}')
-
- # 1. Create model
- model = build_model(config)
- # 2. Create train dataloader
- dataset_train = get_dataset(config, mode='train')
- dataset_val = get_dataset(config, mode='val')
- dataloader_train = get_dataloader(config, dataset_train, 'train', False)
- dataloader_val = get_dataloader(config, dataset_val, 'val', False)
- # 3. Define Mixup function and criterion
- mixup_fn = None
- if config.TRAIN.MIXUP_PROB > 0 or config.TRAIN.CUTMIX_ALPHA > 0 or config.TRAIN.CUTMIX_MINMAX is not None:
- mixup_fn = Mixup(mixup_alpha=config.TRAIN.MIXUP_ALPHA,
- cutmix_alpha=config.TRAIN.CUTMIX_ALPHA,
- cutmix_minmax=config.TRAIN.CUTMIX_MINMAX,
- prob=config.TRAIN.MIXUP_PROB,
- switch_prob=config.TRAIN.MIXUP_SWITCH_PROB,
- mode=config.TRAIN.MIXUP_MODE,
- label_smoothing=config.TRAIN.SMOOTHING,
- num_classes=config.MODEL.NUM_CLASSES)
-
- if config.TRAIN.MIXUP_PROB > 0.:
- criterion = SoftTargetCrossEntropyLoss()
- elif config.TRAIN.SMOOTHING:
- criterion = LabelSmoothingCrossEntropyLoss()
- else:
- criterion = nn.CrossEntropyLoss()
- # only use cross entropy for val
- criterion_val = nn.CrossEntropyLoss()
- # 4. Define lr_scheduler
- scheduler = None
- if config.TRAIN.LR_SCHEDULER.NAME == "warmupcosine":
- scheduler = WarmupCosineScheduler(learning_rate=config.TRAIN.BASE_LR,
- warmup_start_lr=config.TRAIN.WARMUP_START_LR,
- start_lr=config.TRAIN.BASE_LR,
- end_lr=config.TRAIN.END_LR,
- warmup_epochs=config.TRAIN.WARMUP_EPOCHS,
- total_epochs=config.TRAIN.NUM_EPOCHS,
- last_epoch=config.TRAIN.LAST_EPOCH,
- )
- elif config.TRAIN.LR_SCHEDULER.NAME == "cosine":
- scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=config.TRAIN.BASE_LR,
- T_max=config.TRAIN.NUM_EPOCHS,
- last_epoch=last_epoch)
- elif config.scheduler == "multi-step":
- milestones = [int(v.strip()) for v in config.TRAIN.LR_SCHEDULER.MILESTONES.split(",")]
- scheduler = paddle.optimizer.lr.MultiStepDecay(learning_rate=config.TRAIN.BASE_LR,
- milestones=milestones,
- gamma=config.TRAIN.LR_SCHEDULER.DECAY_RATE,
- last_epoch=last_epoch)
- else:
- logger.fatal(f"Unsupported Scheduler: {config.TRAIN.LR_SCHEDULER}.")
- raise NotImplementedError(f"Unsupported Scheduler: {config.TRAIN.LR_SCHEDULER}.")
- # 5. Define optimizer
- if config.TRAIN.OPTIMIZER.NAME == "SGD":
- if config.TRAIN.GRAD_CLIP:
- clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
- else:
- clip = None
- optimizer = paddle.optimizer.Momentum(
- parameters=model.parameters(),
- learning_rate=scheduler if scheduler is not None else config.TRAIN.BASE_LR,
- weight_decay=config.TRAIN.WEIGHT_DECAY,
- momentum=config.TRAIN.OPTIMIZER.MOMENTUM,
- grad_clip=clip)
- elif config.TRAIN.OPTIMIZER.NAME == "AdamW":
- if config.TRAIN.GRAD_CLIP:
- clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
- else:
- clip = None
- optimizer = paddle.optimizer.AdamW(
- parameters=model.parameters(),
- learning_rate=scheduler if scheduler is not None else config.TRAIN.BASE_LR,
- weight_decay=config.TRAIN.WEIGHT_DECAY,
- beta1=config.TRAIN.OPTIMIZER.BETAS[0],
- beta2=config.TRAIN.OPTIMIZER.BETAS[1],
- epsilon=config.TRAIN.OPTIMIZER.EPS,
- grad_clip=clip)
- else:
- logger.fatal(f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}.")
- raise NotImplementedError(f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}.")
- # 6. Load pretrained model or load resume model and optimizer states
- if config.MODEL.PRETRAINED:
- if (config.MODEL.PRETRAINED).endswith('.pdparams'):
- raise ValueError(f'{config.MODEL.PRETRAINED} should not contain .pdparams')
- assert os.path.isfile(config.MODEL.PRETRAINED + '.pdparams') is True
- model_state = paddle.load(config.MODEL.PRETRAINED+'.pdparams')
- model.set_dict(model_state)
- logger.info(f"----- Pretrained: Load model state from {config.MODEL.PRETRAINED}")
-
- if config.MODEL.RESUME:
- assert os.path.isfile(config.MODEL.RESUME + '.pdparams') is True
- assert os.path.isfile(config.MODEL.RESUME + '.pdopt') is True
- model_state = paddle.load(config.MODEL.RESUME + '.pdparams')
- model.set_dict(model_state)
- opt_state = paddle.load(config.MODEL.RESUME + '.pdopt')
- optimizer.set_state_dict(opt_state)
- logger.info(
- f"----- Resume: Load model and optmizer from {config.MODEL.RESUME}")
-
- # STEP 7: Validation (eval mode)
- if config.EVAL:
- logger.info('----- Start Validating')
- val_loss, val_acc1, val_acc5, val_time = validate(
- dataloader=dataloader_val,
- model=model,
- criterion=criterion_val,
- total_batch=len(dataloader_val),
- debug_steps=config.REPORT_FREQ,
- logger=logger)
- logger.info(f"Validation Loss: {val_loss:.4f}, " +
- f"Validation Acc@1: {val_acc1:.4f}, " +
- f"Validation Acc@5: {val_acc5:.4f}, " +
- f"time: {val_time:.2f}")
- return
-
- # STEP 8: Start training and validation (train mode)
- logger.info(f"Start training from epoch {last_epoch+1}.")
- for epoch in range(last_epoch+1, config.TRAIN.NUM_EPOCHS+1):
- # train
- logger.info(f"Now training epoch {epoch}. LR={optimizer.get_lr():.6f}")
- train_loss, train_acc, train_time = train(dataloader=dataloader_train,
- model=model,
- criterion=criterion,
- optimizer=optimizer,
- epoch=epoch,
- total_epochs=config.TRAIN.NUM_EPOCHS,
- total_batch=len(dataloader_train),
- debug_steps=config.REPORT_FREQ,
- accum_iter=config.TRAIN.ACCUM_ITER,
- mixup_fn=mixup_fn,
- amp=config.AMP,
- logger=logger)
- scheduler.step()
-
- logger.info(f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
- f"Train Loss: {train_loss:.4f}, " +
- f"Train Acc: {train_acc:.4f}, " +
- f"time: {train_time:.2f}")
- # validation
- if epoch % config.VALIDATE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
- logger.info(f'----- Validation after Epoch: {epoch}')
- val_loss, val_acc1, val_acc5, val_time = validate(
- dataloader=dataloader_val,
- model=model,
- criterion=criterion_val,
- total_batch=len(dataloader_val),
- debug_steps=config.REPORT_FREQ,
- logger=logger)
- logger.info(f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
- f"Validation Loss: {val_loss:.4f}, " +
- f"Validation Acc@1: {val_acc1:.4f}, " +
- f"Validation Acc@5: {val_acc5:.4f}, " +
- f"time: {val_time:.2f}")
- # model save
- if epoch % config.SAVE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
- model_path = os.path.join(
- config.SAVE, f"{config.MODEL.TYPE}-Epoch-{epoch}-Loss-{train_loss}")
- paddle.save(model.state_dict(), model_path + '.pdparams')
- paddle.save(optimizer.state_dict(), model_path + '.pdopt')
- logger.info(f"----- Save model: {model_path}.pdparams")
- logger.info(f"----- Save optim: {model_path}.pdopt")
-
-
-if __name__ == "__main__":
- main()
diff --git a/image_classification/MAE/main_single_gpu_pretrain.py b/image_classification/MAE/main_single_gpu_pretrain.py
deleted file mode 100644
index cf315a42..00000000
--- a/image_classification/MAE/main_single_gpu_pretrain.py
+++ /dev/null
@@ -1,308 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""MAE pre-training using single GPU, this is just a demo, we recommand using multi-gpu version"""
-
-import sys
-import os
-import time
-import logging
-import argparse
-import random
-import numpy as np
-import paddle
-import paddle.nn as nn
-import paddle.nn.functional as F
-from datasets import get_dataloader
-from datasets import get_dataset
-from transformer import build_mae_pretrain as build_model
-from utils import AverageMeter
-from utils import WarmupCosineScheduler
-from config import get_config
-from config import update_config
-
-
-def get_arguments():
- """return argumeents, this will overwrite the config after loading yaml file"""
- parser = argparse.ArgumentParser('MAE')
- parser.add_argument('-cfg', type=str, default=None)
- parser.add_argument('-dataset', type=str, default=None)
- parser.add_argument('-batch_size', type=int, default=None)
- parser.add_argument('-image_size', type=int, default=None)
- parser.add_argument('-data_path', type=str, default=None)
- parser.add_argument('-output', type=str, default=None)
- parser.add_argument('-ngpus', type=int, default=None)
- parser.add_argument('-pretrained', type=str, default=None)
- parser.add_argument('-resume', type=str, default=None)
- parser.add_argument('-last_epoch', type=int, default=None)
- parser.add_argument('-eval', action='store_true')
- parser.add_argument('-mae_pretrain', action='store_true')
- parser.add_argument('-amp', action='store_true')
- arguments = parser.parse_args()
- return arguments
-
-
-def get_logger(filename, logger_name=None):
- """set logging file and format
- Args:
- filename: str, full path of the logger file to write
- logger_name: str, the logger name, e.g., 'master_logger', 'local_logger'
- Return:
- logger: python logger
- """
- log_format = "%(asctime)s %(message)s"
- logging.basicConfig(stream=sys.stdout, level=logging.INFO,
- format=log_format, datefmt="%m%d %I:%M:%S %p")
- # different name is needed when creating multiple logger in one process
- logger = logging.getLogger(logger_name)
- fh = logging.FileHandler(os.path.join(filename))
- fh.setFormatter(logging.Formatter(log_format))
- logger.addHandler(fh)
- return logger
-
-
-def train(dataloader,
- patch_size,
- model,
- criterion,
- optimizer,
- epoch,
- total_epochs,
- total_batch,
- normalize_target=True,
- debug_steps=100,
- accum_iter=1,
- amp=False,
- logger=None):
- """Training for one epoch
- Args:
- dataloader: paddle.io.DataLoader, dataloader instance
- model: nn.Layer, a ViT model
- criterion: nn.criterion
- epoch: int, current epoch
- total_epochs: int, total num of epochs
- total_batch: int, total num of batches for one epoch
- debug_steps: int, num of iters to log info, default: 100
- accum_iter: int, num of iters for accumulating gradients, default: 1
- amp: bool, if True, use mix precision training, default: False
- logger: logger for logging, default: None
- Returns:
- train_loss_meter.avg: float, average loss on current process/gpu
- train_time: float, training time
- """
- model.train()
- train_loss_meter = AverageMeter()
-
- if amp is True:
- scaler = paddle.amp.GradScaler(init_loss_scaling=1024)
- time_st = time.time()
-
- for batch_id, data in enumerate(dataloader):
- images = data[0]
- masks = paddle.to_tensor(data[1], dtype='bool')
-
- with paddle.no_grad():
- mean = paddle.to_tensor([0.485, 0.456, 0.406]).reshape([1, 3, 1, 1])
- std = paddle.to_tensor([0.229, 0.224, 0.225]).reshape([1, 3, 1, 1])
- unnorm_images = images * std + mean
- B, C, H, W = images.shape
- if normalize_target:
- images_patch = unnorm_images.reshape([B, C, H // patch_size, patch_size, W // patch_size, patch_size])
- images_patch = images_patch.transpose([0, 2, 4, 3, 5, 1])
- images_patch = images_patch.reshape([B, -1, patch_size * patch_size, C])
- images_patch = (images_patch - images_patch.mean(axis=-2, keepdim=True)) / (
- images_patch.var(axis=-2, keepdim=True).sqrt() + 1e-6)
- images_patch = images_patch.flatten(-2)
- else:
- images_patch = unnorm_images.reshape([B, C, H//patch_size, patch_size, W//patch_size, patch_size])
- images_patch = images_patch.transpose([0, 2, 4, 3, 5, 1])
- images_patch = images_patch.reshape([B, -1, patch_size * patch_size, C])
- images_patch = images_patch.flatten(-2)
-
- B, _, C = images_patch.shape
- labels = images_patch[masks[:, 1:]].reshape([B, -1, C])
-
- if amp is True:
- with paddle.amp.auto_cast():
- reconstructed_patches = model(images, masks)
- loss = criterion(reconstructed_patches, labels)
- scaled = scaler.scale(loss)
- scaled.backward()
-
- if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
- scaler.minimize(optimizer, scaled)
- optimizer.clear_grad()
- else:
- reconstructed_patches = model(images, masks)
- loss = criterion(reconstructed_patches, labels)
- # NOTE: division may be needed depending on the loss function
- # Here no division is needed:
- # default 'reduction' param in nn.CrossEntropyLoss is set to 'mean'
- # loss = loss / accum_iter
- loss.backward()
-
- if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
- optimizer.step()
- optimizer.clear_grad()
-
- batch_size = images.shape[0]
- train_loss_meter.update(loss.numpy()[0], batch_size)
-
- if logger and batch_id % debug_steps == 0:
- logger.info(
- f"Epoch[{epoch:03d}/{total_epochs:03d}], " +
- f"Step[{batch_id:04d}/{total_batch:04d}], " +
- f"Avg Loss: {train_loss_meter.avg:.4f}")
-
- train_time = time.time() - time_st
- return train_loss_meter.avg, train_time
-
-
-def main():
- # 0. Preparation
- # config is updated by: (1) config.py, (2) yaml file, (3) arguments
- arguments = get_arguments()
- config = get_config()
- config = update_config(config, arguments)
- # set output folder
- if not config.EVAL:
- config.SAVE = '{}/train-{}'.format(config.SAVE, time.strftime('%Y%m%d-%H-%M-%S'))
- else:
- config.SAVE = '{}/eval-{}'.format(config.SAVE, time.strftime('%Y%m%d-%H-%M-%S'))
- if not os.path.exists(config.SAVE):
- os.makedirs(config.SAVE, exist_ok=True)
- last_epoch = config.TRAIN.LAST_EPOCH
- seed = config.SEED
- paddle.seed(seed)
- np.random.seed(seed)
- random.seed(seed)
- logger = get_logger(filename=os.path.join(config.SAVE, 'log.txt'))
- logger.info(f'\n{config}')
-
- # 1. Create model
- model = build_model(config)
- # 2. Create train dataloader
- dataset_train = get_dataset(config, mode='train')
- dataloader_train = get_dataloader(config, dataset_train, 'train', False)
- # 3. Define criterion
- criterion = nn.MSELoss()
- # 4. Define lr_scheduler
- scheduler = None
- if config.TRAIN.LR_SCHEDULER.NAME == "warmupcosine":
- scheduler = WarmupCosineScheduler(learning_rate=config.TRAIN.BASE_LR,
- warmup_start_lr=config.TRAIN.WARMUP_START_LR,
- start_lr=config.TRAIN.BASE_LR,
- end_lr=config.TRAIN.END_LR,
- warmup_epochs=config.TRAIN.WARMUP_EPOCHS,
- total_epochs=config.TRAIN.NUM_EPOCHS,
- last_epoch=config.TRAIN.LAST_EPOCH,
- )
- elif config.TRAIN.LR_SCHEDULER.NAME == "cosine":
- scheduler = paddle.optimizer.lr.CosineAnnealingDecay(learning_rate=config.TRAIN.BASE_LR,
- T_max=config.TRAIN.NUM_EPOCHS,
- last_epoch=last_epoch)
- elif config.scheduler == "multi-step":
- milestones = [int(v.strip()) for v in config.TRAIN.LR_SCHEDULER.MILESTONES.split(",")]
- scheduler = paddle.optimizer.lr.MultiStepDecay(learning_rate=config.TRAIN.BASE_LR,
- milestones=milestones,
- gamma=config.TRAIN.LR_SCHEDULER.DECAY_RATE,
- last_epoch=last_epoch)
- else:
- logger.fatal(f"Unsupported Scheduler: {config.TRAIN.LR_SCHEDULER}.")
- raise NotImplementedError(f"Unsupported Scheduler: {config.TRAIN.LR_SCHEDULER}.")
- # 5. Define optimizer
- if config.TRAIN.OPTIMIZER.NAME == "SGD":
- if config.TRAIN.GRAD_CLIP:
- clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
- else:
- clip = None
- optimizer = paddle.optimizer.Momentum(
- parameters=model.parameters(),
- learning_rate=scheduler if scheduler is not None else config.TRAIN.BASE_LR,
- weight_decay=config.TRAIN.WEIGHT_DECAY,
- momentum=config.TRAIN.OPTIMIZER.MOMENTUM,
- grad_clip=clip)
- elif config.TRAIN.OPTIMIZER.NAME == "AdamW":
- if config.TRAIN.GRAD_CLIP:
- clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
- else:
- clip = None
- optimizer = paddle.optimizer.AdamW(
- parameters=model.parameters(),
- learning_rate=scheduler if scheduler is not None else config.TRAIN.BASE_LR,
- weight_decay=config.TRAIN.WEIGHT_DECAY,
- beta1=config.TRAIN.OPTIMIZER.BETAS[0],
- beta2=config.TRAIN.OPTIMIZER.BETAS[1],
- epsilon=config.TRAIN.OPTIMIZER.EPS,
- grad_clip=clip)
- else:
- logger.fatal(f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}.")
- raise NotImplementedError(f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}.")
- # 6. Load pretrained model or load resume model and optimizer states
- if config.MODEL.PRETRAINED:
- if (config.MODEL.PRETRAINED).endswith('.pdparams'):
- raise ValueError(f'{config.MODEL.PRETRAINED} should not contain .pdparams')
- assert os.path.isfile(config.MODEL.PRETRAINED + '.pdparams') is True
- model_state = paddle.load(config.MODEL.PRETRAINED+'.pdparams')
- model.set_dict(model_state)
- logger.info(f"----- Pretrained: Load model state from {config.MODEL.PRETRAINED}")
-
- if config.MODEL.RESUME:
- assert os.path.isfile(config.MODEL.RESUME + '.pdparams') is True
- assert os.path.isfile(config.MODEL.RESUME + '.pdopt') is True
- model_state = paddle.load(config.MODEL.RESUME + '.pdparams')
- model.set_dict(model_state)
- opt_state = paddle.load(config.MODEL.RESUME + '.pdopt')
- optimizer.set_state_dict(opt_state)
- logger.info(
- f"----- Resume: Load model and optmizer from {config.MODEL.RESUME}")
-
- # 7. Start training and validation
- logging.info(f"Start training from epoch {last_epoch + 1}.")
- for epoch in range(last_epoch + 1, config.TRAIN.NUM_EPOCHS + 1):
- # train
- logging.info(f"Now training epoch {epoch}. LR={optimizer.get_lr():.6f}")
- train_loss, train_time = train(dataloader=dataloader_train,
- patch_size=config.MODEL.TRANS.PATCH_SIZE,
- model=model,
- criterion=criterion,
- optimizer=optimizer,
- epoch=epoch,
- total_epochs=config.TRAIN.NUM_EPOCHS,
- total_batch=len(dataloader_train),
- normalize_target=config.TRAIN.NORMALIZE_TARGET,
- debug_steps=config.REPORT_FREQ,
- accum_iter=config.TRAIN.ACCUM_ITER,
- amp=config.AMP,
- logger=logger)
- scheduler.step()
-
- logger.info(f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
- f"Train Loss: {train_loss:.4f}, " +
- f"time: {train_time:.2f}")
- # validation
- # No need to do validation during pretraining
-
- # model save
- if epoch % config.SAVE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
- model_path = os.path.join(
- config.SAVE, f"{config.MODEL.TYPE}-Epoch-{epoch}-Loss-{train_loss}")
- paddle.save(model.state_dict(), model_path + '.pdparams')
- paddle.save(optimizer.state_dict(), model_path + '.pdopt')
- logger.info(f"----- Save model: {model_path}.pdparams")
- logger.info(f"----- Save optim: {model_path}.pdopt")
-
-
-if __name__ == "__main__":
- main()
diff --git a/image_classification/MAE/masking_generator.py b/image_classification/MAE/masking_generator.py
deleted file mode 100644
index 9271dd4e..00000000
--- a/image_classification/MAE/masking_generator.py
+++ /dev/null
@@ -1,50 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""
-random mask generator for MAE pretraining
-"""
-
-import random
-import math
-import numpy as np
-
-class RandomMaskingGenerator:
- def __init__(self, input_size, mask_ratio, with_cls_token=True):
- if not isinstance(input_size, tuple):
- input_size = (input_size, ) * 2
-
- self.height = input_size[0]
- self.width = input_size[1]
- self.num_patches = self.height * self.width
- self.num_mask = int(mask_ratio * self.num_patches)
- self.with_cls_token = with_cls_token
-
- def __call__(self):
- mask = np.hstack([np.zeros(self.num_patches - self.num_mask),
- np.ones(self.num_mask)])
- np.random.shuffle(mask)
- if self.with_cls_token:
- mask = np.insert(mask, 0, 0)
- return mask
-
-
-#def main():
-# rmg = RandomMaskingGenerator(input_size=32, mask_ratio=0.75)
-# mask = rmg()
-# for v in mask:
-# print(v, end=', ')
-#
-#if __name__ == "__main__":
-# main()
diff --git a/image_classification/MAE/nohup.out b/image_classification/MAE/nohup.out
deleted file mode 100644
index 6e00dda7..00000000
--- a/image_classification/MAE/nohup.out
+++ /dev/null
@@ -1,9507 +0,0 @@
-Traceback (most recent call last):
- File "main_multi_gpu_pretrain.py", line 24, in
- import paddle
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/__init__.py", line 25, in
- from .fluid import monkey_patch_variable
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/__init__.py", line 45, in
- from . import dataset
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dataset.py", line 19, in
- from ..utils import deprecated
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/utils/__init__.py", line 26, in
- from . import download # noqa: F401
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/utils/download.py", line 23, in
- import requests
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/requests/__init__.py", line 112, in
- from . import utils
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/requests/utils.py", line 24, in
- from . import certs
- File "", line 971, in _find_and_load
- File "", line 955, in _find_and_load_unlocked
- File "", line 665, in _load_unlocked
- File "", line 674, in exec_module
- File "", line 764, in get_code
- File "", line 833, in get_data
-KeyboardInterrupt
-merging config from ./configs/vit_base_patch16_224_pretrain_dec1.yaml
------ Imagenet2012 image train list len = 1281167
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:30053', '127.0.0.1:54949', '127.0.0.1:41862', '127.0.0.1:28777', '127.0.0.1:55177', '127.0.0.1:18423', '127.0.0.1:46681']
-I1219 16:59:41.631045 23562 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:30053 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:54949', '127.0.0.1:41862', '127.0.0.1:28777', '127.0.0.1:55177', '127.0.0.1:18423', '127.0.0.1:46681']
-I1219 16:59:44.247634 23580 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:54949 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:41862', '127.0.0.1:28777', '127.0.0.1:55177', '127.0.0.1:18423', '127.0.0.1:46681']
-I1219 16:59:46.636570 23595 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:41862 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:28777', '127.0.0.1:55177', '127.0.0.1:18423', '127.0.0.1:46681']
-I1219 16:59:48.816335 23610 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:28777 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:55177', '127.0.0.1:18423', '127.0.0.1:46681']
-I1219 16:59:51.517431 23627 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:55177 successful.
-I1219 16:59:53.801396 23642 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:18423 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:46681']
-I1219 16:59:56.182962 23659 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:46681 successful.
-I1219 16:59:56.935767 23580 nccl_context.cc:74] init nccl context nranks: 8 local rank: 2 gpu id: 2 ring id: 0
-I1219 16:59:56.935765 23562 nccl_context.cc:74] init nccl context nranks: 8 local rank: 1 gpu id: 1 ring id: 0
-I1219 16:59:56.935781 23627 nccl_context.cc:74] init nccl context nranks: 8 local rank: 5 gpu id: 5 ring id: 0
-I1219 16:59:56.935775 23595 nccl_context.cc:74] init nccl context nranks: 8 local rank: 3 gpu id: 3 ring id: 0
-I1219 16:59:56.935791 23642 nccl_context.cc:74] init nccl context nranks: 8 local rank: 6 gpu id: 6 ring id: 0
-I1219 16:59:56.935806 23610 nccl_context.cc:74] init nccl context nranks: 8 local rank: 4 gpu id: 4 ring id: 0
-I1219 16:59:56.935818 23659 nccl_context.cc:74] init nccl context nranks: 8 local rank: 7 gpu id: 7 ring id: 0
-I1219 16:59:56.935837 23545 nccl_context.cc:74] init nccl context nranks: 8 local rank: 0 gpu id: 0 ring id: 0
-W1219 17:00:00.904070 23545 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:00:00.904078 23562 device_context.cc:447] Please NOTE: device: 1, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:00:00.904153 23595 device_context.cc:447] Please NOTE: device: 3, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:00:00.904173 23610 device_context.cc:447] Please NOTE: device: 4, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:00:00.904186 23659 device_context.cc:447] Please NOTE: device: 7, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:00:00.904246 23642 device_context.cc:447] Please NOTE: device: 6, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:00:00.904264 23627 device_context.cc:447] Please NOTE: device: 5, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:00:00.906248 23580 device_context.cc:447] Please NOTE: device: 2, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:00:00.957355 23562 device_context.cc:465] device: 1, cuDNN Version: 7.6.
-W1219 17:00:00.957355 23659 device_context.cc:465] device: 7, cuDNN Version: 7.6.
-W1219 17:00:00.957358 23595 device_context.cc:465] device: 3, cuDNN Version: 7.6.
-W1219 17:00:00.957360 23545 device_context.cc:465] device: 0, cuDNN Version: 7.6.
-W1219 17:00:00.957374 23610 device_context.cc:465] device: 4, cuDNN Version: 7.6.
-W1219 17:00:00.957383 23642 device_context.cc:465] device: 6, cuDNN Version: 7.6.
-W1219 17:00:00.957394 23580 device_context.cc:465] device: 2, cuDNN Version: 7.6.
-W1219 17:00:00.957394 23627 device_context.cc:465] device: 5, cuDNN Version: 7.6.
-INFO:local_logger:----- world_size = 8, local_rank = 6
-INFO:local_logger:----- world_size = 8, local_rank = 3
-INFO:master_logger:
-AMP: False
-BASE: ['']
-DATA:
- BATCH_SIZE: 256
- BATCH_SIZE_EVAL: 8
- CROP_PCT: 0.875
- DATASET: imagenet2012
- DATA_PATH: /dataset/imagenet
- IMAGE_SIZE: 224
- NUM_WORKERS: 4
-EVAL: False
-LOCAL_RANK: 0
-MODEL:
- ATTENTION_DROPOUT: 0.1
- DROPOUT: 0.1
- DROPPATH: 0.0
- MAE_PRETRAIN: True
- NAME: vit_base_patch16_224_dec1
- NUM_CLASSES: 1000
- PRETRAINED: None
- RESUME: None
- TRANS:
- DECODER:
- DEPTH: 1
- EMBED_DIM: 512
- NUM_HEADS: 8
- ENCODER:
- DEPTH: 12
- EMBED_DIM: 768
- NUM_HEADS: 12
- MASK_RATIO: 0.75
- MLP_RATIO: 4.0
- PATCH_SIZE: 16
- QKV_BIAS: True
- TYPE: MAE
-NGPUS: 8
-REPORT_FREQ: 100
-SAVE: ./output/train-20211219-16-59-32
-SAVE_FREQ: 1
-SEED: 0
-TAG: default
-TRAIN:
- ACCUM_ITER: 2
- BASE_LR: 0.00015
- CUTMIX_ALPHA: 1.0
- CUTMIX_MINMAX: None
- END_LR: 0.0005
- GRAD_CLIP: 1
- LAST_EPOCH: 0
- LINEAR_SCALED_LR: None
- LR_SCHEDULER:
- DECAY_EPOCHS: 30
- DECAY_RATE: 0.1
- MILESTONES: 30, 60, 90
- NAME: warmupcosine
- MIXUP_ALPHA: 0.8
- MIXUP_MODE: batch
- MIXUP_PROB: 1.0
- MIXUP_SWITCH_PROB: 0.5
- NORMALIZE_TARGET: True
- NUM_EPOCHS: 800
- OPTIMIZER:
- BETAS: (0.9, 0.95)
- EPS: 1e-08
- MOMENTUM: 0.9
- NAME: AdamW
- RAND_AUGMENT: False
- RAND_AUGMENT_LAYERS: 9
- RAND_AUGMENT_MAGNITUDE: 5
- SMOOTHING: 0.1
- WARMUP_EPOCHS: 40
- WARMUP_START_LR: 1e-06
- WEIGHT_DECAY: 0.05
-VALIDATE_FREQ: 100
-INFO:local_logger:----- world_size = 8, local_rank = 0
-INFO:master_logger:----- world_size = 8, local_rank = 0
-INFO:local_logger:----- world_size = 8, local_rank = 7
-INFO:local_logger:----- world_size = 8, local_rank = 5
-INFO:local_logger:----- world_size = 8, local_rank = 1
-INFO:local_logger:----- world_size = 8, local_rank = 2
-INFO:local_logger:----- world_size = 8, local_rank = 4
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:master_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:master_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:master_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- ERROR: Unexpected BUS error encountered in DataLoader worker. This might be caused by insufficient shared memory (shm), please check whether use_shared_memory is set and storage space in /dev/shm is enough
- Exception in thread Thread-1:
-Traceback (most recent call last):
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 583, in _get_data
- data = self._data_queue.get(timeout=self._timeout)
- File "/opt/conda/envs/py36/lib/python3.6/multiprocessing/queues.py", line 105, in get
- raise Empty
-queue.Empty
-
-During handling of the above exception, another exception occurred:
-
-Traceback (most recent call last):
- File "/opt/conda/envs/py36/lib/python3.6/threading.py", line 916, in _bootstrap_inner
- self.run()
- File "/opt/conda/envs/py36/lib/python3.6/threading.py", line 864, in run
- self._target(*self._args, **self._kwargs)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 505, in _thread_loop
- batch = self._get_data()
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dataloader/dataloader_iter.py", line 599, in _get_data
- "pids: {}".format(len(failed_workers), pids))
-RuntimeError: DataLoader 1 workers exit unexpectedly, pids: 23832
-
-
-
---------------------------------------
-C++ Traceback (most recent call last):
---------------------------------------
-No stack trace in paddle, may be caused by external reasons.
-
-----------------------
-Error Message Summary:
-----------------------
-FatalError: `Termination signal` is detected by the operating system.
- [TimeInfo: *** Aborted at 1639904442 (unix time) try "date -d @1639904442" if you are using GNU date ***]
- [SignalInfo: *** SIGTERM (@0x5be5) received by PID 23545 (TID 0x7f5dda7df700) from PID 23525 ***]
-
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 20 leaked semaphores to clean up at shutdown
- len(cache))
-Traceback (most recent call last):
- File "main_multi_gpu_pretrain.py", line 416, in
- main()
- File "main_multi_gpu_pretrain.py", line 412, in main
- dist.spawn(main_worker, args=(config, dataset_train, ), nprocs=config.NGPUS)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 502, in spawn
- while not context.join():
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 312, in join
- self._throw_exception(error_index)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 330, in _throw_exception
- raise Exception(msg)
-Exception:
-
-----------------------------------------------
-Process 3 terminated with the following error:
-----------------------------------------------
-
-Traceback (most recent call last):
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 261, in _func_wrapper
- result = func(*args)
- File "/workspace/ppvit_github/PaddleViT_raw/PaddleViT/image_classification/MAE/main_multi_gpu_pretrain.py", line 368, in main_worker
- master_logger=master_logger)
- File "/workspace/ppvit_github/PaddleViT_raw/PaddleViT/image_classification/MAE/main_multi_gpu_pretrain.py", line 157, in train
- reconstructed_patches = model(images, masks)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dygraph/layers.py", line 914, in __call__
- outputs = self.forward(*inputs, **kwargs)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dygraph/parallel.py", line 695, in forward
- outputs = self._layers(*inputs, **kwargs)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dygraph/layers.py", line 914, in __call__
- outputs = self.forward(*inputs, **kwargs)
- File "/workspace/ppvit_github/PaddleViT_raw/PaddleViT/image_classification/MAE/transformer.py", line 537, in forward
- enc_out = self.encoder(no_mask_x)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dygraph/layers.py", line 914, in __call__
- outputs = self.forward(*inputs, **kwargs)
- File "/workspace/ppvit_github/PaddleViT_raw/PaddleViT/image_classification/MAE/transformer.py", line 364, in forward
- x = layer(x)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dygraph/layers.py", line 914, in __call__
- outputs = self.forward(*inputs, **kwargs)
- File "/workspace/ppvit_github/PaddleViT_raw/PaddleViT/image_classification/MAE/transformer.py", line 310, in forward
- x = self.mlp(x)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dygraph/layers.py", line 914, in __call__
- outputs = self.forward(*inputs, **kwargs)
- File "/workspace/ppvit_github/PaddleViT_raw/PaddleViT/image_classification/MAE/transformer.py", line 245, in forward
- x = self.fc1(x)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dygraph/layers.py", line 914, in __call__
- outputs = self.forward(*inputs, **kwargs)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/nn/layer/common.py", line 172, in forward
- x=input, weight=self.weight, bias=self.bias, name=self.name)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/nn/functional/common.py", line 1474, in linear
- False)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/multiprocess_utils.py", line 134, in __handler__
- core._throw_error_if_process_failed()
-SystemError: (Fatal) DataLoader process (pid 1. If run DataLoader by DataLoader.from_generator(...), queue capacity is set by from_generator(..., capacity=xx, ...).
- 2. If run DataLoader by DataLoader(dataset, ...), queue capacity is set as 2 times of the max value of num_workers and len(places).
- 3. If run by DataLoader(dataset, ..., use_shared_memory=True), set use_shared_memory=False for not using shared memory.) exited is killed by signal: 23723.
- It may be caused by insufficient shared storage space. This problem usually occurs when using docker as a development environment.
- Please use command `df -h` to check the storage space of `/dev/shm`. Shared storage space needs to be greater than (DataLoader Num * DataLoader queue capacity * 1 batch data size).
- You can solve this problem by increasing the shared storage space or reducing the queue capacity appropriately.
-Bus error (at /paddle/paddle/fluid/imperative/data_loader.cc:177)
-
-
-merging config from ./configs/vit_base_patch16_224_pretrain_dec1.yaml
------ Imagenet2012 image train list len = 1281167
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:58819', '127.0.0.1:34756', '127.0.0.1:44071', '127.0.0.1:12661', '127.0.0.1:44311', '127.0.0.1:14139', '127.0.0.1:51679']
-I1219 17:02:09.309500 24382 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:58819 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:34756', '127.0.0.1:44071', '127.0.0.1:12661', '127.0.0.1:44311', '127.0.0.1:14139', '127.0.0.1:51679']
-I1219 17:02:11.901250 24397 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:34756 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:44071', '127.0.0.1:12661', '127.0.0.1:44311', '127.0.0.1:14139', '127.0.0.1:51679']
-I1219 17:02:14.341609 24414 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:44071 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:12661', '127.0.0.1:44311', '127.0.0.1:14139', '127.0.0.1:51679']
-I1219 17:02:17.001890 24429 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:12661 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:44311', '127.0.0.1:14139', '127.0.0.1:51679']
-I1219 17:02:19.379423 24447 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:44311 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:14139', '127.0.0.1:51679']
-I1219 17:02:22.029084 24463 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:14139 successful.
-I1219 17:02:24.569348 24481 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:51679 successful.
-I1219 17:02:24.931157 24382 nccl_context.cc:74] init nccl context nranks: 8 local rank: 1 gpu id: 1 ring id: 0
-I1219 17:02:24.931161 24397 nccl_context.cc:74] init nccl context nranks: 8 local rank: 2 gpu id: 2 ring id: 0
-I1219 17:02:24.931192 24414 nccl_context.cc:74] init nccl context nranks: 8 local rank: 3 gpu id: 3 ring id: 0
-I1219 17:02:24.931200 24429 nccl_context.cc:74] init nccl context nranks: 8 local rank: 4 gpu id: 4 ring id: 0
-I1219 17:02:24.931208 24447 nccl_context.cc:74] init nccl context nranks: 8 local rank: 5 gpu id: 5 ring id: 0
-I1219 17:02:24.931213 24463 nccl_context.cc:74] init nccl context nranks: 8 local rank: 6 gpu id: 6 ring id: 0
-I1219 17:02:24.931216 24481 nccl_context.cc:74] init nccl context nranks: 8 local rank: 7 gpu id: 7 ring id: 0
-I1219 17:02:24.931238 24365 nccl_context.cc:74] init nccl context nranks: 8 local rank: 0 gpu id: 0 ring id: 0
-W1219 17:02:28.374552 24365 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:02:28.374681 24397 device_context.cc:447] Please NOTE: device: 2, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:02:28.374711 24414 device_context.cc:447] Please NOTE: device: 3, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:02:28.374712 24429 device_context.cc:447] Please NOTE: device: 4, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:02:28.374729 24447 device_context.cc:447] Please NOTE: device: 5, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:02:28.374773 24382 device_context.cc:447] Please NOTE: device: 1, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:02:28.374810 24463 device_context.cc:447] Please NOTE: device: 6, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:02:28.376953 24481 device_context.cc:447] Please NOTE: device: 7, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:02:28.382552 24414 device_context.cc:465] device: 3, cuDNN Version: 7.6.
-W1219 17:02:28.382556 24365 device_context.cc:465] device: 0, cuDNN Version: 7.6.
-W1219 17:02:28.382561 24447 device_context.cc:465] device: 5, cuDNN Version: 7.6.
-W1219 17:02:28.382565 24397 device_context.cc:465] device: 2, cuDNN Version: 7.6.
-W1219 17:02:28.382582 24463 device_context.cc:465] device: 6, cuDNN Version: 7.6.
-W1219 17:02:28.382568 24429 device_context.cc:465] device: 4, cuDNN Version: 7.6.
-W1219 17:02:28.382580 24382 device_context.cc:465] device: 1, cuDNN Version: 7.6.
-W1219 17:02:28.382681 24481 device_context.cc:465] device: 7, cuDNN Version: 7.6.
-INFO:local_logger:----- world_size = 8, local_rank = 1
-INFO:local_logger:----- world_size = 8, local_rank = 5
-INFO:local_logger:----- world_size = 8, local_rank = 3
-INFO:local_logger:----- world_size = 8, local_rank = 2
-INFO:local_logger:----- world_size = 8, local_rank = 7
-INFO:local_logger:----- world_size = 8, local_rank = 6
-INFO:master_logger:
-AMP: False
-BASE: ['']
-DATA:
- BATCH_SIZE: 256
- BATCH_SIZE_EVAL: 8
- CROP_PCT: 0.875
- DATASET: imagenet2012
- DATA_PATH: /dataset/imagenet
- IMAGE_SIZE: 224
- NUM_WORKERS: 4
-EVAL: False
-LOCAL_RANK: 0
-MODEL:
- ATTENTION_DROPOUT: 0.1
- DROPOUT: 0.1
- DROPPATH: 0.0
- MAE_PRETRAIN: True
- NAME: vit_base_patch16_224_dec1
- NUM_CLASSES: 1000
- PRETRAINED: None
- RESUME: None
- TRANS:
- DECODER:
- DEPTH: 1
- EMBED_DIM: 512
- NUM_HEADS: 8
- ENCODER:
- DEPTH: 12
- EMBED_DIM: 768
- NUM_HEADS: 12
- MASK_RATIO: 0.75
- MLP_RATIO: 4.0
- PATCH_SIZE: 16
- QKV_BIAS: True
- TYPE: MAE
-NGPUS: 8
-REPORT_FREQ: 100
-SAVE: ./output/train-20211219-17-02-00
-SAVE_FREQ: 1
-SEED: 0
-TAG: default
-TRAIN:
- ACCUM_ITER: 2
- BASE_LR: 0.00015
- CUTMIX_ALPHA: 1.0
- CUTMIX_MINMAX: None
- END_LR: 0.0005
- GRAD_CLIP: 1
- LAST_EPOCH: 0
- LINEAR_SCALED_LR: None
- LR_SCHEDULER:
- DECAY_EPOCHS: 30
- DECAY_RATE: 0.1
- MILESTONES: 30, 60, 90
- NAME: warmupcosine
- MIXUP_ALPHA: 0.8
- MIXUP_MODE: batch
- MIXUP_PROB: 1.0
- MIXUP_SWITCH_PROB: 0.5
- NORMALIZE_TARGET: True
- NUM_EPOCHS: 800
- OPTIMIZER:
- BETAS: (0.9, 0.95)
- EPS: 1e-08
- MOMENTUM: 0.9
- NAME: AdamW
- RAND_AUGMENT: False
- RAND_AUGMENT_LAYERS: 9
- RAND_AUGMENT_MAGNITUDE: 5
- SMOOTHING: 0.1
- WARMUP_EPOCHS: 40
- WARMUP_START_LR: 1e-06
- WEIGHT_DECAY: 0.05
-VALIDATE_FREQ: 100
-INFO:local_logger:----- world_size = 8, local_rank = 0
-INFO:master_logger:----- world_size = 8, local_rank = 0
-INFO:local_logger:----- world_size = 8, local_rank = 4
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:master_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:master_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:master_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1452
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1431
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1469
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1481
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1408
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1501
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1475
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1440
-INFO:master_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1457
-
-
---------------------------------------
-C++ Traceback (most recent call last):
---------------------------------------
-No stack trace in paddle, may be caused by external reasons.
-
-----------------------
-Error Message Summary:
-----------------------
-FatalError: `Termination signal` is detected by the operating system.
- [TimeInfo: *** Aborted at 1639904603 (unix time) try "date -d @1639904603" if you are using GNU date ***]
- [SignalInfo: *** SIGTERM (@0x5f17) received by PID 24365 (TID 0x7f5d5ca46700) from PID 24343 ***]
-
-Traceback (most recent call last):
- File "main_multi_gpu_pretrain.py", line 416, in
- main()
- File "main_multi_gpu_pretrain.py", line 412, in main
- dist.spawn(main_worker, args=(config, dataset_train, ), nprocs=config.NGPUS)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 502, in spawn
- while not context.join():
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 312, in join
- self._throw_exception(error_index)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 330, in _throw_exception
- raise Exception(msg)
-Exception:
-
-----------------------------------------------
-Process 1 terminated with the following error:
-----------------------------------------------
-
-Traceback (most recent call last):
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 261, in _func_wrapper
- result = func(*args)
- File "/workspace/ppvit_github/PaddleViT_raw/PaddleViT/image_classification/MAE/main_multi_gpu_pretrain.py", line 368, in main_worker
- master_logger=master_logger)
- File "/workspace/ppvit_github/PaddleViT_raw/PaddleViT/image_classification/MAE/main_multi_gpu_pretrain.py", line 163, in train
- loss.backward()
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/decorator.py", line 232, in fun
- return caller(func, *(extras + args), **kw)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/wrapped_decorator.py", line 25, in __impl__
- return wrapped_func(*args, **kwargs)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/framework.py", line 229, in __impl__
- return func(*args, **kwargs)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/fluid/dygraph/varbase_patch_methods.py", line 239, in backward
- framework._dygraph_tracer())
-OSError: (External) ResourceExhaustedError:
-
-Out of memory error on GPU 1. Cannot allocate 394.000244MB memory on GPU 1, 15.719788GB memory has been allocated and available memory is only 63.437500MB.
-
-Please check whether there is any other process using GPU 1.
-1. If yes, please stop them, or start PaddlePaddle on another GPU.
-2. If no, please decrease the batch size of your model.
-
- (at /paddle/paddle/fluid/memory/allocation/cuda_allocator.cc:79)
- (at /paddle/paddle/fluid/imperative/basic_engine.cc:568)
-
-
-merging config from ./configs/vit_base_patch16_224_pretrain_dec1.yaml
------ Imagenet2012 image train list len = 1281167
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:45480', '127.0.0.1:58605', '127.0.0.1:23406', '127.0.0.1:16014', '127.0.0.1:60086', '127.0.0.1:60603', '127.0.0.1:46782']
-I1219 17:07:49.286090 25456 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:45480 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:58605', '127.0.0.1:23406', '127.0.0.1:16014', '127.0.0.1:60086', '127.0.0.1:60603', '127.0.0.1:46782']
-I1219 17:07:51.690086 25473 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:58605 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:23406', '127.0.0.1:16014', '127.0.0.1:60086', '127.0.0.1:60603', '127.0.0.1:46782']
-I1219 17:07:54.058967 25488 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:23406 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:16014', '127.0.0.1:60086', '127.0.0.1:60603', '127.0.0.1:46782']
-I1219 17:07:57.064612 25503 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:16014 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:60086', '127.0.0.1:60603', '127.0.0.1:46782']
-I1219 17:07:59.496040 25520 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:60086 successful.
-server not ready, wait 3 sec to retry...
-not ready endpoints:['127.0.0.1:60603', '127.0.0.1:46782']
-I1219 17:08:02.203279 25537 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:60603 successful.
-I1219 17:08:04.597697 25554 gen_comm_id_helper.cc:190] Server listening on: 127.0.0.1:46782 successful.
-I1219 17:08:05.017540 25473 nccl_context.cc:74] init nccl context nranks: 8 local rank: 2 gpu id: 2 ring id: 0
-I1219 17:08:05.017537 25456 nccl_context.cc:74] init nccl context nranks: 8 local rank: 1 gpu id: 1 ring id: 0
-I1219 17:08:05.017560 25488 nccl_context.cc:74] init nccl context nranks: 8 local rank: 3 gpu id: 3 ring id: 0
-I1219 17:08:05.017565 25537 nccl_context.cc:74] init nccl context nranks: 8 local rank: 6 gpu id: 6 ring id: 0
-I1219 17:08:05.017578 25503 nccl_context.cc:74] init nccl context nranks: 8 local rank: 4 gpu id: 4 ring id: 0
-I1219 17:08:05.017585 25520 nccl_context.cc:74] init nccl context nranks: 8 local rank: 5 gpu id: 5 ring id: 0
-I1219 17:08:05.017601 25554 nccl_context.cc:74] init nccl context nranks: 8 local rank: 7 gpu id: 7 ring id: 0
-I1219 17:08:05.017613 25441 nccl_context.cc:74] init nccl context nranks: 8 local rank: 0 gpu id: 0 ring id: 0
-W1219 17:08:09.206136 25441 device_context.cc:447] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:08:09.206564 25456 device_context.cc:447] Please NOTE: device: 1, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:08:09.206579 25554 device_context.cc:447] Please NOTE: device: 7, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:08:09.206670 25488 device_context.cc:447] Please NOTE: device: 3, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:08:09.206694 25520 device_context.cc:447] Please NOTE: device: 5, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:08:09.206728 25503 device_context.cc:447] Please NOTE: device: 4, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:08:09.209081 25537 device_context.cc:447] Please NOTE: device: 6, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:08:09.209785 25473 device_context.cc:447] Please NOTE: device: 2, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
-W1219 17:08:09.212059 25456 device_context.cc:465] device: 1, cuDNN Version: 7.6.
-W1219 17:08:09.212066 25554 device_context.cc:465] device: 7, cuDNN Version: 7.6.
-W1219 17:08:09.212080 25503 device_context.cc:465] device: 4, cuDNN Version: 7.6.
-W1219 17:08:09.212086 25520 device_context.cc:465] device: 5, cuDNN Version: 7.6.
-W1219 17:08:09.212086 25488 device_context.cc:465] device: 3, cuDNN Version: 7.6.
-W1219 17:08:09.212239 25441 device_context.cc:465] device: 0, cuDNN Version: 7.6.
-W1219 17:08:09.213409 25537 device_context.cc:465] device: 6, cuDNN Version: 7.6.
-W1219 17:08:09.214195 25473 device_context.cc:465] device: 2, cuDNN Version: 7.6.
-INFO:local_logger:----- world_size = 8, local_rank = 4
-INFO:local_logger:----- world_size = 8, local_rank = 1
-INFO:local_logger:----- world_size = 8, local_rank = 2
-INFO:master_logger:
-AMP: True
-BASE: ['']
-DATA:
- BATCH_SIZE: 256
- BATCH_SIZE_EVAL: 8
- CROP_PCT: 0.875
- DATASET: imagenet2012
- DATA_PATH: /dataset/imagenet
- IMAGE_SIZE: 224
- NUM_WORKERS: 2
-EVAL: False
-LOCAL_RANK: 0
-MODEL:
- ATTENTION_DROPOUT: 0.0
- DROPOUT: 0.0
- DROPPATH: 0.0
- MAE_PRETRAIN: True
- NAME: vit_base_patch16_224_dec1
- NUM_CLASSES: 1000
- PRETRAINED: None
- RESUME: None
- TRANS:
- DECODER:
- DEPTH: 1
- EMBED_DIM: 512
- NUM_HEADS: 8
- ENCODER:
- DEPTH: 12
- EMBED_DIM: 768
- NUM_HEADS: 12
- MASK_RATIO: 0.75
- MLP_RATIO: 4.0
- PATCH_SIZE: 16
- QKV_BIAS: True
- TYPE: MAE
-NGPUS: 8
-REPORT_FREQ: 100
-SAVE: ./output/train-20211219-17-07-40
-SAVE_FREQ: 1
-SEED: 0
-TAG: default
-TRAIN:
- ACCUM_ITER: 2
- BASE_LR: 0.00015
- CUTMIX_ALPHA: 1.0
- CUTMIX_MINMAX: None
- END_LR: 0.0005
- GRAD_CLIP: 1
- LAST_EPOCH: 0
- LINEAR_SCALED_LR: None
- LR_SCHEDULER:
- DECAY_EPOCHS: 30
- DECAY_RATE: 0.1
- MILESTONES: 30, 60, 90
- NAME: warmupcosine
- MIXUP_ALPHA: 0.8
- MIXUP_MODE: batch
- MIXUP_PROB: 1.0
- MIXUP_SWITCH_PROB: 0.5
- NORMALIZE_TARGET: True
- NUM_EPOCHS: 800
- OPTIMIZER:
- BETAS: (0.9, 0.95)
- EPS: 1e-08
- MOMENTUM: 0.9
- NAME: AdamW
- RAND_AUGMENT: False
- RAND_AUGMENT_LAYERS: 9
- RAND_AUGMENT_MAGNITUDE: 5
- SMOOTHING: 0.1
- WARMUP_EPOCHS: 40
- WARMUP_START_LR: 1e-06
- WEIGHT_DECAY: 0.05
-VALIDATE_FREQ: 100
-INFO:local_logger:----- world_size = 8, local_rank = 0
-INFO:master_logger:----- world_size = 8, local_rank = 0
-INFO:local_logger:----- world_size = 8, local_rank = 6
-INFO:local_logger:----- world_size = 8, local_rank = 5
-INFO:local_logger:----- world_size = 8, local_rank = 7
-INFO:local_logger:----- world_size = 8, local_rank = 3
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:master_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:master_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:master_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:----- Total # of train batch (single gpu): 626
-INFO:local_logger:Start training from epoch 1.
-INFO:local_logger:Now training epoch 1. LR=0.000005
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1468
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1446
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1495
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1428
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1450
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1461
-INFO:master_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1454
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1459
-INFO:local_logger:Epoch[001/800], Step[0000/0626], Avg Loss: 1.1427
-INFO:local_logger:Epoch[001/800], Step[0100/0626], Avg Loss: 1.1136
-INFO:local_logger:Epoch[001/800], Step[0100/0626], Avg Loss: 1.1140
-INFO:local_logger:Epoch[001/800], Step[0100/0626], Avg Loss: 1.1137
-INFO:local_logger:Epoch[001/800], Step[0100/0626], Avg Loss: 1.1132
-INFO:local_logger:Epoch[001/800], Step[0100/0626], Avg Loss: 1.1132
-INFO:master_logger:Epoch[001/800], Step[0100/0626], Avg Loss: 1.1136
-INFO:local_logger:Epoch[001/800], Step[0100/0626], Avg Loss: 1.1135
-INFO:local_logger:Epoch[001/800], Step[0100/0626], Avg Loss: 1.1138
-INFO:local_logger:Epoch[001/800], Step[0100/0626], Avg Loss: 1.1139
-INFO:local_logger:Epoch[001/800], Step[0200/0626], Avg Loss: 1.0903
-INFO:local_logger:Epoch[001/800], Step[0200/0626], Avg Loss: 1.0904
-INFO:local_logger:Epoch[001/800], Step[0200/0626], Avg Loss: 1.0904
-INFO:local_logger:Epoch[001/800], Step[0200/0626], Avg Loss: 1.0908
-INFO:local_logger:Epoch[001/800], Step[0200/0626], Avg Loss: 1.0903
-INFO:local_logger:Epoch[001/800], Step[0200/0626], Avg Loss: 1.0900
-INFO:local_logger:Epoch[001/800], Step[0200/0626], Avg Loss: 1.0904
-INFO:local_logger:Epoch[001/800], Step[0200/0626], Avg Loss: 1.0902
-INFO:master_logger:Epoch[001/800], Step[0200/0626], Avg Loss: 1.0904
-INFO:local_logger:Epoch[001/800], Step[0300/0626], Avg Loss: 1.0723
-INFO:local_logger:Epoch[001/800], Step[0300/0626], Avg Loss: 1.0717
-INFO:local_logger:Epoch[001/800], Step[0300/0626], Avg Loss: 1.0718
-INFO:local_logger:Epoch[001/800], Step[0300/0626], Avg Loss: 1.0716
-INFO:local_logger:Epoch[001/800], Step[0300/0626], Avg Loss: 1.0719
-INFO:master_logger:Epoch[001/800], Step[0300/0626], Avg Loss: 1.0719
-INFO:local_logger:Epoch[001/800], Step[0300/0626], Avg Loss: 1.0718
-INFO:local_logger:Epoch[001/800], Step[0300/0626], Avg Loss: 1.0720
-INFO:local_logger:Epoch[001/800], Step[0300/0626], Avg Loss: 1.0720
-INFO:local_logger:Epoch[001/800], Step[0400/0626], Avg Loss: 1.0576
-INFO:local_logger:Epoch[001/800], Step[0400/0626], Avg Loss: 1.0572
-INFO:local_logger:Epoch[001/800], Step[0400/0626], Avg Loss: 1.0572
-INFO:local_logger:Epoch[001/800], Step[0400/0626], Avg Loss: 1.0570
-INFO:local_logger:Epoch[001/800], Step[0400/0626], Avg Loss: 1.0573
-INFO:local_logger:Epoch[001/800], Step[0400/0626], Avg Loss: 1.0570
-INFO:local_logger:Epoch[001/800], Step[0400/0626], Avg Loss: 1.0573
-INFO:master_logger:Epoch[001/800], Step[0400/0626], Avg Loss: 1.0572
-INFO:local_logger:Epoch[001/800], Step[0400/0626], Avg Loss: 1.0574
-INFO:local_logger:Epoch[001/800], Step[0500/0626], Avg Loss: 1.0461
-INFO:local_logger:Epoch[001/800], Step[0500/0626], Avg Loss: 1.0459
-INFO:local_logger:Epoch[001/800], Step[0500/0626], Avg Loss: 1.0459
-INFO:local_logger:Epoch[001/800], Step[0500/0626], Avg Loss: 1.0461
-INFO:local_logger:Epoch[001/800], Step[0500/0626], Avg Loss: 1.0457
-INFO:local_logger:Epoch[001/800], Step[0500/0626], Avg Loss: 1.0461
-INFO:master_logger:Epoch[001/800], Step[0500/0626], Avg Loss: 1.0460
-INFO:local_logger:Epoch[001/800], Step[0500/0626], Avg Loss: 1.0463
-INFO:local_logger:Epoch[001/800], Step[0500/0626], Avg Loss: 1.0461
-INFO:local_logger:Epoch[001/800], Step[0600/0626], Avg Loss: 1.0374
-INFO:local_logger:Epoch[001/800], Step[0600/0626], Avg Loss: 1.0374
-INFO:local_logger:Epoch[001/800], Step[0600/0626], Avg Loss: 1.0375
-INFO:local_logger:Epoch[001/800], Step[0600/0626], Avg Loss: 1.0375
-INFO:master_logger:Epoch[001/800], Step[0600/0626], Avg Loss: 1.0375
-INFO:local_logger:Epoch[001/800], Step[0600/0626], Avg Loss: 1.0372
-INFO:local_logger:Epoch[001/800], Step[0600/0626], Avg Loss: 1.0377
-INFO:local_logger:Epoch[001/800], Step[0600/0626], Avg Loss: 1.0379
-INFO:local_logger:Epoch[001/800], Step[0600/0626], Avg Loss: 1.0374
-INFO:local_logger:----- Epoch[001/800], Train Loss: 1.0359, time: 934.80
-INFO:local_logger:Now training epoch 2. LR=0.000008
-INFO:local_logger:----- Epoch[001/800], Train Loss: 1.0356, time: 934.81
-INFO:local_logger:Now training epoch 2. LR=0.000008
-INFO:local_logger:----- Epoch[001/800], Train Loss: 1.0354, time: 934.86
-INFO:local_logger:Now training epoch 2. LR=0.000008
-INFO:local_logger:----- Epoch[001/800], Train Loss: 1.0361, time: 934.98
-INFO:local_logger:Now training epoch 2. LR=0.000008
-INFO:local_logger:----- Epoch[001/800], Train Loss: 1.0358, time: 935.03
-INFO:master_logger:----- Epoch[001/800], Train Loss: 1.0357, time: 935.03
-INFO:local_logger:----- Epoch[001/800], Train Loss: 1.0358, time: 935.07
-INFO:local_logger:Now training epoch 2. LR=0.000008
-INFO:local_logger:----- Epoch[001/800], Train Loss: 1.0356, time: 935.07
-INFO:local_logger:Now training epoch 2. LR=0.000008
-INFO:local_logger:----- Epoch[001/800], Train Loss: 1.0357, time: 935.09
-INFO:local_logger:Now training epoch 2. LR=0.000008
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-1-Loss-1.0357822933105671.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-1-Loss-1.0357822933105671.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-1-Loss-1.0357822933105671.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-1-Loss-1.0357822933105671.pdopt
-INFO:local_logger:Now training epoch 2. LR=0.000008
-INFO:master_logger:Now training epoch 2. LR=0.000008
-INFO:local_logger:Epoch[002/800], Step[0000/0626], Avg Loss: 0.9953
-INFO:master_logger:Epoch[002/800], Step[0000/0626], Avg Loss: 0.9905
-INFO:local_logger:Epoch[002/800], Step[0000/0626], Avg Loss: 0.9836
-INFO:local_logger:Epoch[002/800], Step[0000/0626], Avg Loss: 0.9941
-INFO:local_logger:Epoch[002/800], Step[0000/0626], Avg Loss: 0.9887
-INFO:local_logger:Epoch[002/800], Step[0000/0626], Avg Loss: 0.9872
-INFO:local_logger:Epoch[002/800], Step[0000/0626], Avg Loss: 0.9919
-INFO:local_logger:Epoch[002/800], Step[0000/0626], Avg Loss: 0.9949
-INFO:local_logger:Epoch[002/800], Step[0000/0626], Avg Loss: 0.9885
-INFO:local_logger:Epoch[002/800], Step[0100/0626], Avg Loss: 0.9896
-INFO:local_logger:Epoch[002/800], Step[0100/0626], Avg Loss: 0.9894
-INFO:local_logger:Epoch[002/800], Step[0100/0626], Avg Loss: 0.9900
-INFO:local_logger:Epoch[002/800], Step[0100/0626], Avg Loss: 0.9895
-INFO:local_logger:Epoch[002/800], Step[0100/0626], Avg Loss: 0.9901
-INFO:local_logger:Epoch[002/800], Step[0100/0626], Avg Loss: 0.9887
-INFO:local_logger:Epoch[002/800], Step[0100/0626], Avg Loss: 0.9897
-INFO:master_logger:Epoch[002/800], Step[0100/0626], Avg Loss: 0.9896
-INFO:local_logger:Epoch[002/800], Step[0100/0626], Avg Loss: 0.9900
-INFO:local_logger:Epoch[002/800], Step[0200/0626], Avg Loss: 0.9880
-INFO:local_logger:Epoch[002/800], Step[0200/0626], Avg Loss: 0.9889
-INFO:local_logger:Epoch[002/800], Step[0200/0626], Avg Loss: 0.9887
-INFO:local_logger:Epoch[002/800], Step[0200/0626], Avg Loss: 0.9883
-INFO:local_logger:Epoch[002/800], Step[0200/0626], Avg Loss: 0.9887
-INFO:local_logger:Epoch[002/800], Step[0200/0626], Avg Loss: 0.9887
-INFO:local_logger:Epoch[002/800], Step[0200/0626], Avg Loss: 0.9883
-INFO:master_logger:Epoch[002/800], Step[0200/0626], Avg Loss: 0.9885
-INFO:local_logger:Epoch[002/800], Step[0200/0626], Avg Loss: 0.9883
-INFO:local_logger:Epoch[002/800], Step[0300/0626], Avg Loss: 0.9878
-INFO:local_logger:Epoch[002/800], Step[0300/0626], Avg Loss: 0.9874
-INFO:local_logger:Epoch[002/800], Step[0300/0626], Avg Loss: 0.9873
-INFO:local_logger:Epoch[002/800], Step[0300/0626], Avg Loss: 0.9875
-INFO:master_logger:Epoch[002/800], Step[0300/0626], Avg Loss: 0.9876
-INFO:local_logger:Epoch[002/800], Step[0300/0626], Avg Loss: 0.9877
-INFO:local_logger:Epoch[002/800], Step[0300/0626], Avg Loss: 0.9880
-INFO:local_logger:Epoch[002/800], Step[0300/0626], Avg Loss: 0.9878
-INFO:local_logger:Epoch[002/800], Step[0300/0626], Avg Loss: 0.9872
-INFO:local_logger:Epoch[002/800], Step[0400/0626], Avg Loss: 0.9872
-INFO:local_logger:Epoch[002/800], Step[0400/0626], Avg Loss: 0.9870
-INFO:local_logger:Epoch[002/800], Step[0400/0626], Avg Loss: 0.9867
-INFO:local_logger:Epoch[002/800], Step[0400/0626], Avg Loss: 0.9867
-INFO:local_logger:Epoch[002/800], Step[0400/0626], Avg Loss: 0.9870
-INFO:local_logger:Epoch[002/800], Step[0400/0626], Avg Loss: 0.9871
-INFO:local_logger:Epoch[002/800], Step[0400/0626], Avg Loss: 0.9870
-INFO:local_logger:Epoch[002/800], Step[0400/0626], Avg Loss: 0.9868
-INFO:master_logger:Epoch[002/800], Step[0400/0626], Avg Loss: 0.9869
-INFO:local_logger:Epoch[002/800], Step[0500/0626], Avg Loss: 0.9862
-INFO:local_logger:Epoch[002/800], Step[0500/0626], Avg Loss: 0.9865
-INFO:local_logger:Epoch[002/800], Step[0500/0626], Avg Loss: 0.9861
-INFO:local_logger:Epoch[002/800], Step[0500/0626], Avg Loss: 0.9864
-INFO:local_logger:Epoch[002/800], Step[0500/0626], Avg Loss: 0.9863
-INFO:local_logger:Epoch[002/800], Step[0500/0626], Avg Loss: 0.9861
-INFO:local_logger:Epoch[002/800], Step[0500/0626], Avg Loss: 0.9862
-INFO:local_logger:Epoch[002/800], Step[0500/0626], Avg Loss: 0.9863
-INFO:master_logger:Epoch[002/800], Step[0500/0626], Avg Loss: 0.9863
-INFO:local_logger:Epoch[002/800], Step[0600/0626], Avg Loss: 0.9856
-INFO:local_logger:Epoch[002/800], Step[0600/0626], Avg Loss: 0.9858
-INFO:local_logger:Epoch[002/800], Step[0600/0626], Avg Loss: 0.9858
-INFO:local_logger:Epoch[002/800], Step[0600/0626], Avg Loss: 0.9855
-INFO:local_logger:Epoch[002/800], Step[0600/0626], Avg Loss: 0.9855
-INFO:local_logger:Epoch[002/800], Step[0600/0626], Avg Loss: 0.9856
-INFO:master_logger:Epoch[002/800], Step[0600/0626], Avg Loss: 0.9856
-INFO:local_logger:Epoch[002/800], Step[0600/0626], Avg Loss: 0.9856
-INFO:local_logger:Epoch[002/800], Step[0600/0626], Avg Loss: 0.9856
-INFO:local_logger:----- Epoch[002/800], Train Loss: 0.9857, time: 891.36
-INFO:local_logger:Now training epoch 3. LR=0.000012
-INFO:local_logger:----- Epoch[002/800], Train Loss: 0.9855, time: 891.28
-INFO:local_logger:Now training epoch 3. LR=0.000012
-INFO:local_logger:----- Epoch[002/800], Train Loss: 0.9853, time: 891.70
-INFO:local_logger:Now training epoch 3. LR=0.000012
-INFO:local_logger:----- Epoch[002/800], Train Loss: 0.9855, time: 891.46
-INFO:local_logger:Now training epoch 3. LR=0.000012
-INFO:local_logger:----- Epoch[002/800], Train Loss: 0.9853, time: 891.66
-INFO:local_logger:Now training epoch 3. LR=0.000012
-INFO:local_logger:----- Epoch[002/800], Train Loss: 0.9855, time: 891.47
-INFO:local_logger:Now training epoch 3. LR=0.000012
-INFO:local_logger:----- Epoch[002/800], Train Loss: 0.9857, time: 891.56
-INFO:local_logger:Now training epoch 3. LR=0.000012
-INFO:local_logger:----- Epoch[002/800], Train Loss: 0.9854, time: 887.62
-INFO:master_logger:----- Epoch[002/800], Train Loss: 0.9855, time: 887.62
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-2-Loss-0.9854484576284688.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-2-Loss-0.9854484576284688.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-2-Loss-0.9854484576284688.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-2-Loss-0.9854484576284688.pdopt
-INFO:local_logger:Now training epoch 3. LR=0.000012
-INFO:master_logger:Now training epoch 3. LR=0.000012
-INFO:local_logger:Epoch[003/800], Step[0000/0626], Avg Loss: 0.9859
-INFO:local_logger:Epoch[003/800], Step[0000/0626], Avg Loss: 0.9784
-INFO:local_logger:Epoch[003/800], Step[0000/0626], Avg Loss: 0.9751
-INFO:master_logger:Epoch[003/800], Step[0000/0626], Avg Loss: 0.9809
-INFO:local_logger:Epoch[003/800], Step[0000/0626], Avg Loss: 0.9834
-INFO:local_logger:Epoch[003/800], Step[0000/0626], Avg Loss: 0.9795
-INFO:local_logger:Epoch[003/800], Step[0000/0626], Avg Loss: 0.9809
-INFO:local_logger:Epoch[003/800], Step[0000/0626], Avg Loss: 0.9833
-INFO:local_logger:Epoch[003/800], Step[0000/0626], Avg Loss: 0.9810
-INFO:local_logger:Epoch[003/800], Step[0100/0626], Avg Loss: 0.9816
-INFO:local_logger:Epoch[003/800], Step[0100/0626], Avg Loss: 0.9810
-INFO:local_logger:Epoch[003/800], Step[0100/0626], Avg Loss: 0.9814
-INFO:local_logger:Epoch[003/800], Step[0100/0626], Avg Loss: 0.9810
-INFO:master_logger:Epoch[003/800], Step[0100/0626], Avg Loss: 0.9813
-INFO:local_logger:Epoch[003/800], Step[0100/0626], Avg Loss: 0.9813
-INFO:local_logger:Epoch[003/800], Step[0100/0626], Avg Loss: 0.9814
-INFO:local_logger:Epoch[003/800], Step[0100/0626], Avg Loss: 0.9813
-INFO:local_logger:Epoch[003/800], Step[0100/0626], Avg Loss: 0.9814
-INFO:local_logger:Epoch[003/800], Step[0200/0626], Avg Loss: 0.9807
-INFO:local_logger:Epoch[003/800], Step[0200/0626], Avg Loss: 0.9808
-INFO:local_logger:Epoch[003/800], Step[0200/0626], Avg Loss: 0.9808
-INFO:local_logger:Epoch[003/800], Step[0200/0626], Avg Loss: 0.9806
-INFO:local_logger:Epoch[003/800], Step[0200/0626], Avg Loss: 0.9806
-INFO:local_logger:Epoch[003/800], Step[0200/0626], Avg Loss: 0.9804
-INFO:local_logger:Epoch[003/800], Step[0200/0626], Avg Loss: 0.9804
-INFO:local_logger:Epoch[003/800], Step[0200/0626], Avg Loss: 0.9804
-INFO:master_logger:Epoch[003/800], Step[0200/0626], Avg Loss: 0.9806
-INFO:local_logger:Epoch[003/800], Step[0300/0626], Avg Loss: 0.9797
-INFO:local_logger:Epoch[003/800], Step[0300/0626], Avg Loss: 0.9799
-INFO:local_logger:Epoch[003/800], Step[0300/0626], Avg Loss: 0.9799
-INFO:local_logger:Epoch[003/800], Step[0300/0626], Avg Loss: 0.9802
-INFO:local_logger:Epoch[003/800], Step[0300/0626], Avg Loss: 0.9797
-INFO:master_logger:Epoch[003/800], Step[0300/0626], Avg Loss: 0.9799
-INFO:local_logger:Epoch[003/800], Step[0300/0626], Avg Loss: 0.9798
-INFO:local_logger:Epoch[003/800], Step[0300/0626], Avg Loss: 0.9799
-INFO:local_logger:Epoch[003/800], Step[0300/0626], Avg Loss: 0.9798
-INFO:local_logger:Epoch[003/800], Step[0400/0626], Avg Loss: 0.9791
-INFO:local_logger:Epoch[003/800], Step[0400/0626], Avg Loss: 0.9790
-INFO:local_logger:Epoch[003/800], Step[0400/0626], Avg Loss: 0.9793
-INFO:local_logger:Epoch[003/800], Step[0400/0626], Avg Loss: 0.9789
-INFO:local_logger:Epoch[003/800], Step[0400/0626], Avg Loss: 0.9789
-INFO:master_logger:Epoch[003/800], Step[0400/0626], Avg Loss: 0.9790
-INFO:local_logger:Epoch[003/800], Step[0400/0626], Avg Loss: 0.9789
-INFO:local_logger:Epoch[003/800], Step[0400/0626], Avg Loss: 0.9789
-INFO:local_logger:Epoch[003/800], Step[0400/0626], Avg Loss: 0.9791
-INFO:local_logger:Epoch[003/800], Step[0500/0626], Avg Loss: 0.9780
-INFO:local_logger:Epoch[003/800], Step[0500/0626], Avg Loss: 0.9782
-INFO:local_logger:Epoch[003/800], Step[0500/0626], Avg Loss: 0.9782
-INFO:local_logger:Epoch[003/800], Step[0500/0626], Avg Loss: 0.9783
-INFO:master_logger:Epoch[003/800], Step[0500/0626], Avg Loss: 0.9782
-INFO:local_logger:Epoch[003/800], Step[0500/0626], Avg Loss: 0.9781
-INFO:local_logger:Epoch[003/800], Step[0500/0626], Avg Loss: 0.9786
-INFO:local_logger:Epoch[003/800], Step[0500/0626], Avg Loss: 0.9783
-INFO:local_logger:Epoch[003/800], Step[0500/0626], Avg Loss: 0.9781
-INFO:local_logger:Epoch[003/800], Step[0600/0626], Avg Loss: 0.9776
-INFO:local_logger:Epoch[003/800], Step[0600/0626], Avg Loss: 0.9776
-INFO:local_logger:Epoch[003/800], Step[0600/0626], Avg Loss: 0.9774
-INFO:local_logger:Epoch[003/800], Step[0600/0626], Avg Loss: 0.9774
-INFO:local_logger:Epoch[003/800], Step[0600/0626], Avg Loss: 0.9778
-INFO:master_logger:Epoch[003/800], Step[0600/0626], Avg Loss: 0.9775
-INFO:local_logger:Epoch[003/800], Step[0600/0626], Avg Loss: 0.9774
-INFO:local_logger:Epoch[003/800], Step[0600/0626], Avg Loss: 0.9773
-INFO:local_logger:Epoch[003/800], Step[0600/0626], Avg Loss: 0.9773
-INFO:local_logger:----- Epoch[003/800], Train Loss: 0.9774, time: 893.09
-INFO:local_logger:Now training epoch 4. LR=0.000016
-INFO:local_logger:----- Epoch[003/800], Train Loss: 0.9772, time: 893.23
-INFO:local_logger:Now training epoch 4. LR=0.000016
-INFO:local_logger:----- Epoch[003/800], Train Loss: 0.9776, time: 893.27
-INFO:local_logger:Now training epoch 4. LR=0.000016
-INFO:local_logger:----- Epoch[003/800], Train Loss: 0.9771, time: 893.31
-INFO:local_logger:Now training epoch 4. LR=0.000016
-INFO:local_logger:----- Epoch[003/800], Train Loss: 0.9772, time: 893.74
-INFO:local_logger:Now training epoch 4. LR=0.000016
-INFO:local_logger:----- Epoch[003/800], Train Loss: 0.9772, time: 889.63
-INFO:master_logger:----- Epoch[003/800], Train Loss: 0.9773, time: 889.63
-INFO:local_logger:----- Epoch[003/800], Train Loss: 0.9773, time: 893.40
-INFO:local_logger:Now training epoch 4. LR=0.000016
-INFO:local_logger:----- Epoch[003/800], Train Loss: 0.9775, time: 893.56
-INFO:local_logger:Now training epoch 4. LR=0.000016
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-3-Loss-0.9772286424963117.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-3-Loss-0.9772286424963117.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-3-Loss-0.9772286424963117.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-3-Loss-0.9772286424963117.pdopt
-INFO:local_logger:Now training epoch 4. LR=0.000016
-INFO:master_logger:Now training epoch 4. LR=0.000016
-INFO:local_logger:Epoch[004/800], Step[0000/0626], Avg Loss: 0.9778
-INFO:local_logger:Epoch[004/800], Step[0000/0626], Avg Loss: 0.9751
-INFO:local_logger:Epoch[004/800], Step[0000/0626], Avg Loss: 0.9713
-INFO:master_logger:Epoch[004/800], Step[0000/0626], Avg Loss: 0.9734
-INFO:local_logger:Epoch[004/800], Step[0000/0626], Avg Loss: 0.9753
-INFO:local_logger:Epoch[004/800], Step[0000/0626], Avg Loss: 0.9753
-INFO:local_logger:Epoch[004/800], Step[0000/0626], Avg Loss: 0.9704
-INFO:local_logger:Epoch[004/800], Step[0000/0626], Avg Loss: 0.9683
-INFO:local_logger:Epoch[004/800], Step[0000/0626], Avg Loss: 0.9740
-INFO:local_logger:Epoch[004/800], Step[0100/0626], Avg Loss: 0.9727
-INFO:local_logger:Epoch[004/800], Step[0100/0626], Avg Loss: 0.9724
-INFO:local_logger:Epoch[004/800], Step[0100/0626], Avg Loss: 0.9730
-INFO:master_logger:Epoch[004/800], Step[0100/0626], Avg Loss: 0.9728
-INFO:local_logger:Epoch[004/800], Step[0100/0626], Avg Loss: 0.9731
-INFO:local_logger:Epoch[004/800], Step[0100/0626], Avg Loss: 0.9730
-INFO:local_logger:Epoch[004/800], Step[0100/0626], Avg Loss: 0.9729
-INFO:local_logger:Epoch[004/800], Step[0100/0626], Avg Loss: 0.9730
-INFO:local_logger:Epoch[004/800], Step[0100/0626], Avg Loss: 0.9726
-INFO:local_logger:Epoch[004/800], Step[0200/0626], Avg Loss: 0.9724
-INFO:local_logger:Epoch[004/800], Step[0200/0626], Avg Loss: 0.9725
-INFO:local_logger:Epoch[004/800], Step[0200/0626], Avg Loss: 0.9721
-INFO:local_logger:Epoch[004/800], Step[0200/0626], Avg Loss: 0.9721
-INFO:local_logger:Epoch[004/800], Step[0200/0626], Avg Loss: 0.9721
-INFO:local_logger:Epoch[004/800], Step[0200/0626], Avg Loss: 0.9720
-INFO:local_logger:Epoch[004/800], Step[0200/0626], Avg Loss: 0.9722
-INFO:local_logger:Epoch[004/800], Step[0200/0626], Avg Loss: 0.9724
-INFO:master_logger:Epoch[004/800], Step[0200/0626], Avg Loss: 0.9722
-INFO:local_logger:Epoch[004/800], Step[0300/0626], Avg Loss: 0.9715
-INFO:local_logger:Epoch[004/800], Step[0300/0626], Avg Loss: 0.9717
-INFO:local_logger:Epoch[004/800], Step[0300/0626], Avg Loss: 0.9717
-INFO:local_logger:Epoch[004/800], Step[0300/0626], Avg Loss: 0.9720
-INFO:master_logger:Epoch[004/800], Step[0300/0626], Avg Loss: 0.9717
-INFO:local_logger:Epoch[004/800], Step[0300/0626], Avg Loss: 0.9712
-INFO:local_logger:Epoch[004/800], Step[0300/0626], Avg Loss: 0.9718
-INFO:local_logger:Epoch[004/800], Step[0300/0626], Avg Loss: 0.9718
-INFO:local_logger:Epoch[004/800], Step[0300/0626], Avg Loss: 0.9716
-INFO:local_logger:Epoch[004/800], Step[0400/0626], Avg Loss: 0.9712
-INFO:local_logger:Epoch[004/800], Step[0400/0626], Avg Loss: 0.9711
-INFO:local_logger:Epoch[004/800], Step[0400/0626], Avg Loss: 0.9711
-INFO:local_logger:Epoch[004/800], Step[0400/0626], Avg Loss: 0.9715
-INFO:local_logger:Epoch[004/800], Step[0400/0626], Avg Loss: 0.9712
-INFO:local_logger:Epoch[004/800], Step[0400/0626], Avg Loss: 0.9709
-INFO:master_logger:Epoch[004/800], Step[0400/0626], Avg Loss: 0.9712
-INFO:local_logger:Epoch[004/800], Step[0400/0626], Avg Loss: 0.9714
-INFO:local_logger:Epoch[004/800], Step[0400/0626], Avg Loss: 0.9714
-INFO:local_logger:Epoch[004/800], Step[0500/0626], Avg Loss: 0.9707
-INFO:local_logger:Epoch[004/800], Step[0500/0626], Avg Loss: 0.9706
-INFO:local_logger:Epoch[004/800], Step[0500/0626], Avg Loss: 0.9709
-INFO:local_logger:Epoch[004/800], Step[0500/0626], Avg Loss: 0.9709
-INFO:local_logger:Epoch[004/800], Step[0500/0626], Avg Loss: 0.9707
-INFO:local_logger:Epoch[004/800], Step[0500/0626], Avg Loss: 0.9708
-INFO:local_logger:Epoch[004/800], Step[0500/0626], Avg Loss: 0.9705
-INFO:master_logger:Epoch[004/800], Step[0500/0626], Avg Loss: 0.9707
-INFO:local_logger:Epoch[004/800], Step[0500/0626], Avg Loss: 0.9706
-INFO:local_logger:Epoch[004/800], Step[0600/0626], Avg Loss: 0.9701
-INFO:local_logger:Epoch[004/800], Step[0600/0626], Avg Loss: 0.9704
-INFO:local_logger:Epoch[004/800], Step[0600/0626], Avg Loss: 0.9703
-INFO:local_logger:Epoch[004/800], Step[0600/0626], Avg Loss: 0.9701
-INFO:local_logger:Epoch[004/800], Step[0600/0626], Avg Loss: 0.9703
-INFO:local_logger:Epoch[004/800], Step[0600/0626], Avg Loss: 0.9701
-INFO:local_logger:Epoch[004/800], Step[0600/0626], Avg Loss: 0.9704
-INFO:master_logger:Epoch[004/800], Step[0600/0626], Avg Loss: 0.9703
-INFO:local_logger:Epoch[004/800], Step[0600/0626], Avg Loss: 0.9704
-INFO:local_logger:----- Epoch[004/800], Train Loss: 0.9702, time: 854.73
-INFO:local_logger:Now training epoch 5. LR=0.000020
-INFO:local_logger:----- Epoch[004/800], Train Loss: 0.9703, time: 851.06
-INFO:master_logger:----- Epoch[004/800], Train Loss: 0.9702, time: 851.06
-INFO:local_logger:----- Epoch[004/800], Train Loss: 0.9703, time: 854.82
-INFO:local_logger:Now training epoch 5. LR=0.000020
-INFO:local_logger:----- Epoch[004/800], Train Loss: 0.9700, time: 855.11
-INFO:local_logger:Now training epoch 5. LR=0.000020
-INFO:local_logger:----- Epoch[004/800], Train Loss: 0.9700, time: 855.36
-INFO:local_logger:Now training epoch 5. LR=0.000020
-INFO:local_logger:----- Epoch[004/800], Train Loss: 0.9703, time: 855.48
-INFO:local_logger:----- Epoch[004/800], Train Loss: 0.9700, time: 855.31
-INFO:local_logger:Now training epoch 5. LR=0.000020
-INFO:local_logger:Now training epoch 5. LR=0.000020
-INFO:local_logger:----- Epoch[004/800], Train Loss: 0.9702, time: 855.19
-INFO:local_logger:Now training epoch 5. LR=0.000020
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-4-Loss-0.97028241060033.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-4-Loss-0.97028241060033.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-4-Loss-0.97028241060033.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-4-Loss-0.97028241060033.pdopt
-INFO:local_logger:Now training epoch 5. LR=0.000020
-INFO:master_logger:Now training epoch 5. LR=0.000020
-INFO:local_logger:Epoch[005/800], Step[0000/0626], Avg Loss: 0.9655
-INFO:local_logger:Epoch[005/800], Step[0000/0626], Avg Loss: 0.9667
-INFO:local_logger:Epoch[005/800], Step[0000/0626], Avg Loss: 0.9651
-INFO:master_logger:Epoch[005/800], Step[0000/0626], Avg Loss: 0.9667
-INFO:local_logger:Epoch[005/800], Step[0000/0626], Avg Loss: 0.9671
-INFO:local_logger:Epoch[005/800], Step[0000/0626], Avg Loss: 0.9619
-INFO:local_logger:Epoch[005/800], Step[0000/0626], Avg Loss: 0.9712
-INFO:local_logger:Epoch[005/800], Step[0000/0626], Avg Loss: 0.9685
-INFO:local_logger:Epoch[005/800], Step[0000/0626], Avg Loss: 0.9674
-INFO:local_logger:Epoch[005/800], Step[0100/0626], Avg Loss: 0.9675
-INFO:local_logger:Epoch[005/800], Step[0100/0626], Avg Loss: 0.9674
-INFO:local_logger:Epoch[005/800], Step[0100/0626], Avg Loss: 0.9672
-INFO:local_logger:Epoch[005/800], Step[0100/0626], Avg Loss: 0.9682
-INFO:local_logger:Epoch[005/800], Step[0100/0626], Avg Loss: 0.9673
-INFO:local_logger:Epoch[005/800], Step[0100/0626], Avg Loss: 0.9671
-INFO:local_logger:Epoch[005/800], Step[0100/0626], Avg Loss: 0.9679
-INFO:master_logger:Epoch[005/800], Step[0100/0626], Avg Loss: 0.9675
-INFO:local_logger:Epoch[005/800], Step[0100/0626], Avg Loss: 0.9672
-INFO:local_logger:Epoch[005/800], Step[0200/0626], Avg Loss: 0.9670
-INFO:local_logger:Epoch[005/800], Step[0200/0626], Avg Loss: 0.9665
-INFO:local_logger:Epoch[005/800], Step[0200/0626], Avg Loss: 0.9669
-INFO:local_logger:Epoch[005/800], Step[0200/0626], Avg Loss: 0.9669
-INFO:local_logger:Epoch[005/800], Step[0200/0626], Avg Loss: 0.9666
-INFO:local_logger:Epoch[005/800], Step[0200/0626], Avg Loss: 0.9673
-INFO:local_logger:Epoch[005/800], Step[0200/0626], Avg Loss: 0.9672
-INFO:local_logger:Epoch[005/800], Step[0200/0626], Avg Loss: 0.9671
-INFO:master_logger:Epoch[005/800], Step[0200/0626], Avg Loss: 0.9669
-INFO:local_logger:Epoch[005/800], Step[0300/0626], Avg Loss: 0.9661
-INFO:local_logger:Epoch[005/800], Step[0300/0626], Avg Loss: 0.9663
-INFO:local_logger:Epoch[005/800], Step[0300/0626], Avg Loss: 0.9665
-INFO:local_logger:Epoch[005/800], Step[0300/0626], Avg Loss: 0.9665
-INFO:local_logger:Epoch[005/800], Step[0300/0626], Avg Loss: 0.9664
-INFO:local_logger:Epoch[005/800], Step[0300/0626], Avg Loss: 0.9667
-INFO:master_logger:Epoch[005/800], Step[0300/0626], Avg Loss: 0.9665
-INFO:local_logger:Epoch[005/800], Step[0300/0626], Avg Loss: 0.9668
-INFO:local_logger:Epoch[005/800], Step[0300/0626], Avg Loss: 0.9665
-INFO:local_logger:Epoch[005/800], Step[0400/0626], Avg Loss: 0.9661
-INFO:master_logger:Epoch[005/800], Step[0400/0626], Avg Loss: 0.9660
-INFO:local_logger:Epoch[005/800], Step[0400/0626], Avg Loss: 0.9662
-INFO:local_logger:Epoch[005/800], Step[0400/0626], Avg Loss: 0.9660
-INFO:local_logger:Epoch[005/800], Step[0400/0626], Avg Loss: 0.9661
-INFO:local_logger:Epoch[005/800], Step[0400/0626], Avg Loss: 0.9658
-INFO:local_logger:Epoch[005/800], Step[0400/0626], Avg Loss: 0.9660
-INFO:local_logger:Epoch[005/800], Step[0400/0626], Avg Loss: 0.9658
-INFO:local_logger:Epoch[005/800], Step[0400/0626], Avg Loss: 0.9660
-INFO:local_logger:Epoch[005/800], Step[0500/0626], Avg Loss: 0.9655
-INFO:local_logger:Epoch[005/800], Step[0500/0626], Avg Loss: 0.9655
-INFO:local_logger:Epoch[005/800], Step[0500/0626], Avg Loss: 0.9657
-INFO:local_logger:Epoch[005/800], Step[0500/0626], Avg Loss: 0.9657
-INFO:local_logger:Epoch[005/800], Step[0500/0626], Avg Loss: 0.9656
-INFO:local_logger:Epoch[005/800], Step[0500/0626], Avg Loss: 0.9656
-INFO:local_logger:Epoch[005/800], Step[0500/0626], Avg Loss: 0.9657
-INFO:master_logger:Epoch[005/800], Step[0500/0626], Avg Loss: 0.9656
-INFO:local_logger:Epoch[005/800], Step[0500/0626], Avg Loss: 0.9654
-INFO:local_logger:Epoch[005/800], Step[0600/0626], Avg Loss: 0.9651
-INFO:local_logger:Epoch[005/800], Step[0600/0626], Avg Loss: 0.9653
-INFO:local_logger:Epoch[005/800], Step[0600/0626], Avg Loss: 0.9653
-INFO:local_logger:Epoch[005/800], Step[0600/0626], Avg Loss: 0.9654
-INFO:local_logger:Epoch[005/800], Step[0600/0626], Avg Loss: 0.9652
-INFO:local_logger:Epoch[005/800], Step[0600/0626], Avg Loss: 0.9652
-INFO:local_logger:Epoch[005/800], Step[0600/0626], Avg Loss: 0.9649
-INFO:master_logger:Epoch[005/800], Step[0600/0626], Avg Loss: 0.9652
-INFO:local_logger:Epoch[005/800], Step[0600/0626], Avg Loss: 0.9651
-INFO:local_logger:----- Epoch[005/800], Train Loss: 0.9648, time: 889.02
-INFO:local_logger:Now training epoch 6. LR=0.000023
-INFO:local_logger:----- Epoch[005/800], Train Loss: 0.9651, time: 889.10
-INFO:local_logger:Now training epoch 6. LR=0.000023
-INFO:local_logger:----- Epoch[005/800], Train Loss: 0.9652, time: 889.53
-INFO:local_logger:Now training epoch 6. LR=0.000023
-INFO:local_logger:----- Epoch[005/800], Train Loss: 0.9652, time: 885.85
-INFO:master_logger:----- Epoch[005/800], Train Loss: 0.9651, time: 885.85
-INFO:local_logger:----- Epoch[005/800], Train Loss: 0.9651, time: 889.20
-INFO:local_logger:Now training epoch 6. LR=0.000023
-INFO:local_logger:----- Epoch[005/800], Train Loss: 0.9650, time: 889.56
-INFO:local_logger:Now training epoch 6. LR=0.000023
-INFO:local_logger:----- Epoch[005/800], Train Loss: 0.9650, time: 889.67
-INFO:local_logger:Now training epoch 6. LR=0.000023
-INFO:local_logger:----- Epoch[005/800], Train Loss: 0.9653, time: 890.15
-INFO:local_logger:Now training epoch 6. LR=0.000023
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-5-Loss-0.9652042168475674.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-5-Loss-0.9652042168475674.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-5-Loss-0.9652042168475674.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-5-Loss-0.9652042168475674.pdopt
-INFO:local_logger:Now training epoch 6. LR=0.000023
-INFO:master_logger:Now training epoch 6. LR=0.000023
-INFO:local_logger:Epoch[006/800], Step[0000/0626], Avg Loss: 0.9660
-INFO:local_logger:Epoch[006/800], Step[0000/0626], Avg Loss: 0.9643
-INFO:master_logger:Epoch[006/800], Step[0000/0626], Avg Loss: 0.9604
-INFO:local_logger:Epoch[006/800], Step[0000/0626], Avg Loss: 0.9476
-INFO:local_logger:Epoch[006/800], Step[0000/0626], Avg Loss: 0.9637
-INFO:local_logger:Epoch[006/800], Step[0000/0626], Avg Loss: 0.9585
-INFO:local_logger:Epoch[006/800], Step[0000/0626], Avg Loss: 0.9542
-INFO:local_logger:Epoch[006/800], Step[0000/0626], Avg Loss: 0.9641
-INFO:local_logger:Epoch[006/800], Step[0000/0626], Avg Loss: 0.9651
-INFO:local_logger:Epoch[006/800], Step[0100/0626], Avg Loss: 0.9622
-INFO:local_logger:Epoch[006/800], Step[0100/0626], Avg Loss: 0.9624
-INFO:local_logger:Epoch[006/800], Step[0100/0626], Avg Loss: 0.9627
-INFO:local_logger:Epoch[006/800], Step[0100/0626], Avg Loss: 0.9625
-INFO:local_logger:Epoch[006/800], Step[0100/0626], Avg Loss: 0.9626
-INFO:local_logger:Epoch[006/800], Step[0100/0626], Avg Loss: 0.9626
-INFO:master_logger:Epoch[006/800], Step[0100/0626], Avg Loss: 0.9626
-INFO:local_logger:Epoch[006/800], Step[0100/0626], Avg Loss: 0.9627
-INFO:local_logger:Epoch[006/800], Step[0100/0626], Avg Loss: 0.9632
-INFO:local_logger:Epoch[006/800], Step[0200/0626], Avg Loss: 0.9624
-INFO:local_logger:Epoch[006/800], Step[0200/0626], Avg Loss: 0.9619
-INFO:local_logger:Epoch[006/800], Step[0200/0626], Avg Loss: 0.9619
-INFO:local_logger:Epoch[006/800], Step[0200/0626], Avg Loss: 0.9624
-INFO:master_logger:Epoch[006/800], Step[0200/0626], Avg Loss: 0.9622
-INFO:local_logger:Epoch[006/800], Step[0200/0626], Avg Loss: 0.9620
-INFO:local_logger:Epoch[006/800], Step[0200/0626], Avg Loss: 0.9622
-INFO:local_logger:Epoch[006/800], Step[0200/0626], Avg Loss: 0.9624
-INFO:local_logger:Epoch[006/800], Step[0200/0626], Avg Loss: 0.9620
-INFO:local_logger:Epoch[006/800], Step[0300/0626], Avg Loss: 0.9613
-INFO:local_logger:Epoch[006/800], Step[0300/0626], Avg Loss: 0.9620
-INFO:local_logger:Epoch[006/800], Step[0300/0626], Avg Loss: 0.9620
-INFO:local_logger:Epoch[006/800], Step[0300/0626], Avg Loss: 0.9620
-INFO:local_logger:Epoch[006/800], Step[0300/0626], Avg Loss: 0.9615
-INFO:local_logger:Epoch[006/800], Step[0300/0626], Avg Loss: 0.9618
-INFO:local_logger:Epoch[006/800], Step[0300/0626], Avg Loss: 0.9621
-INFO:master_logger:Epoch[006/800], Step[0300/0626], Avg Loss: 0.9618
-INFO:local_logger:Epoch[006/800], Step[0300/0626], Avg Loss: 0.9617
-INFO:local_logger:Epoch[006/800], Step[0400/0626], Avg Loss: 0.9610
-INFO:local_logger:Epoch[006/800], Step[0400/0626], Avg Loss: 0.9615
-INFO:local_logger:Epoch[006/800], Step[0400/0626], Avg Loss: 0.9612
-INFO:local_logger:Epoch[006/800], Step[0400/0626], Avg Loss: 0.9614
-INFO:local_logger:Epoch[006/800], Step[0400/0626], Avg Loss: 0.9614
-INFO:local_logger:Epoch[006/800], Step[0400/0626], Avg Loss: 0.9612
-INFO:local_logger:Epoch[006/800], Step[0400/0626], Avg Loss: 0.9616
-INFO:master_logger:Epoch[006/800], Step[0400/0626], Avg Loss: 0.9613
-INFO:local_logger:Epoch[006/800], Step[0400/0626], Avg Loss: 0.9614
-INFO:local_logger:Epoch[006/800], Step[0500/0626], Avg Loss: 0.9609
-INFO:local_logger:Epoch[006/800], Step[0500/0626], Avg Loss: 0.9609
-INFO:local_logger:Epoch[006/800], Step[0500/0626], Avg Loss: 0.9612
-INFO:local_logger:Epoch[006/800], Step[0500/0626], Avg Loss: 0.9608
-INFO:local_logger:Epoch[006/800], Step[0500/0626], Avg Loss: 0.9611
-INFO:local_logger:Epoch[006/800], Step[0500/0626], Avg Loss: 0.9608
-INFO:local_logger:Epoch[006/800], Step[0500/0626], Avg Loss: 0.9610
-INFO:master_logger:Epoch[006/800], Step[0500/0626], Avg Loss: 0.9610
-INFO:local_logger:Epoch[006/800], Step[0500/0626], Avg Loss: 0.9612
-INFO:local_logger:Epoch[006/800], Step[0600/0626], Avg Loss: 0.9604
-INFO:local_logger:Epoch[006/800], Step[0600/0626], Avg Loss: 0.9608
-INFO:local_logger:Epoch[006/800], Step[0600/0626], Avg Loss: 0.9606
-INFO:local_logger:Epoch[006/800], Step[0600/0626], Avg Loss: 0.9607
-INFO:local_logger:Epoch[006/800], Step[0600/0626], Avg Loss: 0.9605
-INFO:local_logger:Epoch[006/800], Step[0600/0626], Avg Loss: 0.9606
-INFO:local_logger:Epoch[006/800], Step[0600/0626], Avg Loss: 0.9603
-INFO:local_logger:Epoch[006/800], Step[0600/0626], Avg Loss: 0.9609
-INFO:master_logger:Epoch[006/800], Step[0600/0626], Avg Loss: 0.9606
-INFO:local_logger:----- Epoch[006/800], Train Loss: 0.9605, time: 860.53
-INFO:local_logger:Now training epoch 7. LR=0.000027
-INFO:local_logger:----- Epoch[006/800], Train Loss: 0.9607, time: 860.72
-INFO:local_logger:Now training epoch 7. LR=0.000027
-INFO:local_logger:----- Epoch[006/800], Train Loss: 0.9604, time: 857.72
-INFO:master_logger:----- Epoch[006/800], Train Loss: 0.9605, time: 857.72
-INFO:local_logger:----- Epoch[006/800], Train Loss: 0.9607, time: 861.65
-INFO:local_logger:Now training epoch 7. LR=0.000027
-INFO:local_logger:----- Epoch[006/800], Train Loss: 0.9602, time: 861.47
-INFO:local_logger:Now training epoch 7. LR=0.000027
-INFO:local_logger:----- Epoch[006/800], Train Loss: 0.9603, time: 861.13
-INFO:local_logger:----- Epoch[006/800], Train Loss: 0.9608, time: 861.53
-INFO:local_logger:Now training epoch 7. LR=0.000027
-INFO:local_logger:Now training epoch 7. LR=0.000027
-INFO:local_logger:----- Epoch[006/800], Train Loss: 0.9603, time: 861.59
-INFO:local_logger:Now training epoch 7. LR=0.000027
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-6-Loss-0.9604088297024008.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-6-Loss-0.9604088297024008.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-6-Loss-0.9604088297024008.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-6-Loss-0.9604088297024008.pdopt
-INFO:local_logger:Now training epoch 7. LR=0.000027
-INFO:master_logger:Now training epoch 7. LR=0.000027
-INFO:local_logger:Epoch[007/800], Step[0000/0626], Avg Loss: 0.9534
-INFO:local_logger:Epoch[007/800], Step[0000/0626], Avg Loss: 0.9591
-INFO:local_logger:Epoch[007/800], Step[0000/0626], Avg Loss: 0.9552
-INFO:master_logger:Epoch[007/800], Step[0000/0626], Avg Loss: 0.9581
-INFO:local_logger:Epoch[007/800], Step[0000/0626], Avg Loss: 0.9540
-INFO:local_logger:Epoch[007/800], Step[0000/0626], Avg Loss: 0.9572
-INFO:local_logger:Epoch[007/800], Step[0000/0626], Avg Loss: 0.9591
-INFO:local_logger:Epoch[007/800], Step[0000/0626], Avg Loss: 0.9645
-INFO:local_logger:Epoch[007/800], Step[0000/0626], Avg Loss: 0.9624
-INFO:local_logger:Epoch[007/800], Step[0100/0626], Avg Loss: 0.9583
-INFO:local_logger:Epoch[007/800], Step[0100/0626], Avg Loss: 0.9576
-INFO:local_logger:Epoch[007/800], Step[0100/0626], Avg Loss: 0.9586
-INFO:local_logger:Epoch[007/800], Step[0100/0626], Avg Loss: 0.9575
-INFO:local_logger:Epoch[007/800], Step[0100/0626], Avg Loss: 0.9582
-INFO:local_logger:Epoch[007/800], Step[0100/0626], Avg Loss: 0.9584
-INFO:local_logger:Epoch[007/800], Step[0100/0626], Avg Loss: 0.9589
-INFO:master_logger:Epoch[007/800], Step[0100/0626], Avg Loss: 0.9582
-INFO:local_logger:Epoch[007/800], Step[0100/0626], Avg Loss: 0.9584
-INFO:local_logger:Epoch[007/800], Step[0200/0626], Avg Loss: 0.9580
-INFO:local_logger:Epoch[007/800], Step[0200/0626], Avg Loss: 0.9575
-INFO:local_logger:Epoch[007/800], Step[0200/0626], Avg Loss: 0.9573
-INFO:master_logger:Epoch[007/800], Step[0200/0626], Avg Loss: 0.9578
-INFO:local_logger:Epoch[007/800], Step[0200/0626], Avg Loss: 0.9580
-INFO:local_logger:Epoch[007/800], Step[0200/0626], Avg Loss: 0.9581
-INFO:local_logger:Epoch[007/800], Step[0200/0626], Avg Loss: 0.9578
-INFO:local_logger:Epoch[007/800], Step[0200/0626], Avg Loss: 0.9578
-INFO:local_logger:Epoch[007/800], Step[0200/0626], Avg Loss: 0.9577
-INFO:local_logger:Epoch[007/800], Step[0300/0626], Avg Loss: 0.9571
-INFO:local_logger:Epoch[007/800], Step[0300/0626], Avg Loss: 0.9570
-INFO:local_logger:Epoch[007/800], Step[0300/0626], Avg Loss: 0.9575
-INFO:local_logger:Epoch[007/800], Step[0300/0626], Avg Loss: 0.9573
-INFO:local_logger:Epoch[007/800], Step[0300/0626], Avg Loss: 0.9570
-INFO:local_logger:Epoch[007/800], Step[0300/0626], Avg Loss: 0.9574
-INFO:master_logger:Epoch[007/800], Step[0300/0626], Avg Loss: 0.9573
-INFO:local_logger:Epoch[007/800], Step[0300/0626], Avg Loss: 0.9574
-INFO:local_logger:Epoch[007/800], Step[0300/0626], Avg Loss: 0.9577
-INFO:local_logger:Epoch[007/800], Step[0400/0626], Avg Loss: 0.9566
-INFO:local_logger:Epoch[007/800], Step[0400/0626], Avg Loss: 0.9566
-INFO:local_logger:Epoch[007/800], Step[0400/0626], Avg Loss: 0.9567
-INFO:local_logger:Epoch[007/800], Step[0400/0626], Avg Loss: 0.9572
-INFO:local_logger:Epoch[007/800], Step[0400/0626], Avg Loss: 0.9568
-INFO:master_logger:Epoch[007/800], Step[0400/0626], Avg Loss: 0.9568
-INFO:local_logger:Epoch[007/800], Step[0400/0626], Avg Loss: 0.9568
-INFO:local_logger:Epoch[007/800], Step[0400/0626], Avg Loss: 0.9570
-INFO:local_logger:Epoch[007/800], Step[0400/0626], Avg Loss: 0.9568
-INFO:local_logger:Epoch[007/800], Step[0500/0626], Avg Loss: 0.9563
-INFO:local_logger:Epoch[007/800], Step[0500/0626], Avg Loss: 0.9568
-INFO:local_logger:Epoch[007/800], Step[0500/0626], Avg Loss: 0.9561
-INFO:local_logger:Epoch[007/800], Step[0500/0626], Avg Loss: 0.9565
-INFO:local_logger:Epoch[007/800], Step[0500/0626], Avg Loss: 0.9565
-INFO:local_logger:Epoch[007/800], Step[0500/0626], Avg Loss: 0.9563
-INFO:master_logger:Epoch[007/800], Step[0500/0626], Avg Loss: 0.9564
-INFO:local_logger:Epoch[007/800], Step[0500/0626], Avg Loss: 0.9565
-INFO:local_logger:Epoch[007/800], Step[0500/0626], Avg Loss: 0.9563
-INFO:local_logger:Epoch[007/800], Step[0600/0626], Avg Loss: 0.9564
-INFO:local_logger:Epoch[007/800], Step[0600/0626], Avg Loss: 0.9561
-INFO:local_logger:Epoch[007/800], Step[0600/0626], Avg Loss: 0.9561
-INFO:local_logger:Epoch[007/800], Step[0600/0626], Avg Loss: 0.9559
-INFO:local_logger:Epoch[007/800], Step[0600/0626], Avg Loss: 0.9559
-INFO:local_logger:Epoch[007/800], Step[0600/0626], Avg Loss: 0.9558
-INFO:local_logger:Epoch[007/800], Step[0600/0626], Avg Loss: 0.9558
-INFO:master_logger:Epoch[007/800], Step[0600/0626], Avg Loss: 0.9560
-INFO:local_logger:Epoch[007/800], Step[0600/0626], Avg Loss: 0.9561
-INFO:local_logger:----- Epoch[007/800], Train Loss: 0.9560, time: 889.20
-INFO:local_logger:Now training epoch 8. LR=0.000031
-INFO:local_logger:----- Epoch[007/800], Train Loss: 0.9558, time: 888.65
-INFO:local_logger:Now training epoch 8. LR=0.000031
-INFO:local_logger:----- Epoch[007/800], Train Loss: 0.9557, time: 889.07
-INFO:local_logger:Now training epoch 8. LR=0.000031
-INFO:local_logger:----- Epoch[007/800], Train Loss: 0.9563, time: 888.69
-INFO:local_logger:Now training epoch 8. LR=0.000031
-INFO:local_logger:----- Epoch[007/800], Train Loss: 0.9559, time: 888.70
-INFO:local_logger:Now training epoch 8. LR=0.000031
-INFO:local_logger:----- Epoch[007/800], Train Loss: 0.9558, time: 888.74
-INFO:local_logger:Now training epoch 8. LR=0.000031
-INFO:local_logger:----- Epoch[007/800], Train Loss: 0.9557, time: 885.04
-INFO:local_logger:----- Epoch[007/800], Train Loss: 0.9560, time: 888.76
-INFO:master_logger:----- Epoch[007/800], Train Loss: 0.9559, time: 885.04
-INFO:local_logger:Now training epoch 8. LR=0.000031
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-7-Loss-0.9557424400537671.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-7-Loss-0.9557424400537671.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-7-Loss-0.9557424400537671.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-7-Loss-0.9557424400537671.pdopt
-INFO:local_logger:Now training epoch 8. LR=0.000031
-INFO:master_logger:Now training epoch 8. LR=0.000031
-INFO:local_logger:Epoch[008/800], Step[0000/0626], Avg Loss: 0.9562
-INFO:master_logger:Epoch[008/800], Step[0000/0626], Avg Loss: 0.9506
-INFO:local_logger:Epoch[008/800], Step[0000/0626], Avg Loss: 0.9529
-INFO:local_logger:Epoch[008/800], Step[0000/0626], Avg Loss: 0.9443
-INFO:local_logger:Epoch[008/800], Step[0000/0626], Avg Loss: 0.9491
-INFO:local_logger:Epoch[008/800], Step[0000/0626], Avg Loss: 0.9499
-INFO:local_logger:Epoch[008/800], Step[0000/0626], Avg Loss: 0.9539
-INFO:local_logger:Epoch[008/800], Step[0000/0626], Avg Loss: 0.9524
-INFO:local_logger:Epoch[008/800], Step[0000/0626], Avg Loss: 0.9463
-INFO:local_logger:Epoch[008/800], Step[0100/0626], Avg Loss: 0.9530
-INFO:local_logger:Epoch[008/800], Step[0100/0626], Avg Loss: 0.9530
-INFO:local_logger:Epoch[008/800], Step[0100/0626], Avg Loss: 0.9531
-INFO:local_logger:Epoch[008/800], Step[0100/0626], Avg Loss: 0.9531
-INFO:master_logger:Epoch[008/800], Step[0100/0626], Avg Loss: 0.9532
-INFO:local_logger:Epoch[008/800], Step[0100/0626], Avg Loss: 0.9528
-INFO:local_logger:Epoch[008/800], Step[0100/0626], Avg Loss: 0.9532
-INFO:local_logger:Epoch[008/800], Step[0100/0626], Avg Loss: 0.9535
-INFO:local_logger:Epoch[008/800], Step[0100/0626], Avg Loss: 0.9540
-INFO:local_logger:Epoch[008/800], Step[0200/0626], Avg Loss: 0.9527
-INFO:local_logger:Epoch[008/800], Step[0200/0626], Avg Loss: 0.9531
-INFO:local_logger:Epoch[008/800], Step[0200/0626], Avg Loss: 0.9526
-INFO:local_logger:Epoch[008/800], Step[0200/0626], Avg Loss: 0.9526
-INFO:local_logger:Epoch[008/800], Step[0200/0626], Avg Loss: 0.9525
-INFO:master_logger:Epoch[008/800], Step[0200/0626], Avg Loss: 0.9527
-INFO:local_logger:Epoch[008/800], Step[0200/0626], Avg Loss: 0.9528
-INFO:local_logger:Epoch[008/800], Step[0200/0626], Avg Loss: 0.9529
-INFO:local_logger:Epoch[008/800], Step[0200/0626], Avg Loss: 0.9524
-INFO:local_logger:Epoch[008/800], Step[0300/0626], Avg Loss: 0.9524
-INFO:local_logger:Epoch[008/800], Step[0300/0626], Avg Loss: 0.9520
-INFO:local_logger:Epoch[008/800], Step[0300/0626], Avg Loss: 0.9521
-INFO:master_logger:Epoch[008/800], Step[0300/0626], Avg Loss: 0.9523
-INFO:local_logger:Epoch[008/800], Step[0300/0626], Avg Loss: 0.9524
-INFO:local_logger:Epoch[008/800], Step[0300/0626], Avg Loss: 0.9526
-INFO:local_logger:Epoch[008/800], Step[0300/0626], Avg Loss: 0.9520
-INFO:local_logger:Epoch[008/800], Step[0300/0626], Avg Loss: 0.9521
-INFO:local_logger:Epoch[008/800], Step[0300/0626], Avg Loss: 0.9525
-INFO:local_logger:Epoch[008/800], Step[0400/0626], Avg Loss: 0.9517
-INFO:local_logger:Epoch[008/800], Step[0400/0626], Avg Loss: 0.9518
-INFO:local_logger:Epoch[008/800], Step[0400/0626], Avg Loss: 0.9516
-INFO:local_logger:Epoch[008/800], Step[0400/0626], Avg Loss: 0.9519
-INFO:local_logger:Epoch[008/800], Step[0400/0626], Avg Loss: 0.9516
-INFO:local_logger:Epoch[008/800], Step[0400/0626], Avg Loss: 0.9515
-INFO:local_logger:Epoch[008/800], Step[0400/0626], Avg Loss: 0.9518
-INFO:master_logger:Epoch[008/800], Step[0400/0626], Avg Loss: 0.9517
-INFO:local_logger:Epoch[008/800], Step[0400/0626], Avg Loss: 0.9518
-INFO:local_logger:Epoch[008/800], Step[0500/0626], Avg Loss: 0.9511
-INFO:local_logger:Epoch[008/800], Step[0500/0626], Avg Loss: 0.9511
-INFO:local_logger:Epoch[008/800], Step[0500/0626], Avg Loss: 0.9513
-INFO:local_logger:Epoch[008/800], Step[0500/0626], Avg Loss: 0.9511
-INFO:master_logger:Epoch[008/800], Step[0500/0626], Avg Loss: 0.9512
-INFO:local_logger:Epoch[008/800], Step[0500/0626], Avg Loss: 0.9512
-INFO:local_logger:Epoch[008/800], Step[0500/0626], Avg Loss: 0.9513
-INFO:local_logger:Epoch[008/800], Step[0500/0626], Avg Loss: 0.9511
-INFO:local_logger:Epoch[008/800], Step[0500/0626], Avg Loss: 0.9512
-INFO:local_logger:Epoch[008/800], Step[0600/0626], Avg Loss: 0.9506
-INFO:local_logger:Epoch[008/800], Step[0600/0626], Avg Loss: 0.9506
-INFO:local_logger:Epoch[008/800], Step[0600/0626], Avg Loss: 0.9506
-INFO:local_logger:Epoch[008/800], Step[0600/0626], Avg Loss: 0.9508
-INFO:local_logger:Epoch[008/800], Step[0600/0626], Avg Loss: 0.9505
-INFO:local_logger:Epoch[008/800], Step[0600/0626], Avg Loss: 0.9506
-INFO:local_logger:Epoch[008/800], Step[0600/0626], Avg Loss: 0.9509
-INFO:master_logger:Epoch[008/800], Step[0600/0626], Avg Loss: 0.9506
-INFO:local_logger:Epoch[008/800], Step[0600/0626], Avg Loss: 0.9506
-INFO:local_logger:----- Epoch[008/800], Train Loss: 0.9505, time: 854.97
-INFO:local_logger:Now training epoch 9. LR=0.000035
-INFO:local_logger:----- Epoch[008/800], Train Loss: 0.9507, time: 855.87
-INFO:local_logger:Now training epoch 9. LR=0.000035
-INFO:local_logger:----- Epoch[008/800], Train Loss: 0.9506, time: 855.95
-INFO:local_logger:Now training epoch 9. LR=0.000035
-INFO:local_logger:----- Epoch[008/800], Train Loss: 0.9504, time: 855.93
-INFO:local_logger:Now training epoch 9. LR=0.000035
-INFO:local_logger:----- Epoch[008/800], Train Loss: 0.9504, time: 852.20
-INFO:master_logger:----- Epoch[008/800], Train Loss: 0.9505, time: 852.20
-INFO:local_logger:----- Epoch[008/800], Train Loss: 0.9504, time: 855.94
-INFO:local_logger:----- Epoch[008/800], Train Loss: 0.9504, time: 855.86
-INFO:local_logger:Now training epoch 9. LR=0.000035
-INFO:local_logger:Now training epoch 9. LR=0.000035
-INFO:local_logger:----- Epoch[008/800], Train Loss: 0.9506, time: 855.86
-INFO:local_logger:Now training epoch 9. LR=0.000035
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-8-Loss-0.950418085337367.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-8-Loss-0.950418085337367.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-8-Loss-0.950418085337367.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-8-Loss-0.950418085337367.pdopt
-INFO:local_logger:Now training epoch 9. LR=0.000035
-INFO:master_logger:Now training epoch 9. LR=0.000035
-INFO:local_logger:Epoch[009/800], Step[0000/0626], Avg Loss: 0.9532
-INFO:master_logger:Epoch[009/800], Step[0000/0626], Avg Loss: 0.9494
-INFO:local_logger:Epoch[009/800], Step[0000/0626], Avg Loss: 0.9472
-INFO:local_logger:Epoch[009/800], Step[0000/0626], Avg Loss: 0.9535
-INFO:local_logger:Epoch[009/800], Step[0000/0626], Avg Loss: 0.9457
-INFO:local_logger:Epoch[009/800], Step[0000/0626], Avg Loss: 0.9521
-INFO:local_logger:Epoch[009/800], Step[0000/0626], Avg Loss: 0.9484
-INFO:local_logger:Epoch[009/800], Step[0000/0626], Avg Loss: 0.9460
-INFO:local_logger:Epoch[009/800], Step[0000/0626], Avg Loss: 0.9495
-INFO:local_logger:Epoch[009/800], Step[0100/0626], Avg Loss: 0.9469
-INFO:local_logger:Epoch[009/800], Step[0100/0626], Avg Loss: 0.9469
-INFO:local_logger:Epoch[009/800], Step[0100/0626], Avg Loss: 0.9473
-INFO:local_logger:Epoch[009/800], Step[0100/0626], Avg Loss: 0.9469
-INFO:local_logger:Epoch[009/800], Step[0100/0626], Avg Loss: 0.9459
-INFO:master_logger:Epoch[009/800], Step[0100/0626], Avg Loss: 0.9466
-INFO:local_logger:Epoch[009/800], Step[0100/0626], Avg Loss: 0.9459
-INFO:local_logger:Epoch[009/800], Step[0100/0626], Avg Loss: 0.9465
-INFO:local_logger:Epoch[009/800], Step[0100/0626], Avg Loss: 0.9465
-INFO:local_logger:Epoch[009/800], Step[0200/0626], Avg Loss: 0.9466
-INFO:local_logger:Epoch[009/800], Step[0200/0626], Avg Loss: 0.9460
-INFO:local_logger:Epoch[009/800], Step[0200/0626], Avg Loss: 0.9461
-INFO:local_logger:Epoch[009/800], Step[0200/0626], Avg Loss: 0.9462
-INFO:local_logger:Epoch[009/800], Step[0200/0626], Avg Loss: 0.9455
-INFO:local_logger:Epoch[009/800], Step[0200/0626], Avg Loss: 0.9466
-INFO:local_logger:Epoch[009/800], Step[0200/0626], Avg Loss: 0.9460
-INFO:local_logger:Epoch[009/800], Step[0200/0626], Avg Loss: 0.9465
-INFO:master_logger:Epoch[009/800], Step[0200/0626], Avg Loss: 0.9462
-INFO:local_logger:Epoch[009/800], Step[0300/0626], Avg Loss: 0.9455
-INFO:local_logger:Epoch[009/800], Step[0300/0626], Avg Loss: 0.9455
-INFO:local_logger:Epoch[009/800], Step[0300/0626], Avg Loss: 0.9450
-INFO:local_logger:Epoch[009/800], Step[0300/0626], Avg Loss: 0.9451
-INFO:local_logger:Epoch[009/800], Step[0300/0626], Avg Loss: 0.9449
-INFO:local_logger:Epoch[009/800], Step[0300/0626], Avg Loss: 0.9455
-INFO:local_logger:Epoch[009/800], Step[0300/0626], Avg Loss: 0.9452
-INFO:local_logger:Epoch[009/800], Step[0300/0626], Avg Loss: 0.9448
-INFO:master_logger:Epoch[009/800], Step[0300/0626], Avg Loss: 0.9452
-INFO:local_logger:Epoch[009/800], Step[0400/0626], Avg Loss: 0.9441
-INFO:local_logger:Epoch[009/800], Step[0400/0626], Avg Loss: 0.9447
-INFO:local_logger:Epoch[009/800], Step[0400/0626], Avg Loss: 0.9441
-INFO:local_logger:Epoch[009/800], Step[0400/0626], Avg Loss: 0.9444
-INFO:local_logger:Epoch[009/800], Step[0400/0626], Avg Loss: 0.9444
-INFO:master_logger:Epoch[009/800], Step[0400/0626], Avg Loss: 0.9444
-INFO:local_logger:Epoch[009/800], Step[0400/0626], Avg Loss: 0.9445
-INFO:local_logger:Epoch[009/800], Step[0400/0626], Avg Loss: 0.9446
-INFO:local_logger:Epoch[009/800], Step[0400/0626], Avg Loss: 0.9442
-INFO:local_logger:Epoch[009/800], Step[0500/0626], Avg Loss: 0.9437
-INFO:local_logger:Epoch[009/800], Step[0500/0626], Avg Loss: 0.9432
-INFO:local_logger:Epoch[009/800], Step[0500/0626], Avg Loss: 0.9436
-INFO:local_logger:Epoch[009/800], Step[0500/0626], Avg Loss: 0.9435
-INFO:local_logger:Epoch[009/800], Step[0500/0626], Avg Loss: 0.9434
-INFO:local_logger:Epoch[009/800], Step[0500/0626], Avg Loss: 0.9434
-INFO:local_logger:Epoch[009/800], Step[0500/0626], Avg Loss: 0.9434
-INFO:master_logger:Epoch[009/800], Step[0500/0626], Avg Loss: 0.9435
-INFO:local_logger:Epoch[009/800], Step[0500/0626], Avg Loss: 0.9437
-INFO:local_logger:Epoch[009/800], Step[0600/0626], Avg Loss: 0.9426
-INFO:local_logger:Epoch[009/800], Step[0600/0626], Avg Loss: 0.9427
-INFO:local_logger:Epoch[009/800], Step[0600/0626], Avg Loss: 0.9428
-INFO:local_logger:Epoch[009/800], Step[0600/0626], Avg Loss: 0.9423
-INFO:local_logger:Epoch[009/800], Step[0600/0626], Avg Loss: 0.9427
-INFO:local_logger:Epoch[009/800], Step[0600/0626], Avg Loss: 0.9421
-INFO:local_logger:Epoch[009/800], Step[0600/0626], Avg Loss: 0.9426
-INFO:local_logger:Epoch[009/800], Step[0600/0626], Avg Loss: 0.9427
-INFO:master_logger:Epoch[009/800], Step[0600/0626], Avg Loss: 0.9426
-INFO:local_logger:----- Epoch[009/800], Train Loss: 0.9425, time: 891.29
-INFO:local_logger:Now training epoch 10. LR=0.000038
-INFO:local_logger:----- Epoch[009/800], Train Loss: 0.9425, time: 886.67
-INFO:master_logger:----- Epoch[009/800], Train Loss: 0.9424, time: 886.67
-INFO:local_logger:----- Epoch[009/800], Train Loss: 0.9420, time: 891.03
-INFO:local_logger:Now training epoch 10. LR=0.000038
-INFO:local_logger:----- Epoch[009/800], Train Loss: 0.9425, time: 891.03
-INFO:local_logger:Now training epoch 10. LR=0.000038
-INFO:local_logger:----- Epoch[009/800], Train Loss: 0.9421, time: 891.05
-INFO:local_logger:Now training epoch 10. LR=0.000038
-INFO:local_logger:----- Epoch[009/800], Train Loss: 0.9424, time: 891.03
-INFO:local_logger:Now training epoch 10. LR=0.000038
-INFO:local_logger:----- Epoch[009/800], Train Loss: 0.9426, time: 891.05
-INFO:local_logger:Now training epoch 10. LR=0.000038
-INFO:local_logger:----- Epoch[009/800], Train Loss: 0.9425, time: 891.04
-INFO:local_logger:Now training epoch 10. LR=0.000038
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-9-Loss-0.9425096387053156.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-9-Loss-0.9425096387053156.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-9-Loss-0.9425096387053156.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-9-Loss-0.9425096387053156.pdopt
-INFO:local_logger:Now training epoch 10. LR=0.000038
-INFO:master_logger:Now training epoch 10. LR=0.000038
-INFO:local_logger:Epoch[010/800], Step[0000/0626], Avg Loss: 0.9389
-INFO:local_logger:Epoch[010/800], Step[0000/0626], Avg Loss: 0.9403
-INFO:master_logger:Epoch[010/800], Step[0000/0626], Avg Loss: 0.9385
-INFO:local_logger:Epoch[010/800], Step[0000/0626], Avg Loss: 0.9304
-INFO:local_logger:Epoch[010/800], Step[0000/0626], Avg Loss: 0.9343
-INFO:local_logger:Epoch[010/800], Step[0000/0626], Avg Loss: 0.9450
-INFO:local_logger:Epoch[010/800], Step[0000/0626], Avg Loss: 0.9331
-INFO:local_logger:Epoch[010/800], Step[0000/0626], Avg Loss: 0.9421
-INFO:local_logger:Epoch[010/800], Step[0000/0626], Avg Loss: 0.9438
-INFO:local_logger:Epoch[010/800], Step[0100/0626], Avg Loss: 0.9362
-INFO:local_logger:Epoch[010/800], Step[0100/0626], Avg Loss: 0.9361
-INFO:local_logger:Epoch[010/800], Step[0100/0626], Avg Loss: 0.9356
-INFO:local_logger:Epoch[010/800], Step[0100/0626], Avg Loss: 0.9360
-INFO:master_logger:Epoch[010/800], Step[0100/0626], Avg Loss: 0.9362
-INFO:local_logger:Epoch[010/800], Step[0100/0626], Avg Loss: 0.9362
-INFO:local_logger:Epoch[010/800], Step[0100/0626], Avg Loss: 0.9361
-INFO:local_logger:Epoch[010/800], Step[0100/0626], Avg Loss: 0.9361
-INFO:local_logger:Epoch[010/800], Step[0100/0626], Avg Loss: 0.9371
-INFO:local_logger:Epoch[010/800], Step[0200/0626], Avg Loss: 0.9354
-INFO:local_logger:Epoch[010/800], Step[0200/0626], Avg Loss: 0.9358
-INFO:local_logger:Epoch[010/800], Step[0200/0626], Avg Loss: 0.9355
-INFO:local_logger:Epoch[010/800], Step[0200/0626], Avg Loss: 0.9355
-INFO:master_logger:Epoch[010/800], Step[0200/0626], Avg Loss: 0.9357
-INFO:local_logger:Epoch[010/800], Step[0200/0626], Avg Loss: 0.9360
-INFO:local_logger:Epoch[010/800], Step[0200/0626], Avg Loss: 0.9355
-INFO:local_logger:Epoch[010/800], Step[0200/0626], Avg Loss: 0.9359
-INFO:local_logger:Epoch[010/800], Step[0200/0626], Avg Loss: 0.9358
-INFO:local_logger:Epoch[010/800], Step[0300/0626], Avg Loss: 0.9345
-INFO:local_logger:Epoch[010/800], Step[0300/0626], Avg Loss: 0.9350
-INFO:local_logger:Epoch[010/800], Step[0300/0626], Avg Loss: 0.9346
-INFO:local_logger:Epoch[010/800], Step[0300/0626], Avg Loss: 0.9345
-INFO:local_logger:Epoch[010/800], Step[0300/0626], Avg Loss: 0.9348
-INFO:local_logger:Epoch[010/800], Step[0300/0626], Avg Loss: 0.9348
-INFO:master_logger:Epoch[010/800], Step[0300/0626], Avg Loss: 0.9348
-INFO:local_logger:Epoch[010/800], Step[0300/0626], Avg Loss: 0.9348
-INFO:local_logger:Epoch[010/800], Step[0300/0626], Avg Loss: 0.9350
-INFO:local_logger:Epoch[010/800], Step[0400/0626], Avg Loss: 0.9337
-INFO:local_logger:Epoch[010/800], Step[0400/0626], Avg Loss: 0.9337
-INFO:local_logger:Epoch[010/800], Step[0400/0626], Avg Loss: 0.9336
-INFO:local_logger:Epoch[010/800], Step[0400/0626], Avg Loss: 0.9339
-INFO:local_logger:Epoch[010/800], Step[0400/0626], Avg Loss: 0.9340
-INFO:local_logger:Epoch[010/800], Step[0400/0626], Avg Loss: 0.9340
-INFO:local_logger:Epoch[010/800], Step[0400/0626], Avg Loss: 0.9340
-INFO:master_logger:Epoch[010/800], Step[0400/0626], Avg Loss: 0.9338
-INFO:local_logger:Epoch[010/800], Step[0400/0626], Avg Loss: 0.9337
-INFO:local_logger:Epoch[010/800], Step[0500/0626], Avg Loss: 0.9330
-INFO:local_logger:Epoch[010/800], Step[0500/0626], Avg Loss: 0.9327
-INFO:local_logger:Epoch[010/800], Step[0500/0626], Avg Loss: 0.9328
-INFO:local_logger:Epoch[010/800], Step[0500/0626], Avg Loss: 0.9328
-INFO:local_logger:Epoch[010/800], Step[0500/0626], Avg Loss: 0.9330
-INFO:master_logger:Epoch[010/800], Step[0500/0626], Avg Loss: 0.9329
-INFO:local_logger:Epoch[010/800], Step[0500/0626], Avg Loss: 0.9326
-INFO:local_logger:Epoch[010/800], Step[0500/0626], Avg Loss: 0.9329
-INFO:local_logger:Epoch[010/800], Step[0500/0626], Avg Loss: 0.9330
-INFO:local_logger:Epoch[010/800], Step[0600/0626], Avg Loss: 0.9320
-INFO:local_logger:Epoch[010/800], Step[0600/0626], Avg Loss: 0.9320
-INFO:local_logger:Epoch[010/800], Step[0600/0626], Avg Loss: 0.9323
-INFO:local_logger:Epoch[010/800], Step[0600/0626], Avg Loss: 0.9321
-INFO:local_logger:Epoch[010/800], Step[0600/0626], Avg Loss: 0.9322
-INFO:local_logger:Epoch[010/800], Step[0600/0626], Avg Loss: 0.9321
-INFO:local_logger:Epoch[010/800], Step[0600/0626], Avg Loss: 0.9320
-INFO:local_logger:Epoch[010/800], Step[0600/0626], Avg Loss: 0.9321
-INFO:master_logger:Epoch[010/800], Step[0600/0626], Avg Loss: 0.9321
-INFO:local_logger:----- Epoch[010/800], Train Loss: 0.9317, time: 857.40
-INFO:local_logger:Now training epoch 11. LR=0.000042
-INFO:local_logger:----- Epoch[010/800], Train Loss: 0.9320, time: 857.41
-INFO:local_logger:Now training epoch 11. LR=0.000042
-INFO:local_logger:----- Epoch[010/800], Train Loss: 0.9318, time: 854.67
-INFO:master_logger:----- Epoch[010/800], Train Loss: 0.9318, time: 854.67
-INFO:local_logger:----- Epoch[010/800], Train Loss: 0.9318, time: 858.49
-INFO:local_logger:Now training epoch 11. LR=0.000042
-INFO:local_logger:----- Epoch[010/800], Train Loss: 0.9319, time: 857.80
-INFO:local_logger:Now training epoch 11. LR=0.000042
-INFO:local_logger:----- Epoch[010/800], Train Loss: 0.9319, time: 857.83
-INFO:local_logger:Now training epoch 11. LR=0.000042
-INFO:local_logger:----- Epoch[010/800], Train Loss: 0.9319, time: 857.81
-INFO:local_logger:Now training epoch 11. LR=0.000042
-INFO:local_logger:----- Epoch[010/800], Train Loss: 0.9318, time: 857.82
-INFO:local_logger:Now training epoch 11. LR=0.000042
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-10-Loss-0.9318290638491608.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-10-Loss-0.9318290638491608.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-10-Loss-0.9318290638491608.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-10-Loss-0.9318290638491608.pdopt
-INFO:local_logger:Now training epoch 11. LR=0.000042
-INFO:master_logger:Now training epoch 11. LR=0.000042
-INFO:local_logger:Epoch[011/800], Step[0000/0626], Avg Loss: 0.9246
-INFO:local_logger:Epoch[011/800], Step[0000/0626], Avg Loss: 0.9293
-INFO:master_logger:Epoch[011/800], Step[0000/0626], Avg Loss: 0.9253
-INFO:local_logger:Epoch[011/800], Step[0000/0626], Avg Loss: 0.9166
-INFO:local_logger:Epoch[011/800], Step[0000/0626], Avg Loss: 0.9227
-INFO:local_logger:Epoch[011/800], Step[0000/0626], Avg Loss: 0.9325
-INFO:local_logger:Epoch[011/800], Step[0000/0626], Avg Loss: 0.9194
-INFO:local_logger:Epoch[011/800], Step[0000/0626], Avg Loss: 0.9280
-INFO:local_logger:Epoch[011/800], Step[0000/0626], Avg Loss: 0.9296
-INFO:local_logger:Epoch[011/800], Step[0100/0626], Avg Loss: 0.9257
-INFO:local_logger:Epoch[011/800], Step[0100/0626], Avg Loss: 0.9242
-INFO:local_logger:Epoch[011/800], Step[0100/0626], Avg Loss: 0.9247
-INFO:local_logger:Epoch[011/800], Step[0100/0626], Avg Loss: 0.9260
-INFO:local_logger:Epoch[011/800], Step[0100/0626], Avg Loss: 0.9256
-INFO:master_logger:Epoch[011/800], Step[0100/0626], Avg Loss: 0.9254
-INFO:local_logger:Epoch[011/800], Step[0100/0626], Avg Loss: 0.9255
-INFO:local_logger:Epoch[011/800], Step[0100/0626], Avg Loss: 0.9251
-INFO:local_logger:Epoch[011/800], Step[0100/0626], Avg Loss: 0.9261
-INFO:local_logger:Epoch[011/800], Step[0200/0626], Avg Loss: 0.9257
-INFO:local_logger:Epoch[011/800], Step[0200/0626], Avg Loss: 0.9247
-INFO:local_logger:Epoch[011/800], Step[0200/0626], Avg Loss: 0.9255
-INFO:local_logger:Epoch[011/800], Step[0200/0626], Avg Loss: 0.9252
-INFO:local_logger:Epoch[011/800], Step[0200/0626], Avg Loss: 0.9255
-INFO:master_logger:Epoch[011/800], Step[0200/0626], Avg Loss: 0.9252
-INFO:local_logger:Epoch[011/800], Step[0200/0626], Avg Loss: 0.9245
-INFO:local_logger:Epoch[011/800], Step[0200/0626], Avg Loss: 0.9253
-INFO:local_logger:Epoch[011/800], Step[0200/0626], Avg Loss: 0.9256
-INFO:local_logger:Epoch[011/800], Step[0300/0626], Avg Loss: 0.9241
-INFO:local_logger:Epoch[011/800], Step[0300/0626], Avg Loss: 0.9237
-INFO:local_logger:Epoch[011/800], Step[0300/0626], Avg Loss: 0.9244
-INFO:local_logger:Epoch[011/800], Step[0300/0626], Avg Loss: 0.9244
-INFO:local_logger:Epoch[011/800], Step[0300/0626], Avg Loss: 0.9246
-INFO:local_logger:Epoch[011/800], Step[0300/0626], Avg Loss: 0.9238
-INFO:local_logger:Epoch[011/800], Step[0300/0626], Avg Loss: 0.9242
-INFO:local_logger:Epoch[011/800], Step[0300/0626], Avg Loss: 0.9241
-INFO:master_logger:Epoch[011/800], Step[0300/0626], Avg Loss: 0.9242
-INFO:local_logger:Epoch[011/800], Step[0400/0626], Avg Loss: 0.9234
-INFO:local_logger:Epoch[011/800], Step[0400/0626], Avg Loss: 0.9229
-INFO:local_logger:Epoch[011/800], Step[0400/0626], Avg Loss: 0.9234
-INFO:local_logger:Epoch[011/800], Step[0400/0626], Avg Loss: 0.9229
-INFO:local_logger:Epoch[011/800], Step[0400/0626], Avg Loss: 0.9234
-INFO:local_logger:Epoch[011/800], Step[0400/0626], Avg Loss: 0.9233
-INFO:local_logger:Epoch[011/800], Step[0400/0626], Avg Loss: 0.9233
-INFO:master_logger:Epoch[011/800], Step[0400/0626], Avg Loss: 0.9232
-INFO:local_logger:Epoch[011/800], Step[0400/0626], Avg Loss: 0.9235
-INFO:local_logger:Epoch[011/800], Step[0500/0626], Avg Loss: 0.9224
-INFO:local_logger:Epoch[011/800], Step[0500/0626], Avg Loss: 0.9222
-INFO:local_logger:Epoch[011/800], Step[0500/0626], Avg Loss: 0.9225
-INFO:local_logger:Epoch[011/800], Step[0500/0626], Avg Loss: 0.9217
-INFO:local_logger:Epoch[011/800], Step[0500/0626], Avg Loss: 0.9223
-INFO:local_logger:Epoch[011/800], Step[0500/0626], Avg Loss: 0.9222
-INFO:local_logger:Epoch[011/800], Step[0500/0626], Avg Loss: 0.9219
-INFO:master_logger:Epoch[011/800], Step[0500/0626], Avg Loss: 0.9222
-INFO:local_logger:Epoch[011/800], Step[0500/0626], Avg Loss: 0.9223
-INFO:local_logger:Epoch[011/800], Step[0600/0626], Avg Loss: 0.9214
-INFO:local_logger:Epoch[011/800], Step[0600/0626], Avg Loss: 0.9207
-INFO:local_logger:Epoch[011/800], Step[0600/0626], Avg Loss: 0.9211
-INFO:local_logger:Epoch[011/800], Step[0600/0626], Avg Loss: 0.9211
-INFO:local_logger:Epoch[011/800], Step[0600/0626], Avg Loss: 0.9210
-INFO:local_logger:Epoch[011/800], Step[0600/0626], Avg Loss: 0.9212
-INFO:master_logger:Epoch[011/800], Step[0600/0626], Avg Loss: 0.9211
-INFO:local_logger:Epoch[011/800], Step[0600/0626], Avg Loss: 0.9212
-INFO:local_logger:Epoch[011/800], Step[0600/0626], Avg Loss: 0.9213
-INFO:local_logger:----- Epoch[011/800], Train Loss: 0.9209, time: 888.60
-INFO:local_logger:----- Epoch[011/800], Train Loss: 0.9210, time: 888.60
-INFO:local_logger:----- Epoch[011/800], Train Loss: 0.9208, time: 889.00
-INFO:local_logger:Now training epoch 12. LR=0.000046
-INFO:local_logger:Now training epoch 12. LR=0.000046
-INFO:local_logger:Now training epoch 12. LR=0.000046
-INFO:local_logger:----- Epoch[011/800], Train Loss: 0.9209, time: 888.65
-INFO:local_logger:Now training epoch 12. LR=0.000046
-INFO:local_logger:----- Epoch[011/800], Train Loss: 0.9209, time: 884.67
-INFO:master_logger:----- Epoch[011/800], Train Loss: 0.9209, time: 884.67
-INFO:local_logger:----- Epoch[011/800], Train Loss: 0.9205, time: 888.70
-INFO:local_logger:----- Epoch[011/800], Train Loss: 0.9211, time: 889.09
-INFO:local_logger:Now training epoch 12. LR=0.000046
-INFO:local_logger:Now training epoch 12. LR=0.000046
-INFO:local_logger:----- Epoch[011/800], Train Loss: 0.9209, time: 888.68
-INFO:local_logger:Now training epoch 12. LR=0.000046
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-11-Loss-0.9209032249693648.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-11-Loss-0.9209032249693648.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-11-Loss-0.9209032249693648.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-11-Loss-0.9209032249693648.pdopt
-INFO:local_logger:Now training epoch 12. LR=0.000046
-INFO:master_logger:Now training epoch 12. LR=0.000046
-INFO:local_logger:Epoch[012/800], Step[0000/0626], Avg Loss: 0.9086
-INFO:local_logger:Epoch[012/800], Step[0000/0626], Avg Loss: 0.9130
-INFO:master_logger:Epoch[012/800], Step[0000/0626], Avg Loss: 0.9125
-INFO:local_logger:Epoch[012/800], Step[0000/0626], Avg Loss: 0.9171
-INFO:local_logger:Epoch[012/800], Step[0000/0626], Avg Loss: 0.9156
-INFO:local_logger:Epoch[012/800], Step[0000/0626], Avg Loss: 0.9156
-INFO:local_logger:Epoch[012/800], Step[0000/0626], Avg Loss: 0.9171
-INFO:local_logger:Epoch[012/800], Step[0000/0626], Avg Loss: 0.9090
-INFO:local_logger:Epoch[012/800], Step[0000/0626], Avg Loss: 0.9038
-INFO:local_logger:Epoch[012/800], Step[0100/0626], Avg Loss: 0.9149
-INFO:local_logger:Epoch[012/800], Step[0100/0626], Avg Loss: 0.9150
-INFO:local_logger:Epoch[012/800], Step[0100/0626], Avg Loss: 0.9146
-INFO:local_logger:Epoch[012/800], Step[0100/0626], Avg Loss: 0.9152
-INFO:local_logger:Epoch[012/800], Step[0100/0626], Avg Loss: 0.9145
-INFO:local_logger:Epoch[012/800], Step[0100/0626], Avg Loss: 0.9153
-INFO:master_logger:Epoch[012/800], Step[0100/0626], Avg Loss: 0.9149
-INFO:local_logger:Epoch[012/800], Step[0100/0626], Avg Loss: 0.9149
-INFO:local_logger:Epoch[012/800], Step[0100/0626], Avg Loss: 0.9149
-INFO:local_logger:Epoch[012/800], Step[0200/0626], Avg Loss: 0.9144
-INFO:local_logger:Epoch[012/800], Step[0200/0626], Avg Loss: 0.9141
-INFO:local_logger:Epoch[012/800], Step[0200/0626], Avg Loss: 0.9138
-INFO:local_logger:Epoch[012/800], Step[0200/0626], Avg Loss: 0.9139
-INFO:local_logger:Epoch[012/800], Step[0200/0626], Avg Loss: 0.9143
-INFO:local_logger:Epoch[012/800], Step[0200/0626], Avg Loss: 0.9141
-INFO:local_logger:Epoch[012/800], Step[0200/0626], Avg Loss: 0.9142
-INFO:master_logger:Epoch[012/800], Step[0200/0626], Avg Loss: 0.9142
-INFO:local_logger:Epoch[012/800], Step[0200/0626], Avg Loss: 0.9145
-INFO:local_logger:Epoch[012/800], Step[0300/0626], Avg Loss: 0.9128
-INFO:local_logger:Epoch[012/800], Step[0300/0626], Avg Loss: 0.9132
-INFO:local_logger:Epoch[012/800], Step[0300/0626], Avg Loss: 0.9126
-INFO:local_logger:Epoch[012/800], Step[0300/0626], Avg Loss: 0.9129
-INFO:local_logger:Epoch[012/800], Step[0300/0626], Avg Loss: 0.9133
-INFO:local_logger:Epoch[012/800], Step[0300/0626], Avg Loss: 0.9131
-INFO:master_logger:Epoch[012/800], Step[0300/0626], Avg Loss: 0.9130
-INFO:local_logger:Epoch[012/800], Step[0300/0626], Avg Loss: 0.9127
-INFO:local_logger:Epoch[012/800], Step[0300/0626], Avg Loss: 0.9132
-INFO:local_logger:Epoch[012/800], Step[0400/0626], Avg Loss: 0.9121
-INFO:local_logger:Epoch[012/800], Step[0400/0626], Avg Loss: 0.9115
-INFO:local_logger:Epoch[012/800], Step[0400/0626], Avg Loss: 0.9118
-INFO:local_logger:Epoch[012/800], Step[0400/0626], Avg Loss: 0.9120
-INFO:master_logger:Epoch[012/800], Step[0400/0626], Avg Loss: 0.9119
-INFO:local_logger:Epoch[012/800], Step[0400/0626], Avg Loss: 0.9117
-INFO:local_logger:Epoch[012/800], Step[0400/0626], Avg Loss: 0.9118
-INFO:local_logger:Epoch[012/800], Step[0400/0626], Avg Loss: 0.9119
-INFO:local_logger:Epoch[012/800], Step[0400/0626], Avg Loss: 0.9121
-INFO:local_logger:Epoch[012/800], Step[0500/0626], Avg Loss: 0.9113
-INFO:local_logger:Epoch[012/800], Step[0500/0626], Avg Loss: 0.9111
-INFO:local_logger:Epoch[012/800], Step[0500/0626], Avg Loss: 0.9111
-INFO:local_logger:Epoch[012/800], Step[0500/0626], Avg Loss: 0.9108
-INFO:local_logger:Epoch[012/800], Step[0500/0626], Avg Loss: 0.9108
-INFO:master_logger:Epoch[012/800], Step[0500/0626], Avg Loss: 0.9111
-INFO:local_logger:Epoch[012/800], Step[0500/0626], Avg Loss: 0.9111
-INFO:local_logger:Epoch[012/800], Step[0500/0626], Avg Loss: 0.9113
-INFO:local_logger:Epoch[012/800], Step[0500/0626], Avg Loss: 0.9112
-INFO:local_logger:Epoch[012/800], Step[0600/0626], Avg Loss: 0.9103
-INFO:local_logger:Epoch[012/800], Step[0600/0626], Avg Loss: 0.9101
-INFO:local_logger:Epoch[012/800], Step[0600/0626], Avg Loss: 0.9102
-INFO:local_logger:Epoch[012/800], Step[0600/0626], Avg Loss: 0.9105
-INFO:local_logger:Epoch[012/800], Step[0600/0626], Avg Loss: 0.9103
-INFO:local_logger:Epoch[012/800], Step[0600/0626], Avg Loss: 0.9103
-INFO:local_logger:Epoch[012/800], Step[0600/0626], Avg Loss: 0.9101
-INFO:local_logger:Epoch[012/800], Step[0600/0626], Avg Loss: 0.9099
-INFO:master_logger:Epoch[012/800], Step[0600/0626], Avg Loss: 0.9102
-INFO:local_logger:----- Epoch[012/800], Train Loss: 0.9102, time: 850.59
-INFO:local_logger:Now training epoch 13. LR=0.000049
-INFO:local_logger:----- Epoch[012/800], Train Loss: 0.9099, time: 850.55
-INFO:local_logger:Now training epoch 13. LR=0.000049
-INFO:local_logger:----- Epoch[012/800], Train Loss: 0.9104, time: 851.02
-INFO:local_logger:Now training epoch 13. LR=0.000049
-INFO:local_logger:----- Epoch[012/800], Train Loss: 0.9099, time: 851.10
-INFO:local_logger:Now training epoch 13. LR=0.000049
-INFO:local_logger:----- Epoch[012/800], Train Loss: 0.9100, time: 851.10
-INFO:local_logger:Now training epoch 13. LR=0.000049
-INFO:local_logger:----- Epoch[012/800], Train Loss: 0.9097, time: 847.34
-INFO:master_logger:----- Epoch[012/800], Train Loss: 0.9101, time: 847.34
-INFO:local_logger:----- Epoch[012/800], Train Loss: 0.9101, time: 851.03
-INFO:local_logger:----- Epoch[012/800], Train Loss: 0.9101, time: 851.05
-INFO:local_logger:Now training epoch 13. LR=0.000049
-INFO:local_logger:Now training epoch 13. LR=0.000049
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-12-Loss-0.9097320030754859.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-12-Loss-0.9097320030754859.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-12-Loss-0.9097320030754859.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-12-Loss-0.9097320030754859.pdopt
-INFO:local_logger:Now training epoch 13. LR=0.000049
-INFO:master_logger:Now training epoch 13. LR=0.000049
-INFO:local_logger:Epoch[013/800], Step[0000/0626], Avg Loss: 0.9093
-INFO:local_logger:Epoch[013/800], Step[0000/0626], Avg Loss: 0.9118
-INFO:master_logger:Epoch[013/800], Step[0000/0626], Avg Loss: 0.9072
-INFO:local_logger:Epoch[013/800], Step[0000/0626], Avg Loss: 0.9034
-INFO:local_logger:Epoch[013/800], Step[0000/0626], Avg Loss: 0.9130
-INFO:local_logger:Epoch[013/800], Step[0000/0626], Avg Loss: 0.9112
-INFO:local_logger:Epoch[013/800], Step[0000/0626], Avg Loss: 0.9077
-INFO:local_logger:Epoch[013/800], Step[0000/0626], Avg Loss: 0.9036
-INFO:local_logger:Epoch[013/800], Step[0000/0626], Avg Loss: 0.8972
-INFO:local_logger:Epoch[013/800], Step[0100/0626], Avg Loss: 0.9039
-INFO:local_logger:Epoch[013/800], Step[0100/0626], Avg Loss: 0.9045
-INFO:local_logger:Epoch[013/800], Step[0100/0626], Avg Loss: 0.9040
-INFO:local_logger:Epoch[013/800], Step[0100/0626], Avg Loss: 0.9050
-INFO:local_logger:Epoch[013/800], Step[0100/0626], Avg Loss: 0.9046
-INFO:master_logger:Epoch[013/800], Step[0100/0626], Avg Loss: 0.9044
-INFO:local_logger:Epoch[013/800], Step[0100/0626], Avg Loss: 0.9047
-INFO:local_logger:Epoch[013/800], Step[0100/0626], Avg Loss: 0.9041
-INFO:local_logger:Epoch[013/800], Step[0100/0626], Avg Loss: 0.9043
-INFO:local_logger:Epoch[013/800], Step[0200/0626], Avg Loss: 0.9035
-INFO:local_logger:Epoch[013/800], Step[0200/0626], Avg Loss: 0.9038
-INFO:local_logger:Epoch[013/800], Step[0200/0626], Avg Loss: 0.9041
-INFO:local_logger:Epoch[013/800], Step[0200/0626], Avg Loss: 0.9039
-INFO:local_logger:Epoch[013/800], Step[0200/0626], Avg Loss: 0.9040
-INFO:master_logger:Epoch[013/800], Step[0200/0626], Avg Loss: 0.9039
-INFO:local_logger:Epoch[013/800], Step[0200/0626], Avg Loss: 0.9040
-INFO:local_logger:Epoch[013/800], Step[0200/0626], Avg Loss: 0.9039
-INFO:local_logger:Epoch[013/800], Step[0200/0626], Avg Loss: 0.9040
-INFO:local_logger:Epoch[013/800], Step[0300/0626], Avg Loss: 0.9027
-INFO:local_logger:Epoch[013/800], Step[0300/0626], Avg Loss: 0.9024
-INFO:local_logger:Epoch[013/800], Step[0300/0626], Avg Loss: 0.9030
-INFO:local_logger:Epoch[013/800], Step[0300/0626], Avg Loss: 0.9027
-INFO:local_logger:Epoch[013/800], Step[0300/0626], Avg Loss: 0.9029
-INFO:local_logger:Epoch[013/800], Step[0300/0626], Avg Loss: 0.9032
-INFO:local_logger:Epoch[013/800], Step[0300/0626], Avg Loss: 0.9029
-INFO:master_logger:Epoch[013/800], Step[0300/0626], Avg Loss: 0.9028
-INFO:local_logger:Epoch[013/800], Step[0300/0626], Avg Loss: 0.9029
-INFO:local_logger:Epoch[013/800], Step[0400/0626], Avg Loss: 0.9021
-INFO:local_logger:Epoch[013/800], Step[0400/0626], Avg Loss: 0.9018
-INFO:local_logger:Epoch[013/800], Step[0400/0626], Avg Loss: 0.9023
-INFO:local_logger:Epoch[013/800], Step[0400/0626], Avg Loss: 0.9018
-INFO:local_logger:Epoch[013/800], Step[0400/0626], Avg Loss: 0.9019
-INFO:master_logger:Epoch[013/800], Step[0400/0626], Avg Loss: 0.9019
-INFO:local_logger:Epoch[013/800], Step[0400/0626], Avg Loss: 0.9015
-INFO:local_logger:Epoch[013/800], Step[0400/0626], Avg Loss: 0.9020
-INFO:local_logger:Epoch[013/800], Step[0400/0626], Avg Loss: 0.9016
-INFO:local_logger:Epoch[013/800], Step[0500/0626], Avg Loss: 0.9012
-INFO:local_logger:Epoch[013/800], Step[0500/0626], Avg Loss: 0.9014
-INFO:local_logger:Epoch[013/800], Step[0500/0626], Avg Loss: 0.9018
-INFO:local_logger:Epoch[013/800], Step[0500/0626], Avg Loss: 0.9013
-INFO:master_logger:Epoch[013/800], Step[0500/0626], Avg Loss: 0.9013
-INFO:local_logger:Epoch[013/800], Step[0500/0626], Avg Loss: 0.9014
-INFO:local_logger:Epoch[013/800], Step[0500/0626], Avg Loss: 0.9011
-INFO:local_logger:Epoch[013/800], Step[0500/0626], Avg Loss: 0.9008
-INFO:local_logger:Epoch[013/800], Step[0500/0626], Avg Loss: 0.9010
-INFO:local_logger:Epoch[013/800], Step[0600/0626], Avg Loss: 0.9002
-INFO:local_logger:Epoch[013/800], Step[0600/0626], Avg Loss: 0.9003
-INFO:local_logger:Epoch[013/800], Step[0600/0626], Avg Loss: 0.9009
-INFO:master_logger:Epoch[013/800], Step[0600/0626], Avg Loss: 0.9003
-INFO:local_logger:Epoch[013/800], Step[0600/0626], Avg Loss: 0.8999
-INFO:local_logger:Epoch[013/800], Step[0600/0626], Avg Loss: 0.9002
-INFO:local_logger:Epoch[013/800], Step[0600/0626], Avg Loss: 0.9001
-INFO:local_logger:Epoch[013/800], Step[0600/0626], Avg Loss: 0.9003
-INFO:local_logger:Epoch[013/800], Step[0600/0626], Avg Loss: 0.9006
-INFO:local_logger:----- Epoch[013/800], Train Loss: 0.9000, time: 883.21
-INFO:local_logger:Now training epoch 14. LR=0.000053
-INFO:local_logger:----- Epoch[013/800], Train Loss: 0.8998, time: 883.55
-INFO:local_logger:Now training epoch 14. LR=0.000053
-INFO:local_logger:----- Epoch[013/800], Train Loss: 0.9000, time: 879.83
-INFO:master_logger:----- Epoch[013/800], Train Loss: 0.9000, time: 879.83
-INFO:local_logger:----- Epoch[013/800], Train Loss: 0.8996, time: 883.81
-INFO:local_logger:Now training epoch 14. LR=0.000053
-INFO:local_logger:----- Epoch[013/800], Train Loss: 0.9003, time: 884.67
-INFO:local_logger:Now training epoch 14. LR=0.000053
-INFO:local_logger:----- Epoch[013/800], Train Loss: 0.8999, time: 884.16
-INFO:local_logger:Now training epoch 14. LR=0.000053
-INFO:local_logger:----- Epoch[013/800], Train Loss: 0.9005, time: 884.17
-INFO:local_logger:Now training epoch 14. LR=0.000053
-INFO:local_logger:----- Epoch[013/800], Train Loss: 0.8999, time: 884.64
-INFO:local_logger:Now training epoch 14. LR=0.000053
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-13-Loss-0.9000374903566999.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-13-Loss-0.9000374903566999.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-13-Loss-0.9000374903566999.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-13-Loss-0.9000374903566999.pdopt
-INFO:local_logger:Now training epoch 14. LR=0.000053
-INFO:master_logger:Now training epoch 14. LR=0.000053
-INFO:local_logger:Epoch[014/800], Step[0000/0626], Avg Loss: 0.8904
-INFO:master_logger:Epoch[014/800], Step[0000/0626], Avg Loss: 0.8921
-INFO:local_logger:Epoch[014/800], Step[0000/0626], Avg Loss: 0.8961
-INFO:local_logger:Epoch[014/800], Step[0000/0626], Avg Loss: 0.8964
-INFO:local_logger:Epoch[014/800], Step[0000/0626], Avg Loss: 0.8893
-INFO:local_logger:Epoch[014/800], Step[0000/0626], Avg Loss: 0.8854
-INFO:local_logger:Epoch[014/800], Step[0000/0626], Avg Loss: 0.8944
-INFO:local_logger:Epoch[014/800], Step[0000/0626], Avg Loss: 0.8877
-INFO:local_logger:Epoch[014/800], Step[0000/0626], Avg Loss: 0.8971
-INFO:local_logger:Epoch[014/800], Step[0100/0626], Avg Loss: 0.8953
-INFO:local_logger:Epoch[014/800], Step[0100/0626], Avg Loss: 0.8955
-INFO:local_logger:Epoch[014/800], Step[0100/0626], Avg Loss: 0.8938
-INFO:local_logger:Epoch[014/800], Step[0100/0626], Avg Loss: 0.8954
-INFO:master_logger:Epoch[014/800], Step[0100/0626], Avg Loss: 0.8951
-INFO:local_logger:Epoch[014/800], Step[0100/0626], Avg Loss: 0.8947
-INFO:local_logger:Epoch[014/800], Step[0100/0626], Avg Loss: 0.8947
-INFO:local_logger:Epoch[014/800], Step[0100/0626], Avg Loss: 0.8950
-INFO:local_logger:Epoch[014/800], Step[0100/0626], Avg Loss: 0.8962
-INFO:local_logger:Epoch[014/800], Step[0200/0626], Avg Loss: 0.8930
-INFO:local_logger:Epoch[014/800], Step[0200/0626], Avg Loss: 0.8934
-INFO:local_logger:Epoch[014/800], Step[0200/0626], Avg Loss: 0.8928
-INFO:local_logger:Epoch[014/800], Step[0200/0626], Avg Loss: 0.8927
-INFO:local_logger:Epoch[014/800], Step[0200/0626], Avg Loss: 0.8928
-INFO:local_logger:Epoch[014/800], Step[0200/0626], Avg Loss: 0.8934
-INFO:master_logger:Epoch[014/800], Step[0200/0626], Avg Loss: 0.8931
-INFO:local_logger:Epoch[014/800], Step[0200/0626], Avg Loss: 0.8934
-INFO:local_logger:Epoch[014/800], Step[0200/0626], Avg Loss: 0.8931
-INFO:local_logger:Epoch[014/800], Step[0300/0626], Avg Loss: 0.8927
-INFO:local_logger:Epoch[014/800], Step[0300/0626], Avg Loss: 0.8919
-INFO:local_logger:Epoch[014/800], Step[0300/0626], Avg Loss: 0.8921
-INFO:local_logger:Epoch[014/800], Step[0300/0626], Avg Loss: 0.8920
-INFO:local_logger:Epoch[014/800], Step[0300/0626], Avg Loss: 0.8926
-INFO:local_logger:Epoch[014/800], Step[0300/0626], Avg Loss: 0.8924
-INFO:local_logger:Epoch[014/800], Step[0300/0626], Avg Loss: 0.8919
-INFO:master_logger:Epoch[014/800], Step[0300/0626], Avg Loss: 0.8922
-INFO:local_logger:Epoch[014/800], Step[0300/0626], Avg Loss: 0.8920
-INFO:local_logger:Epoch[014/800], Step[0400/0626], Avg Loss: 0.8909
-INFO:local_logger:Epoch[014/800], Step[0400/0626], Avg Loss: 0.8914
-INFO:local_logger:Epoch[014/800], Step[0400/0626], Avg Loss: 0.8908
-INFO:local_logger:Epoch[014/800], Step[0400/0626], Avg Loss: 0.8909
-INFO:local_logger:Epoch[014/800], Step[0400/0626], Avg Loss: 0.8912
-INFO:local_logger:Epoch[014/800], Step[0400/0626], Avg Loss: 0.8910
-INFO:master_logger:Epoch[014/800], Step[0400/0626], Avg Loss: 0.8910
-INFO:local_logger:Epoch[014/800], Step[0400/0626], Avg Loss: 0.8906
-INFO:local_logger:Epoch[014/800], Step[0400/0626], Avg Loss: 0.8914
-INFO:local_logger:Epoch[014/800], Step[0500/0626], Avg Loss: 0.8903
-INFO:local_logger:Epoch[014/800], Step[0500/0626], Avg Loss: 0.8904
-INFO:local_logger:Epoch[014/800], Step[0500/0626], Avg Loss: 0.8906
-INFO:local_logger:Epoch[014/800], Step[0500/0626], Avg Loss: 0.8900
-INFO:local_logger:Epoch[014/800], Step[0500/0626], Avg Loss: 0.8906
-INFO:local_logger:Epoch[014/800], Step[0500/0626], Avg Loss: 0.8902
-INFO:local_logger:Epoch[014/800], Step[0500/0626], Avg Loss: 0.8901
-INFO:local_logger:Epoch[014/800], Step[0500/0626], Avg Loss: 0.8902
-INFO:master_logger:Epoch[014/800], Step[0500/0626], Avg Loss: 0.8903
-INFO:local_logger:Epoch[014/800], Step[0600/0626], Avg Loss: 0.8896
-INFO:local_logger:Epoch[014/800], Step[0600/0626], Avg Loss: 0.8898
-INFO:local_logger:Epoch[014/800], Step[0600/0626], Avg Loss: 0.8896
-INFO:master_logger:Epoch[014/800], Step[0600/0626], Avg Loss: 0.8896
-INFO:local_logger:Epoch[014/800], Step[0600/0626], Avg Loss: 0.8897
-INFO:local_logger:Epoch[014/800], Step[0600/0626], Avg Loss: 0.8893
-INFO:local_logger:Epoch[014/800], Step[0600/0626], Avg Loss: 0.8897
-INFO:local_logger:Epoch[014/800], Step[0600/0626], Avg Loss: 0.8893
-INFO:local_logger:Epoch[014/800], Step[0600/0626], Avg Loss: 0.8894
-INFO:local_logger:----- Epoch[014/800], Train Loss: 0.8895, time: 845.39
-INFO:local_logger:Now training epoch 15. LR=0.000057
-INFO:local_logger:----- Epoch[014/800], Train Loss: 0.8895, time: 845.75
-INFO:local_logger:Now training epoch 15. LR=0.000057
-INFO:local_logger:----- Epoch[014/800], Train Loss: 0.8892, time: 846.69
-INFO:local_logger:----- Epoch[014/800], Train Loss: 0.8894, time: 846.08
-INFO:local_logger:Now training epoch 15. LR=0.000057
-INFO:local_logger:----- Epoch[014/800], Train Loss: 0.8891, time: 847.03
-INFO:local_logger:Now training epoch 15. LR=0.000057
-INFO:local_logger:Now training epoch 15. LR=0.000057
-INFO:local_logger:----- Epoch[014/800], Train Loss: 0.8890, time: 846.07
-INFO:local_logger:Now training epoch 15. LR=0.000057
-INFO:local_logger:----- Epoch[014/800], Train Loss: 0.8893, time: 842.90
-INFO:master_logger:----- Epoch[014/800], Train Loss: 0.8893, time: 842.90
-INFO:local_logger:----- Epoch[014/800], Train Loss: 0.8894, time: 846.07
-INFO:local_logger:Now training epoch 15. LR=0.000057
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-14-Loss-0.8892871914493445.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-14-Loss-0.8892871914493445.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-14-Loss-0.8892871914493445.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-14-Loss-0.8892871914493445.pdopt
-INFO:local_logger:Now training epoch 15. LR=0.000057
-INFO:master_logger:Now training epoch 15. LR=0.000057
-INFO:local_logger:Epoch[015/800], Step[0000/0626], Avg Loss: 0.8662
-INFO:local_logger:Epoch[015/800], Step[0000/0626], Avg Loss: 0.8831
-INFO:local_logger:Epoch[015/800], Step[0000/0626], Avg Loss: 0.8842
-INFO:local_logger:Epoch[015/800], Step[0000/0626], Avg Loss: 0.8880
-INFO:local_logger:Epoch[015/800], Step[0000/0626], Avg Loss: 0.8864
-INFO:local_logger:Epoch[015/800], Step[0000/0626], Avg Loss: 0.8846
-INFO:master_logger:Epoch[015/800], Step[0000/0626], Avg Loss: 0.8812
-INFO:local_logger:Epoch[015/800], Step[0000/0626], Avg Loss: 0.8773
-INFO:local_logger:Epoch[015/800], Step[0000/0626], Avg Loss: 0.8795
-INFO:local_logger:Epoch[015/800], Step[0100/0626], Avg Loss: 0.8853
-INFO:local_logger:Epoch[015/800], Step[0100/0626], Avg Loss: 0.8863
-INFO:local_logger:Epoch[015/800], Step[0100/0626], Avg Loss: 0.8867
-INFO:local_logger:Epoch[015/800], Step[0100/0626], Avg Loss: 0.8861
-INFO:local_logger:Epoch[015/800], Step[0100/0626], Avg Loss: 0.8861
-INFO:local_logger:Epoch[015/800], Step[0100/0626], Avg Loss: 0.8862
-INFO:local_logger:Epoch[015/800], Step[0100/0626], Avg Loss: 0.8860
-INFO:master_logger:Epoch[015/800], Step[0100/0626], Avg Loss: 0.8860
-INFO:local_logger:Epoch[015/800], Step[0100/0626], Avg Loss: 0.8851
-INFO:local_logger:Epoch[015/800], Step[0200/0626], Avg Loss: 0.8851
-INFO:local_logger:Epoch[015/800], Step[0200/0626], Avg Loss: 0.8849
-INFO:local_logger:Epoch[015/800], Step[0200/0626], Avg Loss: 0.8845
-INFO:local_logger:Epoch[015/800], Step[0200/0626], Avg Loss: 0.8847
-INFO:master_logger:Epoch[015/800], Step[0200/0626], Avg Loss: 0.8846
-INFO:local_logger:Epoch[015/800], Step[0200/0626], Avg Loss: 0.8841
-INFO:local_logger:Epoch[015/800], Step[0200/0626], Avg Loss: 0.8841
-INFO:local_logger:Epoch[015/800], Step[0200/0626], Avg Loss: 0.8847
-INFO:local_logger:Epoch[015/800], Step[0200/0626], Avg Loss: 0.8845
-INFO:local_logger:Epoch[015/800], Step[0300/0626], Avg Loss: 0.8832
-INFO:local_logger:Epoch[015/800], Step[0300/0626], Avg Loss: 0.8831
-INFO:local_logger:Epoch[015/800], Step[0300/0626], Avg Loss: 0.8833
-INFO:local_logger:Epoch[015/800], Step[0300/0626], Avg Loss: 0.8827
-INFO:local_logger:Epoch[015/800], Step[0300/0626], Avg Loss: 0.8833
-INFO:local_logger:Epoch[015/800], Step[0300/0626], Avg Loss: 0.8833
-INFO:local_logger:Epoch[015/800], Step[0300/0626], Avg Loss: 0.8830
-INFO:master_logger:Epoch[015/800], Step[0300/0626], Avg Loss: 0.8831
-INFO:local_logger:Epoch[015/800], Step[0300/0626], Avg Loss: 0.8826
-INFO:local_logger:Epoch[015/800], Step[0400/0626], Avg Loss: 0.8827
-INFO:local_logger:Epoch[015/800], Step[0400/0626], Avg Loss: 0.8824
-INFO:local_logger:Epoch[015/800], Step[0400/0626], Avg Loss: 0.8825
-INFO:local_logger:Epoch[015/800], Step[0400/0626], Avg Loss: 0.8824
-INFO:local_logger:Epoch[015/800], Step[0400/0626], Avg Loss: 0.8827
-INFO:local_logger:Epoch[015/800], Step[0400/0626], Avg Loss: 0.8820
-INFO:local_logger:Epoch[015/800], Step[0400/0626], Avg Loss: 0.8828
-INFO:local_logger:Epoch[015/800], Step[0400/0626], Avg Loss: 0.8819
-INFO:master_logger:Epoch[015/800], Step[0400/0626], Avg Loss: 0.8824
-INFO:local_logger:Epoch[015/800], Step[0500/0626], Avg Loss: 0.8819
-INFO:local_logger:Epoch[015/800], Step[0500/0626], Avg Loss: 0.8813
-INFO:local_logger:Epoch[015/800], Step[0500/0626], Avg Loss: 0.8812
-INFO:local_logger:Epoch[015/800], Step[0500/0626], Avg Loss: 0.8818
-INFO:local_logger:Epoch[015/800], Step[0500/0626], Avg Loss: 0.8816
-INFO:local_logger:Epoch[015/800], Step[0500/0626], Avg Loss: 0.8820
-INFO:local_logger:Epoch[015/800], Step[0500/0626], Avg Loss: 0.8815
-INFO:master_logger:Epoch[015/800], Step[0500/0626], Avg Loss: 0.8816
-INFO:local_logger:Epoch[015/800], Step[0500/0626], Avg Loss: 0.8818
-INFO:local_logger:Epoch[015/800], Step[0600/0626], Avg Loss: 0.8808
-INFO:local_logger:Epoch[015/800], Step[0600/0626], Avg Loss: 0.8805
-INFO:local_logger:Epoch[015/800], Step[0600/0626], Avg Loss: 0.8807
-INFO:local_logger:Epoch[015/800], Step[0600/0626], Avg Loss: 0.8807
-INFO:local_logger:Epoch[015/800], Step[0600/0626], Avg Loss: 0.8804
-INFO:master_logger:Epoch[015/800], Step[0600/0626], Avg Loss: 0.8806
-INFO:local_logger:Epoch[015/800], Step[0600/0626], Avg Loss: 0.8808
-INFO:local_logger:Epoch[015/800], Step[0600/0626], Avg Loss: 0.8804
-INFO:local_logger:Epoch[015/800], Step[0600/0626], Avg Loss: 0.8809
-INFO:local_logger:----- Epoch[015/800], Train Loss: 0.8805, time: 897.37
-INFO:local_logger:Now training epoch 16. LR=0.000061
-INFO:local_logger:----- Epoch[015/800], Train Loss: 0.8805, time: 893.90
-INFO:master_logger:----- Epoch[015/800], Train Loss: 0.8804, time: 893.90
-INFO:local_logger:----- Epoch[015/800], Train Loss: 0.8805, time: 898.09
-INFO:local_logger:----- Epoch[015/800], Train Loss: 0.8803, time: 898.09
-INFO:local_logger:Now training epoch 16. LR=0.000061
-INFO:local_logger:Now training epoch 16. LR=0.000061
-INFO:local_logger:----- Epoch[015/800], Train Loss: 0.8802, time: 898.08
-INFO:local_logger:Now training epoch 16. LR=0.000061
-INFO:local_logger:----- Epoch[015/800], Train Loss: 0.8802, time: 898.09
-INFO:local_logger:Now training epoch 16. LR=0.000061
-INFO:local_logger:----- Epoch[015/800], Train Loss: 0.8807, time: 898.77
-INFO:local_logger:Now training epoch 16. LR=0.000061
-INFO:local_logger:----- Epoch[015/800], Train Loss: 0.8806, time: 898.79
-INFO:local_logger:Now training epoch 16. LR=0.000061
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-15-Loss-0.8804958925234925.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-15-Loss-0.8804958925234925.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-15-Loss-0.8804958925234925.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-15-Loss-0.8804958925234925.pdopt
-INFO:local_logger:Now training epoch 16. LR=0.000061
-INFO:master_logger:Now training epoch 16. LR=0.000061
-INFO:local_logger:Epoch[016/800], Step[0000/0626], Avg Loss: 0.8772
-INFO:local_logger:Epoch[016/800], Step[0000/0626], Avg Loss: 0.8776
-INFO:local_logger:Epoch[016/800], Step[0000/0626], Avg Loss: 0.8818
-INFO:local_logger:Epoch[016/800], Step[0000/0626], Avg Loss: 0.8756
-INFO:master_logger:Epoch[016/800], Step[0000/0626], Avg Loss: 0.8774
-INFO:local_logger:Epoch[016/800], Step[0000/0626], Avg Loss: 0.8834
-INFO:local_logger:Epoch[016/800], Step[0000/0626], Avg Loss: 0.8802
-INFO:local_logger:Epoch[016/800], Step[0000/0626], Avg Loss: 0.8772
-INFO:local_logger:Epoch[016/800], Step[0000/0626], Avg Loss: 0.8659
-INFO:local_logger:Epoch[016/800], Step[0100/0626], Avg Loss: 0.8729
-INFO:local_logger:Epoch[016/800], Step[0100/0626], Avg Loss: 0.8735
-INFO:local_logger:Epoch[016/800], Step[0100/0626], Avg Loss: 0.8742
-INFO:local_logger:Epoch[016/800], Step[0100/0626], Avg Loss: 0.8743
-INFO:local_logger:Epoch[016/800], Step[0100/0626], Avg Loss: 0.8738
-INFO:master_logger:Epoch[016/800], Step[0100/0626], Avg Loss: 0.8738
-INFO:local_logger:Epoch[016/800], Step[0100/0626], Avg Loss: 0.8741
-INFO:local_logger:Epoch[016/800], Step[0100/0626], Avg Loss: 0.8747
-INFO:local_logger:Epoch[016/800], Step[0100/0626], Avg Loss: 0.8731
-INFO:local_logger:Epoch[016/800], Step[0200/0626], Avg Loss: 0.8724
-INFO:local_logger:Epoch[016/800], Step[0200/0626], Avg Loss: 0.8723
-INFO:local_logger:Epoch[016/800], Step[0200/0626], Avg Loss: 0.8727
-INFO:local_logger:Epoch[016/800], Step[0200/0626], Avg Loss: 0.8726
-INFO:local_logger:Epoch[016/800], Step[0200/0626], Avg Loss: 0.8731
-INFO:local_logger:Epoch[016/800], Step[0200/0626], Avg Loss: 0.8732
-INFO:local_logger:Epoch[016/800], Step[0200/0626], Avg Loss: 0.8728
-INFO:master_logger:Epoch[016/800], Step[0200/0626], Avg Loss: 0.8728
-INFO:local_logger:Epoch[016/800], Step[0200/0626], Avg Loss: 0.8732
-INFO:local_logger:Epoch[016/800], Step[0300/0626], Avg Loss: 0.8718
-INFO:local_logger:Epoch[016/800], Step[0300/0626], Avg Loss: 0.8723
-INFO:local_logger:Epoch[016/800], Step[0300/0626], Avg Loss: 0.8717
-INFO:local_logger:Epoch[016/800], Step[0300/0626], Avg Loss: 0.8718
-INFO:local_logger:Epoch[016/800], Step[0300/0626], Avg Loss: 0.8718
-INFO:local_logger:Epoch[016/800], Step[0300/0626], Avg Loss: 0.8720
-INFO:master_logger:Epoch[016/800], Step[0300/0626], Avg Loss: 0.8719
-INFO:local_logger:Epoch[016/800], Step[0300/0626], Avg Loss: 0.8722
-INFO:local_logger:Epoch[016/800], Step[0300/0626], Avg Loss: 0.8717
-INFO:local_logger:Epoch[016/800], Step[0400/0626], Avg Loss: 0.8710
-INFO:local_logger:Epoch[016/800], Step[0400/0626], Avg Loss: 0.8710
-INFO:local_logger:Epoch[016/800], Step[0400/0626], Avg Loss: 0.8710
-INFO:local_logger:Epoch[016/800], Step[0400/0626], Avg Loss: 0.8713
-INFO:local_logger:Epoch[016/800], Step[0400/0626], Avg Loss: 0.8712
-INFO:local_logger:Epoch[016/800], Step[0400/0626], Avg Loss: 0.8717
-INFO:master_logger:Epoch[016/800], Step[0400/0626], Avg Loss: 0.8712
-INFO:local_logger:Epoch[016/800], Step[0400/0626], Avg Loss: 0.8713
-INFO:local_logger:Epoch[016/800], Step[0400/0626], Avg Loss: 0.8713
-INFO:local_logger:Epoch[016/800], Step[0500/0626], Avg Loss: 0.8707
-INFO:local_logger:Epoch[016/800], Step[0500/0626], Avg Loss: 0.8706
-INFO:local_logger:Epoch[016/800], Step[0500/0626], Avg Loss: 0.8710
-INFO:local_logger:Epoch[016/800], Step[0500/0626], Avg Loss: 0.8707
-INFO:local_logger:Epoch[016/800], Step[0500/0626], Avg Loss: 0.8709
-INFO:local_logger:Epoch[016/800], Step[0500/0626], Avg Loss: 0.8706
-INFO:master_logger:Epoch[016/800], Step[0500/0626], Avg Loss: 0.8707
-INFO:local_logger:Epoch[016/800], Step[0500/0626], Avg Loss: 0.8705
-INFO:local_logger:Epoch[016/800], Step[0500/0626], Avg Loss: 0.8709
-INFO:local_logger:Epoch[016/800], Step[0600/0626], Avg Loss: 0.8696
-INFO:local_logger:Epoch[016/800], Step[0600/0626], Avg Loss: 0.8697
-INFO:local_logger:Epoch[016/800], Step[0600/0626], Avg Loss: 0.8697
-INFO:local_logger:Epoch[016/800], Step[0600/0626], Avg Loss: 0.8697
-INFO:local_logger:Epoch[016/800], Step[0600/0626], Avg Loss: 0.8696
-INFO:local_logger:Epoch[016/800], Step[0600/0626], Avg Loss: 0.8700
-INFO:local_logger:Epoch[016/800], Step[0600/0626], Avg Loss: 0.8701
-INFO:master_logger:Epoch[016/800], Step[0600/0626], Avg Loss: 0.8698
-INFO:local_logger:Epoch[016/800], Step[0600/0626], Avg Loss: 0.8700
-INFO:local_logger:----- Epoch[016/800], Train Loss: 0.8695, time: 861.71
-INFO:local_logger:Now training epoch 17. LR=0.000064
-INFO:local_logger:----- Epoch[016/800], Train Loss: 0.8693, time: 862.54
-INFO:local_logger:Now training epoch 17. LR=0.000064
-INFO:local_logger:----- Epoch[016/800], Train Loss: 0.8699, time: 862.86
-INFO:local_logger:Now training epoch 17. LR=0.000064
-INFO:local_logger:----- Epoch[016/800], Train Loss: 0.8695, time: 863.58
-INFO:local_logger:Now training epoch 17. LR=0.000064
-INFO:local_logger:----- Epoch[016/800], Train Loss: 0.8695, time: 862.86
-INFO:local_logger:----- Epoch[016/800], Train Loss: 0.8698, time: 862.85
-INFO:local_logger:Now training epoch 17. LR=0.000064
-INFO:local_logger:Now training epoch 17. LR=0.000064
-INFO:local_logger:----- Epoch[016/800], Train Loss: 0.8698, time: 862.87
-INFO:local_logger:Now training epoch 17. LR=0.000064
-INFO:local_logger:----- Epoch[016/800], Train Loss: 0.8694, time: 859.59
-INFO:master_logger:----- Epoch[016/800], Train Loss: 0.8696, time: 859.59
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-16-Loss-0.8694310493630203.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-16-Loss-0.8694310493630203.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-16-Loss-0.8694310493630203.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-16-Loss-0.8694310493630203.pdopt
-INFO:local_logger:Now training epoch 17. LR=0.000064
-INFO:master_logger:Now training epoch 17. LR=0.000064
-INFO:local_logger:Epoch[017/800], Step[0000/0626], Avg Loss: 0.8663
-INFO:local_logger:Epoch[017/800], Step[0000/0626], Avg Loss: 0.8709
-INFO:local_logger:Epoch[017/800], Step[0000/0626], Avg Loss: 0.8677
-INFO:local_logger:Epoch[017/800], Step[0000/0626], Avg Loss: 0.8526
-INFO:local_logger:Epoch[017/800], Step[0000/0626], Avg Loss: 0.8679
-INFO:local_logger:Epoch[017/800], Step[0000/0626], Avg Loss: 0.8632
-INFO:master_logger:Epoch[017/800], Step[0000/0626], Avg Loss: 0.8658
-INFO:local_logger:Epoch[017/800], Step[0000/0626], Avg Loss: 0.8670
-INFO:local_logger:Epoch[017/800], Step[0000/0626], Avg Loss: 0.8705
-INFO:local_logger:Epoch[017/800], Step[0100/0626], Avg Loss: 0.8666
-INFO:local_logger:Epoch[017/800], Step[0100/0626], Avg Loss: 0.8669
-INFO:local_logger:Epoch[017/800], Step[0100/0626], Avg Loss: 0.8674
-INFO:local_logger:Epoch[017/800], Step[0100/0626], Avg Loss: 0.8667
-INFO:master_logger:Epoch[017/800], Step[0100/0626], Avg Loss: 0.8668
-INFO:local_logger:Epoch[017/800], Step[0100/0626], Avg Loss: 0.8662
-INFO:local_logger:Epoch[017/800], Step[0100/0626], Avg Loss: 0.8672
-INFO:local_logger:Epoch[017/800], Step[0100/0626], Avg Loss: 0.8665
-INFO:local_logger:Epoch[017/800], Step[0100/0626], Avg Loss: 0.8673
-INFO:local_logger:Epoch[017/800], Step[0200/0626], Avg Loss: 0.8654
-INFO:local_logger:Epoch[017/800], Step[0200/0626], Avg Loss: 0.8660
-INFO:local_logger:Epoch[017/800], Step[0200/0626], Avg Loss: 0.8658
-INFO:local_logger:Epoch[017/800], Step[0200/0626], Avg Loss: 0.8659
-INFO:local_logger:Epoch[017/800], Step[0200/0626], Avg Loss: 0.8654
-INFO:local_logger:Epoch[017/800], Step[0200/0626], Avg Loss: 0.8658
-INFO:master_logger:Epoch[017/800], Step[0200/0626], Avg Loss: 0.8657
-INFO:local_logger:Epoch[017/800], Step[0200/0626], Avg Loss: 0.8656
-INFO:local_logger:Epoch[017/800], Step[0200/0626], Avg Loss: 0.8654
-INFO:local_logger:Epoch[017/800], Step[0300/0626], Avg Loss: 0.8647
-INFO:local_logger:Epoch[017/800], Step[0300/0626], Avg Loss: 0.8648
-INFO:local_logger:Epoch[017/800], Step[0300/0626], Avg Loss: 0.8646
-INFO:local_logger:Epoch[017/800], Step[0300/0626], Avg Loss: 0.8649
-INFO:master_logger:Epoch[017/800], Step[0300/0626], Avg Loss: 0.8647
-INFO:local_logger:Epoch[017/800], Step[0300/0626], Avg Loss: 0.8652
-INFO:local_logger:Epoch[017/800], Step[0300/0626], Avg Loss: 0.8646
-INFO:local_logger:Epoch[017/800], Step[0300/0626], Avg Loss: 0.8642
-INFO:local_logger:Epoch[017/800], Step[0300/0626], Avg Loss: 0.8648
-INFO:local_logger:Epoch[017/800], Step[0400/0626], Avg Loss: 0.8629
-INFO:local_logger:Epoch[017/800], Step[0400/0626], Avg Loss: 0.8639
-INFO:local_logger:Epoch[017/800], Step[0400/0626], Avg Loss: 0.8636
-INFO:local_logger:Epoch[017/800], Step[0400/0626], Avg Loss: 0.8635
-INFO:local_logger:Epoch[017/800], Step[0400/0626], Avg Loss: 0.8634
-INFO:local_logger:Epoch[017/800], Step[0400/0626], Avg Loss: 0.8634
-INFO:local_logger:Epoch[017/800], Step[0400/0626], Avg Loss: 0.8635
-INFO:local_logger:Epoch[017/800], Step[0400/0626], Avg Loss: 0.8633
-INFO:master_logger:Epoch[017/800], Step[0400/0626], Avg Loss: 0.8634
-INFO:local_logger:Epoch[017/800], Step[0500/0626], Avg Loss: 0.8619
-INFO:local_logger:Epoch[017/800], Step[0500/0626], Avg Loss: 0.8628
-INFO:local_logger:Epoch[017/800], Step[0500/0626], Avg Loss: 0.8626
-INFO:local_logger:Epoch[017/800], Step[0500/0626], Avg Loss: 0.8624
-INFO:local_logger:Epoch[017/800], Step[0500/0626], Avg Loss: 0.8622
-INFO:local_logger:Epoch[017/800], Step[0500/0626], Avg Loss: 0.8624
-INFO:master_logger:Epoch[017/800], Step[0500/0626], Avg Loss: 0.8624
-INFO:local_logger:Epoch[017/800], Step[0500/0626], Avg Loss: 0.8625
-INFO:local_logger:Epoch[017/800], Step[0500/0626], Avg Loss: 0.8625
-INFO:local_logger:Epoch[017/800], Step[0600/0626], Avg Loss: 0.8615
-INFO:local_logger:Epoch[017/800], Step[0600/0626], Avg Loss: 0.8619
-INFO:local_logger:Epoch[017/800], Step[0600/0626], Avg Loss: 0.8620
-INFO:local_logger:Epoch[017/800], Step[0600/0626], Avg Loss: 0.8613
-INFO:local_logger:Epoch[017/800], Step[0600/0626], Avg Loss: 0.8618
-INFO:local_logger:Epoch[017/800], Step[0600/0626], Avg Loss: 0.8618
-INFO:local_logger:Epoch[017/800], Step[0600/0626], Avg Loss: 0.8616
-INFO:master_logger:Epoch[017/800], Step[0600/0626], Avg Loss: 0.8617
-INFO:local_logger:Epoch[017/800], Step[0600/0626], Avg Loss: 0.8619
-INFO:local_logger:----- Epoch[017/800], Train Loss: 0.8617, time: 890.30
-INFO:local_logger:Now training epoch 18. LR=0.000068
-INFO:local_logger:----- Epoch[017/800], Train Loss: 0.8616, time: 890.30
-INFO:local_logger:Now training epoch 18. LR=0.000068
-INFO:local_logger:----- Epoch[017/800], Train Loss: 0.8617, time: 890.31
-INFO:local_logger:Now training epoch 18. LR=0.000068
-INFO:local_logger:----- Epoch[017/800], Train Loss: 0.8610, time: 890.96
-INFO:local_logger:Now training epoch 18. LR=0.000068
-INFO:local_logger:----- Epoch[017/800], Train Loss: 0.8614, time: 887.14
-INFO:master_logger:----- Epoch[017/800], Train Loss: 0.8615, time: 887.14
-INFO:local_logger:----- Epoch[017/800], Train Loss: 0.8616, time: 891.29
-INFO:local_logger:Now training epoch 18. LR=0.000068
-INFO:local_logger:----- Epoch[017/800], Train Loss: 0.8617, time: 890.99
-INFO:local_logger:Now training epoch 18. LR=0.000068
-INFO:local_logger:----- Epoch[017/800], Train Loss: 0.8614, time: 892.15
-INFO:local_logger:Now training epoch 18. LR=0.000068
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-17-Loss-0.8613511298173326.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-17-Loss-0.8613511298173326.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-17-Loss-0.8613511298173326.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-17-Loss-0.8613511298173326.pdopt
-INFO:local_logger:Now training epoch 18. LR=0.000068
-INFO:master_logger:Now training epoch 18. LR=0.000068
-INFO:local_logger:Epoch[018/800], Step[0000/0626], Avg Loss: 0.8610
-INFO:master_logger:Epoch[018/800], Step[0000/0626], Avg Loss: 0.8573
-INFO:local_logger:Epoch[018/800], Step[0000/0626], Avg Loss: 0.8499
-INFO:local_logger:Epoch[018/800], Step[0000/0626], Avg Loss: 0.8529
-INFO:local_logger:Epoch[018/800], Step[0000/0626], Avg Loss: 0.8567
-INFO:local_logger:Epoch[018/800], Step[0000/0626], Avg Loss: 0.8551
-INFO:local_logger:Epoch[018/800], Step[0000/0626], Avg Loss: 0.8589
-INFO:local_logger:Epoch[018/800], Step[0000/0626], Avg Loss: 0.8601
-INFO:local_logger:Epoch[018/800], Step[0000/0626], Avg Loss: 0.8641
-INFO:local_logger:Epoch[018/800], Step[0100/0626], Avg Loss: 0.8555
-INFO:local_logger:Epoch[018/800], Step[0100/0626], Avg Loss: 0.8553
-INFO:local_logger:Epoch[018/800], Step[0100/0626], Avg Loss: 0.8547
-INFO:local_logger:Epoch[018/800], Step[0100/0626], Avg Loss: 0.8552
-INFO:local_logger:Epoch[018/800], Step[0100/0626], Avg Loss: 0.8543
-INFO:local_logger:Epoch[018/800], Step[0100/0626], Avg Loss: 0.8543
-INFO:master_logger:Epoch[018/800], Step[0100/0626], Avg Loss: 0.8547
-INFO:local_logger:Epoch[018/800], Step[0100/0626], Avg Loss: 0.8542
-INFO:local_logger:Epoch[018/800], Step[0100/0626], Avg Loss: 0.8543
-INFO:local_logger:Epoch[018/800], Step[0200/0626], Avg Loss: 0.8544
-INFO:local_logger:Epoch[018/800], Step[0200/0626], Avg Loss: 0.8543
-INFO:local_logger:Epoch[018/800], Step[0200/0626], Avg Loss: 0.8543
-INFO:local_logger:Epoch[018/800], Step[0200/0626], Avg Loss: 0.8541
-INFO:local_logger:Epoch[018/800], Step[0200/0626], Avg Loss: 0.8537
-INFO:master_logger:Epoch[018/800], Step[0200/0626], Avg Loss: 0.8541
-INFO:local_logger:Epoch[018/800], Step[0200/0626], Avg Loss: 0.8544
-INFO:local_logger:Epoch[018/800], Step[0200/0626], Avg Loss: 0.8539
-INFO:local_logger:Epoch[018/800], Step[0200/0626], Avg Loss: 0.8541
-INFO:local_logger:Epoch[018/800], Step[0300/0626], Avg Loss: 0.8534
-INFO:local_logger:Epoch[018/800], Step[0300/0626], Avg Loss: 0.8534
-INFO:local_logger:Epoch[018/800], Step[0300/0626], Avg Loss: 0.8536
-INFO:local_logger:Epoch[018/800], Step[0300/0626], Avg Loss: 0.8535
-INFO:local_logger:Epoch[018/800], Step[0300/0626], Avg Loss: 0.8540
-INFO:local_logger:Epoch[018/800], Step[0300/0626], Avg Loss: 0.8535
-INFO:master_logger:Epoch[018/800], Step[0300/0626], Avg Loss: 0.8537
-INFO:local_logger:Epoch[018/800], Step[0300/0626], Avg Loss: 0.8544
-INFO:local_logger:Epoch[018/800], Step[0300/0626], Avg Loss: 0.8538
-INFO:local_logger:Epoch[018/800], Step[0400/0626], Avg Loss: 0.8536
-INFO:local_logger:Epoch[018/800], Step[0400/0626], Avg Loss: 0.8534
-INFO:local_logger:Epoch[018/800], Step[0400/0626], Avg Loss: 0.8534
-INFO:local_logger:Epoch[018/800], Step[0400/0626], Avg Loss: 0.8532
-INFO:local_logger:Epoch[018/800], Step[0400/0626], Avg Loss: 0.8542
-INFO:local_logger:Epoch[018/800], Step[0400/0626], Avg Loss: 0.8532
-INFO:local_logger:Epoch[018/800], Step[0400/0626], Avg Loss: 0.8537
-INFO:master_logger:Epoch[018/800], Step[0400/0626], Avg Loss: 0.8535
-INFO:local_logger:Epoch[018/800], Step[0400/0626], Avg Loss: 0.8530
-INFO:local_logger:Epoch[018/800], Step[0500/0626], Avg Loss: 0.8533
-INFO:local_logger:Epoch[018/800], Step[0500/0626], Avg Loss: 0.8536
-INFO:local_logger:Epoch[018/800], Step[0500/0626], Avg Loss: 0.8529
-INFO:local_logger:Epoch[018/800], Step[0500/0626], Avg Loss: 0.8531
-INFO:local_logger:Epoch[018/800], Step[0500/0626], Avg Loss: 0.8534
-INFO:local_logger:Epoch[018/800], Step[0500/0626], Avg Loss: 0.8529
-INFO:local_logger:Epoch[018/800], Step[0500/0626], Avg Loss: 0.8534
-INFO:master_logger:Epoch[018/800], Step[0500/0626], Avg Loss: 0.8533
-INFO:local_logger:Epoch[018/800], Step[0500/0626], Avg Loss: 0.8541
-INFO:local_logger:Epoch[018/800], Step[0600/0626], Avg Loss: 0.8533
-INFO:local_logger:Epoch[018/800], Step[0600/0626], Avg Loss: 0.8525
-INFO:local_logger:Epoch[018/800], Step[0600/0626], Avg Loss: 0.8531
-INFO:local_logger:Epoch[018/800], Step[0600/0626], Avg Loss: 0.8529
-INFO:local_logger:Epoch[018/800], Step[0600/0626], Avg Loss: 0.8528
-INFO:local_logger:Epoch[018/800], Step[0600/0626], Avg Loss: 0.8525
-INFO:local_logger:Epoch[018/800], Step[0600/0626], Avg Loss: 0.8536
-INFO:local_logger:Epoch[018/800], Step[0600/0626], Avg Loss: 0.8528
-INFO:master_logger:Epoch[018/800], Step[0600/0626], Avg Loss: 0.8529
-INFO:local_logger:----- Epoch[018/800], Train Loss: 0.8532, time: 859.28
-INFO:local_logger:Now training epoch 19. LR=0.000072
-INFO:local_logger:----- Epoch[018/800], Train Loss: 0.8524, time: 859.95
-INFO:local_logger:Now training epoch 19. LR=0.000072
-INFO:local_logger:----- Epoch[018/800], Train Loss: 0.8527, time: 855.56
-INFO:local_logger:----- Epoch[018/800], Train Loss: 0.8523, time: 859.27
-INFO:local_logger:----- Epoch[018/800], Train Loss: 0.8527, time: 859.94
-INFO:master_logger:----- Epoch[018/800], Train Loss: 0.8528, time: 855.56
-INFO:local_logger:Now training epoch 19. LR=0.000072
-INFO:local_logger:Now training epoch 19. LR=0.000072
-INFO:local_logger:----- Epoch[018/800], Train Loss: 0.8530, time: 859.29
-INFO:local_logger:Now training epoch 19. LR=0.000072
-INFO:local_logger:----- Epoch[018/800], Train Loss: 0.8528, time: 859.96
-INFO:local_logger:Now training epoch 19. LR=0.000072
-INFO:local_logger:----- Epoch[018/800], Train Loss: 0.8534, time: 859.27
-INFO:local_logger:Now training epoch 19. LR=0.000072
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-18-Loss-0.8526818839083388.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-18-Loss-0.8526818839083388.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-18-Loss-0.8526818839083388.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-18-Loss-0.8526818839083388.pdopt
-INFO:local_logger:Now training epoch 19. LR=0.000072
-INFO:master_logger:Now training epoch 19. LR=0.000072
-INFO:local_logger:Epoch[019/800], Step[0000/0626], Avg Loss: 0.8466
-INFO:local_logger:Epoch[019/800], Step[0000/0626], Avg Loss: 0.8442
-INFO:local_logger:Epoch[019/800], Step[0000/0626], Avg Loss: 0.8470
-INFO:local_logger:Epoch[019/800], Step[0000/0626], Avg Loss: 0.8424
-INFO:local_logger:Epoch[019/800], Step[0000/0626], Avg Loss: 0.8531
-INFO:local_logger:Epoch[019/800], Step[0000/0626], Avg Loss: 0.8522
-INFO:master_logger:Epoch[019/800], Step[0000/0626], Avg Loss: 0.8474
-INFO:local_logger:Epoch[019/800], Step[0000/0626], Avg Loss: 0.8452
-INFO:local_logger:Epoch[019/800], Step[0000/0626], Avg Loss: 0.8487
-INFO:local_logger:Epoch[019/800], Step[0100/0626], Avg Loss: 0.8481
-INFO:local_logger:Epoch[019/800], Step[0100/0626], Avg Loss: 0.8485
-INFO:local_logger:Epoch[019/800], Step[0100/0626], Avg Loss: 0.8477
-INFO:local_logger:Epoch[019/800], Step[0100/0626], Avg Loss: 0.8475
-INFO:local_logger:Epoch[019/800], Step[0100/0626], Avg Loss: 0.8477
-INFO:local_logger:Epoch[019/800], Step[0100/0626], Avg Loss: 0.8473
-INFO:local_logger:Epoch[019/800], Step[0100/0626], Avg Loss: 0.8491
-INFO:master_logger:Epoch[019/800], Step[0100/0626], Avg Loss: 0.8480
-INFO:local_logger:Epoch[019/800], Step[0100/0626], Avg Loss: 0.8481
-INFO:local_logger:Epoch[019/800], Step[0200/0626], Avg Loss: 0.8476
-INFO:local_logger:Epoch[019/800], Step[0200/0626], Avg Loss: 0.8477
-INFO:local_logger:Epoch[019/800], Step[0200/0626], Avg Loss: 0.8476
-INFO:local_logger:Epoch[019/800], Step[0200/0626], Avg Loss: 0.8481
-INFO:local_logger:Epoch[019/800], Step[0200/0626], Avg Loss: 0.8480
-INFO:master_logger:Epoch[019/800], Step[0200/0626], Avg Loss: 0.8478
-INFO:local_logger:Epoch[019/800], Step[0200/0626], Avg Loss: 0.8483
-INFO:local_logger:Epoch[019/800], Step[0200/0626], Avg Loss: 0.8476
-INFO:local_logger:Epoch[019/800], Step[0200/0626], Avg Loss: 0.8478
-INFO:local_logger:Epoch[019/800], Step[0300/0626], Avg Loss: 0.8470
-INFO:local_logger:Epoch[019/800], Step[0300/0626], Avg Loss: 0.8465
-INFO:master_logger:Epoch[019/800], Step[0300/0626], Avg Loss: 0.8468
-INFO:local_logger:Epoch[019/800], Step[0300/0626], Avg Loss: 0.8469
-INFO:local_logger:Epoch[019/800], Step[0300/0626], Avg Loss: 0.8470
-INFO:local_logger:Epoch[019/800], Step[0300/0626], Avg Loss: 0.8470
-INFO:local_logger:Epoch[019/800], Step[0300/0626], Avg Loss: 0.8465
-INFO:local_logger:Epoch[019/800], Step[0300/0626], Avg Loss: 0.8468
-INFO:local_logger:Epoch[019/800], Step[0300/0626], Avg Loss: 0.8467
-INFO:local_logger:Epoch[019/800], Step[0400/0626], Avg Loss: 0.8458
-INFO:local_logger:Epoch[019/800], Step[0400/0626], Avg Loss: 0.8460
-INFO:local_logger:Epoch[019/800], Step[0400/0626], Avg Loss: 0.8460
-INFO:local_logger:Epoch[019/800], Step[0400/0626], Avg Loss: 0.8464
-INFO:local_logger:Epoch[019/800], Step[0400/0626], Avg Loss: 0.8461
-INFO:local_logger:Epoch[019/800], Step[0400/0626], Avg Loss: 0.8460
-INFO:local_logger:Epoch[019/800], Step[0400/0626], Avg Loss: 0.8465
-INFO:local_logger:Epoch[019/800], Step[0400/0626], Avg Loss: 0.8463
-INFO:master_logger:Epoch[019/800], Step[0400/0626], Avg Loss: 0.8461
-INFO:local_logger:Epoch[019/800], Step[0500/0626], Avg Loss: 0.8452
-INFO:local_logger:Epoch[019/800], Step[0500/0626], Avg Loss: 0.8454
-INFO:local_logger:Epoch[019/800], Step[0500/0626], Avg Loss: 0.8450
-INFO:local_logger:Epoch[019/800], Step[0500/0626], Avg Loss: 0.8455
-INFO:local_logger:Epoch[019/800], Step[0500/0626], Avg Loss: 0.8451
-INFO:local_logger:Epoch[019/800], Step[0500/0626], Avg Loss: 0.8452
-INFO:local_logger:Epoch[019/800], Step[0500/0626], Avg Loss: 0.8452
-INFO:local_logger:Epoch[019/800], Step[0500/0626], Avg Loss: 0.8455
-INFO:master_logger:Epoch[019/800], Step[0500/0626], Avg Loss: 0.8452
-INFO:local_logger:Epoch[019/800], Step[0600/0626], Avg Loss: 0.8445
-INFO:local_logger:Epoch[019/800], Step[0600/0626], Avg Loss: 0.8442
-INFO:local_logger:Epoch[019/800], Step[0600/0626], Avg Loss: 0.8446
-INFO:local_logger:Epoch[019/800], Step[0600/0626], Avg Loss: 0.8445
-INFO:local_logger:Epoch[019/800], Step[0600/0626], Avg Loss: 0.8447
-INFO:local_logger:Epoch[019/800], Step[0600/0626], Avg Loss: 0.8445
-INFO:master_logger:Epoch[019/800], Step[0600/0626], Avg Loss: 0.8445
-INFO:local_logger:Epoch[019/800], Step[0600/0626], Avg Loss: 0.8443
-INFO:local_logger:Epoch[019/800], Step[0600/0626], Avg Loss: 0.8445
-INFO:local_logger:----- Epoch[019/800], Train Loss: 0.8446, time: 880.98
-INFO:master_logger:----- Epoch[019/800], Train Loss: 0.8443, time: 880.98
-INFO:local_logger:----- Epoch[019/800], Train Loss: 0.8443, time: 885.40
-INFO:local_logger:Now training epoch 20. LR=0.000075
-INFO:local_logger:----- Epoch[019/800], Train Loss: 0.8443, time: 885.43
-INFO:local_logger:Now training epoch 20. LR=0.000075
-INFO:local_logger:----- Epoch[019/800], Train Loss: 0.8444, time: 885.46
-INFO:local_logger:Now training epoch 20. LR=0.000075
-INFO:local_logger:----- Epoch[019/800], Train Loss: 0.8441, time: 885.49
-INFO:local_logger:Now training epoch 20. LR=0.000075
-INFO:local_logger:----- Epoch[019/800], Train Loss: 0.8441, time: 885.53
-INFO:local_logger:Now training epoch 20. LR=0.000075
-INFO:local_logger:----- Epoch[019/800], Train Loss: 0.8443, time: 885.54
-INFO:local_logger:Now training epoch 20. LR=0.000075
-INFO:local_logger:----- Epoch[019/800], Train Loss: 0.8446, time: 885.53
-INFO:local_logger:Now training epoch 20. LR=0.000075
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-19-Loss-0.8445631699389794.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-19-Loss-0.8445631699389794.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-19-Loss-0.8445631699389794.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-19-Loss-0.8445631699389794.pdopt
-INFO:local_logger:Now training epoch 20. LR=0.000075
-INFO:master_logger:Now training epoch 20. LR=0.000075
-INFO:local_logger:Epoch[020/800], Step[0000/0626], Avg Loss: 0.8395
-INFO:local_logger:Epoch[020/800], Step[0000/0626], Avg Loss: 0.8579
-INFO:local_logger:Epoch[020/800], Step[0000/0626], Avg Loss: 0.8337
-INFO:master_logger:Epoch[020/800], Step[0000/0626], Avg Loss: 0.8394
-INFO:local_logger:Epoch[020/800], Step[0000/0626], Avg Loss: 0.8377
-INFO:local_logger:Epoch[020/800], Step[0000/0626], Avg Loss: 0.8425
-INFO:local_logger:Epoch[020/800], Step[0000/0626], Avg Loss: 0.8297
-INFO:local_logger:Epoch[020/800], Step[0000/0626], Avg Loss: 0.8371
-INFO:local_logger:Epoch[020/800], Step[0000/0626], Avg Loss: 0.8374
-INFO:local_logger:Epoch[020/800], Step[0100/0626], Avg Loss: 0.8389
-INFO:local_logger:Epoch[020/800], Step[0100/0626], Avg Loss: 0.8399
-INFO:local_logger:Epoch[020/800], Step[0100/0626], Avg Loss: 0.8385
-INFO:local_logger:Epoch[020/800], Step[0100/0626], Avg Loss: 0.8402
-INFO:local_logger:Epoch[020/800], Step[0100/0626], Avg Loss: 0.8398
-INFO:local_logger:Epoch[020/800], Step[0100/0626], Avg Loss: 0.8399
-INFO:master_logger:Epoch[020/800], Step[0100/0626], Avg Loss: 0.8396
-INFO:local_logger:Epoch[020/800], Step[0100/0626], Avg Loss: 0.8398
-INFO:local_logger:Epoch[020/800], Step[0100/0626], Avg Loss: 0.8400
-INFO:local_logger:Epoch[020/800], Step[0200/0626], Avg Loss: 0.8402
-INFO:local_logger:Epoch[020/800], Step[0200/0626], Avg Loss: 0.8410
-INFO:local_logger:Epoch[020/800], Step[0200/0626], Avg Loss: 0.8400
-INFO:local_logger:Epoch[020/800], Step[0200/0626], Avg Loss: 0.8406
-INFO:local_logger:Epoch[020/800], Step[0200/0626], Avg Loss: 0.8409
-INFO:master_logger:Epoch[020/800], Step[0200/0626], Avg Loss: 0.8408
-INFO:local_logger:Epoch[020/800], Step[0200/0626], Avg Loss: 0.8403
-INFO:local_logger:Epoch[020/800], Step[0200/0626], Avg Loss: 0.8416
-INFO:local_logger:Epoch[020/800], Step[0200/0626], Avg Loss: 0.8415
-INFO:local_logger:Epoch[020/800], Step[0300/0626], Avg Loss: 0.8399
-INFO:local_logger:Epoch[020/800], Step[0300/0626], Avg Loss: 0.8411
-INFO:local_logger:Epoch[020/800], Step[0300/0626], Avg Loss: 0.8404
-INFO:local_logger:Epoch[020/800], Step[0300/0626], Avg Loss: 0.8406
-INFO:local_logger:Epoch[020/800], Step[0300/0626], Avg Loss: 0.8406
-INFO:master_logger:Epoch[020/800], Step[0300/0626], Avg Loss: 0.8406
-INFO:local_logger:Epoch[020/800], Step[0300/0626], Avg Loss: 0.8403
-INFO:local_logger:Epoch[020/800], Step[0300/0626], Avg Loss: 0.8415
-INFO:local_logger:Epoch[020/800], Step[0300/0626], Avg Loss: 0.8403
-INFO:local_logger:Epoch[020/800], Step[0400/0626], Avg Loss: 0.8397
-INFO:local_logger:Epoch[020/800], Step[0400/0626], Avg Loss: 0.8397
-INFO:local_logger:Epoch[020/800], Step[0400/0626], Avg Loss: 0.8393
-INFO:local_logger:Epoch[020/800], Step[0400/0626], Avg Loss: 0.8400
-INFO:local_logger:Epoch[020/800], Step[0400/0626], Avg Loss: 0.8395
-INFO:local_logger:Epoch[020/800], Step[0400/0626], Avg Loss: 0.8400
-INFO:local_logger:Epoch[020/800], Step[0400/0626], Avg Loss: 0.8397
-INFO:master_logger:Epoch[020/800], Step[0400/0626], Avg Loss: 0.8398
-INFO:local_logger:Epoch[020/800], Step[0400/0626], Avg Loss: 0.8404
-INFO:local_logger:Epoch[020/800], Step[0500/0626], Avg Loss: 0.8384
-INFO:local_logger:Epoch[020/800], Step[0500/0626], Avg Loss: 0.8386
-INFO:local_logger:Epoch[020/800], Step[0500/0626], Avg Loss: 0.8384
-INFO:local_logger:Epoch[020/800], Step[0500/0626], Avg Loss: 0.8388
-INFO:local_logger:Epoch[020/800], Step[0500/0626], Avg Loss: 0.8383
-INFO:local_logger:Epoch[020/800], Step[0500/0626], Avg Loss: 0.8387
-INFO:local_logger:Epoch[020/800], Step[0500/0626], Avg Loss: 0.8384
-INFO:master_logger:Epoch[020/800], Step[0500/0626], Avg Loss: 0.8386
-INFO:local_logger:Epoch[020/800], Step[0500/0626], Avg Loss: 0.8391
-INFO:local_logger:Epoch[020/800], Step[0600/0626], Avg Loss: 0.8387
-INFO:local_logger:Epoch[020/800], Step[0600/0626], Avg Loss: 0.8378
-INFO:local_logger:Epoch[020/800], Step[0600/0626], Avg Loss: 0.8380
-INFO:local_logger:Epoch[020/800], Step[0600/0626], Avg Loss: 0.8382
-INFO:local_logger:Epoch[020/800], Step[0600/0626], Avg Loss: 0.8383
-INFO:local_logger:Epoch[020/800], Step[0600/0626], Avg Loss: 0.8381
-INFO:master_logger:Epoch[020/800], Step[0600/0626], Avg Loss: 0.8382
-INFO:local_logger:Epoch[020/800], Step[0600/0626], Avg Loss: 0.8380
-INFO:local_logger:Epoch[020/800], Step[0600/0626], Avg Loss: 0.8383
-INFO:local_logger:----- Epoch[020/800], Train Loss: 0.8379, time: 856.21
-INFO:local_logger:Now training epoch 21. LR=0.000079
-INFO:local_logger:----- Epoch[020/800], Train Loss: 0.8378, time: 856.64
-INFO:local_logger:----- Epoch[020/800], Train Loss: 0.8381, time: 856.54
-INFO:local_logger:Now training epoch 21. LR=0.000079
-INFO:local_logger:Now training epoch 21. LR=0.000079
-INFO:local_logger:----- Epoch[020/800], Train Loss: 0.8378, time: 856.55
-INFO:local_logger:Now training epoch 21. LR=0.000079
-INFO:local_logger:----- Epoch[020/800], Train Loss: 0.8378, time: 856.54
-INFO:local_logger:Now training epoch 21. LR=0.000079
-INFO:local_logger:----- Epoch[020/800], Train Loss: 0.8385, time: 856.62
-INFO:local_logger:----- Epoch[020/800], Train Loss: 0.8379, time: 856.58
-INFO:local_logger:Now training epoch 21. LR=0.000079
-INFO:local_logger:Now training epoch 21. LR=0.000079
-INFO:local_logger:----- Epoch[020/800], Train Loss: 0.8377, time: 853.74
-INFO:master_logger:----- Epoch[020/800], Train Loss: 0.8379, time: 853.74
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-20-Loss-0.837697342612629.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-20-Loss-0.837697342612629.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-20-Loss-0.837697342612629.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-20-Loss-0.837697342612629.pdopt
-INFO:local_logger:Now training epoch 21. LR=0.000079
-INFO:master_logger:Now training epoch 21. LR=0.000079
-INFO:local_logger:Epoch[021/800], Step[0000/0626], Avg Loss: 0.8247
-INFO:local_logger:Epoch[021/800], Step[0000/0626], Avg Loss: 0.8468
-INFO:master_logger:Epoch[021/800], Step[0000/0626], Avg Loss: 0.8311
-INFO:local_logger:Epoch[021/800], Step[0000/0626], Avg Loss: 0.8307
-INFO:local_logger:Epoch[021/800], Step[0000/0626], Avg Loss: 0.8301
-INFO:local_logger:Epoch[021/800], Step[0000/0626], Avg Loss: 0.8220
-INFO:local_logger:Epoch[021/800], Step[0000/0626], Avg Loss: 0.8352
-INFO:local_logger:Epoch[021/800], Step[0000/0626], Avg Loss: 0.8284
-INFO:local_logger:Epoch[021/800], Step[0000/0626], Avg Loss: 0.8313
-INFO:local_logger:Epoch[021/800], Step[0100/0626], Avg Loss: 0.8345
-INFO:local_logger:Epoch[021/800], Step[0100/0626], Avg Loss: 0.8327
-INFO:local_logger:Epoch[021/800], Step[0100/0626], Avg Loss: 0.8336
-INFO:local_logger:Epoch[021/800], Step[0100/0626], Avg Loss: 0.8344
-INFO:local_logger:Epoch[021/800], Step[0100/0626], Avg Loss: 0.8331
-INFO:local_logger:Epoch[021/800], Step[0100/0626], Avg Loss: 0.8343
-INFO:master_logger:Epoch[021/800], Step[0100/0626], Avg Loss: 0.8338
-INFO:local_logger:Epoch[021/800], Step[0100/0626], Avg Loss: 0.8339
-INFO:local_logger:Epoch[021/800], Step[0100/0626], Avg Loss: 0.8339
-INFO:local_logger:Epoch[021/800], Step[0200/0626], Avg Loss: 0.8343
-INFO:local_logger:Epoch[021/800], Step[0200/0626], Avg Loss: 0.8340
-INFO:local_logger:Epoch[021/800], Step[0200/0626], Avg Loss: 0.8344
-INFO:local_logger:Epoch[021/800], Step[0200/0626], Avg Loss: 0.8339
-INFO:master_logger:Epoch[021/800], Step[0200/0626], Avg Loss: 0.8338
-INFO:local_logger:Epoch[021/800], Step[0200/0626], Avg Loss: 0.8333
-INFO:local_logger:Epoch[021/800], Step[0200/0626], Avg Loss: 0.8332
-INFO:local_logger:Epoch[021/800], Step[0200/0626], Avg Loss: 0.8337
-INFO:local_logger:Epoch[021/800], Step[0200/0626], Avg Loss: 0.8335
-INFO:local_logger:Epoch[021/800], Step[0300/0626], Avg Loss: 0.8328
-INFO:local_logger:Epoch[021/800], Step[0300/0626], Avg Loss: 0.8336
-INFO:local_logger:Epoch[021/800], Step[0300/0626], Avg Loss: 0.8330
-INFO:local_logger:Epoch[021/800], Step[0300/0626], Avg Loss: 0.8331
-INFO:local_logger:Epoch[021/800], Step[0300/0626], Avg Loss: 0.8331
-INFO:local_logger:Epoch[021/800], Step[0300/0626], Avg Loss: 0.8337
-INFO:local_logger:Epoch[021/800], Step[0300/0626], Avg Loss: 0.8328
-INFO:local_logger:Epoch[021/800], Step[0300/0626], Avg Loss: 0.8336
-INFO:master_logger:Epoch[021/800], Step[0300/0626], Avg Loss: 0.8332
-INFO:local_logger:Epoch[021/800], Step[0400/0626], Avg Loss: 0.8324
-INFO:local_logger:Epoch[021/800], Step[0400/0626], Avg Loss: 0.8322
-INFO:local_logger:Epoch[021/800], Step[0400/0626], Avg Loss: 0.8329
-INFO:local_logger:Epoch[021/800], Step[0400/0626], Avg Loss: 0.8326
-INFO:local_logger:Epoch[021/800], Step[0400/0626], Avg Loss: 0.8333
-INFO:local_logger:Epoch[021/800], Step[0400/0626], Avg Loss: 0.8324
-INFO:local_logger:Epoch[021/800], Step[0400/0626], Avg Loss: 0.8332
-INFO:master_logger:Epoch[021/800], Step[0400/0626], Avg Loss: 0.8327
-INFO:local_logger:Epoch[021/800], Step[0400/0626], Avg Loss: 0.8328
-INFO:local_logger:Epoch[021/800], Step[0500/0626], Avg Loss: 0.8323
-INFO:local_logger:Epoch[021/800], Step[0500/0626], Avg Loss: 0.8322
-INFO:master_logger:Epoch[021/800], Step[0500/0626], Avg Loss: 0.8321
-INFO:local_logger:Epoch[021/800], Step[0500/0626], Avg Loss: 0.8325
-INFO:local_logger:Epoch[021/800], Step[0500/0626], Avg Loss: 0.8319
-INFO:local_logger:Epoch[021/800], Step[0500/0626], Avg Loss: 0.8317
-INFO:local_logger:Epoch[021/800], Step[0500/0626], Avg Loss: 0.8323
-INFO:local_logger:Epoch[021/800], Step[0500/0626], Avg Loss: 0.8319
-INFO:local_logger:Epoch[021/800], Step[0500/0626], Avg Loss: 0.8320
-INFO:local_logger:Epoch[021/800], Step[0600/0626], Avg Loss: 0.8319
-INFO:local_logger:Epoch[021/800], Step[0600/0626], Avg Loss: 0.8316
-INFO:local_logger:Epoch[021/800], Step[0600/0626], Avg Loss: 0.8318
-INFO:local_logger:Epoch[021/800], Step[0600/0626], Avg Loss: 0.8314
-INFO:local_logger:Epoch[021/800], Step[0600/0626], Avg Loss: 0.8317
-INFO:local_logger:Epoch[021/800], Step[0600/0626], Avg Loss: 0.8314
-INFO:local_logger:Epoch[021/800], Step[0600/0626], Avg Loss: 0.8315
-INFO:master_logger:Epoch[021/800], Step[0600/0626], Avg Loss: 0.8316
-INFO:local_logger:Epoch[021/800], Step[0600/0626], Avg Loss: 0.8317
-INFO:local_logger:----- Epoch[021/800], Train Loss: 0.8314, time: 903.73
-INFO:master_logger:----- Epoch[021/800], Train Loss: 0.8313, time: 903.73
-INFO:local_logger:----- Epoch[021/800], Train Loss: 0.8311, time: 908.09
-INFO:local_logger:Now training epoch 22. LR=0.000083
-INFO:local_logger:----- Epoch[021/800], Train Loss: 0.8312, time: 908.50
-INFO:local_logger:Now training epoch 22. LR=0.000083
-INFO:local_logger:----- Epoch[021/800], Train Loss: 0.8312, time: 908.98
-INFO:local_logger:Now training epoch 22. LR=0.000083
-INFO:local_logger:----- Epoch[021/800], Train Loss: 0.8314, time: 908.52
-INFO:local_logger:Now training epoch 22. LR=0.000083
-INFO:local_logger:----- Epoch[021/800], Train Loss: 0.8314, time: 908.52
-INFO:local_logger:Now training epoch 22. LR=0.000083
-INFO:local_logger:----- Epoch[021/800], Train Loss: 0.8311, time: 908.52
-INFO:local_logger:Now training epoch 22. LR=0.000083
-INFO:local_logger:----- Epoch[021/800], Train Loss: 0.8317, time: 908.52
-INFO:local_logger:Now training epoch 22. LR=0.000083
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-21-Loss-0.8314437446567381.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-21-Loss-0.8314437446567381.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-21-Loss-0.8314437446567381.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-21-Loss-0.8314437446567381.pdopt
-INFO:local_logger:Now training epoch 22. LR=0.000083
-INFO:master_logger:Now training epoch 22. LR=0.000083
-INFO:local_logger:Epoch[022/800], Step[0000/0626], Avg Loss: 0.8238
-INFO:local_logger:Epoch[022/800], Step[0000/0626], Avg Loss: 0.8287
-INFO:master_logger:Epoch[022/800], Step[0000/0626], Avg Loss: 0.8236
-INFO:local_logger:Epoch[022/800], Step[0000/0626], Avg Loss: 0.8314
-INFO:local_logger:Epoch[022/800], Step[0000/0626], Avg Loss: 0.8120
-INFO:local_logger:Epoch[022/800], Step[0000/0626], Avg Loss: 0.8206
-INFO:local_logger:Epoch[022/800], Step[0000/0626], Avg Loss: 0.8245
-INFO:local_logger:Epoch[022/800], Step[0000/0626], Avg Loss: 0.8214
-INFO:local_logger:Epoch[022/800], Step[0000/0626], Avg Loss: 0.8266
-INFO:local_logger:Epoch[022/800], Step[0100/0626], Avg Loss: 0.8256
-INFO:local_logger:Epoch[022/800], Step[0100/0626], Avg Loss: 0.8255
-INFO:local_logger:Epoch[022/800], Step[0100/0626], Avg Loss: 0.8256
-INFO:local_logger:Epoch[022/800], Step[0100/0626], Avg Loss: 0.8256
-INFO:local_logger:Epoch[022/800], Step[0100/0626], Avg Loss: 0.8274
-INFO:local_logger:Epoch[022/800], Step[0100/0626], Avg Loss: 0.8262
-INFO:local_logger:Epoch[022/800], Step[0100/0626], Avg Loss: 0.8270
-INFO:master_logger:Epoch[022/800], Step[0100/0626], Avg Loss: 0.8262
-INFO:local_logger:Epoch[022/800], Step[0100/0626], Avg Loss: 0.8269
-INFO:local_logger:Epoch[022/800], Step[0200/0626], Avg Loss: 0.8246
-INFO:local_logger:Epoch[022/800], Step[0200/0626], Avg Loss: 0.8258
-INFO:local_logger:Epoch[022/800], Step[0200/0626], Avg Loss: 0.8262
-INFO:local_logger:Epoch[022/800], Step[0200/0626], Avg Loss: 0.8250
-INFO:master_logger:Epoch[022/800], Step[0200/0626], Avg Loss: 0.8252
-INFO:local_logger:Epoch[022/800], Step[0200/0626], Avg Loss: 0.8250
-INFO:local_logger:Epoch[022/800], Step[0200/0626], Avg Loss: 0.8256
-INFO:local_logger:Epoch[022/800], Step[0200/0626], Avg Loss: 0.8245
-INFO:local_logger:Epoch[022/800], Step[0200/0626], Avg Loss: 0.8253
-INFO:local_logger:Epoch[022/800], Step[0300/0626], Avg Loss: 0.8236
-INFO:local_logger:Epoch[022/800], Step[0300/0626], Avg Loss: 0.8237
-INFO:local_logger:Epoch[022/800], Step[0300/0626], Avg Loss: 0.8235
-INFO:local_logger:Epoch[022/800], Step[0300/0626], Avg Loss: 0.8239
-INFO:local_logger:Epoch[022/800], Step[0300/0626], Avg Loss: 0.8245
-INFO:local_logger:Epoch[022/800], Step[0300/0626], Avg Loss: 0.8242
-INFO:local_logger:Epoch[022/800], Step[0300/0626], Avg Loss: 0.8232
-INFO:local_logger:Epoch[022/800], Step[0300/0626], Avg Loss: 0.8245
-INFO:master_logger:Epoch[022/800], Step[0300/0626], Avg Loss: 0.8239
-INFO:local_logger:Epoch[022/800], Step[0400/0626], Avg Loss: 0.8234
-INFO:local_logger:Epoch[022/800], Step[0400/0626], Avg Loss: 0.8226
-INFO:local_logger:Epoch[022/800], Step[0400/0626], Avg Loss: 0.8230
-INFO:local_logger:Epoch[022/800], Step[0400/0626], Avg Loss: 0.8239
-INFO:local_logger:Epoch[022/800], Step[0400/0626], Avg Loss: 0.8240
-INFO:local_logger:Epoch[022/800], Step[0400/0626], Avg Loss: 0.8231
-INFO:local_logger:Epoch[022/800], Step[0400/0626], Avg Loss: 0.8231
-INFO:local_logger:Epoch[022/800], Step[0400/0626], Avg Loss: 0.8235
-INFO:master_logger:Epoch[022/800], Step[0400/0626], Avg Loss: 0.8233
-INFO:local_logger:Epoch[022/800], Step[0500/0626], Avg Loss: 0.8231
-INFO:local_logger:Epoch[022/800], Step[0500/0626], Avg Loss: 0.8231
-INFO:local_logger:Epoch[022/800], Step[0500/0626], Avg Loss: 0.8222
-INFO:local_logger:Epoch[022/800], Step[0500/0626], Avg Loss: 0.8226
-INFO:master_logger:Epoch[022/800], Step[0500/0626], Avg Loss: 0.8225
-INFO:local_logger:Epoch[022/800], Step[0500/0626], Avg Loss: 0.8222
-INFO:local_logger:Epoch[022/800], Step[0500/0626], Avg Loss: 0.8223
-INFO:local_logger:Epoch[022/800], Step[0500/0626], Avg Loss: 0.8226
-INFO:local_logger:Epoch[022/800], Step[0500/0626], Avg Loss: 0.8222
-INFO:local_logger:Epoch[022/800], Step[0600/0626], Avg Loss: 0.8221
-INFO:local_logger:Epoch[022/800], Step[0600/0626], Avg Loss: 0.8219
-INFO:local_logger:Epoch[022/800], Step[0600/0626], Avg Loss: 0.8219
-INFO:local_logger:Epoch[022/800], Step[0600/0626], Avg Loss: 0.8220
-INFO:local_logger:Epoch[022/800], Step[0600/0626], Avg Loss: 0.8223
-INFO:local_logger:Epoch[022/800], Step[0600/0626], Avg Loss: 0.8226
-INFO:master_logger:Epoch[022/800], Step[0600/0626], Avg Loss: 0.8222
-INFO:local_logger:Epoch[022/800], Step[0600/0626], Avg Loss: 0.8228
-INFO:local_logger:Epoch[022/800], Step[0600/0626], Avg Loss: 0.8222
-INFO:local_logger:----- Epoch[022/800], Train Loss: 0.8221, time: 859.97
-INFO:master_logger:----- Epoch[022/800], Train Loss: 0.8221, time: 859.97
-INFO:local_logger:----- Epoch[022/800], Train Loss: 0.8221, time: 863.27
-INFO:local_logger:Now training epoch 23. LR=0.000087
-INFO:local_logger:----- Epoch[022/800], Train Loss: 0.8219, time: 864.02
-INFO:local_logger:Now training epoch 23. LR=0.000087
-INFO:local_logger:----- Epoch[022/800], Train Loss: 0.8219, time: 863.88
-INFO:local_logger:Now training epoch 23. LR=0.000087
-INFO:local_logger:----- Epoch[022/800], Train Loss: 0.8227, time: 863.94
-INFO:local_logger:Now training epoch 23. LR=0.000087
-INFO:local_logger:----- Epoch[022/800], Train Loss: 0.8220, time: 863.94
-INFO:local_logger:Now training epoch 23. LR=0.000087
-INFO:local_logger:----- Epoch[022/800], Train Loss: 0.8223, time: 863.94
-INFO:local_logger:Now training epoch 23. LR=0.000087
-INFO:local_logger:----- Epoch[022/800], Train Loss: 0.8219, time: 863.94
-INFO:local_logger:Now training epoch 23. LR=0.000087
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-22-Loss-0.8221496500572387.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-22-Loss-0.8221496500572387.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-22-Loss-0.8221496500572387.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-22-Loss-0.8221496500572387.pdopt
-INFO:local_logger:Now training epoch 23. LR=0.000087
-INFO:master_logger:Now training epoch 23. LR=0.000087
-INFO:local_logger:Epoch[023/800], Step[0000/0626], Avg Loss: 0.8185
-INFO:local_logger:Epoch[023/800], Step[0000/0626], Avg Loss: 0.8079
-INFO:local_logger:Epoch[023/800], Step[0000/0626], Avg Loss: 0.8203
-INFO:local_logger:Epoch[023/800], Step[0000/0626], Avg Loss: 0.8216
-INFO:master_logger:Epoch[023/800], Step[0000/0626], Avg Loss: 0.8178
-INFO:local_logger:Epoch[023/800], Step[0000/0626], Avg Loss: 0.8182
-INFO:local_logger:Epoch[023/800], Step[0000/0626], Avg Loss: 0.8154
-INFO:local_logger:Epoch[023/800], Step[0000/0626], Avg Loss: 0.8235
-INFO:local_logger:Epoch[023/800], Step[0000/0626], Avg Loss: 0.8168
-INFO:local_logger:Epoch[023/800], Step[0100/0626], Avg Loss: 0.8184
-INFO:local_logger:Epoch[023/800], Step[0100/0626], Avg Loss: 0.8174
-INFO:local_logger:Epoch[023/800], Step[0100/0626], Avg Loss: 0.8173
-INFO:local_logger:Epoch[023/800], Step[0100/0626], Avg Loss: 0.8176
-INFO:local_logger:Epoch[023/800], Step[0100/0626], Avg Loss: 0.8182
-INFO:master_logger:Epoch[023/800], Step[0100/0626], Avg Loss: 0.8177
-INFO:local_logger:Epoch[023/800], Step[0100/0626], Avg Loss: 0.8171
-INFO:local_logger:Epoch[023/800], Step[0100/0626], Avg Loss: 0.8178
-INFO:local_logger:Epoch[023/800], Step[0100/0626], Avg Loss: 0.8173
-INFO:local_logger:Epoch[023/800], Step[0200/0626], Avg Loss: 0.8173
-INFO:local_logger:Epoch[023/800], Step[0200/0626], Avg Loss: 0.8172
-INFO:local_logger:Epoch[023/800], Step[0200/0626], Avg Loss: 0.8169
-INFO:local_logger:Epoch[023/800], Step[0200/0626], Avg Loss: 0.8168
-INFO:local_logger:Epoch[023/800], Step[0200/0626], Avg Loss: 0.8171
-INFO:local_logger:Epoch[023/800], Step[0200/0626], Avg Loss: 0.8171
-INFO:local_logger:Epoch[023/800], Step[0200/0626], Avg Loss: 0.8172
-INFO:local_logger:Epoch[023/800], Step[0200/0626], Avg Loss: 0.8171
-INFO:master_logger:Epoch[023/800], Step[0200/0626], Avg Loss: 0.8171
-INFO:local_logger:Epoch[023/800], Step[0300/0626], Avg Loss: 0.8166
-INFO:local_logger:Epoch[023/800], Step[0300/0626], Avg Loss: 0.8163
-INFO:local_logger:Epoch[023/800], Step[0300/0626], Avg Loss: 0.8166
-INFO:local_logger:Epoch[023/800], Step[0300/0626], Avg Loss: 0.8167
-INFO:local_logger:Epoch[023/800], Step[0300/0626], Avg Loss: 0.8164
-INFO:master_logger:Epoch[023/800], Step[0300/0626], Avg Loss: 0.8166
-INFO:local_logger:Epoch[023/800], Step[0300/0626], Avg Loss: 0.8170
-INFO:local_logger:Epoch[023/800], Step[0300/0626], Avg Loss: 0.8170
-INFO:local_logger:Epoch[023/800], Step[0300/0626], Avg Loss: 0.8166
-INFO:local_logger:Epoch[023/800], Step[0400/0626], Avg Loss: 0.8161
-INFO:local_logger:Epoch[023/800], Step[0400/0626], Avg Loss: 0.8159
-INFO:local_logger:Epoch[023/800], Step[0400/0626], Avg Loss: 0.8161
-INFO:local_logger:Epoch[023/800], Step[0400/0626], Avg Loss: 0.8159
-INFO:local_logger:Epoch[023/800], Step[0400/0626], Avg Loss: 0.8158
-INFO:local_logger:Epoch[023/800], Step[0400/0626], Avg Loss: 0.8165
-INFO:master_logger:Epoch[023/800], Step[0400/0626], Avg Loss: 0.8161
-INFO:local_logger:Epoch[023/800], Step[0400/0626], Avg Loss: 0.8163
-INFO:local_logger:Epoch[023/800], Step[0400/0626], Avg Loss: 0.8165
-INFO:local_logger:Epoch[023/800], Step[0500/0626], Avg Loss: 0.8157
-INFO:local_logger:Epoch[023/800], Step[0500/0626], Avg Loss: 0.8160
-INFO:local_logger:Epoch[023/800], Step[0500/0626], Avg Loss: 0.8161
-INFO:local_logger:Epoch[023/800], Step[0500/0626], Avg Loss: 0.8155
-INFO:master_logger:Epoch[023/800], Step[0500/0626], Avg Loss: 0.8157
-INFO:local_logger:Epoch[023/800], Step[0500/0626], Avg Loss: 0.8154
-INFO:local_logger:Epoch[023/800], Step[0500/0626], Avg Loss: 0.8155
-INFO:local_logger:Epoch[023/800], Step[0500/0626], Avg Loss: 0.8156
-INFO:local_logger:Epoch[023/800], Step[0500/0626], Avg Loss: 0.8155
-INFO:local_logger:Epoch[023/800], Step[0600/0626], Avg Loss: 0.8151
-INFO:local_logger:Epoch[023/800], Step[0600/0626], Avg Loss: 0.8149
-INFO:local_logger:Epoch[023/800], Step[0600/0626], Avg Loss: 0.8154
-INFO:local_logger:Epoch[023/800], Step[0600/0626], Avg Loss: 0.8151
-INFO:local_logger:Epoch[023/800], Step[0600/0626], Avg Loss: 0.8152
-INFO:local_logger:Epoch[023/800], Step[0600/0626], Avg Loss: 0.8149
-INFO:master_logger:Epoch[023/800], Step[0600/0626], Avg Loss: 0.8151
-INFO:local_logger:Epoch[023/800], Step[0600/0626], Avg Loss: 0.8151
-INFO:local_logger:Epoch[023/800], Step[0600/0626], Avg Loss: 0.8153
-INFO:local_logger:----- Epoch[023/800], Train Loss: 0.8150, time: 884.65
-INFO:master_logger:----- Epoch[023/800], Train Loss: 0.8150, time: 884.65
-INFO:local_logger:----- Epoch[023/800], Train Loss: 0.8151, time: 888.50
-INFO:local_logger:Now training epoch 24. LR=0.000090
-INFO:local_logger:----- Epoch[023/800], Train Loss: 0.8152, time: 889.17
-INFO:local_logger:Now training epoch 24. LR=0.000090
-INFO:local_logger:----- Epoch[023/800], Train Loss: 0.8152, time: 888.59
-INFO:local_logger:Now training epoch 24. LR=0.000090
-INFO:local_logger:----- Epoch[023/800], Train Loss: 0.8148, time: 888.85
-INFO:local_logger:Now training epoch 24. LR=0.000090
-INFO:local_logger:----- Epoch[023/800], Train Loss: 0.8149, time: 888.51
-INFO:local_logger:Now training epoch 24. LR=0.000090
-INFO:local_logger:----- Epoch[023/800], Train Loss: 0.8149, time: 888.52
-INFO:local_logger:Now training epoch 24. LR=0.000090
-INFO:local_logger:----- Epoch[023/800], Train Loss: 0.8146, time: 888.53
-INFO:local_logger:Now training epoch 24. LR=0.000090
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-23-Loss-0.8150022067021212.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-23-Loss-0.8150022067021212.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-23-Loss-0.8150022067021212.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-23-Loss-0.8150022067021212.pdopt
-INFO:local_logger:Now training epoch 24. LR=0.000090
-INFO:master_logger:Now training epoch 24. LR=0.000090
-INFO:local_logger:Epoch[024/800], Step[0000/0626], Avg Loss: 0.8150
-INFO:local_logger:Epoch[024/800], Step[0000/0626], Avg Loss: 0.8170
-INFO:local_logger:Epoch[024/800], Step[0000/0626], Avg Loss: 0.8108
-INFO:local_logger:Epoch[024/800], Step[0000/0626], Avg Loss: 0.8043
-INFO:local_logger:Epoch[024/800], Step[0000/0626], Avg Loss: 0.8090
-INFO:local_logger:Epoch[024/800], Step[0000/0626], Avg Loss: 0.8224
-INFO:local_logger:Epoch[024/800], Step[0000/0626], Avg Loss: 0.8215
-INFO:master_logger:Epoch[024/800], Step[0000/0626], Avg Loss: 0.8135
-INFO:local_logger:Epoch[024/800], Step[0000/0626], Avg Loss: 0.8082
-INFO:local_logger:Epoch[024/800], Step[0100/0626], Avg Loss: 0.8115
-INFO:local_logger:Epoch[024/800], Step[0100/0626], Avg Loss: 0.8115
-INFO:local_logger:Epoch[024/800], Step[0100/0626], Avg Loss: 0.8110
-INFO:local_logger:Epoch[024/800], Step[0100/0626], Avg Loss: 0.8112
-INFO:local_logger:Epoch[024/800], Step[0100/0626], Avg Loss: 0.8117
-INFO:local_logger:Epoch[024/800], Step[0100/0626], Avg Loss: 0.8124
-INFO:local_logger:Epoch[024/800], Step[0100/0626], Avg Loss: 0.8113
-INFO:master_logger:Epoch[024/800], Step[0100/0626], Avg Loss: 0.8114
-INFO:local_logger:Epoch[024/800], Step[0100/0626], Avg Loss: 0.8105
-INFO:local_logger:Epoch[024/800], Step[0200/0626], Avg Loss: 0.8104
-INFO:local_logger:Epoch[024/800], Step[0200/0626], Avg Loss: 0.8110
-INFO:local_logger:Epoch[024/800], Step[0200/0626], Avg Loss: 0.8111
-INFO:local_logger:Epoch[024/800], Step[0200/0626], Avg Loss: 0.8115
-INFO:local_logger:Epoch[024/800], Step[0200/0626], Avg Loss: 0.8114
-INFO:local_logger:Epoch[024/800], Step[0200/0626], Avg Loss: 0.8115
-INFO:local_logger:Epoch[024/800], Step[0200/0626], Avg Loss: 0.8108
-INFO:local_logger:Epoch[024/800], Step[0200/0626], Avg Loss: 0.8112
-INFO:master_logger:Epoch[024/800], Step[0200/0626], Avg Loss: 0.8111
-INFO:local_logger:Epoch[024/800], Step[0300/0626], Avg Loss: 0.8101
-INFO:local_logger:Epoch[024/800], Step[0300/0626], Avg Loss: 0.8100
-INFO:local_logger:Epoch[024/800], Step[0300/0626], Avg Loss: 0.8106
-INFO:local_logger:Epoch[024/800], Step[0300/0626], Avg Loss: 0.8106
-INFO:local_logger:Epoch[024/800], Step[0300/0626], Avg Loss: 0.8099
-INFO:local_logger:Epoch[024/800], Step[0300/0626], Avg Loss: 0.8100
-INFO:local_logger:Epoch[024/800], Step[0300/0626], Avg Loss: 0.8101
-INFO:local_logger:Epoch[024/800], Step[0300/0626], Avg Loss: 0.8107
-INFO:master_logger:Epoch[024/800], Step[0300/0626], Avg Loss: 0.8103
-INFO:local_logger:Epoch[024/800], Step[0400/0626], Avg Loss: 0.8096
-INFO:local_logger:Epoch[024/800], Step[0400/0626], Avg Loss: 0.8096
-INFO:local_logger:Epoch[024/800], Step[0400/0626], Avg Loss: 0.8096
-INFO:local_logger:Epoch[024/800], Step[0400/0626], Avg Loss: 0.8099
-INFO:local_logger:Epoch[024/800], Step[0400/0626], Avg Loss: 0.8098
-INFO:local_logger:Epoch[024/800], Step[0400/0626], Avg Loss: 0.8097
-INFO:local_logger:Epoch[024/800], Step[0400/0626], Avg Loss: 0.8101
-INFO:master_logger:Epoch[024/800], Step[0400/0626], Avg Loss: 0.8098
-INFO:local_logger:Epoch[024/800], Step[0400/0626], Avg Loss: 0.8103
-INFO:local_logger:Epoch[024/800], Step[0500/0626], Avg Loss: 0.8089
-INFO:local_logger:Epoch[024/800], Step[0500/0626], Avg Loss: 0.8095
-INFO:local_logger:Epoch[024/800], Step[0500/0626], Avg Loss: 0.8094
-INFO:local_logger:Epoch[024/800], Step[0500/0626], Avg Loss: 0.8090
-INFO:local_logger:Epoch[024/800], Step[0500/0626], Avg Loss: 0.8094
-INFO:local_logger:Epoch[024/800], Step[0500/0626], Avg Loss: 0.8090
-INFO:local_logger:Epoch[024/800], Step[0500/0626], Avg Loss: 0.8091
-INFO:master_logger:Epoch[024/800], Step[0500/0626], Avg Loss: 0.8092
-INFO:local_logger:Epoch[024/800], Step[0500/0626], Avg Loss: 0.8090
-INFO:local_logger:Epoch[024/800], Step[0600/0626], Avg Loss: 0.8088
-INFO:local_logger:Epoch[024/800], Step[0600/0626], Avg Loss: 0.8088
-INFO:local_logger:Epoch[024/800], Step[0600/0626], Avg Loss: 0.8093
-INFO:local_logger:Epoch[024/800], Step[0600/0626], Avg Loss: 0.8088
-INFO:local_logger:Epoch[024/800], Step[0600/0626], Avg Loss: 0.8087
-INFO:local_logger:Epoch[024/800], Step[0600/0626], Avg Loss: 0.8091
-INFO:local_logger:Epoch[024/800], Step[0600/0626], Avg Loss: 0.8091
-INFO:master_logger:Epoch[024/800], Step[0600/0626], Avg Loss: 0.8089
-INFO:local_logger:Epoch[024/800], Step[0600/0626], Avg Loss: 0.8088
-INFO:local_logger:----- Epoch[024/800], Train Loss: 0.8087, time: 870.26
-INFO:local_logger:Now training epoch 25. LR=0.000094
-INFO:local_logger:----- Epoch[024/800], Train Loss: 0.8084, time: 870.28
-INFO:local_logger:Now training epoch 25. LR=0.000094
-INFO:local_logger:----- Epoch[024/800], Train Loss: 0.8092, time: 867.64
-INFO:master_logger:----- Epoch[024/800], Train Loss: 0.8088, time: 867.64
-INFO:local_logger:----- Epoch[024/800], Train Loss: 0.8090, time: 870.78
-INFO:local_logger:Now training epoch 25. LR=0.000094
-INFO:local_logger:----- Epoch[024/800], Train Loss: 0.8089, time: 870.77
-INFO:local_logger:Now training epoch 25. LR=0.000094
-INFO:local_logger:----- Epoch[024/800], Train Loss: 0.8086, time: 870.75
-INFO:local_logger:Now training epoch 25. LR=0.000094
-INFO:local_logger:----- Epoch[024/800], Train Loss: 0.8086, time: 870.78
-INFO:local_logger:----- Epoch[024/800], Train Loss: 0.8087, time: 870.75
-INFO:local_logger:Now training epoch 25. LR=0.000094
-INFO:local_logger:Now training epoch 25. LR=0.000094
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-24-Loss-0.8091736378081739.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-24-Loss-0.8091736378081739.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-24-Loss-0.8091736378081739.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-24-Loss-0.8091736378081739.pdopt
-INFO:local_logger:Now training epoch 25. LR=0.000094
-INFO:master_logger:Now training epoch 25. LR=0.000094
-INFO:local_logger:Epoch[025/800], Step[0000/0626], Avg Loss: 0.8030
-INFO:local_logger:Epoch[025/800], Step[0000/0626], Avg Loss: 0.8073
-INFO:local_logger:Epoch[025/800], Step[0000/0626], Avg Loss: 0.7969
-INFO:master_logger:Epoch[025/800], Step[0000/0626], Avg Loss: 0.8051
-INFO:local_logger:Epoch[025/800], Step[0000/0626], Avg Loss: 0.8085
-INFO:local_logger:Epoch[025/800], Step[0000/0626], Avg Loss: 0.8000
-INFO:local_logger:Epoch[025/800], Step[0000/0626], Avg Loss: 0.8034
-INFO:local_logger:Epoch[025/800], Step[0000/0626], Avg Loss: 0.8158
-INFO:local_logger:Epoch[025/800], Step[0000/0626], Avg Loss: 0.8061
-INFO:local_logger:Epoch[025/800], Step[0100/0626], Avg Loss: 0.8055
-INFO:local_logger:Epoch[025/800], Step[0100/0626], Avg Loss: 0.8046
-INFO:local_logger:Epoch[025/800], Step[0100/0626], Avg Loss: 0.8059
-INFO:local_logger:Epoch[025/800], Step[0100/0626], Avg Loss: 0.8052
-INFO:local_logger:Epoch[025/800], Step[0100/0626], Avg Loss: 0.8053
-INFO:local_logger:Epoch[025/800], Step[0100/0626], Avg Loss: 0.8054
-INFO:local_logger:Epoch[025/800], Step[0100/0626], Avg Loss: 0.8052
-INFO:local_logger:Epoch[025/800], Step[0100/0626], Avg Loss: 0.8057
-INFO:master_logger:Epoch[025/800], Step[0100/0626], Avg Loss: 0.8053
-INFO:local_logger:Epoch[025/800], Step[0200/0626], Avg Loss: 0.8049
-INFO:local_logger:Epoch[025/800], Step[0200/0626], Avg Loss: 0.8050
-INFO:local_logger:Epoch[025/800], Step[0200/0626], Avg Loss: 0.8051
-INFO:local_logger:Epoch[025/800], Step[0200/0626], Avg Loss: 0.8047
-INFO:local_logger:Epoch[025/800], Step[0200/0626], Avg Loss: 0.8054
-INFO:local_logger:Epoch[025/800], Step[0200/0626], Avg Loss: 0.8050
-INFO:local_logger:Epoch[025/800], Step[0200/0626], Avg Loss: 0.8057
-INFO:master_logger:Epoch[025/800], Step[0200/0626], Avg Loss: 0.8051
-INFO:local_logger:Epoch[025/800], Step[0200/0626], Avg Loss: 0.8051
-INFO:local_logger:Epoch[025/800], Step[0300/0626], Avg Loss: 0.8052
-INFO:local_logger:Epoch[025/800], Step[0300/0626], Avg Loss: 0.8049
-INFO:local_logger:Epoch[025/800], Step[0300/0626], Avg Loss: 0.8046
-INFO:local_logger:Epoch[025/800], Step[0300/0626], Avg Loss: 0.8044
-INFO:local_logger:Epoch[025/800], Step[0300/0626], Avg Loss: 0.8048
-INFO:local_logger:Epoch[025/800], Step[0300/0626], Avg Loss: 0.8046
-INFO:master_logger:Epoch[025/800], Step[0300/0626], Avg Loss: 0.8047
-INFO:local_logger:Epoch[025/800], Step[0300/0626], Avg Loss: 0.8044
-INFO:local_logger:Epoch[025/800], Step[0300/0626], Avg Loss: 0.8044
-INFO:local_logger:Epoch[025/800], Step[0400/0626], Avg Loss: 0.8043
-INFO:master_logger:Epoch[025/800], Step[0400/0626], Avg Loss: 0.8042
-INFO:local_logger:Epoch[025/800], Step[0400/0626], Avg Loss: 0.8042
-INFO:local_logger:Epoch[025/800], Step[0400/0626], Avg Loss: 0.8043
-INFO:local_logger:Epoch[025/800], Step[0400/0626], Avg Loss: 0.8040
-INFO:local_logger:Epoch[025/800], Step[0400/0626], Avg Loss: 0.8045
-INFO:local_logger:Epoch[025/800], Step[0400/0626], Avg Loss: 0.8038
-INFO:local_logger:Epoch[025/800], Step[0400/0626], Avg Loss: 0.8043
-INFO:local_logger:Epoch[025/800], Step[0400/0626], Avg Loss: 0.8043
-INFO:local_logger:Epoch[025/800], Step[0500/0626], Avg Loss: 0.8040
-INFO:local_logger:Epoch[025/800], Step[0500/0626], Avg Loss: 0.8039
-INFO:local_logger:Epoch[025/800], Step[0500/0626], Avg Loss: 0.8032
-INFO:local_logger:Epoch[025/800], Step[0500/0626], Avg Loss: 0.8034
-INFO:master_logger:Epoch[025/800], Step[0500/0626], Avg Loss: 0.8036
-INFO:local_logger:Epoch[025/800], Step[0500/0626], Avg Loss: 0.8037
-INFO:local_logger:Epoch[025/800], Step[0500/0626], Avg Loss: 0.8037
-INFO:local_logger:Epoch[025/800], Step[0500/0626], Avg Loss: 0.8037
-INFO:local_logger:Epoch[025/800], Step[0500/0626], Avg Loss: 0.8035
-INFO:local_logger:Epoch[025/800], Step[0600/0626], Avg Loss: 0.8036
-INFO:local_logger:Epoch[025/800], Step[0600/0626], Avg Loss: 0.8029
-INFO:master_logger:Epoch[025/800], Step[0600/0626], Avg Loss: 0.8033
-INFO:local_logger:Epoch[025/800], Step[0600/0626], Avg Loss: 0.8032
-INFO:local_logger:Epoch[025/800], Step[0600/0626], Avg Loss: 0.8032
-INFO:local_logger:Epoch[025/800], Step[0600/0626], Avg Loss: 0.8030
-INFO:local_logger:Epoch[025/800], Step[0600/0626], Avg Loss: 0.8035
-INFO:local_logger:Epoch[025/800], Step[0600/0626], Avg Loss: 0.8036
-INFO:local_logger:Epoch[025/800], Step[0600/0626], Avg Loss: 0.8035
-INFO:local_logger:----- Epoch[025/800], Train Loss: 0.8033, time: 888.93
-INFO:local_logger:Now training epoch 26. LR=0.000098
-INFO:local_logger:----- Epoch[025/800], Train Loss: 0.8035, time: 885.88
-INFO:master_logger:----- Epoch[025/800], Train Loss: 0.8032, time: 885.88
-INFO:local_logger:----- Epoch[025/800], Train Loss: 0.8029, time: 889.72
-INFO:local_logger:Now training epoch 26. LR=0.000098
-INFO:local_logger:----- Epoch[025/800], Train Loss: 0.8033, time: 889.81
-INFO:local_logger:Now training epoch 26. LR=0.000098
-INFO:local_logger:----- Epoch[025/800], Train Loss: 0.8031, time: 889.83
-INFO:local_logger:Now training epoch 26. LR=0.000098
-INFO:local_logger:----- Epoch[025/800], Train Loss: 0.8031, time: 890.36
-INFO:local_logger:Now training epoch 26. LR=0.000098
-INFO:local_logger:----- Epoch[025/800], Train Loss: 0.8035, time: 890.36
-INFO:local_logger:Now training epoch 26. LR=0.000098
-INFO:local_logger:----- Epoch[025/800], Train Loss: 0.8028, time: 889.87
-INFO:local_logger:Now training epoch 26. LR=0.000098
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-25-Loss-0.8034991228641365.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-25-Loss-0.8034991228641365.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-25-Loss-0.8034991228641365.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-25-Loss-0.8034991228641365.pdopt
-INFO:local_logger:Now training epoch 26. LR=0.000098
-INFO:master_logger:Now training epoch 26. LR=0.000098
-INFO:local_logger:Epoch[026/800], Step[0000/0626], Avg Loss: 0.7943
-INFO:local_logger:Epoch[026/800], Step[0000/0626], Avg Loss: 0.7989
-INFO:master_logger:Epoch[026/800], Step[0000/0626], Avg Loss: 0.7988
-INFO:local_logger:Epoch[026/800], Step[0000/0626], Avg Loss: 0.7899
-INFO:local_logger:Epoch[026/800], Step[0000/0626], Avg Loss: 0.7949
-INFO:local_logger:Epoch[026/800], Step[0000/0626], Avg Loss: 0.7996
-INFO:local_logger:Epoch[026/800], Step[0000/0626], Avg Loss: 0.8022
-INFO:local_logger:Epoch[026/800], Step[0000/0626], Avg Loss: 0.8063
-INFO:local_logger:Epoch[026/800], Step[0000/0626], Avg Loss: 0.8043
-INFO:local_logger:Epoch[026/800], Step[0100/0626], Avg Loss: 0.7989
-INFO:local_logger:Epoch[026/800], Step[0100/0626], Avg Loss: 0.7997
-INFO:local_logger:Epoch[026/800], Step[0100/0626], Avg Loss: 0.7993
-INFO:local_logger:Epoch[026/800], Step[0100/0626], Avg Loss: 0.7994
-INFO:local_logger:Epoch[026/800], Step[0100/0626], Avg Loss: 0.7998
-INFO:local_logger:Epoch[026/800], Step[0100/0626], Avg Loss: 0.7976
-INFO:master_logger:Epoch[026/800], Step[0100/0626], Avg Loss: 0.7992
-INFO:local_logger:Epoch[026/800], Step[0100/0626], Avg Loss: 0.7992
-INFO:local_logger:Epoch[026/800], Step[0100/0626], Avg Loss: 0.7992
-INFO:local_logger:Epoch[026/800], Step[0200/0626], Avg Loss: 0.7993
-INFO:local_logger:Epoch[026/800], Step[0200/0626], Avg Loss: 0.7986
-INFO:local_logger:Epoch[026/800], Step[0200/0626], Avg Loss: 0.7985
-INFO:local_logger:Epoch[026/800], Step[0200/0626], Avg Loss: 0.7986
-INFO:local_logger:Epoch[026/800], Step[0200/0626], Avg Loss: 0.7987
-INFO:local_logger:Epoch[026/800], Step[0200/0626], Avg Loss: 0.7988
-INFO:local_logger:Epoch[026/800], Step[0200/0626], Avg Loss: 0.7980
-INFO:master_logger:Epoch[026/800], Step[0200/0626], Avg Loss: 0.7988
-INFO:local_logger:Epoch[026/800], Step[0200/0626], Avg Loss: 0.7999
-INFO:local_logger:Epoch[026/800], Step[0300/0626], Avg Loss: 0.7980
-INFO:local_logger:Epoch[026/800], Step[0300/0626], Avg Loss: 0.7983
-INFO:local_logger:Epoch[026/800], Step[0300/0626], Avg Loss: 0.7983
-INFO:local_logger:Epoch[026/800], Step[0300/0626], Avg Loss: 0.7992
-INFO:local_logger:Epoch[026/800], Step[0300/0626], Avg Loss: 0.7980
-INFO:local_logger:Epoch[026/800], Step[0300/0626], Avg Loss: 0.7983
-INFO:local_logger:Epoch[026/800], Step[0300/0626], Avg Loss: 0.7985
-INFO:master_logger:Epoch[026/800], Step[0300/0626], Avg Loss: 0.7983
-INFO:local_logger:Epoch[026/800], Step[0300/0626], Avg Loss: 0.7979
-INFO:local_logger:Epoch[026/800], Step[0400/0626], Avg Loss: 0.7977
-INFO:local_logger:Epoch[026/800], Step[0400/0626], Avg Loss: 0.7973
-INFO:local_logger:Epoch[026/800], Step[0400/0626], Avg Loss: 0.7976
-INFO:local_logger:Epoch[026/800], Step[0400/0626], Avg Loss: 0.7977
-INFO:local_logger:Epoch[026/800], Step[0400/0626], Avg Loss: 0.7975
-INFO:local_logger:Epoch[026/800], Step[0400/0626], Avg Loss: 0.7976
-INFO:local_logger:Epoch[026/800], Step[0400/0626], Avg Loss: 0.7979
-INFO:master_logger:Epoch[026/800], Step[0400/0626], Avg Loss: 0.7977
-INFO:local_logger:Epoch[026/800], Step[0400/0626], Avg Loss: 0.7984
-INFO:local_logger:Epoch[026/800], Step[0500/0626], Avg Loss: 0.7970
-INFO:local_logger:Epoch[026/800], Step[0500/0626], Avg Loss: 0.7967
-INFO:local_logger:Epoch[026/800], Step[0500/0626], Avg Loss: 0.7967
-INFO:local_logger:Epoch[026/800], Step[0500/0626], Avg Loss: 0.7976
-INFO:local_logger:Epoch[026/800], Step[0500/0626], Avg Loss: 0.7970
-INFO:local_logger:Epoch[026/800], Step[0500/0626], Avg Loss: 0.7965
-INFO:master_logger:Epoch[026/800], Step[0500/0626], Avg Loss: 0.7969
-INFO:local_logger:Epoch[026/800], Step[0500/0626], Avg Loss: 0.7968
-INFO:local_logger:Epoch[026/800], Step[0500/0626], Avg Loss: 0.7969
-INFO:local_logger:Epoch[026/800], Step[0600/0626], Avg Loss: 0.7961
-INFO:local_logger:Epoch[026/800], Step[0600/0626], Avg Loss: 0.7961
-INFO:local_logger:Epoch[026/800], Step[0600/0626], Avg Loss: 0.7959
-INFO:local_logger:Epoch[026/800], Step[0600/0626], Avg Loss: 0.7961
-INFO:local_logger:Epoch[026/800], Step[0600/0626], Avg Loss: 0.7969
-INFO:local_logger:Epoch[026/800], Step[0600/0626], Avg Loss: 0.7961
-INFO:master_logger:Epoch[026/800], Step[0600/0626], Avg Loss: 0.7962
-INFO:local_logger:Epoch[026/800], Step[0600/0626], Avg Loss: 0.7963
-INFO:local_logger:Epoch[026/800], Step[0600/0626], Avg Loss: 0.7964
-INFO:local_logger:----- Epoch[026/800], Train Loss: 0.7968, time: 871.34
-INFO:local_logger:Now training epoch 27. LR=0.000102
-INFO:local_logger:----- Epoch[026/800], Train Loss: 0.7961, time: 867.96
-INFO:master_logger:----- Epoch[026/800], Train Loss: 0.7962, time: 867.96
-INFO:local_logger:----- Epoch[026/800], Train Loss: 0.7959, time: 871.51
-INFO:local_logger:Now training epoch 27. LR=0.000102
-INFO:local_logger:----- Epoch[026/800], Train Loss: 0.7960, time: 871.97
-INFO:local_logger:Now training epoch 27. LR=0.000102
-INFO:local_logger:----- Epoch[026/800], Train Loss: 0.7960, time: 871.99
-INFO:local_logger:Now training epoch 27. LR=0.000102
-INFO:local_logger:----- Epoch[026/800], Train Loss: 0.7963, time: 872.03
-INFO:local_logger:Now training epoch 27. LR=0.000102
-INFO:local_logger:----- Epoch[026/800], Train Loss: 0.7962, time: 872.93
-INFO:local_logger:Now training epoch 27. LR=0.000102
-INFO:local_logger:----- Epoch[026/800], Train Loss: 0.7961, time: 872.05
-INFO:local_logger:Now training epoch 27. LR=0.000102
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-26-Loss-0.796081899062454.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-26-Loss-0.796081899062454.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-26-Loss-0.796081899062454.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-26-Loss-0.796081899062454.pdopt
-INFO:local_logger:Now training epoch 27. LR=0.000102
-INFO:master_logger:Now training epoch 27. LR=0.000102
-INFO:local_logger:Epoch[027/800], Step[0000/0626], Avg Loss: 0.7991
-INFO:local_logger:Epoch[027/800], Step[0000/0626], Avg Loss: 0.8009
-INFO:local_logger:Epoch[027/800], Step[0000/0626], Avg Loss: 0.7909
-INFO:master_logger:Epoch[027/800], Step[0000/0626], Avg Loss: 0.7963
-INFO:local_logger:Epoch[027/800], Step[0000/0626], Avg Loss: 0.8054
-INFO:local_logger:Epoch[027/800], Step[0000/0626], Avg Loss: 0.7875
-INFO:local_logger:Epoch[027/800], Step[0000/0626], Avg Loss: 0.7990
-INFO:local_logger:Epoch[027/800], Step[0000/0626], Avg Loss: 0.7946
-INFO:local_logger:Epoch[027/800], Step[0000/0626], Avg Loss: 0.7932
-INFO:local_logger:Epoch[027/800], Step[0100/0626], Avg Loss: 0.7923
-INFO:local_logger:Epoch[027/800], Step[0100/0626], Avg Loss: 0.7922
-INFO:local_logger:Epoch[027/800], Step[0100/0626], Avg Loss: 0.7923
-INFO:master_logger:Epoch[027/800], Step[0100/0626], Avg Loss: 0.7920
-INFO:local_logger:Epoch[027/800], Step[0100/0626], Avg Loss: 0.7924
-INFO:local_logger:Epoch[027/800], Step[0100/0626], Avg Loss: 0.7913
-INFO:local_logger:Epoch[027/800], Step[0100/0626], Avg Loss: 0.7925
-INFO:local_logger:Epoch[027/800], Step[0100/0626], Avg Loss: 0.7917
-INFO:local_logger:Epoch[027/800], Step[0100/0626], Avg Loss: 0.7914
-INFO:local_logger:Epoch[027/800], Step[0200/0626], Avg Loss: 0.7922
-INFO:local_logger:Epoch[027/800], Step[0200/0626], Avg Loss: 0.7922
-INFO:local_logger:Epoch[027/800], Step[0200/0626], Avg Loss: 0.7924
-INFO:local_logger:Epoch[027/800], Step[0200/0626], Avg Loss: 0.7924
-INFO:local_logger:Epoch[027/800], Step[0200/0626], Avg Loss: 0.7915
-INFO:local_logger:Epoch[027/800], Step[0200/0626], Avg Loss: 0.7919
-INFO:master_logger:Epoch[027/800], Step[0200/0626], Avg Loss: 0.7920
-INFO:local_logger:Epoch[027/800], Step[0200/0626], Avg Loss: 0.7914
-INFO:local_logger:Epoch[027/800], Step[0200/0626], Avg Loss: 0.7921
-INFO:local_logger:Epoch[027/800], Step[0300/0626], Avg Loss: 0.7914
-INFO:local_logger:Epoch[027/800], Step[0300/0626], Avg Loss: 0.7917
-INFO:local_logger:Epoch[027/800], Step[0300/0626], Avg Loss: 0.7905
-INFO:local_logger:Epoch[027/800], Step[0300/0626], Avg Loss: 0.7909
-INFO:master_logger:Epoch[027/800], Step[0300/0626], Avg Loss: 0.7911
-INFO:local_logger:Epoch[027/800], Step[0300/0626], Avg Loss: 0.7907
-INFO:local_logger:Epoch[027/800], Step[0300/0626], Avg Loss: 0.7915
-INFO:local_logger:Epoch[027/800], Step[0300/0626], Avg Loss: 0.7912
-INFO:local_logger:Epoch[027/800], Step[0300/0626], Avg Loss: 0.7909
-INFO:local_logger:Epoch[027/800], Step[0400/0626], Avg Loss: 0.7909
-INFO:local_logger:Epoch[027/800], Step[0400/0626], Avg Loss: 0.7900
-INFO:local_logger:Epoch[027/800], Step[0400/0626], Avg Loss: 0.7909
-INFO:local_logger:Epoch[027/800], Step[0400/0626], Avg Loss: 0.7902
-INFO:local_logger:Epoch[027/800], Step[0400/0626], Avg Loss: 0.7904
-INFO:master_logger:Epoch[027/800], Step[0400/0626], Avg Loss: 0.7906
-INFO:local_logger:Epoch[027/800], Step[0400/0626], Avg Loss: 0.7907
-INFO:local_logger:Epoch[027/800], Step[0400/0626], Avg Loss: 0.7903
-INFO:local_logger:Epoch[027/800], Step[0400/0626], Avg Loss: 0.7911
-INFO:local_logger:Epoch[027/800], Step[0500/0626], Avg Loss: 0.7902
-INFO:local_logger:Epoch[027/800], Step[0500/0626], Avg Loss: 0.7899
-INFO:local_logger:Epoch[027/800], Step[0500/0626], Avg Loss: 0.7898
-INFO:local_logger:Epoch[027/800], Step[0500/0626], Avg Loss: 0.7903
-INFO:local_logger:Epoch[027/800], Step[0500/0626], Avg Loss: 0.7903
-INFO:master_logger:Epoch[027/800], Step[0500/0626], Avg Loss: 0.7900
-INFO:local_logger:Epoch[027/800], Step[0500/0626], Avg Loss: 0.7904
-INFO:local_logger:Epoch[027/800], Step[0500/0626], Avg Loss: 0.7895
-INFO:local_logger:Epoch[027/800], Step[0500/0626], Avg Loss: 0.7898
-INFO:local_logger:Epoch[027/800], Step[0600/0626], Avg Loss: 0.7896
-INFO:local_logger:Epoch[027/800], Step[0600/0626], Avg Loss: 0.7893
-INFO:local_logger:Epoch[027/800], Step[0600/0626], Avg Loss: 0.7898
-INFO:local_logger:Epoch[027/800], Step[0600/0626], Avg Loss: 0.7894
-INFO:local_logger:Epoch[027/800], Step[0600/0626], Avg Loss: 0.7893
-INFO:local_logger:Epoch[027/800], Step[0600/0626], Avg Loss: 0.7899
-INFO:local_logger:Epoch[027/800], Step[0600/0626], Avg Loss: 0.7891
-INFO:local_logger:Epoch[027/800], Step[0600/0626], Avg Loss: 0.7894
-INFO:master_logger:Epoch[027/800], Step[0600/0626], Avg Loss: 0.7895
-INFO:local_logger:----- Epoch[027/800], Train Loss: 0.7892, time: 878.78
-INFO:local_logger:Now training epoch 28. LR=0.000105
-INFO:local_logger:----- Epoch[027/800], Train Loss: 0.7897, time: 878.81
-INFO:local_logger:Now training epoch 28. LR=0.000105
-INFO:local_logger:----- Epoch[027/800], Train Loss: 0.7894, time: 875.73
-INFO:master_logger:----- Epoch[027/800], Train Loss: 0.7893, time: 875.73
-INFO:local_logger:----- Epoch[027/800], Train Loss: 0.7890, time: 878.88
-INFO:local_logger:Now training epoch 28. LR=0.000105
-INFO:local_logger:----- Epoch[027/800], Train Loss: 0.7892, time: 878.97
-INFO:local_logger:Now training epoch 28. LR=0.000105
-INFO:local_logger:----- Epoch[027/800], Train Loss: 0.7891, time: 879.29
-INFO:local_logger:Now training epoch 28. LR=0.000105
-INFO:local_logger:----- Epoch[027/800], Train Loss: 0.7895, time: 879.32
-INFO:local_logger:Now training epoch 28. LR=0.000105
-INFO:local_logger:----- Epoch[027/800], Train Loss: 0.7895, time: 879.32
-INFO:local_logger:Now training epoch 28. LR=0.000105
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-27-Loss-0.7894227700390196.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-27-Loss-0.7894227700390196.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-27-Loss-0.7894227700390196.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-27-Loss-0.7894227700390196.pdopt
-INFO:local_logger:Now training epoch 28. LR=0.000105
-INFO:master_logger:Now training epoch 28. LR=0.000105
-INFO:local_logger:Epoch[028/800], Step[0000/0626], Avg Loss: 0.7943
-INFO:master_logger:Epoch[028/800], Step[0000/0626], Avg Loss: 0.7880
-INFO:local_logger:Epoch[028/800], Step[0000/0626], Avg Loss: 0.7931
-INFO:local_logger:Epoch[028/800], Step[0000/0626], Avg Loss: 0.7814
-INFO:local_logger:Epoch[028/800], Step[0000/0626], Avg Loss: 0.7872
-INFO:local_logger:Epoch[028/800], Step[0000/0626], Avg Loss: 0.7910
-INFO:local_logger:Epoch[028/800], Step[0000/0626], Avg Loss: 0.7793
-INFO:local_logger:Epoch[028/800], Step[0000/0626], Avg Loss: 0.7897
-INFO:local_logger:Epoch[028/800], Step[0000/0626], Avg Loss: 0.7883
-INFO:local_logger:Epoch[028/800], Step[0100/0626], Avg Loss: 0.7855
-INFO:local_logger:Epoch[028/800], Step[0100/0626], Avg Loss: 0.7868
-INFO:local_logger:Epoch[028/800], Step[0100/0626], Avg Loss: 0.7873
-INFO:local_logger:Epoch[028/800], Step[0100/0626], Avg Loss: 0.7853
-INFO:local_logger:Epoch[028/800], Step[0100/0626], Avg Loss: 0.7862
-INFO:local_logger:Epoch[028/800], Step[0100/0626], Avg Loss: 0.7875
-INFO:master_logger:Epoch[028/800], Step[0100/0626], Avg Loss: 0.7866
-INFO:local_logger:Epoch[028/800], Step[0100/0626], Avg Loss: 0.7874
-INFO:local_logger:Epoch[028/800], Step[0100/0626], Avg Loss: 0.7871
-INFO:local_logger:Epoch[028/800], Step[0200/0626], Avg Loss: 0.7865
-INFO:local_logger:Epoch[028/800], Step[0200/0626], Avg Loss: 0.7866
-INFO:local_logger:Epoch[028/800], Step[0200/0626], Avg Loss: 0.7859
-INFO:local_logger:Epoch[028/800], Step[0200/0626], Avg Loss: 0.7855
-INFO:local_logger:Epoch[028/800], Step[0200/0626], Avg Loss: 0.7855
-INFO:local_logger:Epoch[028/800], Step[0200/0626], Avg Loss: 0.7858
-INFO:local_logger:Epoch[028/800], Step[0200/0626], Avg Loss: 0.7860
-INFO:master_logger:Epoch[028/800], Step[0200/0626], Avg Loss: 0.7861
-INFO:local_logger:Epoch[028/800], Step[0200/0626], Avg Loss: 0.7865
-INFO:local_logger:Epoch[028/800], Step[0300/0626], Avg Loss: 0.7850
-INFO:local_logger:Epoch[028/800], Step[0300/0626], Avg Loss: 0.7850
-INFO:local_logger:Epoch[028/800], Step[0300/0626], Avg Loss: 0.7851
-INFO:local_logger:Epoch[028/800], Step[0300/0626], Avg Loss: 0.7845
-INFO:master_logger:Epoch[028/800], Step[0300/0626], Avg Loss: 0.7851
-INFO:local_logger:Epoch[028/800], Step[0300/0626], Avg Loss: 0.7851
-INFO:local_logger:Epoch[028/800], Step[0300/0626], Avg Loss: 0.7855
-INFO:local_logger:Epoch[028/800], Step[0300/0626], Avg Loss: 0.7846
-INFO:local_logger:Epoch[028/800], Step[0300/0626], Avg Loss: 0.7860
-INFO:local_logger:Epoch[028/800], Step[0400/0626], Avg Loss: 0.7841
-INFO:local_logger:Epoch[028/800], Step[0400/0626], Avg Loss: 0.7844
-INFO:local_logger:Epoch[028/800], Step[0400/0626], Avg Loss: 0.7839
-INFO:local_logger:Epoch[028/800], Step[0400/0626], Avg Loss: 0.7850
-INFO:master_logger:Epoch[028/800], Step[0400/0626], Avg Loss: 0.7843
-INFO:local_logger:Epoch[028/800], Step[0400/0626], Avg Loss: 0.7837
-INFO:local_logger:Epoch[028/800], Step[0400/0626], Avg Loss: 0.7843
-INFO:local_logger:Epoch[028/800], Step[0400/0626], Avg Loss: 0.7843
-INFO:local_logger:Epoch[028/800], Step[0400/0626], Avg Loss: 0.7845
-INFO:local_logger:Epoch[028/800], Step[0500/0626], Avg Loss: 0.7842
-INFO:local_logger:Epoch[028/800], Step[0500/0626], Avg Loss: 0.7838
-INFO:local_logger:Epoch[028/800], Step[0500/0626], Avg Loss: 0.7835
-INFO:local_logger:Epoch[028/800], Step[0500/0626], Avg Loss: 0.7841
-INFO:local_logger:Epoch[028/800], Step[0500/0626], Avg Loss: 0.7840
-INFO:local_logger:Epoch[028/800], Step[0500/0626], Avg Loss: 0.7839
-INFO:master_logger:Epoch[028/800], Step[0500/0626], Avg Loss: 0.7840
-INFO:local_logger:Epoch[028/800], Step[0500/0626], Avg Loss: 0.7846
-INFO:local_logger:Epoch[028/800], Step[0500/0626], Avg Loss: 0.7837
-INFO:local_logger:Epoch[028/800], Step[0600/0626], Avg Loss: 0.7838
-INFO:local_logger:Epoch[028/800], Step[0600/0626], Avg Loss: 0.7831
-INFO:local_logger:Epoch[028/800], Step[0600/0626], Avg Loss: 0.7837
-INFO:local_logger:Epoch[028/800], Step[0600/0626], Avg Loss: 0.7836
-INFO:local_logger:Epoch[028/800], Step[0600/0626], Avg Loss: 0.7836
-INFO:local_logger:Epoch[028/800], Step[0600/0626], Avg Loss: 0.7840
-INFO:local_logger:Epoch[028/800], Step[0600/0626], Avg Loss: 0.7839
-INFO:local_logger:Epoch[028/800], Step[0600/0626], Avg Loss: 0.7833
-INFO:master_logger:Epoch[028/800], Step[0600/0626], Avg Loss: 0.7836
-INFO:local_logger:----- Epoch[028/800], Train Loss: 0.7837, time: 871.38
-INFO:local_logger:Now training epoch 29. LR=0.000109
-INFO:local_logger:----- Epoch[028/800], Train Loss: 0.7835, time: 868.22
-INFO:master_logger:----- Epoch[028/800], Train Loss: 0.7835, time: 868.22
-INFO:local_logger:----- Epoch[028/800], Train Loss: 0.7831, time: 872.18
-INFO:local_logger:Now training epoch 29. LR=0.000109
-INFO:local_logger:----- Epoch[028/800], Train Loss: 0.7830, time: 873.00
-INFO:local_logger:----- Epoch[028/800], Train Loss: 0.7835, time: 871.85
-INFO:local_logger:Now training epoch 29. LR=0.000109
-INFO:local_logger:Now training epoch 29. LR=0.000109
-INFO:local_logger:----- Epoch[028/800], Train Loss: 0.7834, time: 871.89
-INFO:local_logger:Now training epoch 29. LR=0.000109
-INFO:local_logger:----- Epoch[028/800], Train Loss: 0.7837, time: 873.03
-INFO:local_logger:----- Epoch[028/800], Train Loss: 0.7838, time: 872.29
-INFO:local_logger:Now training epoch 29. LR=0.000109
-INFO:local_logger:Now training epoch 29. LR=0.000109
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-28-Loss-0.783455797444855.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-28-Loss-0.783455797444855.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-28-Loss-0.783455797444855.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-28-Loss-0.783455797444855.pdopt
-INFO:local_logger:Now training epoch 29. LR=0.000109
-INFO:master_logger:Now training epoch 29. LR=0.000109
-INFO:local_logger:Epoch[029/800], Step[0000/0626], Avg Loss: 0.7766
-INFO:local_logger:Epoch[029/800], Step[0000/0626], Avg Loss: 0.7720
-INFO:master_logger:Epoch[029/800], Step[0000/0626], Avg Loss: 0.7785
-INFO:local_logger:Epoch[029/800], Step[0000/0626], Avg Loss: 0.7816
-INFO:local_logger:Epoch[029/800], Step[0000/0626], Avg Loss: 0.7862
-INFO:local_logger:Epoch[029/800], Step[0000/0626], Avg Loss: 0.7662
-INFO:local_logger:Epoch[029/800], Step[0000/0626], Avg Loss: 0.7767
-INFO:local_logger:Epoch[029/800], Step[0000/0626], Avg Loss: 0.7916
-INFO:local_logger:Epoch[029/800], Step[0000/0626], Avg Loss: 0.7771
-INFO:local_logger:Epoch[029/800], Step[0100/0626], Avg Loss: 0.7823
-INFO:local_logger:Epoch[029/800], Step[0100/0626], Avg Loss: 0.7822
-INFO:local_logger:Epoch[029/800], Step[0100/0626], Avg Loss: 0.7822
-INFO:local_logger:Epoch[029/800], Step[0100/0626], Avg Loss: 0.7816
-INFO:local_logger:Epoch[029/800], Step[0100/0626], Avg Loss: 0.7818
-INFO:local_logger:Epoch[029/800], Step[0100/0626], Avg Loss: 0.7815
-INFO:local_logger:Epoch[029/800], Step[0100/0626], Avg Loss: 0.7809
-INFO:master_logger:Epoch[029/800], Step[0100/0626], Avg Loss: 0.7818
-INFO:local_logger:Epoch[029/800], Step[0100/0626], Avg Loss: 0.7822
-INFO:local_logger:Epoch[029/800], Step[0200/0626], Avg Loss: 0.7802
-INFO:local_logger:Epoch[029/800], Step[0200/0626], Avg Loss: 0.7802
-INFO:local_logger:Epoch[029/800], Step[0200/0626], Avg Loss: 0.7795
-INFO:local_logger:Epoch[029/800], Step[0200/0626], Avg Loss: 0.7806
-INFO:local_logger:Epoch[029/800], Step[0200/0626], Avg Loss: 0.7806
-INFO:local_logger:Epoch[029/800], Step[0200/0626], Avg Loss: 0.7807
-INFO:local_logger:Epoch[029/800], Step[0200/0626], Avg Loss: 0.7800
-INFO:local_logger:Epoch[029/800], Step[0200/0626], Avg Loss: 0.7800
-INFO:master_logger:Epoch[029/800], Step[0200/0626], Avg Loss: 0.7802
-INFO:local_logger:Epoch[029/800], Step[0300/0626], Avg Loss: 0.7794
-INFO:local_logger:Epoch[029/800], Step[0300/0626], Avg Loss: 0.7794
-INFO:local_logger:Epoch[029/800], Step[0300/0626], Avg Loss: 0.7796
-INFO:local_logger:Epoch[029/800], Step[0300/0626], Avg Loss: 0.7798
-INFO:master_logger:Epoch[029/800], Step[0300/0626], Avg Loss: 0.7795
-INFO:local_logger:Epoch[029/800], Step[0300/0626], Avg Loss: 0.7797
-INFO:local_logger:Epoch[029/800], Step[0300/0626], Avg Loss: 0.7796
-INFO:local_logger:Epoch[029/800], Step[0300/0626], Avg Loss: 0.7797
-INFO:local_logger:Epoch[029/800], Step[0300/0626], Avg Loss: 0.7789
-INFO:local_logger:Epoch[029/800], Step[0400/0626], Avg Loss: 0.7788
-INFO:local_logger:Epoch[029/800], Step[0400/0626], Avg Loss: 0.7781
-INFO:local_logger:Epoch[029/800], Step[0400/0626], Avg Loss: 0.7786
-INFO:local_logger:Epoch[029/800], Step[0400/0626], Avg Loss: 0.7787
-INFO:local_logger:Epoch[029/800], Step[0400/0626], Avg Loss: 0.7785
-INFO:local_logger:Epoch[029/800], Step[0400/0626], Avg Loss: 0.7787
-INFO:local_logger:Epoch[029/800], Step[0400/0626], Avg Loss: 0.7786
-INFO:master_logger:Epoch[029/800], Step[0400/0626], Avg Loss: 0.7786
-INFO:local_logger:Epoch[029/800], Step[0400/0626], Avg Loss: 0.7788
-INFO:local_logger:Epoch[029/800], Step[0500/0626], Avg Loss: 0.7782
-INFO:local_logger:Epoch[029/800], Step[0500/0626], Avg Loss: 0.7776
-INFO:local_logger:Epoch[029/800], Step[0500/0626], Avg Loss: 0.7779
-INFO:local_logger:Epoch[029/800], Step[0500/0626], Avg Loss: 0.7781
-INFO:local_logger:Epoch[029/800], Step[0500/0626], Avg Loss: 0.7782
-INFO:local_logger:Epoch[029/800], Step[0500/0626], Avg Loss: 0.7780
-INFO:master_logger:Epoch[029/800], Step[0500/0626], Avg Loss: 0.7781
-INFO:local_logger:Epoch[029/800], Step[0500/0626], Avg Loss: 0.7781
-INFO:local_logger:Epoch[029/800], Step[0500/0626], Avg Loss: 0.7782
-INFO:local_logger:Epoch[029/800], Step[0600/0626], Avg Loss: 0.7776
-INFO:local_logger:Epoch[029/800], Step[0600/0626], Avg Loss: 0.7777
-INFO:local_logger:Epoch[029/800], Step[0600/0626], Avg Loss: 0.7777
-INFO:local_logger:Epoch[029/800], Step[0600/0626], Avg Loss: 0.7778
-INFO:local_logger:Epoch[029/800], Step[0600/0626], Avg Loss: 0.7776
-INFO:local_logger:Epoch[029/800], Step[0600/0626], Avg Loss: 0.7776
-INFO:master_logger:Epoch[029/800], Step[0600/0626], Avg Loss: 0.7776
-INFO:local_logger:Epoch[029/800], Step[0600/0626], Avg Loss: 0.7772
-INFO:local_logger:Epoch[029/800], Step[0600/0626], Avg Loss: 0.7776
-INFO:local_logger:----- Epoch[029/800], Train Loss: 0.7778, time: 869.47
-INFO:local_logger:Now training epoch 30. LR=0.000113
-INFO:local_logger:----- Epoch[029/800], Train Loss: 0.7776, time: 865.73
-INFO:local_logger:----- Epoch[029/800], Train Loss: 0.7773, time: 868.98
-INFO:master_logger:----- Epoch[029/800], Train Loss: 0.7776, time: 865.73
-INFO:local_logger:Now training epoch 30. LR=0.000113
-INFO:local_logger:----- Epoch[029/800], Train Loss: 0.7777, time: 869.01
-INFO:local_logger:Now training epoch 30. LR=0.000113
-INFO:local_logger:----- Epoch[029/800], Train Loss: 0.7777, time: 869.04
-INFO:local_logger:Now training epoch 30. LR=0.000113
-INFO:local_logger:----- Epoch[029/800], Train Loss: 0.7774, time: 869.04
-INFO:local_logger:Now training epoch 30. LR=0.000113
-INFO:local_logger:----- Epoch[029/800], Train Loss: 0.7776, time: 869.04
-INFO:local_logger:Now training epoch 30. LR=0.000113
-INFO:local_logger:----- Epoch[029/800], Train Loss: 0.7776, time: 869.04
-INFO:local_logger:Now training epoch 30. LR=0.000113
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-29-Loss-0.77762673581754.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-29-Loss-0.77762673581754.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-29-Loss-0.77762673581754.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-29-Loss-0.77762673581754.pdopt
-INFO:local_logger:Now training epoch 30. LR=0.000113
-INFO:master_logger:Now training epoch 30. LR=0.000113
-INFO:local_logger:Epoch[030/800], Step[0000/0626], Avg Loss: 0.7671
-INFO:master_logger:Epoch[030/800], Step[0000/0626], Avg Loss: 0.7736
-INFO:local_logger:Epoch[030/800], Step[0000/0626], Avg Loss: 0.7764
-INFO:local_logger:Epoch[030/800], Step[0000/0626], Avg Loss: 0.7666
-INFO:local_logger:Epoch[030/800], Step[0000/0626], Avg Loss: 0.7815
-INFO:local_logger:Epoch[030/800], Step[0000/0626], Avg Loss: 0.7786
-INFO:local_logger:Epoch[030/800], Step[0000/0626], Avg Loss: 0.7819
-INFO:local_logger:Epoch[030/800], Step[0000/0626], Avg Loss: 0.7683
-INFO:local_logger:Epoch[030/800], Step[0000/0626], Avg Loss: 0.7686
-INFO:local_logger:Epoch[030/800], Step[0100/0626], Avg Loss: 0.7759
-INFO:local_logger:Epoch[030/800], Step[0100/0626], Avg Loss: 0.7765
-INFO:local_logger:Epoch[030/800], Step[0100/0626], Avg Loss: 0.7745
-INFO:local_logger:Epoch[030/800], Step[0100/0626], Avg Loss: 0.7773
-INFO:local_logger:Epoch[030/800], Step[0100/0626], Avg Loss: 0.7759
-INFO:local_logger:Epoch[030/800], Step[0100/0626], Avg Loss: 0.7772
-INFO:local_logger:Epoch[030/800], Step[0100/0626], Avg Loss: 0.7763
-INFO:master_logger:Epoch[030/800], Step[0100/0626], Avg Loss: 0.7762
-INFO:local_logger:Epoch[030/800], Step[0100/0626], Avg Loss: 0.7756
-INFO:local_logger:Epoch[030/800], Step[0200/0626], Avg Loss: 0.7744
-INFO:local_logger:Epoch[030/800], Step[0200/0626], Avg Loss: 0.7748
-INFO:local_logger:Epoch[030/800], Step[0200/0626], Avg Loss: 0.7754
-INFO:local_logger:Epoch[030/800], Step[0200/0626], Avg Loss: 0.7743
-INFO:local_logger:Epoch[030/800], Step[0200/0626], Avg Loss: 0.7742
-INFO:local_logger:Epoch[030/800], Step[0200/0626], Avg Loss: 0.7749
-INFO:master_logger:Epoch[030/800], Step[0200/0626], Avg Loss: 0.7747
-INFO:local_logger:Epoch[030/800], Step[0200/0626], Avg Loss: 0.7747
-INFO:local_logger:Epoch[030/800], Step[0200/0626], Avg Loss: 0.7747
-INFO:local_logger:Epoch[030/800], Step[0300/0626], Avg Loss: 0.7740
-INFO:local_logger:Epoch[030/800], Step[0300/0626], Avg Loss: 0.7748
-INFO:local_logger:Epoch[030/800], Step[0300/0626], Avg Loss: 0.7739
-INFO:local_logger:Epoch[030/800], Step[0300/0626], Avg Loss: 0.7744
-INFO:local_logger:Epoch[030/800], Step[0300/0626], Avg Loss: 0.7741
-INFO:local_logger:Epoch[030/800], Step[0300/0626], Avg Loss: 0.7739
-INFO:master_logger:Epoch[030/800], Step[0300/0626], Avg Loss: 0.7741
-INFO:local_logger:Epoch[030/800], Step[0300/0626], Avg Loss: 0.7744
-INFO:local_logger:Epoch[030/800], Step[0300/0626], Avg Loss: 0.7736
-INFO:local_logger:Epoch[030/800], Step[0400/0626], Avg Loss: 0.7737
-INFO:local_logger:Epoch[030/800], Step[0400/0626], Avg Loss: 0.7736
-INFO:local_logger:Epoch[030/800], Step[0400/0626], Avg Loss: 0.7732
-INFO:local_logger:Epoch[030/800], Step[0400/0626], Avg Loss: 0.7735
-INFO:local_logger:Epoch[030/800], Step[0400/0626], Avg Loss: 0.7740
-INFO:local_logger:Epoch[030/800], Step[0400/0626], Avg Loss: 0.7732
-INFO:local_logger:Epoch[030/800], Step[0400/0626], Avg Loss: 0.7734
-INFO:local_logger:Epoch[030/800], Step[0400/0626], Avg Loss: 0.7732
-INFO:master_logger:Epoch[030/800], Step[0400/0626], Avg Loss: 0.7735
-INFO:local_logger:Epoch[030/800], Step[0500/0626], Avg Loss: 0.7730
-INFO:local_logger:Epoch[030/800], Step[0500/0626], Avg Loss: 0.7729
-INFO:local_logger:Epoch[030/800], Step[0500/0626], Avg Loss: 0.7729
-INFO:local_logger:Epoch[030/800], Step[0500/0626], Avg Loss: 0.7732
-INFO:local_logger:Epoch[030/800], Step[0500/0626], Avg Loss: 0.7731
-INFO:local_logger:Epoch[030/800], Step[0500/0626], Avg Loss: 0.7727
-INFO:local_logger:Epoch[030/800], Step[0500/0626], Avg Loss: 0.7729
-INFO:local_logger:Epoch[030/800], Step[0500/0626], Avg Loss: 0.7725
-INFO:master_logger:Epoch[030/800], Step[0500/0626], Avg Loss: 0.7729
-INFO:local_logger:Epoch[030/800], Step[0600/0626], Avg Loss: 0.7723
-INFO:local_logger:Epoch[030/800], Step[0600/0626], Avg Loss: 0.7723
-INFO:local_logger:Epoch[030/800], Step[0600/0626], Avg Loss: 0.7722
-INFO:local_logger:Epoch[030/800], Step[0600/0626], Avg Loss: 0.7720
-INFO:local_logger:Epoch[030/800], Step[0600/0626], Avg Loss: 0.7726
-INFO:local_logger:Epoch[030/800], Step[0600/0626], Avg Loss: 0.7724
-INFO:master_logger:Epoch[030/800], Step[0600/0626], Avg Loss: 0.7722
-INFO:local_logger:Epoch[030/800], Step[0600/0626], Avg Loss: 0.7723
-INFO:local_logger:Epoch[030/800], Step[0600/0626], Avg Loss: 0.7719
-INFO:local_logger:----- Epoch[030/800], Train Loss: 0.7722, time: 884.56
-INFO:local_logger:Now training epoch 31. LR=0.000116
-INFO:local_logger:----- Epoch[030/800], Train Loss: 0.7718, time: 884.68
-INFO:local_logger:Now training epoch 31. LR=0.000116
-INFO:local_logger:----- Epoch[030/800], Train Loss: 0.7719, time: 881.45
-INFO:master_logger:----- Epoch[030/800], Train Loss: 0.7721, time: 881.45
-INFO:local_logger:----- Epoch[030/800], Train Loss: 0.7721, time: 885.20
-INFO:local_logger:Now training epoch 31. LR=0.000116
-INFO:local_logger:----- Epoch[030/800], Train Loss: 0.7721, time: 885.20
-INFO:local_logger:----- Epoch[030/800], Train Loss: 0.7722, time: 885.14
-INFO:local_logger:Now training epoch 31. LR=0.000116
-INFO:local_logger:Now training epoch 31. LR=0.000116
-INFO:local_logger:----- Epoch[030/800], Train Loss: 0.7724, time: 885.17
-INFO:local_logger:Now training epoch 31. LR=0.000116
-INFO:local_logger:----- Epoch[030/800], Train Loss: 0.7723, time: 885.20
-INFO:local_logger:Now training epoch 31. LR=0.000116
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-30-Loss-0.7718740027551073.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-30-Loss-0.7718740027551073.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-30-Loss-0.7718740027551073.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-30-Loss-0.7718740027551073.pdopt
-INFO:local_logger:Now training epoch 31. LR=0.000116
-INFO:master_logger:Now training epoch 31. LR=0.000116
-INFO:local_logger:Epoch[031/800], Step[0000/0626], Avg Loss: 0.7753
-INFO:local_logger:Epoch[031/800], Step[0000/0626], Avg Loss: 0.7631
-INFO:local_logger:Epoch[031/800], Step[0000/0626], Avg Loss: 0.7707
-INFO:local_logger:Epoch[031/800], Step[0000/0626], Avg Loss: 0.7713
-INFO:local_logger:Epoch[031/800], Step[0000/0626], Avg Loss: 0.7666
-INFO:local_logger:Epoch[031/800], Step[0000/0626], Avg Loss: 0.7545
-INFO:master_logger:Epoch[031/800], Step[0000/0626], Avg Loss: 0.7651
-INFO:local_logger:Epoch[031/800], Step[0000/0626], Avg Loss: 0.7590
-INFO:local_logger:Epoch[031/800], Step[0000/0626], Avg Loss: 0.7607
-INFO:local_logger:Epoch[031/800], Step[0100/0626], Avg Loss: 0.7683
-INFO:master_logger:Epoch[031/800], Step[0100/0626], Avg Loss: 0.7684
-INFO:local_logger:Epoch[031/800], Step[0100/0626], Avg Loss: 0.7667
-INFO:local_logger:Epoch[031/800], Step[0100/0626], Avg Loss: 0.7685
-INFO:local_logger:Epoch[031/800], Step[0100/0626], Avg Loss: 0.7688
-INFO:local_logger:Epoch[031/800], Step[0100/0626], Avg Loss: 0.7677
-INFO:local_logger:Epoch[031/800], Step[0100/0626], Avg Loss: 0.7694
-INFO:local_logger:Epoch[031/800], Step[0100/0626], Avg Loss: 0.7688
-INFO:local_logger:Epoch[031/800], Step[0100/0626], Avg Loss: 0.7686
-INFO:local_logger:Epoch[031/800], Step[0200/0626], Avg Loss: 0.7679
-INFO:local_logger:Epoch[031/800], Step[0200/0626], Avg Loss: 0.7681
-INFO:local_logger:Epoch[031/800], Step[0200/0626], Avg Loss: 0.7678
-INFO:local_logger:Epoch[031/800], Step[0200/0626], Avg Loss: 0.7680
-INFO:master_logger:Epoch[031/800], Step[0200/0626], Avg Loss: 0.7680
-INFO:local_logger:Epoch[031/800], Step[0200/0626], Avg Loss: 0.7683
-INFO:local_logger:Epoch[031/800], Step[0200/0626], Avg Loss: 0.7685
-INFO:local_logger:Epoch[031/800], Step[0200/0626], Avg Loss: 0.7681
-INFO:local_logger:Epoch[031/800], Step[0200/0626], Avg Loss: 0.7672
-INFO:local_logger:Epoch[031/800], Step[0300/0626], Avg Loss: 0.7678
-INFO:local_logger:Epoch[031/800], Step[0300/0626], Avg Loss: 0.7684
-INFO:local_logger:Epoch[031/800], Step[0300/0626], Avg Loss: 0.7676
-INFO:local_logger:Epoch[031/800], Step[0300/0626], Avg Loss: 0.7680
-INFO:local_logger:Epoch[031/800], Step[0300/0626], Avg Loss: 0.7676
-INFO:master_logger:Epoch[031/800], Step[0300/0626], Avg Loss: 0.7677
-INFO:local_logger:Epoch[031/800], Step[0300/0626], Avg Loss: 0.7668
-INFO:local_logger:Epoch[031/800], Step[0300/0626], Avg Loss: 0.7674
-INFO:local_logger:Epoch[031/800], Step[0300/0626], Avg Loss: 0.7679
-INFO:local_logger:Epoch[031/800], Step[0400/0626], Avg Loss: 0.7667
-INFO:local_logger:Epoch[031/800], Step[0400/0626], Avg Loss: 0.7676
-INFO:local_logger:Epoch[031/800], Step[0400/0626], Avg Loss: 0.7674
-INFO:local_logger:Epoch[031/800], Step[0400/0626], Avg Loss: 0.7676
-INFO:local_logger:Epoch[031/800], Step[0400/0626], Avg Loss: 0.7675
-INFO:local_logger:Epoch[031/800], Step[0400/0626], Avg Loss: 0.7678
-INFO:local_logger:Epoch[031/800], Step[0400/0626], Avg Loss: 0.7683
-INFO:local_logger:Epoch[031/800], Step[0400/0626], Avg Loss: 0.7678
-INFO:master_logger:Epoch[031/800], Step[0400/0626], Avg Loss: 0.7676
-INFO:local_logger:Epoch[031/800], Step[0500/0626], Avg Loss: 0.7669
-INFO:local_logger:Epoch[031/800], Step[0500/0626], Avg Loss: 0.7673
-INFO:local_logger:Epoch[031/800], Step[0500/0626], Avg Loss: 0.7674
-INFO:local_logger:Epoch[031/800], Step[0500/0626], Avg Loss: 0.7670
-INFO:local_logger:Epoch[031/800], Step[0500/0626], Avg Loss: 0.7672
-INFO:master_logger:Epoch[031/800], Step[0500/0626], Avg Loss: 0.7671
-INFO:local_logger:Epoch[031/800], Step[0500/0626], Avg Loss: 0.7663
-INFO:local_logger:Epoch[031/800], Step[0500/0626], Avg Loss: 0.7671
-INFO:local_logger:Epoch[031/800], Step[0500/0626], Avg Loss: 0.7676
-INFO:local_logger:Epoch[031/800], Step[0600/0626], Avg Loss: 0.7669
-INFO:local_logger:Epoch[031/800], Step[0600/0626], Avg Loss: 0.7668
-INFO:local_logger:Epoch[031/800], Step[0600/0626], Avg Loss: 0.7664
-INFO:local_logger:Epoch[031/800], Step[0600/0626], Avg Loss: 0.7667
-INFO:local_logger:Epoch[031/800], Step[0600/0626], Avg Loss: 0.7661
-INFO:local_logger:Epoch[031/800], Step[0600/0626], Avg Loss: 0.7667
-INFO:master_logger:Epoch[031/800], Step[0600/0626], Avg Loss: 0.7667
-INFO:local_logger:Epoch[031/800], Step[0600/0626], Avg Loss: 0.7670
-INFO:local_logger:Epoch[031/800], Step[0600/0626], Avg Loss: 0.7669
-INFO:local_logger:----- Epoch[031/800], Train Loss: 0.7663, time: 868.34
-INFO:local_logger:Now training epoch 32. LR=0.000120
-INFO:local_logger:----- Epoch[031/800], Train Loss: 0.7660, time: 868.35
-INFO:local_logger:Now training epoch 32. LR=0.000120
-INFO:local_logger:----- Epoch[031/800], Train Loss: 0.7666, time: 868.93
-INFO:local_logger:Now training epoch 32. LR=0.000120
-INFO:local_logger:----- Epoch[031/800], Train Loss: 0.7667, time: 868.45
-INFO:local_logger:Now training epoch 32. LR=0.000120
-INFO:local_logger:----- Epoch[031/800], Train Loss: 0.7668, time: 868.50
-INFO:local_logger:Now training epoch 32. LR=0.000120
-INFO:local_logger:----- Epoch[031/800], Train Loss: 0.7666, time: 864.87
-INFO:master_logger:----- Epoch[031/800], Train Loss: 0.7665, time: 864.87
-INFO:local_logger:----- Epoch[031/800], Train Loss: 0.7668, time: 869.11
-INFO:local_logger:Now training epoch 32. LR=0.000120
-INFO:local_logger:----- Epoch[031/800], Train Loss: 0.7665, time: 868.66
-INFO:local_logger:Now training epoch 32. LR=0.000120
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-31-Loss-0.7666413477179265.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-31-Loss-0.7666413477179265.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-31-Loss-0.7666413477179265.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-31-Loss-0.7666413477179265.pdopt
-INFO:local_logger:Now training epoch 32. LR=0.000120
-INFO:master_logger:Now training epoch 32. LR=0.000120
-INFO:local_logger:Epoch[032/800], Step[0000/0626], Avg Loss: 0.7643
-INFO:local_logger:Epoch[032/800], Step[0000/0626], Avg Loss: 0.7629
-INFO:local_logger:Epoch[032/800], Step[0000/0626], Avg Loss: 0.7742
-INFO:master_logger:Epoch[032/800], Step[0000/0626], Avg Loss: 0.7666
-INFO:local_logger:Epoch[032/800], Step[0000/0626], Avg Loss: 0.7638
-INFO:local_logger:Epoch[032/800], Step[0000/0626], Avg Loss: 0.7575
-INFO:local_logger:Epoch[032/800], Step[0000/0626], Avg Loss: 0.7728
-INFO:local_logger:Epoch[032/800], Step[0000/0626], Avg Loss: 0.7710
-INFO:local_logger:Epoch[032/800], Step[0000/0626], Avg Loss: 0.7662
-INFO:local_logger:Epoch[032/800], Step[0100/0626], Avg Loss: 0.7641
-INFO:local_logger:Epoch[032/800], Step[0100/0626], Avg Loss: 0.7647
-INFO:local_logger:Epoch[032/800], Step[0100/0626], Avg Loss: 0.7646
-INFO:master_logger:Epoch[032/800], Step[0100/0626], Avg Loss: 0.7645
-INFO:local_logger:Epoch[032/800], Step[0100/0626], Avg Loss: 0.7645
-INFO:local_logger:Epoch[032/800], Step[0100/0626], Avg Loss: 0.7638
-INFO:local_logger:Epoch[032/800], Step[0100/0626], Avg Loss: 0.7636
-INFO:local_logger:Epoch[032/800], Step[0100/0626], Avg Loss: 0.7646
-INFO:local_logger:Epoch[032/800], Step[0100/0626], Avg Loss: 0.7657
-INFO:local_logger:Epoch[032/800], Step[0200/0626], Avg Loss: 0.7632
-INFO:local_logger:Epoch[032/800], Step[0200/0626], Avg Loss: 0.7627
-INFO:local_logger:Epoch[032/800], Step[0200/0626], Avg Loss: 0.7635
-INFO:local_logger:Epoch[032/800], Step[0200/0626], Avg Loss: 0.7633
-INFO:local_logger:Epoch[032/800], Step[0200/0626], Avg Loss: 0.7638
-INFO:master_logger:Epoch[032/800], Step[0200/0626], Avg Loss: 0.7635
-INFO:local_logger:Epoch[032/800], Step[0200/0626], Avg Loss: 0.7637
-INFO:local_logger:Epoch[032/800], Step[0200/0626], Avg Loss: 0.7629
-INFO:local_logger:Epoch[032/800], Step[0200/0626], Avg Loss: 0.7644
-INFO:local_logger:Epoch[032/800], Step[0300/0626], Avg Loss: 0.7629
-INFO:local_logger:Epoch[032/800], Step[0300/0626], Avg Loss: 0.7628
-INFO:local_logger:Epoch[032/800], Step[0300/0626], Avg Loss: 0.7626
-INFO:local_logger:Epoch[032/800], Step[0300/0626], Avg Loss: 0.7631
-INFO:local_logger:Epoch[032/800], Step[0300/0626], Avg Loss: 0.7633
-INFO:local_logger:Epoch[032/800], Step[0300/0626], Avg Loss: 0.7634
-INFO:master_logger:Epoch[032/800], Step[0300/0626], Avg Loss: 0.7629
-INFO:local_logger:Epoch[032/800], Step[0300/0626], Avg Loss: 0.7626
-INFO:local_logger:Epoch[032/800], Step[0300/0626], Avg Loss: 0.7628
-INFO:local_logger:Epoch[032/800], Step[0400/0626], Avg Loss: 0.7623
-INFO:local_logger:Epoch[032/800], Step[0400/0626], Avg Loss: 0.7625
-INFO:local_logger:Epoch[032/800], Step[0400/0626], Avg Loss: 0.7620
-INFO:local_logger:Epoch[032/800], Step[0400/0626], Avg Loss: 0.7630
-INFO:local_logger:Epoch[032/800], Step[0400/0626], Avg Loss: 0.7621
-INFO:local_logger:Epoch[032/800], Step[0400/0626], Avg Loss: 0.7628
-INFO:master_logger:Epoch[032/800], Step[0400/0626], Avg Loss: 0.7624
-INFO:local_logger:Epoch[032/800], Step[0400/0626], Avg Loss: 0.7622
-INFO:local_logger:Epoch[032/800], Step[0400/0626], Avg Loss: 0.7621
-INFO:local_logger:Epoch[032/800], Step[0500/0626], Avg Loss: 0.7617
-INFO:local_logger:Epoch[032/800], Step[0500/0626], Avg Loss: 0.7623
-INFO:local_logger:Epoch[032/800], Step[0500/0626], Avg Loss: 0.7623
-INFO:master_logger:Epoch[032/800], Step[0500/0626], Avg Loss: 0.7619
-INFO:local_logger:Epoch[032/800], Step[0500/0626], Avg Loss: 0.7618
-INFO:local_logger:Epoch[032/800], Step[0500/0626], Avg Loss: 0.7617
-INFO:local_logger:Epoch[032/800], Step[0500/0626], Avg Loss: 0.7619
-INFO:local_logger:Epoch[032/800], Step[0500/0626], Avg Loss: 0.7617
-INFO:local_logger:Epoch[032/800], Step[0500/0626], Avg Loss: 0.7617
-INFO:local_logger:Epoch[032/800], Step[0600/0626], Avg Loss: 0.7613
-INFO:local_logger:Epoch[032/800], Step[0600/0626], Avg Loss: 0.7613
-INFO:local_logger:Epoch[032/800], Step[0600/0626], Avg Loss: 0.7616
-INFO:local_logger:Epoch[032/800], Step[0600/0626], Avg Loss: 0.7612
-INFO:local_logger:Epoch[032/800], Step[0600/0626], Avg Loss: 0.7615
-INFO:local_logger:Epoch[032/800], Step[0600/0626], Avg Loss: 0.7612
-INFO:master_logger:Epoch[032/800], Step[0600/0626], Avg Loss: 0.7614
-INFO:local_logger:Epoch[032/800], Step[0600/0626], Avg Loss: 0.7616
-INFO:local_logger:Epoch[032/800], Step[0600/0626], Avg Loss: 0.7611
-INFO:local_logger:----- Epoch[032/800], Train Loss: 0.7611, time: 877.39
-INFO:local_logger:Now training epoch 33. LR=0.000124
-INFO:local_logger:----- Epoch[032/800], Train Loss: 0.7610, time: 878.25
-INFO:local_logger:Now training epoch 33. LR=0.000124
-INFO:local_logger:----- Epoch[032/800], Train Loss: 0.7609, time: 878.35
-INFO:local_logger:Now training epoch 33. LR=0.000124
-INFO:local_logger:----- Epoch[032/800], Train Loss: 0.7615, time: 874.33
-INFO:master_logger:----- Epoch[032/800], Train Loss: 0.7612, time: 874.33
-INFO:local_logger:----- Epoch[032/800], Train Loss: 0.7612, time: 878.35
-INFO:local_logger:Now training epoch 33. LR=0.000124
-INFO:local_logger:----- Epoch[032/800], Train Loss: 0.7612, time: 878.38
-INFO:local_logger:Now training epoch 33. LR=0.000124
-INFO:local_logger:----- Epoch[032/800], Train Loss: 0.7612, time: 878.23
-INFO:local_logger:Now training epoch 33. LR=0.000124
-INFO:local_logger:----- Epoch[032/800], Train Loss: 0.7615, time: 878.10
-INFO:local_logger:Now training epoch 33. LR=0.000124
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-32-Loss-0.761453199911086.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-32-Loss-0.761453199911086.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-32-Loss-0.761453199911086.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-32-Loss-0.761453199911086.pdopt
-INFO:local_logger:Now training epoch 33. LR=0.000124
-INFO:master_logger:Now training epoch 33. LR=0.000124
-INFO:local_logger:Epoch[033/800], Step[0000/0626], Avg Loss: 0.7557
-INFO:local_logger:Epoch[033/800], Step[0000/0626], Avg Loss: 0.7650
-INFO:local_logger:Epoch[033/800], Step[0000/0626], Avg Loss: 0.7551
-INFO:master_logger:Epoch[033/800], Step[0000/0626], Avg Loss: 0.7556
-INFO:local_logger:Epoch[033/800], Step[0000/0626], Avg Loss: 0.7455
-INFO:local_logger:Epoch[033/800], Step[0000/0626], Avg Loss: 0.7599
-INFO:local_logger:Epoch[033/800], Step[0000/0626], Avg Loss: 0.7565
-INFO:local_logger:Epoch[033/800], Step[0000/0626], Avg Loss: 0.7625
-INFO:local_logger:Epoch[033/800], Step[0000/0626], Avg Loss: 0.7448
-INFO:local_logger:Epoch[033/800], Step[0100/0626], Avg Loss: 0.7586
-INFO:local_logger:Epoch[033/800], Step[0100/0626], Avg Loss: 0.7590
-INFO:local_logger:Epoch[033/800], Step[0100/0626], Avg Loss: 0.7584
-INFO:local_logger:Epoch[033/800], Step[0100/0626], Avg Loss: 0.7588
-INFO:local_logger:Epoch[033/800], Step[0100/0626], Avg Loss: 0.7580
-INFO:local_logger:Epoch[033/800], Step[0100/0626], Avg Loss: 0.7570
-INFO:local_logger:Epoch[033/800], Step[0100/0626], Avg Loss: 0.7575
-INFO:local_logger:Epoch[033/800], Step[0100/0626], Avg Loss: 0.7570
-INFO:master_logger:Epoch[033/800], Step[0100/0626], Avg Loss: 0.7580
-INFO:local_logger:Epoch[033/800], Step[0200/0626], Avg Loss: 0.7577
-INFO:local_logger:Epoch[033/800], Step[0200/0626], Avg Loss: 0.7575
-INFO:local_logger:Epoch[033/800], Step[0200/0626], Avg Loss: 0.7584
-INFO:local_logger:Epoch[033/800], Step[0200/0626], Avg Loss: 0.7580
-INFO:local_logger:Epoch[033/800], Step[0200/0626], Avg Loss: 0.7584
-INFO:local_logger:Epoch[033/800], Step[0200/0626], Avg Loss: 0.7571
-INFO:local_logger:Epoch[033/800], Step[0200/0626], Avg Loss: 0.7574
-INFO:master_logger:Epoch[033/800], Step[0200/0626], Avg Loss: 0.7578
-INFO:local_logger:Epoch[033/800], Step[0200/0626], Avg Loss: 0.7576
-INFO:local_logger:Epoch[033/800], Step[0300/0626], Avg Loss: 0.7572
-INFO:local_logger:Epoch[033/800], Step[0300/0626], Avg Loss: 0.7579
-INFO:local_logger:Epoch[033/800], Step[0300/0626], Avg Loss: 0.7576
-INFO:local_logger:Epoch[033/800], Step[0300/0626], Avg Loss: 0.7568
-INFO:local_logger:Epoch[033/800], Step[0300/0626], Avg Loss: 0.7577
-INFO:local_logger:Epoch[033/800], Step[0300/0626], Avg Loss: 0.7579
-INFO:master_logger:Epoch[033/800], Step[0300/0626], Avg Loss: 0.7575
-INFO:local_logger:Epoch[033/800], Step[0300/0626], Avg Loss: 0.7579
-INFO:local_logger:Epoch[033/800], Step[0300/0626], Avg Loss: 0.7572
-INFO:local_logger:Epoch[033/800], Step[0400/0626], Avg Loss: 0.7574
-INFO:local_logger:Epoch[033/800], Step[0400/0626], Avg Loss: 0.7571
-INFO:local_logger:Epoch[033/800], Step[0400/0626], Avg Loss: 0.7568
-INFO:local_logger:Epoch[033/800], Step[0400/0626], Avg Loss: 0.7573
-INFO:local_logger:Epoch[033/800], Step[0400/0626], Avg Loss: 0.7571
-INFO:local_logger:Epoch[033/800], Step[0400/0626], Avg Loss: 0.7573
-INFO:local_logger:Epoch[033/800], Step[0400/0626], Avg Loss: 0.7572
-INFO:local_logger:Epoch[033/800], Step[0400/0626], Avg Loss: 0.7565
-INFO:master_logger:Epoch[033/800], Step[0400/0626], Avg Loss: 0.7571
-INFO:local_logger:Epoch[033/800], Step[0500/0626], Avg Loss: 0.7565
-INFO:local_logger:Epoch[033/800], Step[0500/0626], Avg Loss: 0.7567
-INFO:local_logger:Epoch[033/800], Step[0500/0626], Avg Loss: 0.7570
-INFO:local_logger:Epoch[033/800], Step[0500/0626], Avg Loss: 0.7569
-INFO:local_logger:Epoch[033/800], Step[0500/0626], Avg Loss: 0.7564
-INFO:local_logger:Epoch[033/800], Step[0500/0626], Avg Loss: 0.7570
-INFO:local_logger:Epoch[033/800], Step[0500/0626], Avg Loss: 0.7569
-INFO:master_logger:Epoch[033/800], Step[0500/0626], Avg Loss: 0.7568
-INFO:local_logger:Epoch[033/800], Step[0500/0626], Avg Loss: 0.7569
-INFO:local_logger:Epoch[033/800], Step[0600/0626], Avg Loss: 0.7564
-INFO:local_logger:Epoch[033/800], Step[0600/0626], Avg Loss: 0.7561
-INFO:local_logger:Epoch[033/800], Step[0600/0626], Avg Loss: 0.7567
-INFO:local_logger:Epoch[033/800], Step[0600/0626], Avg Loss: 0.7561
-INFO:local_logger:Epoch[033/800], Step[0600/0626], Avg Loss: 0.7566
-INFO:local_logger:Epoch[033/800], Step[0600/0626], Avg Loss: 0.7567
-INFO:master_logger:Epoch[033/800], Step[0600/0626], Avg Loss: 0.7564
-INFO:local_logger:Epoch[033/800], Step[0600/0626], Avg Loss: 0.7563
-INFO:local_logger:Epoch[033/800], Step[0600/0626], Avg Loss: 0.7566
-INFO:local_logger:----- Epoch[033/800], Train Loss: 0.7561, time: 852.50
-INFO:master_logger:----- Epoch[033/800], Train Loss: 0.7564, time: 852.50
-INFO:local_logger:----- Epoch[033/800], Train Loss: 0.7566, time: 856.70
-INFO:local_logger:Now training epoch 34. LR=0.000128
-INFO:local_logger:----- Epoch[033/800], Train Loss: 0.7566, time: 856.72
-INFO:local_logger:Now training epoch 34. LR=0.000128
-INFO:local_logger:----- Epoch[033/800], Train Loss: 0.7560, time: 856.75
-INFO:local_logger:Now training epoch 34. LR=0.000128
-INFO:local_logger:----- Epoch[033/800], Train Loss: 0.7566, time: 857.55
-INFO:local_logger:Now training epoch 34. LR=0.000128
-INFO:local_logger:----- Epoch[033/800], Train Loss: 0.7562, time: 856.91
-INFO:local_logger:Now training epoch 34. LR=0.000128
-INFO:local_logger:----- Epoch[033/800], Train Loss: 0.7565, time: 856.90
-INFO:local_logger:Now training epoch 34. LR=0.000128
-INFO:local_logger:----- Epoch[033/800], Train Loss: 0.7563, time: 856.92
-INFO:local_logger:Now training epoch 34. LR=0.000128
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-33-Loss-0.7561021761674307.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-33-Loss-0.7561021761674307.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-33-Loss-0.7561021761674307.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-33-Loss-0.7561021761674307.pdopt
-INFO:local_logger:Now training epoch 34. LR=0.000128
-INFO:master_logger:Now training epoch 34. LR=0.000128
-INFO:local_logger:Epoch[034/800], Step[0000/0626], Avg Loss: 0.7407
-INFO:master_logger:Epoch[034/800], Step[0000/0626], Avg Loss: 0.7518
-INFO:local_logger:Epoch[034/800], Step[0000/0626], Avg Loss: 0.7494
-INFO:local_logger:Epoch[034/800], Step[0000/0626], Avg Loss: 0.7433
-INFO:local_logger:Epoch[034/800], Step[0000/0626], Avg Loss: 0.7598
-INFO:local_logger:Epoch[034/800], Step[0000/0626], Avg Loss: 0.7563
-INFO:local_logger:Epoch[034/800], Step[0000/0626], Avg Loss: 0.7612
-INFO:local_logger:Epoch[034/800], Step[0000/0626], Avg Loss: 0.7477
-INFO:local_logger:Epoch[034/800], Step[0000/0626], Avg Loss: 0.7559
-INFO:local_logger:Epoch[034/800], Step[0100/0626], Avg Loss: 0.7547
-INFO:local_logger:Epoch[034/800], Step[0100/0626], Avg Loss: 0.7548
-INFO:local_logger:Epoch[034/800], Step[0100/0626], Avg Loss: 0.7543
-INFO:local_logger:Epoch[034/800], Step[0100/0626], Avg Loss: 0.7539
-INFO:local_logger:Epoch[034/800], Step[0100/0626], Avg Loss: 0.7534
-INFO:local_logger:Epoch[034/800], Step[0100/0626], Avg Loss: 0.7536
-INFO:master_logger:Epoch[034/800], Step[0100/0626], Avg Loss: 0.7540
-INFO:local_logger:Epoch[034/800], Step[0100/0626], Avg Loss: 0.7534
-INFO:local_logger:Epoch[034/800], Step[0100/0626], Avg Loss: 0.7540
-INFO:local_logger:Epoch[034/800], Step[0200/0626], Avg Loss: 0.7539
-INFO:local_logger:Epoch[034/800], Step[0200/0626], Avg Loss: 0.7529
-INFO:local_logger:Epoch[034/800], Step[0200/0626], Avg Loss: 0.7541
-INFO:local_logger:Epoch[034/800], Step[0200/0626], Avg Loss: 0.7541
-INFO:local_logger:Epoch[034/800], Step[0200/0626], Avg Loss: 0.7531
-INFO:master_logger:Epoch[034/800], Step[0200/0626], Avg Loss: 0.7536
-INFO:local_logger:Epoch[034/800], Step[0200/0626], Avg Loss: 0.7536
-INFO:local_logger:Epoch[034/800], Step[0200/0626], Avg Loss: 0.7534
-INFO:local_logger:Epoch[034/800], Step[0200/0626], Avg Loss: 0.7533
-INFO:local_logger:Epoch[034/800], Step[0300/0626], Avg Loss: 0.7529
-INFO:local_logger:Epoch[034/800], Step[0300/0626], Avg Loss: 0.7533
-INFO:local_logger:Epoch[034/800], Step[0300/0626], Avg Loss: 0.7532
-INFO:local_logger:Epoch[034/800], Step[0300/0626], Avg Loss: 0.7531
-INFO:local_logger:Epoch[034/800], Step[0300/0626], Avg Loss: 0.7528
-INFO:local_logger:Epoch[034/800], Step[0300/0626], Avg Loss: 0.7534
-INFO:master_logger:Epoch[034/800], Step[0300/0626], Avg Loss: 0.7531
-INFO:local_logger:Epoch[034/800], Step[0300/0626], Avg Loss: 0.7539
-INFO:local_logger:Epoch[034/800], Step[0300/0626], Avg Loss: 0.7523
-INFO:local_logger:Epoch[034/800], Step[0400/0626], Avg Loss: 0.7527
-INFO:local_logger:Epoch[034/800], Step[0400/0626], Avg Loss: 0.7527
-INFO:local_logger:Epoch[034/800], Step[0400/0626], Avg Loss: 0.7526
-INFO:local_logger:Epoch[034/800], Step[0400/0626], Avg Loss: 0.7520
-INFO:local_logger:Epoch[034/800], Step[0400/0626], Avg Loss: 0.7536
-INFO:local_logger:Epoch[034/800], Step[0400/0626], Avg Loss: 0.7525
-INFO:local_logger:Epoch[034/800], Step[0400/0626], Avg Loss: 0.7525
-INFO:local_logger:Epoch[034/800], Step[0400/0626], Avg Loss: 0.7528
-INFO:master_logger:Epoch[034/800], Step[0400/0626], Avg Loss: 0.7527
-INFO:local_logger:Epoch[034/800], Step[0500/0626], Avg Loss: 0.7526
-INFO:local_logger:Epoch[034/800], Step[0500/0626], Avg Loss: 0.7520
-INFO:local_logger:Epoch[034/800], Step[0500/0626], Avg Loss: 0.7518
-INFO:local_logger:Epoch[034/800], Step[0500/0626], Avg Loss: 0.7523
-INFO:local_logger:Epoch[034/800], Step[0500/0626], Avg Loss: 0.7525
-INFO:local_logger:Epoch[034/800], Step[0500/0626], Avg Loss: 0.7524
-INFO:local_logger:Epoch[034/800], Step[0500/0626], Avg Loss: 0.7522
-INFO:local_logger:Epoch[034/800], Step[0500/0626], Avg Loss: 0.7532
-INFO:master_logger:Epoch[034/800], Step[0500/0626], Avg Loss: 0.7524
-INFO:local_logger:Epoch[034/800], Step[0600/0626], Avg Loss: 0.7520
-INFO:local_logger:Epoch[034/800], Step[0600/0626], Avg Loss: 0.7521
-INFO:local_logger:Epoch[034/800], Step[0600/0626], Avg Loss: 0.7516
-INFO:local_logger:Epoch[034/800], Step[0600/0626], Avg Loss: 0.7522
-INFO:local_logger:Epoch[034/800], Step[0600/0626], Avg Loss: 0.7516
-INFO:master_logger:Epoch[034/800], Step[0600/0626], Avg Loss: 0.7520
-INFO:local_logger:Epoch[034/800], Step[0600/0626], Avg Loss: 0.7521
-INFO:local_logger:Epoch[034/800], Step[0600/0626], Avg Loss: 0.7526
-INFO:local_logger:Epoch[034/800], Step[0600/0626], Avg Loss: 0.7516
-INFO:local_logger:----- Epoch[034/800], Train Loss: 0.7519, time: 885.18
-INFO:local_logger:Now training epoch 35. LR=0.000131
-INFO:local_logger:----- Epoch[034/800], Train Loss: 0.7520, time: 885.17
-INFO:local_logger:Now training epoch 35. LR=0.000131
-INFO:local_logger:----- Epoch[034/800], Train Loss: 0.7519, time: 885.32
-INFO:local_logger:Now training epoch 35. LR=0.000131
-INFO:local_logger:----- Epoch[034/800], Train Loss: 0.7516, time: 882.34
-INFO:master_logger:----- Epoch[034/800], Train Loss: 0.7519, time: 882.34
-INFO:local_logger:----- Epoch[034/800], Train Loss: 0.7522, time: 885.50
-INFO:local_logger:Now training epoch 35. LR=0.000131
-INFO:local_logger:----- Epoch[034/800], Train Loss: 0.7516, time: 885.48
-INFO:local_logger:Now training epoch 35. LR=0.000131
-INFO:local_logger:----- Epoch[034/800], Train Loss: 0.7516, time: 885.48
-INFO:local_logger:Now training epoch 35. LR=0.000131
-INFO:local_logger:----- Epoch[034/800], Train Loss: 0.7524, time: 885.48
-INFO:local_logger:Now training epoch 35. LR=0.000131
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-34-Loss-0.7516406552539668.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-34-Loss-0.7516406552539668.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-34-Loss-0.7516406552539668.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-34-Loss-0.7516406552539668.pdopt
-INFO:local_logger:Now training epoch 35. LR=0.000131
-INFO:master_logger:Now training epoch 35. LR=0.000131
-INFO:local_logger:Epoch[035/800], Step[0000/0626], Avg Loss: 0.7487
-INFO:local_logger:Epoch[035/800], Step[0000/0626], Avg Loss: 0.7414
-INFO:local_logger:Epoch[035/800], Step[0000/0626], Avg Loss: 0.7520
-INFO:master_logger:Epoch[035/800], Step[0000/0626], Avg Loss: 0.7511
-INFO:local_logger:Epoch[035/800], Step[0000/0626], Avg Loss: 0.7453
-INFO:local_logger:Epoch[035/800], Step[0000/0626], Avg Loss: 0.7505
-INFO:local_logger:Epoch[035/800], Step[0000/0626], Avg Loss: 0.7633
-INFO:local_logger:Epoch[035/800], Step[0000/0626], Avg Loss: 0.7544
-INFO:local_logger:Epoch[035/800], Step[0000/0626], Avg Loss: 0.7535
-INFO:local_logger:Epoch[035/800], Step[0100/0626], Avg Loss: 0.7500
-INFO:local_logger:Epoch[035/800], Step[0100/0626], Avg Loss: 0.7504
-INFO:local_logger:Epoch[035/800], Step[0100/0626], Avg Loss: 0.7496
-INFO:local_logger:Epoch[035/800], Step[0100/0626], Avg Loss: 0.7499
-INFO:local_logger:Epoch[035/800], Step[0100/0626], Avg Loss: 0.7493
-INFO:local_logger:Epoch[035/800], Step[0100/0626], Avg Loss: 0.7492
-INFO:local_logger:Epoch[035/800], Step[0100/0626], Avg Loss: 0.7490
-INFO:master_logger:Epoch[035/800], Step[0100/0626], Avg Loss: 0.7494
-INFO:local_logger:Epoch[035/800], Step[0100/0626], Avg Loss: 0.7481
-INFO:local_logger:Epoch[035/800], Step[0200/0626], Avg Loss: 0.7496
-INFO:local_logger:Epoch[035/800], Step[0200/0626], Avg Loss: 0.7496
-INFO:local_logger:Epoch[035/800], Step[0200/0626], Avg Loss: 0.7495
-INFO:local_logger:Epoch[035/800], Step[0200/0626], Avg Loss: 0.7485
-INFO:local_logger:Epoch[035/800], Step[0200/0626], Avg Loss: 0.7487
-INFO:master_logger:Epoch[035/800], Step[0200/0626], Avg Loss: 0.7493
-INFO:local_logger:Epoch[035/800], Step[0200/0626], Avg Loss: 0.7490
-INFO:local_logger:Epoch[035/800], Step[0200/0626], Avg Loss: 0.7498
-INFO:local_logger:Epoch[035/800], Step[0200/0626], Avg Loss: 0.7493
-INFO:local_logger:Epoch[035/800], Step[0300/0626], Avg Loss: 0.7484
-INFO:local_logger:Epoch[035/800], Step[0300/0626], Avg Loss: 0.7486
-INFO:local_logger:Epoch[035/800], Step[0300/0626], Avg Loss: 0.7491
-INFO:local_logger:Epoch[035/800], Step[0300/0626], Avg Loss: 0.7494
-INFO:local_logger:Epoch[035/800], Step[0300/0626], Avg Loss: 0.7494
-INFO:master_logger:Epoch[035/800], Step[0300/0626], Avg Loss: 0.7490
-INFO:local_logger:Epoch[035/800], Step[0300/0626], Avg Loss: 0.7491
-INFO:local_logger:Epoch[035/800], Step[0300/0626], Avg Loss: 0.7488
-INFO:local_logger:Epoch[035/800], Step[0300/0626], Avg Loss: 0.7495
-INFO:local_logger:Epoch[035/800], Step[0400/0626], Avg Loss: 0.7488
-INFO:local_logger:Epoch[035/800], Step[0400/0626], Avg Loss: 0.7490
-INFO:local_logger:Epoch[035/800], Step[0400/0626], Avg Loss: 0.7483
-INFO:local_logger:Epoch[035/800], Step[0400/0626], Avg Loss: 0.7490
-INFO:local_logger:Epoch[035/800], Step[0400/0626], Avg Loss: 0.7484
-INFO:local_logger:Epoch[035/800], Step[0400/0626], Avg Loss: 0.7489
-INFO:local_logger:Epoch[035/800], Step[0400/0626], Avg Loss: 0.7480
-INFO:master_logger:Epoch[035/800], Step[0400/0626], Avg Loss: 0.7486
-INFO:local_logger:Epoch[035/800], Step[0400/0626], Avg Loss: 0.7484
-INFO:local_logger:Epoch[035/800], Step[0500/0626], Avg Loss: 0.7485
-INFO:local_logger:Epoch[035/800], Step[0500/0626], Avg Loss: 0.7488
-INFO:local_logger:Epoch[035/800], Step[0500/0626], Avg Loss: 0.7483
-INFO:local_logger:Epoch[035/800], Step[0500/0626], Avg Loss: 0.7482
-INFO:local_logger:Epoch[035/800], Step[0500/0626], Avg Loss: 0.7479
-INFO:local_logger:Epoch[035/800], Step[0500/0626], Avg Loss: 0.7482
-INFO:local_logger:Epoch[035/800], Step[0500/0626], Avg Loss: 0.7491
-INFO:master_logger:Epoch[035/800], Step[0500/0626], Avg Loss: 0.7484
-INFO:local_logger:Epoch[035/800], Step[0500/0626], Avg Loss: 0.7484
-INFO:local_logger:Epoch[035/800], Step[0600/0626], Avg Loss: 0.7482
-INFO:local_logger:Epoch[035/800], Step[0600/0626], Avg Loss: 0.7485
-INFO:local_logger:Epoch[035/800], Step[0600/0626], Avg Loss: 0.7478
-INFO:local_logger:Epoch[035/800], Step[0600/0626], Avg Loss: 0.7480
-INFO:local_logger:Epoch[035/800], Step[0600/0626], Avg Loss: 0.7480
-INFO:local_logger:Epoch[035/800], Step[0600/0626], Avg Loss: 0.7481
-INFO:local_logger:Epoch[035/800], Step[0600/0626], Avg Loss: 0.7478
-INFO:local_logger:Epoch[035/800], Step[0600/0626], Avg Loss: 0.7485
-INFO:master_logger:Epoch[035/800], Step[0600/0626], Avg Loss: 0.7481
-INFO:local_logger:----- Epoch[035/800], Train Loss: 0.7484, time: 858.24
-INFO:local_logger:Now training epoch 36. LR=0.000135
-INFO:local_logger:----- Epoch[035/800], Train Loss: 0.7479, time: 859.58
-INFO:local_logger:Now training epoch 36. LR=0.000135
-INFO:local_logger:----- Epoch[035/800], Train Loss: 0.7481, time: 859.55
-INFO:local_logger:Now training epoch 36. LR=0.000135
-INFO:local_logger:----- Epoch[035/800], Train Loss: 0.7480, time: 859.83
-INFO:local_logger:Now training epoch 36. LR=0.000135
-INFO:local_logger:----- Epoch[035/800], Train Loss: 0.7480, time: 859.45
-INFO:local_logger:Now training epoch 36. LR=0.000135
-INFO:local_logger:----- Epoch[035/800], Train Loss: 0.7484, time: 859.45
-INFO:local_logger:Now training epoch 36. LR=0.000135
-INFO:local_logger:----- Epoch[035/800], Train Loss: 0.7477, time: 855.75
-INFO:master_logger:----- Epoch[035/800], Train Loss: 0.7480, time: 855.75
-INFO:local_logger:----- Epoch[035/800], Train Loss: 0.7478, time: 859.47
-INFO:local_logger:Now training epoch 36. LR=0.000135
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-35-Loss-0.7477136553201034.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-35-Loss-0.7477136553201034.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-35-Loss-0.7477136553201034.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-35-Loss-0.7477136553201034.pdopt
-INFO:local_logger:Now training epoch 36. LR=0.000135
-INFO:master_logger:Now training epoch 36. LR=0.000135
-INFO:local_logger:Epoch[036/800], Step[0000/0626], Avg Loss: 0.7546
-INFO:local_logger:Epoch[036/800], Step[0000/0626], Avg Loss: 0.7470
-INFO:local_logger:Epoch[036/800], Step[0000/0626], Avg Loss: 0.7469
-INFO:master_logger:Epoch[036/800], Step[0000/0626], Avg Loss: 0.7497
-INFO:local_logger:Epoch[036/800], Step[0000/0626], Avg Loss: 0.7536
-INFO:local_logger:Epoch[036/800], Step[0000/0626], Avg Loss: 0.7544
-INFO:local_logger:Epoch[036/800], Step[0000/0626], Avg Loss: 0.7537
-INFO:local_logger:Epoch[036/800], Step[0000/0626], Avg Loss: 0.7427
-INFO:local_logger:Epoch[036/800], Step[0000/0626], Avg Loss: 0.7446
-INFO:local_logger:Epoch[036/800], Step[0100/0626], Avg Loss: 0.7474
-INFO:local_logger:Epoch[036/800], Step[0100/0626], Avg Loss: 0.7461
-INFO:local_logger:Epoch[036/800], Step[0100/0626], Avg Loss: 0.7458
-INFO:local_logger:Epoch[036/800], Step[0100/0626], Avg Loss: 0.7467
-INFO:local_logger:Epoch[036/800], Step[0100/0626], Avg Loss: 0.7456
-INFO:local_logger:Epoch[036/800], Step[0100/0626], Avg Loss: 0.7469
-INFO:local_logger:Epoch[036/800], Step[0100/0626], Avg Loss: 0.7457
-INFO:master_logger:Epoch[036/800], Step[0100/0626], Avg Loss: 0.7463
-INFO:local_logger:Epoch[036/800], Step[0100/0626], Avg Loss: 0.7461
-INFO:local_logger:Epoch[036/800], Step[0200/0626], Avg Loss: 0.7459
-INFO:local_logger:Epoch[036/800], Step[0200/0626], Avg Loss: 0.7473
-INFO:local_logger:Epoch[036/800], Step[0200/0626], Avg Loss: 0.7459
-INFO:local_logger:Epoch[036/800], Step[0200/0626], Avg Loss: 0.7461
-INFO:local_logger:Epoch[036/800], Step[0200/0626], Avg Loss: 0.7460
-INFO:local_logger:Epoch[036/800], Step[0200/0626], Avg Loss: 0.7450
-INFO:master_logger:Epoch[036/800], Step[0200/0626], Avg Loss: 0.7460
-INFO:local_logger:Epoch[036/800], Step[0200/0626], Avg Loss: 0.7464
-INFO:local_logger:Epoch[036/800], Step[0200/0626], Avg Loss: 0.7453
-INFO:local_logger:Epoch[036/800], Step[0300/0626], Avg Loss: 0.7464
-INFO:local_logger:Epoch[036/800], Step[0300/0626], Avg Loss: 0.7457
-INFO:local_logger:Epoch[036/800], Step[0300/0626], Avg Loss: 0.7456
-INFO:local_logger:Epoch[036/800], Step[0300/0626], Avg Loss: 0.7461
-INFO:local_logger:Epoch[036/800], Step[0300/0626], Avg Loss: 0.7456
-INFO:master_logger:Epoch[036/800], Step[0300/0626], Avg Loss: 0.7456
-INFO:local_logger:Epoch[036/800], Step[0300/0626], Avg Loss: 0.7451
-INFO:local_logger:Epoch[036/800], Step[0300/0626], Avg Loss: 0.7456
-INFO:local_logger:Epoch[036/800], Step[0300/0626], Avg Loss: 0.7450
-INFO:local_logger:Epoch[036/800], Step[0400/0626], Avg Loss: 0.7453
-INFO:local_logger:Epoch[036/800], Step[0400/0626], Avg Loss: 0.7449
-INFO:local_logger:Epoch[036/800], Step[0400/0626], Avg Loss: 0.7457
-INFO:local_logger:Epoch[036/800], Step[0400/0626], Avg Loss: 0.7456
-INFO:local_logger:Epoch[036/800], Step[0400/0626], Avg Loss: 0.7456
-INFO:local_logger:Epoch[036/800], Step[0400/0626], Avg Loss: 0.7450
-INFO:master_logger:Epoch[036/800], Step[0400/0626], Avg Loss: 0.7453
-INFO:local_logger:Epoch[036/800], Step[0400/0626], Avg Loss: 0.7452
-INFO:local_logger:Epoch[036/800], Step[0400/0626], Avg Loss: 0.7449
-INFO:local_logger:Epoch[036/800], Step[0500/0626], Avg Loss: 0.7452
-INFO:local_logger:Epoch[036/800], Step[0500/0626], Avg Loss: 0.7450
-INFO:local_logger:Epoch[036/800], Step[0500/0626], Avg Loss: 0.7453
-INFO:local_logger:Epoch[036/800], Step[0500/0626], Avg Loss: 0.7449
-INFO:local_logger:Epoch[036/800], Step[0500/0626], Avg Loss: 0.7449
-INFO:local_logger:Epoch[036/800], Step[0500/0626], Avg Loss: 0.7452
-INFO:local_logger:Epoch[036/800], Step[0500/0626], Avg Loss: 0.7448
-INFO:local_logger:Epoch[036/800], Step[0500/0626], Avg Loss: 0.7451
-INFO:master_logger:Epoch[036/800], Step[0500/0626], Avg Loss: 0.7451
-INFO:local_logger:Epoch[036/800], Step[0600/0626], Avg Loss: 0.7446
-INFO:local_logger:Epoch[036/800], Step[0600/0626], Avg Loss: 0.7446
-INFO:local_logger:Epoch[036/800], Step[0600/0626], Avg Loss: 0.7446
-INFO:local_logger:Epoch[036/800], Step[0600/0626], Avg Loss: 0.7449
-INFO:local_logger:Epoch[036/800], Step[0600/0626], Avg Loss: 0.7446
-INFO:local_logger:Epoch[036/800], Step[0600/0626], Avg Loss: 0.7444
-INFO:local_logger:Epoch[036/800], Step[0600/0626], Avg Loss: 0.7450
-INFO:local_logger:Epoch[036/800], Step[0600/0626], Avg Loss: 0.7450
-INFO:master_logger:Epoch[036/800], Step[0600/0626], Avg Loss: 0.7447
-INFO:local_logger:----- Epoch[036/800], Train Loss: 0.7449, time: 890.54
-INFO:local_logger:Now training epoch 37. LR=0.000139
-INFO:local_logger:----- Epoch[036/800], Train Loss: 0.7445, time: 891.48
-INFO:local_logger:----- Epoch[036/800], Train Loss: 0.7445, time: 891.09
-INFO:local_logger:Now training epoch 37. LR=0.000139
-INFO:local_logger:Now training epoch 37. LR=0.000139
-INFO:local_logger:----- Epoch[036/800], Train Loss: 0.7446, time: 887.41
-INFO:master_logger:----- Epoch[036/800], Train Loss: 0.7447, time: 887.41
-INFO:local_logger:----- Epoch[036/800], Train Loss: 0.7448, time: 891.48
-INFO:local_logger:Now training epoch 37. LR=0.000139
-INFO:local_logger:----- Epoch[036/800], Train Loss: 0.7449, time: 891.12
-INFO:local_logger:Now training epoch 37. LR=0.000139
-INFO:local_logger:----- Epoch[036/800], Train Loss: 0.7446, time: 892.33
-INFO:local_logger:Now training epoch 37. LR=0.000139
-INFO:local_logger:----- Epoch[036/800], Train Loss: 0.7444, time: 891.12
-INFO:local_logger:Now training epoch 37. LR=0.000139
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-36-Loss-0.7446498155174043.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-36-Loss-0.7446498155174043.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-36-Loss-0.7446498155174043.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-36-Loss-0.7446498155174043.pdopt
-INFO:local_logger:Now training epoch 37. LR=0.000139
-INFO:master_logger:Now training epoch 37. LR=0.000139
-INFO:local_logger:Epoch[037/800], Step[0000/0626], Avg Loss: 0.7435
-INFO:local_logger:Epoch[037/800], Step[0000/0626], Avg Loss: 0.7441
-INFO:local_logger:Epoch[037/800], Step[0000/0626], Avg Loss: 0.7454
-INFO:local_logger:Epoch[037/800], Step[0000/0626], Avg Loss: 0.7397
-INFO:master_logger:Epoch[037/800], Step[0000/0626], Avg Loss: 0.7456
-INFO:local_logger:Epoch[037/800], Step[0000/0626], Avg Loss: 0.7399
-INFO:local_logger:Epoch[037/800], Step[0000/0626], Avg Loss: 0.7476
-INFO:local_logger:Epoch[037/800], Step[0000/0626], Avg Loss: 0.7504
-INFO:local_logger:Epoch[037/800], Step[0000/0626], Avg Loss: 0.7542
-INFO:local_logger:Epoch[037/800], Step[0100/0626], Avg Loss: 0.7433
-INFO:local_logger:Epoch[037/800], Step[0100/0626], Avg Loss: 0.7414
-INFO:local_logger:Epoch[037/800], Step[0100/0626], Avg Loss: 0.7429
-INFO:local_logger:Epoch[037/800], Step[0100/0626], Avg Loss: 0.7416
-INFO:local_logger:Epoch[037/800], Step[0100/0626], Avg Loss: 0.7421
-INFO:master_logger:Epoch[037/800], Step[0100/0626], Avg Loss: 0.7426
-INFO:local_logger:Epoch[037/800], Step[0100/0626], Avg Loss: 0.7423
-INFO:local_logger:Epoch[037/800], Step[0100/0626], Avg Loss: 0.7431
-INFO:local_logger:Epoch[037/800], Step[0100/0626], Avg Loss: 0.7438
-INFO:local_logger:Epoch[037/800], Step[0200/0626], Avg Loss: 0.7429
-INFO:local_logger:Epoch[037/800], Step[0200/0626], Avg Loss: 0.7434
-INFO:local_logger:Epoch[037/800], Step[0200/0626], Avg Loss: 0.7425
-INFO:local_logger:Epoch[037/800], Step[0200/0626], Avg Loss: 0.7427
-INFO:local_logger:Epoch[037/800], Step[0200/0626], Avg Loss: 0.7427
-INFO:local_logger:Epoch[037/800], Step[0200/0626], Avg Loss: 0.7428
-INFO:local_logger:Epoch[037/800], Step[0200/0626], Avg Loss: 0.7429
-INFO:local_logger:Epoch[037/800], Step[0200/0626], Avg Loss: 0.7424
-INFO:master_logger:Epoch[037/800], Step[0200/0626], Avg Loss: 0.7428
-INFO:local_logger:Epoch[037/800], Step[0300/0626], Avg Loss: 0.7422
-INFO:local_logger:Epoch[037/800], Step[0300/0626], Avg Loss: 0.7425
-INFO:local_logger:Epoch[037/800], Step[0300/0626], Avg Loss: 0.7417
-INFO:local_logger:Epoch[037/800], Step[0300/0626], Avg Loss: 0.7424
-INFO:local_logger:Epoch[037/800], Step[0300/0626], Avg Loss: 0.7423
-INFO:local_logger:Epoch[037/800], Step[0300/0626], Avg Loss: 0.7424
-INFO:local_logger:Epoch[037/800], Step[0300/0626], Avg Loss: 0.7427
-INFO:local_logger:Epoch[037/800], Step[0300/0626], Avg Loss: 0.7425
-INFO:master_logger:Epoch[037/800], Step[0300/0626], Avg Loss: 0.7423
-INFO:local_logger:Epoch[037/800], Step[0400/0626], Avg Loss: 0.7423
-INFO:local_logger:Epoch[037/800], Step[0400/0626], Avg Loss: 0.7419
-INFO:local_logger:Epoch[037/800], Step[0400/0626], Avg Loss: 0.7416
-INFO:local_logger:Epoch[037/800], Step[0400/0626], Avg Loss: 0.7420
-INFO:local_logger:Epoch[037/800], Step[0400/0626], Avg Loss: 0.7422
-INFO:local_logger:Epoch[037/800], Step[0400/0626], Avg Loss: 0.7414
-INFO:local_logger:Epoch[037/800], Step[0400/0626], Avg Loss: 0.7420
-INFO:local_logger:Epoch[037/800], Step[0400/0626], Avg Loss: 0.7419
-INFO:master_logger:Epoch[037/800], Step[0400/0626], Avg Loss: 0.7419
-INFO:local_logger:Epoch[037/800], Step[0500/0626], Avg Loss: 0.7414
-INFO:local_logger:Epoch[037/800], Step[0500/0626], Avg Loss: 0.7414
-INFO:local_logger:Epoch[037/800], Step[0500/0626], Avg Loss: 0.7416
-INFO:local_logger:Epoch[037/800], Step[0500/0626], Avg Loss: 0.7419
-INFO:local_logger:Epoch[037/800], Step[0500/0626], Avg Loss: 0.7417
-INFO:local_logger:Epoch[037/800], Step[0500/0626], Avg Loss: 0.7419
-INFO:local_logger:Epoch[037/800], Step[0500/0626], Avg Loss: 0.7415
-INFO:local_logger:Epoch[037/800], Step[0500/0626], Avg Loss: 0.7418
-INFO:master_logger:Epoch[037/800], Step[0500/0626], Avg Loss: 0.7417
-INFO:local_logger:Epoch[037/800], Step[0600/0626], Avg Loss: 0.7410
-INFO:local_logger:Epoch[037/800], Step[0600/0626], Avg Loss: 0.7416
-INFO:local_logger:Epoch[037/800], Step[0600/0626], Avg Loss: 0.7416
-INFO:local_logger:Epoch[037/800], Step[0600/0626], Avg Loss: 0.7417
-INFO:local_logger:Epoch[037/800], Step[0600/0626], Avg Loss: 0.7413
-INFO:local_logger:Epoch[037/800], Step[0600/0626], Avg Loss: 0.7413
-INFO:local_logger:Epoch[037/800], Step[0600/0626], Avg Loss: 0.7412
-INFO:master_logger:Epoch[037/800], Step[0600/0626], Avg Loss: 0.7414
-INFO:local_logger:Epoch[037/800], Step[0600/0626], Avg Loss: 0.7414
-INFO:local_logger:----- Epoch[037/800], Train Loss: 0.7415, time: 861.30
-INFO:local_logger:Now training epoch 38. LR=0.000143
-INFO:local_logger:----- Epoch[037/800], Train Loss: 0.7415, time: 861.30
-INFO:local_logger:Now training epoch 38. LR=0.000143
-INFO:local_logger:----- Epoch[037/800], Train Loss: 0.7414, time: 861.29
-INFO:local_logger:Now training epoch 38. LR=0.000143
-INFO:local_logger:----- Epoch[037/800], Train Loss: 0.7412, time: 861.53
-INFO:local_logger:Now training epoch 38. LR=0.000143
-INFO:local_logger:----- Epoch[037/800], Train Loss: 0.7412, time: 862.22
-INFO:local_logger:Now training epoch 38. LR=0.000143
-INFO:local_logger:----- Epoch[037/800], Train Loss: 0.7411, time: 861.67
-INFO:local_logger:Now training epoch 38. LR=0.000143
-INFO:local_logger:----- Epoch[037/800], Train Loss: 0.7410, time: 861.66
-INFO:local_logger:Now training epoch 38. LR=0.000143
-INFO:local_logger:----- Epoch[037/800], Train Loss: 0.7415, time: 858.01
-INFO:master_logger:----- Epoch[037/800], Train Loss: 0.7413, time: 858.01
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-37-Loss-0.7415279359559235.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-37-Loss-0.7415279359559235.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-37-Loss-0.7415279359559235.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-37-Loss-0.7415279359559235.pdopt
-INFO:local_logger:Now training epoch 38. LR=0.000143
-INFO:master_logger:Now training epoch 38. LR=0.000143
-INFO:local_logger:Epoch[038/800], Step[0000/0626], Avg Loss: 0.7326
-INFO:local_logger:Epoch[038/800], Step[0000/0626], Avg Loss: 0.7470
-INFO:master_logger:Epoch[038/800], Step[0000/0626], Avg Loss: 0.7394
-INFO:local_logger:Epoch[038/800], Step[0000/0626], Avg Loss: 0.7295
-INFO:local_logger:Epoch[038/800], Step[0000/0626], Avg Loss: 0.7452
-INFO:local_logger:Epoch[038/800], Step[0000/0626], Avg Loss: 0.7429
-INFO:local_logger:Epoch[038/800], Step[0000/0626], Avg Loss: 0.7354
-INFO:local_logger:Epoch[038/800], Step[0000/0626], Avg Loss: 0.7382
-INFO:local_logger:Epoch[038/800], Step[0000/0626], Avg Loss: 0.7443
-INFO:local_logger:Epoch[038/800], Step[0100/0626], Avg Loss: 0.7383
-INFO:local_logger:Epoch[038/800], Step[0100/0626], Avg Loss: 0.7398
-INFO:local_logger:Epoch[038/800], Step[0100/0626], Avg Loss: 0.7393
-INFO:local_logger:Epoch[038/800], Step[0100/0626], Avg Loss: 0.7389
-INFO:local_logger:Epoch[038/800], Step[0100/0626], Avg Loss: 0.7386
-INFO:local_logger:Epoch[038/800], Step[0100/0626], Avg Loss: 0.7387
-INFO:master_logger:Epoch[038/800], Step[0100/0626], Avg Loss: 0.7392
-INFO:local_logger:Epoch[038/800], Step[0100/0626], Avg Loss: 0.7411
-INFO:local_logger:Epoch[038/800], Step[0100/0626], Avg Loss: 0.7392
-INFO:local_logger:Epoch[038/800], Step[0200/0626], Avg Loss: 0.7388
-INFO:local_logger:Epoch[038/800], Step[0200/0626], Avg Loss: 0.7388
-INFO:local_logger:Epoch[038/800], Step[0200/0626], Avg Loss: 0.7386
-INFO:local_logger:Epoch[038/800], Step[0200/0626], Avg Loss: 0.7386
-INFO:local_logger:Epoch[038/800], Step[0200/0626], Avg Loss: 0.7399
-INFO:local_logger:Epoch[038/800], Step[0200/0626], Avg Loss: 0.7386
-INFO:local_logger:Epoch[038/800], Step[0200/0626], Avg Loss: 0.7391
-INFO:master_logger:Epoch[038/800], Step[0200/0626], Avg Loss: 0.7389
-INFO:local_logger:Epoch[038/800], Step[0200/0626], Avg Loss: 0.7389
-INFO:local_logger:Epoch[038/800], Step[0300/0626], Avg Loss: 0.7388
-INFO:local_logger:Epoch[038/800], Step[0300/0626], Avg Loss: 0.7390
-INFO:local_logger:Epoch[038/800], Step[0300/0626], Avg Loss: 0.7388
-INFO:local_logger:Epoch[038/800], Step[0300/0626], Avg Loss: 0.7386
-INFO:local_logger:Epoch[038/800], Step[0300/0626], Avg Loss: 0.7388
-INFO:local_logger:Epoch[038/800], Step[0300/0626], Avg Loss: 0.7384
-INFO:local_logger:Epoch[038/800], Step[0300/0626], Avg Loss: 0.7385
-INFO:master_logger:Epoch[038/800], Step[0300/0626], Avg Loss: 0.7387
-INFO:local_logger:Epoch[038/800], Step[0300/0626], Avg Loss: 0.7389
-INFO:local_logger:Epoch[038/800], Step[0400/0626], Avg Loss: 0.7382
-INFO:local_logger:Epoch[038/800], Step[0400/0626], Avg Loss: 0.7388
-INFO:local_logger:Epoch[038/800], Step[0400/0626], Avg Loss: 0.7390
-INFO:local_logger:Epoch[038/800], Step[0400/0626], Avg Loss: 0.7387
-INFO:local_logger:Epoch[038/800], Step[0400/0626], Avg Loss: 0.7388
-INFO:master_logger:Epoch[038/800], Step[0400/0626], Avg Loss: 0.7387
-INFO:local_logger:Epoch[038/800], Step[0400/0626], Avg Loss: 0.7384
-INFO:local_logger:Epoch[038/800], Step[0400/0626], Avg Loss: 0.7386
-INFO:local_logger:Epoch[038/800], Step[0400/0626], Avg Loss: 0.7388
-INFO:local_logger:Epoch[038/800], Step[0500/0626], Avg Loss: 0.7383
-INFO:local_logger:Epoch[038/800], Step[0500/0626], Avg Loss: 0.7385
-INFO:local_logger:Epoch[038/800], Step[0500/0626], Avg Loss: 0.7385
-INFO:local_logger:Epoch[038/800], Step[0500/0626], Avg Loss: 0.7385
-INFO:local_logger:Epoch[038/800], Step[0500/0626], Avg Loss: 0.7390
-INFO:local_logger:Epoch[038/800], Step[0500/0626], Avg Loss: 0.7377
-INFO:local_logger:Epoch[038/800], Step[0500/0626], Avg Loss: 0.7385
-INFO:master_logger:Epoch[038/800], Step[0500/0626], Avg Loss: 0.7385
-INFO:local_logger:Epoch[038/800], Step[0500/0626], Avg Loss: 0.7387
-INFO:local_logger:Epoch[038/800], Step[0600/0626], Avg Loss: 0.7378
-INFO:local_logger:Epoch[038/800], Step[0600/0626], Avg Loss: 0.7382
-INFO:local_logger:Epoch[038/800], Step[0600/0626], Avg Loss: 0.7389
-INFO:local_logger:Epoch[038/800], Step[0600/0626], Avg Loss: 0.7381
-INFO:local_logger:Epoch[038/800], Step[0600/0626], Avg Loss: 0.7381
-INFO:local_logger:Epoch[038/800], Step[0600/0626], Avg Loss: 0.7381
-INFO:master_logger:Epoch[038/800], Step[0600/0626], Avg Loss: 0.7383
-INFO:local_logger:Epoch[038/800], Step[0600/0626], Avg Loss: 0.7381
-INFO:local_logger:Epoch[038/800], Step[0600/0626], Avg Loss: 0.7386
-INFO:local_logger:----- Epoch[038/800], Train Loss: 0.7381, time: 895.61
-INFO:local_logger:Now training epoch 39. LR=0.000146
-INFO:local_logger:----- Epoch[038/800], Train Loss: 0.7378, time: 895.58
-INFO:local_logger:Now training epoch 39. LR=0.000146
-INFO:local_logger:----- Epoch[038/800], Train Loss: 0.7380, time: 895.99
-INFO:local_logger:Now training epoch 39. LR=0.000146
-INFO:local_logger:----- Epoch[038/800], Train Loss: 0.7381, time: 896.21
-INFO:local_logger:Now training epoch 39. LR=0.000146
-INFO:local_logger:----- Epoch[038/800], Train Loss: 0.7381, time: 892.19
-INFO:master_logger:----- Epoch[038/800], Train Loss: 0.7382, time: 892.19
-INFO:local_logger:----- Epoch[038/800], Train Loss: 0.7385, time: 895.95
-INFO:local_logger:Now training epoch 39. LR=0.000146
-INFO:local_logger:----- Epoch[038/800], Train Loss: 0.7388, time: 896.06
-INFO:local_logger:Now training epoch 39. LR=0.000146
-INFO:local_logger:----- Epoch[038/800], Train Loss: 0.7381, time: 895.94
-INFO:local_logger:Now training epoch 39. LR=0.000146
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-38-Loss-0.7380608445157859.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-38-Loss-0.7380608445157859.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-38-Loss-0.7380608445157859.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-38-Loss-0.7380608445157859.pdopt
-INFO:local_logger:Now training epoch 39. LR=0.000146
-INFO:master_logger:Now training epoch 39. LR=0.000146
-INFO:local_logger:Epoch[039/800], Step[0000/0626], Avg Loss: 0.7392
-INFO:master_logger:Epoch[039/800], Step[0000/0626], Avg Loss: 0.7382
-INFO:local_logger:Epoch[039/800], Step[0000/0626], Avg Loss: 0.7373
-INFO:local_logger:Epoch[039/800], Step[0000/0626], Avg Loss: 0.7385
-INFO:local_logger:Epoch[039/800], Step[0000/0626], Avg Loss: 0.7375
-INFO:local_logger:Epoch[039/800], Step[0000/0626], Avg Loss: 0.7414
-INFO:local_logger:Epoch[039/800], Step[0000/0626], Avg Loss: 0.7378
-INFO:local_logger:Epoch[039/800], Step[0000/0626], Avg Loss: 0.7312
-INFO:local_logger:Epoch[039/800], Step[0000/0626], Avg Loss: 0.7426
-INFO:local_logger:Epoch[039/800], Step[0100/0626], Avg Loss: 0.7358
-INFO:local_logger:Epoch[039/800], Step[0100/0626], Avg Loss: 0.7363
-INFO:local_logger:Epoch[039/800], Step[0100/0626], Avg Loss: 0.7362
-INFO:local_logger:Epoch[039/800], Step[0100/0626], Avg Loss: 0.7361
-INFO:local_logger:Epoch[039/800], Step[0100/0626], Avg Loss: 0.7359
-INFO:local_logger:Epoch[039/800], Step[0100/0626], Avg Loss: 0.7369
-INFO:local_logger:Epoch[039/800], Step[0100/0626], Avg Loss: 0.7365
-INFO:master_logger:Epoch[039/800], Step[0100/0626], Avg Loss: 0.7363
-INFO:local_logger:Epoch[039/800], Step[0100/0626], Avg Loss: 0.7366
-INFO:local_logger:Epoch[039/800], Step[0200/0626], Avg Loss: 0.7366
-INFO:local_logger:Epoch[039/800], Step[0200/0626], Avg Loss: 0.7359
-INFO:local_logger:Epoch[039/800], Step[0200/0626], Avg Loss: 0.7371
-INFO:local_logger:Epoch[039/800], Step[0200/0626], Avg Loss: 0.7356
-INFO:local_logger:Epoch[039/800], Step[0200/0626], Avg Loss: 0.7361
-INFO:master_logger:Epoch[039/800], Step[0200/0626], Avg Loss: 0.7362
-INFO:local_logger:Epoch[039/800], Step[0200/0626], Avg Loss: 0.7363
-INFO:local_logger:Epoch[039/800], Step[0200/0626], Avg Loss: 0.7364
-INFO:local_logger:Epoch[039/800], Step[0200/0626], Avg Loss: 0.7360
-INFO:local_logger:Epoch[039/800], Step[0300/0626], Avg Loss: 0.7354
-INFO:local_logger:Epoch[039/800], Step[0300/0626], Avg Loss: 0.7356
-INFO:local_logger:Epoch[039/800], Step[0300/0626], Avg Loss: 0.7368
-INFO:local_logger:Epoch[039/800], Step[0300/0626], Avg Loss: 0.7364
-INFO:local_logger:Epoch[039/800], Step[0300/0626], Avg Loss: 0.7363
-INFO:master_logger:Epoch[039/800], Step[0300/0626], Avg Loss: 0.7360
-INFO:local_logger:Epoch[039/800], Step[0300/0626], Avg Loss: 0.7357
-INFO:local_logger:Epoch[039/800], Step[0300/0626], Avg Loss: 0.7359
-INFO:local_logger:Epoch[039/800], Step[0300/0626], Avg Loss: 0.7360
-INFO:local_logger:Epoch[039/800], Step[0400/0626], Avg Loss: 0.7359
-INFO:local_logger:Epoch[039/800], Step[0400/0626], Avg Loss: 0.7357
-INFO:local_logger:Epoch[039/800], Step[0400/0626], Avg Loss: 0.7355
-INFO:local_logger:Epoch[039/800], Step[0400/0626], Avg Loss: 0.7359
-INFO:local_logger:Epoch[039/800], Step[0400/0626], Avg Loss: 0.7359
-INFO:master_logger:Epoch[039/800], Step[0400/0626], Avg Loss: 0.7358
-INFO:local_logger:Epoch[039/800], Step[0400/0626], Avg Loss: 0.7365
-INFO:local_logger:Epoch[039/800], Step[0400/0626], Avg Loss: 0.7355
-INFO:local_logger:Epoch[039/800], Step[0400/0626], Avg Loss: 0.7356
-INFO:local_logger:Epoch[039/800], Step[0500/0626], Avg Loss: 0.7352
-INFO:local_logger:Epoch[039/800], Step[0500/0626], Avg Loss: 0.7352
-INFO:local_logger:Epoch[039/800], Step[0500/0626], Avg Loss: 0.7356
-INFO:local_logger:Epoch[039/800], Step[0500/0626], Avg Loss: 0.7355
-INFO:local_logger:Epoch[039/800], Step[0500/0626], Avg Loss: 0.7358
-INFO:local_logger:Epoch[039/800], Step[0500/0626], Avg Loss: 0.7352
-INFO:local_logger:Epoch[039/800], Step[0500/0626], Avg Loss: 0.7356
-INFO:master_logger:Epoch[039/800], Step[0500/0626], Avg Loss: 0.7354
-INFO:local_logger:Epoch[039/800], Step[0500/0626], Avg Loss: 0.7353
-INFO:local_logger:Epoch[039/800], Step[0600/0626], Avg Loss: 0.7351
-INFO:local_logger:Epoch[039/800], Step[0600/0626], Avg Loss: 0.7351
-INFO:local_logger:Epoch[039/800], Step[0600/0626], Avg Loss: 0.7354
-INFO:local_logger:Epoch[039/800], Step[0600/0626], Avg Loss: 0.7353
-INFO:local_logger:Epoch[039/800], Step[0600/0626], Avg Loss: 0.7351
-INFO:local_logger:Epoch[039/800], Step[0600/0626], Avg Loss: 0.7356
-INFO:local_logger:Epoch[039/800], Step[0600/0626], Avg Loss: 0.7356
-INFO:master_logger:Epoch[039/800], Step[0600/0626], Avg Loss: 0.7354
-INFO:local_logger:Epoch[039/800], Step[0600/0626], Avg Loss: 0.7355
-INFO:local_logger:----- Epoch[039/800], Train Loss: 0.7351, time: 865.25
-INFO:local_logger:----- Epoch[039/800], Train Loss: 0.7350, time: 864.90
-INFO:local_logger:Now training epoch 40. LR=0.000150
-INFO:local_logger:Now training epoch 40. LR=0.000150
-INFO:local_logger:----- Epoch[039/800], Train Loss: 0.7353, time: 860.87
-INFO:master_logger:----- Epoch[039/800], Train Loss: 0.7353, time: 860.87
-INFO:local_logger:----- Epoch[039/800], Train Loss: 0.7352, time: 864.70
-INFO:local_logger:Now training epoch 40. LR=0.000150
-INFO:local_logger:----- Epoch[039/800], Train Loss: 0.7355, time: 864.66
-INFO:local_logger:Now training epoch 40. LR=0.000150
-INFO:local_logger:----- Epoch[039/800], Train Loss: 0.7355, time: 864.70
-INFO:local_logger:Now training epoch 40. LR=0.000150
-INFO:local_logger:----- Epoch[039/800], Train Loss: 0.7356, time: 864.70
-INFO:local_logger:Now training epoch 40. LR=0.000150
-INFO:local_logger:----- Epoch[039/800], Train Loss: 0.7351, time: 865.02
-INFO:local_logger:Now training epoch 40. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-39-Loss-0.7353187804344304.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-39-Loss-0.7353187804344304.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-39-Loss-0.7353187804344304.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-39-Loss-0.7353187804344304.pdopt
-INFO:local_logger:Now training epoch 40. LR=0.000150
-INFO:master_logger:Now training epoch 40. LR=0.000150
-INFO:local_logger:Epoch[040/800], Step[0000/0626], Avg Loss: 0.7343
-INFO:local_logger:Epoch[040/800], Step[0000/0626], Avg Loss: 0.7310
-INFO:local_logger:Epoch[040/800], Step[0000/0626], Avg Loss: 0.7444
-INFO:local_logger:Epoch[040/800], Step[0000/0626], Avg Loss: 0.7305
-INFO:master_logger:Epoch[040/800], Step[0000/0626], Avg Loss: 0.7355
-INFO:local_logger:Epoch[040/800], Step[0000/0626], Avg Loss: 0.7357
-INFO:local_logger:Epoch[040/800], Step[0000/0626], Avg Loss: 0.7310
-INFO:local_logger:Epoch[040/800], Step[0000/0626], Avg Loss: 0.7439
-INFO:local_logger:Epoch[040/800], Step[0000/0626], Avg Loss: 0.7330
-INFO:local_logger:Epoch[040/800], Step[0100/0626], Avg Loss: 0.7348
-INFO:local_logger:Epoch[040/800], Step[0100/0626], Avg Loss: 0.7344
-INFO:local_logger:Epoch[040/800], Step[0100/0626], Avg Loss: 0.7346
-INFO:local_logger:Epoch[040/800], Step[0100/0626], Avg Loss: 0.7348
-INFO:master_logger:Epoch[040/800], Step[0100/0626], Avg Loss: 0.7341
-INFO:local_logger:Epoch[040/800], Step[0100/0626], Avg Loss: 0.7345
-INFO:local_logger:Epoch[040/800], Step[0100/0626], Avg Loss: 0.7336
-INFO:local_logger:Epoch[040/800], Step[0100/0626], Avg Loss: 0.7331
-INFO:local_logger:Epoch[040/800], Step[0100/0626], Avg Loss: 0.7333
-INFO:local_logger:Epoch[040/800], Step[0200/0626], Avg Loss: 0.7340
-INFO:local_logger:Epoch[040/800], Step[0200/0626], Avg Loss: 0.7343
-INFO:local_logger:Epoch[040/800], Step[0200/0626], Avg Loss: 0.7335
-INFO:local_logger:Epoch[040/800], Step[0200/0626], Avg Loss: 0.7340
-INFO:local_logger:Epoch[040/800], Step[0200/0626], Avg Loss: 0.7335
-INFO:local_logger:Epoch[040/800], Step[0200/0626], Avg Loss: 0.7334
-INFO:local_logger:Epoch[040/800], Step[0200/0626], Avg Loss: 0.7326
-INFO:master_logger:Epoch[040/800], Step[0200/0626], Avg Loss: 0.7336
-INFO:local_logger:Epoch[040/800], Step[0200/0626], Avg Loss: 0.7339
-INFO:local_logger:Epoch[040/800], Step[0300/0626], Avg Loss: 0.7338
-INFO:local_logger:Epoch[040/800], Step[0300/0626], Avg Loss: 0.7336
-INFO:local_logger:Epoch[040/800], Step[0300/0626], Avg Loss: 0.7331
-INFO:local_logger:Epoch[040/800], Step[0300/0626], Avg Loss: 0.7338
-INFO:local_logger:Epoch[040/800], Step[0300/0626], Avg Loss: 0.7332
-INFO:master_logger:Epoch[040/800], Step[0300/0626], Avg Loss: 0.7334
-INFO:local_logger:Epoch[040/800], Step[0300/0626], Avg Loss: 0.7329
-INFO:local_logger:Epoch[040/800], Step[0300/0626], Avg Loss: 0.7334
-INFO:local_logger:Epoch[040/800], Step[0300/0626], Avg Loss: 0.7338
-INFO:local_logger:Epoch[040/800], Step[0400/0626], Avg Loss: 0.7335
-INFO:local_logger:Epoch[040/800], Step[0400/0626], Avg Loss: 0.7335
-INFO:local_logger:Epoch[040/800], Step[0400/0626], Avg Loss: 0.7336
-INFO:local_logger:Epoch[040/800], Step[0400/0626], Avg Loss: 0.7334
-INFO:local_logger:Epoch[040/800], Step[0400/0626], Avg Loss: 0.7328
-INFO:local_logger:Epoch[040/800], Step[0400/0626], Avg Loss: 0.7332
-INFO:master_logger:Epoch[040/800], Step[0400/0626], Avg Loss: 0.7333
-INFO:local_logger:Epoch[040/800], Step[0400/0626], Avg Loss: 0.7332
-INFO:local_logger:Epoch[040/800], Step[0400/0626], Avg Loss: 0.7330
-INFO:local_logger:Epoch[040/800], Step[0500/0626], Avg Loss: 0.7330
-INFO:local_logger:Epoch[040/800], Step[0500/0626], Avg Loss: 0.7323
-INFO:local_logger:Epoch[040/800], Step[0500/0626], Avg Loss: 0.7333
-INFO:local_logger:Epoch[040/800], Step[0500/0626], Avg Loss: 0.7333
-INFO:local_logger:Epoch[040/800], Step[0500/0626], Avg Loss: 0.7329
-INFO:local_logger:Epoch[040/800], Step[0500/0626], Avg Loss: 0.7334
-INFO:local_logger:Epoch[040/800], Step[0500/0626], Avg Loss: 0.7330
-INFO:local_logger:Epoch[040/800], Step[0500/0626], Avg Loss: 0.7329
-INFO:master_logger:Epoch[040/800], Step[0500/0626], Avg Loss: 0.7330
-INFO:local_logger:Epoch[040/800], Step[0600/0626], Avg Loss: 0.7329
-INFO:local_logger:Epoch[040/800], Step[0600/0626], Avg Loss: 0.7329
-INFO:local_logger:Epoch[040/800], Step[0600/0626], Avg Loss: 0.7328
-INFO:local_logger:Epoch[040/800], Step[0600/0626], Avg Loss: 0.7329
-INFO:local_logger:Epoch[040/800], Step[0600/0626], Avg Loss: 0.7328
-INFO:local_logger:Epoch[040/800], Step[0600/0626], Avg Loss: 0.7329
-INFO:local_logger:Epoch[040/800], Step[0600/0626], Avg Loss: 0.7322
-INFO:master_logger:Epoch[040/800], Step[0600/0626], Avg Loss: 0.7328
-INFO:local_logger:Epoch[040/800], Step[0600/0626], Avg Loss: 0.7328
-INFO:local_logger:----- Epoch[040/800], Train Loss: 0.7328, time: 895.01
-INFO:local_logger:Now training epoch 41. LR=0.000150
-INFO:local_logger:----- Epoch[040/800], Train Loss: 0.7328, time: 890.99
-INFO:local_logger:----- Epoch[040/800], Train Loss: 0.7328, time: 895.12
-INFO:master_logger:----- Epoch[040/800], Train Loss: 0.7327, time: 890.99
-INFO:local_logger:Now training epoch 41. LR=0.000150
-INFO:local_logger:----- Epoch[040/800], Train Loss: 0.7328, time: 895.13
-INFO:local_logger:Now training epoch 41. LR=0.000150
-INFO:local_logger:----- Epoch[040/800], Train Loss: 0.7328, time: 894.99
-INFO:local_logger:Now training epoch 41. LR=0.000150
-INFO:local_logger:----- Epoch[040/800], Train Loss: 0.7328, time: 894.98
-INFO:local_logger:Now training epoch 41. LR=0.000150
-INFO:local_logger:----- Epoch[040/800], Train Loss: 0.7321, time: 894.99
-INFO:local_logger:Now training epoch 41. LR=0.000150
-INFO:local_logger:----- Epoch[040/800], Train Loss: 0.7327, time: 895.07
-INFO:local_logger:Now training epoch 41. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-40-Loss-0.7327702230552797.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-40-Loss-0.7327702230552797.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-40-Loss-0.7327702230552797.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-40-Loss-0.7327702230552797.pdopt
-INFO:local_logger:Now training epoch 41. LR=0.000150
-INFO:master_logger:Now training epoch 41. LR=0.000150
-INFO:local_logger:Epoch[041/800], Step[0000/0626], Avg Loss: 0.7292
-INFO:master_logger:Epoch[041/800], Step[0000/0626], Avg Loss: 0.7311
-INFO:local_logger:Epoch[041/800], Step[0000/0626], Avg Loss: 0.7233
-INFO:local_logger:Epoch[041/800], Step[0000/0626], Avg Loss: 0.7219
-INFO:local_logger:Epoch[041/800], Step[0000/0626], Avg Loss: 0.7395
-INFO:local_logger:Epoch[041/800], Step[0000/0626], Avg Loss: 0.7295
-INFO:local_logger:Epoch[041/800], Step[0000/0626], Avg Loss: 0.7279
-INFO:local_logger:Epoch[041/800], Step[0000/0626], Avg Loss: 0.7431
-INFO:local_logger:Epoch[041/800], Step[0000/0626], Avg Loss: 0.7341
-INFO:local_logger:Epoch[041/800], Step[0100/0626], Avg Loss: 0.7312
-INFO:local_logger:Epoch[041/800], Step[0100/0626], Avg Loss: 0.7315
-INFO:local_logger:Epoch[041/800], Step[0100/0626], Avg Loss: 0.7304
-INFO:local_logger:Epoch[041/800], Step[0100/0626], Avg Loss: 0.7317
-INFO:local_logger:Epoch[041/800], Step[0100/0626], Avg Loss: 0.7307
-INFO:master_logger:Epoch[041/800], Step[0100/0626], Avg Loss: 0.7311
-INFO:local_logger:Epoch[041/800], Step[0100/0626], Avg Loss: 0.7308
-INFO:local_logger:Epoch[041/800], Step[0100/0626], Avg Loss: 0.7302
-INFO:local_logger:Epoch[041/800], Step[0100/0626], Avg Loss: 0.7321
-INFO:local_logger:Epoch[041/800], Step[0200/0626], Avg Loss: 0.7308
-INFO:local_logger:Epoch[041/800], Step[0200/0626], Avg Loss: 0.7315
-INFO:local_logger:Epoch[041/800], Step[0200/0626], Avg Loss: 0.7308
-INFO:local_logger:Epoch[041/800], Step[0200/0626], Avg Loss: 0.7317
-INFO:local_logger:Epoch[041/800], Step[0200/0626], Avg Loss: 0.7303
-INFO:local_logger:Epoch[041/800], Step[0200/0626], Avg Loss: 0.7313
-INFO:master_logger:Epoch[041/800], Step[0200/0626], Avg Loss: 0.7310
-INFO:local_logger:Epoch[041/800], Step[0200/0626], Avg Loss: 0.7309
-INFO:local_logger:Epoch[041/800], Step[0200/0626], Avg Loss: 0.7308
-INFO:local_logger:Epoch[041/800], Step[0300/0626], Avg Loss: 0.7310
-INFO:local_logger:Epoch[041/800], Step[0300/0626], Avg Loss: 0.7307
-INFO:local_logger:Epoch[041/800], Step[0300/0626], Avg Loss: 0.7304
-INFO:local_logger:Epoch[041/800], Step[0300/0626], Avg Loss: 0.7307
-INFO:master_logger:Epoch[041/800], Step[0300/0626], Avg Loss: 0.7308
-INFO:local_logger:Epoch[041/800], Step[0300/0626], Avg Loss: 0.7310
-INFO:local_logger:Epoch[041/800], Step[0300/0626], Avg Loss: 0.7308
-INFO:local_logger:Epoch[041/800], Step[0300/0626], Avg Loss: 0.7312
-INFO:local_logger:Epoch[041/800], Step[0300/0626], Avg Loss: 0.7304
-INFO:local_logger:Epoch[041/800], Step[0400/0626], Avg Loss: 0.7312
-INFO:local_logger:Epoch[041/800], Step[0400/0626], Avg Loss: 0.7309
-INFO:local_logger:Epoch[041/800], Step[0400/0626], Avg Loss: 0.7301
-INFO:local_logger:Epoch[041/800], Step[0400/0626], Avg Loss: 0.7305
-INFO:local_logger:Epoch[041/800], Step[0400/0626], Avg Loss: 0.7305
-INFO:master_logger:Epoch[041/800], Step[0400/0626], Avg Loss: 0.7306
-INFO:local_logger:Epoch[041/800], Step[0400/0626], Avg Loss: 0.7307
-INFO:local_logger:Epoch[041/800], Step[0400/0626], Avg Loss: 0.7303
-INFO:local_logger:Epoch[041/800], Step[0400/0626], Avg Loss: 0.7307
-INFO:local_logger:Epoch[041/800], Step[0500/0626], Avg Loss: 0.7303
-INFO:local_logger:Epoch[041/800], Step[0500/0626], Avg Loss: 0.7302
-INFO:local_logger:Epoch[041/800], Step[0500/0626], Avg Loss: 0.7306
-INFO:local_logger:Epoch[041/800], Step[0500/0626], Avg Loss: 0.7305
-INFO:local_logger:Epoch[041/800], Step[0500/0626], Avg Loss: 0.7300
-INFO:local_logger:Epoch[041/800], Step[0500/0626], Avg Loss: 0.7306
-INFO:local_logger:Epoch[041/800], Step[0500/0626], Avg Loss: 0.7304
-INFO:local_logger:Epoch[041/800], Step[0500/0626], Avg Loss: 0.7304
-INFO:master_logger:Epoch[041/800], Step[0500/0626], Avg Loss: 0.7304
-INFO:local_logger:Epoch[041/800], Step[0600/0626], Avg Loss: 0.7303
-INFO:local_logger:Epoch[041/800], Step[0600/0626], Avg Loss: 0.7304
-INFO:local_logger:Epoch[041/800], Step[0600/0626], Avg Loss: 0.7301
-INFO:local_logger:Epoch[041/800], Step[0600/0626], Avg Loss: 0.7299
-INFO:local_logger:Epoch[041/800], Step[0600/0626], Avg Loss: 0.7302
-INFO:master_logger:Epoch[041/800], Step[0600/0626], Avg Loss: 0.7301
-INFO:local_logger:Epoch[041/800], Step[0600/0626], Avg Loss: 0.7299
-INFO:local_logger:Epoch[041/800], Step[0600/0626], Avg Loss: 0.7302
-INFO:local_logger:Epoch[041/800], Step[0600/0626], Avg Loss: 0.7301
-INFO:local_logger:----- Epoch[041/800], Train Loss: 0.7305, time: 866.86
-INFO:local_logger:Now training epoch 42. LR=0.000150
-INFO:local_logger:----- Epoch[041/800], Train Loss: 0.7300, time: 866.90
-INFO:local_logger:Now training epoch 42. LR=0.000150
-INFO:local_logger:----- Epoch[041/800], Train Loss: 0.7304, time: 867.40
-INFO:local_logger:Now training epoch 42. LR=0.000150
-INFO:local_logger:----- Epoch[041/800], Train Loss: 0.7303, time: 863.76
-INFO:master_logger:----- Epoch[041/800], Train Loss: 0.7302, time: 863.76
-INFO:local_logger:----- Epoch[041/800], Train Loss: 0.7301, time: 867.52
-INFO:local_logger:Now training epoch 42. LR=0.000150
-INFO:local_logger:----- Epoch[041/800], Train Loss: 0.7303, time: 867.54
-INFO:local_logger:Now training epoch 42. LR=0.000150
-INFO:local_logger:----- Epoch[041/800], Train Loss: 0.7302, time: 867.65
-INFO:local_logger:Now training epoch 42. LR=0.000150
-INFO:local_logger:----- Epoch[041/800], Train Loss: 0.7300, time: 867.66
-INFO:local_logger:Now training epoch 42. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-41-Loss-0.7302703064055491.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-41-Loss-0.7302703064055491.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-41-Loss-0.7302703064055491.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-41-Loss-0.7302703064055491.pdopt
-INFO:local_logger:Now training epoch 42. LR=0.000150
-INFO:master_logger:Now training epoch 42. LR=0.000150
-INFO:local_logger:Epoch[042/800], Step[0000/0626], Avg Loss: 0.7300
-INFO:local_logger:Epoch[042/800], Step[0000/0626], Avg Loss: 0.7398
-INFO:master_logger:Epoch[042/800], Step[0000/0626], Avg Loss: 0.7281
-INFO:local_logger:Epoch[042/800], Step[0000/0626], Avg Loss: 0.7322
-INFO:local_logger:Epoch[042/800], Step[0000/0626], Avg Loss: 0.7185
-INFO:local_logger:Epoch[042/800], Step[0000/0626], Avg Loss: 0.7206
-INFO:local_logger:Epoch[042/800], Step[0000/0626], Avg Loss: 0.7231
-INFO:local_logger:Epoch[042/800], Step[0000/0626], Avg Loss: 0.7259
-INFO:local_logger:Epoch[042/800], Step[0000/0626], Avg Loss: 0.7347
-INFO:local_logger:Epoch[042/800], Step[0100/0626], Avg Loss: 0.7299
-INFO:local_logger:Epoch[042/800], Step[0100/0626], Avg Loss: 0.7279
-INFO:local_logger:Epoch[042/800], Step[0100/0626], Avg Loss: 0.7292
-INFO:local_logger:Epoch[042/800], Step[0100/0626], Avg Loss: 0.7300
-INFO:local_logger:Epoch[042/800], Step[0100/0626], Avg Loss: 0.7287
-INFO:local_logger:Epoch[042/800], Step[0100/0626], Avg Loss: 0.7292
-INFO:master_logger:Epoch[042/800], Step[0100/0626], Avg Loss: 0.7291
-INFO:local_logger:Epoch[042/800], Step[0100/0626], Avg Loss: 0.7283
-INFO:local_logger:Epoch[042/800], Step[0100/0626], Avg Loss: 0.7297
-INFO:local_logger:Epoch[042/800], Step[0200/0626], Avg Loss: 0.7285
-INFO:local_logger:Epoch[042/800], Step[0200/0626], Avg Loss: 0.7286
-INFO:local_logger:Epoch[042/800], Step[0200/0626], Avg Loss: 0.7290
-INFO:local_logger:Epoch[042/800], Step[0200/0626], Avg Loss: 0.7289
-INFO:local_logger:Epoch[042/800], Step[0200/0626], Avg Loss: 0.7292
-INFO:local_logger:Epoch[042/800], Step[0200/0626], Avg Loss: 0.7282
-INFO:master_logger:Epoch[042/800], Step[0200/0626], Avg Loss: 0.7286
-INFO:local_logger:Epoch[042/800], Step[0200/0626], Avg Loss: 0.7284
-INFO:local_logger:Epoch[042/800], Step[0200/0626], Avg Loss: 0.7283
-INFO:local_logger:Epoch[042/800], Step[0300/0626], Avg Loss: 0.7283
-INFO:local_logger:Epoch[042/800], Step[0300/0626], Avg Loss: 0.7288
-INFO:local_logger:Epoch[042/800], Step[0300/0626], Avg Loss: 0.7283
-INFO:local_logger:Epoch[042/800], Step[0300/0626], Avg Loss: 0.7277
-INFO:local_logger:Epoch[042/800], Step[0300/0626], Avg Loss: 0.7284
-INFO:local_logger:Epoch[042/800], Step[0300/0626], Avg Loss: 0.7286
-INFO:master_logger:Epoch[042/800], Step[0300/0626], Avg Loss: 0.7285
-INFO:local_logger:Epoch[042/800], Step[0300/0626], Avg Loss: 0.7287
-INFO:local_logger:Epoch[042/800], Step[0300/0626], Avg Loss: 0.7288
-INFO:local_logger:Epoch[042/800], Step[0400/0626], Avg Loss: 0.7279
-INFO:local_logger:Epoch[042/800], Step[0400/0626], Avg Loss: 0.7285
-INFO:local_logger:Epoch[042/800], Step[0400/0626], Avg Loss: 0.7287
-INFO:local_logger:Epoch[042/800], Step[0400/0626], Avg Loss: 0.7288
-INFO:local_logger:Epoch[042/800], Step[0400/0626], Avg Loss: 0.7283
-INFO:local_logger:Epoch[042/800], Step[0400/0626], Avg Loss: 0.7275
-INFO:local_logger:Epoch[042/800], Step[0400/0626], Avg Loss: 0.7284
-INFO:master_logger:Epoch[042/800], Step[0400/0626], Avg Loss: 0.7283
-INFO:local_logger:Epoch[042/800], Step[0400/0626], Avg Loss: 0.7279
-INFO:local_logger:Epoch[042/800], Step[0500/0626], Avg Loss: 0.7278
-INFO:local_logger:Epoch[042/800], Step[0500/0626], Avg Loss: 0.7284
-INFO:local_logger:Epoch[042/800], Step[0500/0626], Avg Loss: 0.7285
-INFO:local_logger:Epoch[042/800], Step[0500/0626], Avg Loss: 0.7284
-INFO:local_logger:Epoch[042/800], Step[0500/0626], Avg Loss: 0.7277
-INFO:local_logger:Epoch[042/800], Step[0500/0626], Avg Loss: 0.7279
-INFO:local_logger:Epoch[042/800], Step[0500/0626], Avg Loss: 0.7278
-INFO:local_logger:Epoch[042/800], Step[0500/0626], Avg Loss: 0.7285
-INFO:master_logger:Epoch[042/800], Step[0500/0626], Avg Loss: 0.7281
-INFO:local_logger:Epoch[042/800], Step[0600/0626], Avg Loss: 0.7278
-INFO:local_logger:Epoch[042/800], Step[0600/0626], Avg Loss: 0.7276
-INFO:local_logger:Epoch[042/800], Step[0600/0626], Avg Loss: 0.7285
-INFO:local_logger:Epoch[042/800], Step[0600/0626], Avg Loss: 0.7278
-INFO:local_logger:Epoch[042/800], Step[0600/0626], Avg Loss: 0.7277
-INFO:local_logger:Epoch[042/800], Step[0600/0626], Avg Loss: 0.7283
-INFO:local_logger:Epoch[042/800], Step[0600/0626], Avg Loss: 0.7282
-INFO:master_logger:Epoch[042/800], Step[0600/0626], Avg Loss: 0.7280
-INFO:local_logger:Epoch[042/800], Step[0600/0626], Avg Loss: 0.7280
-INFO:local_logger:----- Epoch[042/800], Train Loss: 0.7281, time: 892.19
-INFO:local_logger:Now training epoch 43. LR=0.000150
-INFO:local_logger:----- Epoch[042/800], Train Loss: 0.7280, time: 892.20
-INFO:local_logger:Now training epoch 43. LR=0.000150
-INFO:local_logger:----- Epoch[042/800], Train Loss: 0.7277, time: 892.57
-INFO:local_logger:Now training epoch 43. LR=0.000150
-INFO:local_logger:----- Epoch[042/800], Train Loss: 0.7276, time: 893.34
-INFO:local_logger:Now training epoch 43. LR=0.000150
-INFO:local_logger:----- Epoch[042/800], Train Loss: 0.7275, time: 892.63
-INFO:local_logger:Now training epoch 43. LR=0.000150
-INFO:local_logger:----- Epoch[042/800], Train Loss: 0.7283, time: 889.09
-INFO:master_logger:----- Epoch[042/800], Train Loss: 0.7279, time: 889.09
-INFO:local_logger:----- Epoch[042/800], Train Loss: 0.7282, time: 893.43
-INFO:local_logger:Now training epoch 43. LR=0.000150
-INFO:local_logger:----- Epoch[042/800], Train Loss: 0.7279, time: 892.89
-INFO:local_logger:Now training epoch 43. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-42-Loss-0.728333370828915.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-42-Loss-0.728333370828915.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-42-Loss-0.728333370828915.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-42-Loss-0.728333370828915.pdopt
-INFO:local_logger:Now training epoch 43. LR=0.000150
-INFO:master_logger:Now training epoch 43. LR=0.000150
-INFO:local_logger:Epoch[043/800], Step[0000/0626], Avg Loss: 0.7259
-INFO:local_logger:Epoch[043/800], Step[0000/0626], Avg Loss: 0.7157
-INFO:master_logger:Epoch[043/800], Step[0000/0626], Avg Loss: 0.7299
-INFO:local_logger:Epoch[043/800], Step[0000/0626], Avg Loss: 0.7296
-INFO:local_logger:Epoch[043/800], Step[0000/0626], Avg Loss: 0.7389
-INFO:local_logger:Epoch[043/800], Step[0000/0626], Avg Loss: 0.7312
-INFO:local_logger:Epoch[043/800], Step[0000/0626], Avg Loss: 0.7351
-INFO:local_logger:Epoch[043/800], Step[0000/0626], Avg Loss: 0.7178
-INFO:local_logger:Epoch[043/800], Step[0000/0626], Avg Loss: 0.7452
-INFO:local_logger:Epoch[043/800], Step[0100/0626], Avg Loss: 0.7266
-INFO:local_logger:Epoch[043/800], Step[0100/0626], Avg Loss: 0.7263
-INFO:local_logger:Epoch[043/800], Step[0100/0626], Avg Loss: 0.7264
-INFO:local_logger:Epoch[043/800], Step[0100/0626], Avg Loss: 0.7253
-INFO:master_logger:Epoch[043/800], Step[0100/0626], Avg Loss: 0.7262
-INFO:local_logger:Epoch[043/800], Step[0100/0626], Avg Loss: 0.7269
-INFO:local_logger:Epoch[043/800], Step[0100/0626], Avg Loss: 0.7261
-INFO:local_logger:Epoch[043/800], Step[0100/0626], Avg Loss: 0.7256
-INFO:local_logger:Epoch[043/800], Step[0100/0626], Avg Loss: 0.7263
-INFO:local_logger:Epoch[043/800], Step[0200/0626], Avg Loss: 0.7260
-INFO:local_logger:Epoch[043/800], Step[0200/0626], Avg Loss: 0.7257
-INFO:local_logger:Epoch[043/800], Step[0200/0626], Avg Loss: 0.7268
-INFO:local_logger:Epoch[043/800], Step[0200/0626], Avg Loss: 0.7256
-INFO:local_logger:Epoch[043/800], Step[0200/0626], Avg Loss: 0.7263
-INFO:local_logger:Epoch[043/800], Step[0200/0626], Avg Loss: 0.7260
-INFO:master_logger:Epoch[043/800], Step[0200/0626], Avg Loss: 0.7260
-INFO:local_logger:Epoch[043/800], Step[0200/0626], Avg Loss: 0.7257
-INFO:local_logger:Epoch[043/800], Step[0200/0626], Avg Loss: 0.7258
-INFO:local_logger:Epoch[043/800], Step[0300/0626], Avg Loss: 0.7257
-INFO:local_logger:Epoch[043/800], Step[0300/0626], Avg Loss: 0.7262
-INFO:master_logger:Epoch[043/800], Step[0300/0626], Avg Loss: 0.7259
-INFO:local_logger:Epoch[043/800], Step[0300/0626], Avg Loss: 0.7260
-INFO:local_logger:Epoch[043/800], Step[0300/0626], Avg Loss: 0.7256
-INFO:local_logger:Epoch[043/800], Step[0300/0626], Avg Loss: 0.7264
-INFO:local_logger:Epoch[043/800], Step[0300/0626], Avg Loss: 0.7262
-INFO:local_logger:Epoch[043/800], Step[0300/0626], Avg Loss: 0.7257
-INFO:local_logger:Epoch[043/800], Step[0300/0626], Avg Loss: 0.7256
-INFO:local_logger:Epoch[043/800], Step[0400/0626], Avg Loss: 0.7257
-INFO:local_logger:Epoch[043/800], Step[0400/0626], Avg Loss: 0.7261
-INFO:local_logger:Epoch[043/800], Step[0400/0626], Avg Loss: 0.7257
-INFO:master_logger:Epoch[043/800], Step[0400/0626], Avg Loss: 0.7258
-INFO:local_logger:Epoch[043/800], Step[0400/0626], Avg Loss: 0.7257
-INFO:local_logger:Epoch[043/800], Step[0400/0626], Avg Loss: 0.7254
-INFO:local_logger:Epoch[043/800], Step[0400/0626], Avg Loss: 0.7253
-INFO:local_logger:Epoch[043/800], Step[0400/0626], Avg Loss: 0.7260
-INFO:local_logger:Epoch[043/800], Step[0400/0626], Avg Loss: 0.7263
-INFO:local_logger:Epoch[043/800], Step[0500/0626], Avg Loss: 0.7253
-INFO:local_logger:Epoch[043/800], Step[0500/0626], Avg Loss: 0.7257
-INFO:local_logger:Epoch[043/800], Step[0500/0626], Avg Loss: 0.7255
-INFO:local_logger:Epoch[043/800], Step[0500/0626], Avg Loss: 0.7258
-INFO:local_logger:Epoch[043/800], Step[0500/0626], Avg Loss: 0.7260
-INFO:master_logger:Epoch[043/800], Step[0500/0626], Avg Loss: 0.7256
-INFO:local_logger:Epoch[043/800], Step[0500/0626], Avg Loss: 0.7253
-INFO:local_logger:Epoch[043/800], Step[0500/0626], Avg Loss: 0.7255
-INFO:local_logger:Epoch[043/800], Step[0500/0626], Avg Loss: 0.7259
-INFO:local_logger:Epoch[043/800], Step[0600/0626], Avg Loss: 0.7256
-INFO:local_logger:Epoch[043/800], Step[0600/0626], Avg Loss: 0.7255
-INFO:local_logger:Epoch[043/800], Step[0600/0626], Avg Loss: 0.7258
-INFO:local_logger:Epoch[043/800], Step[0600/0626], Avg Loss: 0.7258
-INFO:local_logger:Epoch[043/800], Step[0600/0626], Avg Loss: 0.7253
-INFO:master_logger:Epoch[043/800], Step[0600/0626], Avg Loss: 0.7256
-INFO:local_logger:Epoch[043/800], Step[0600/0626], Avg Loss: 0.7254
-INFO:local_logger:Epoch[043/800], Step[0600/0626], Avg Loss: 0.7252
-INFO:local_logger:Epoch[043/800], Step[0600/0626], Avg Loss: 0.7258
-INFO:local_logger:----- Epoch[043/800], Train Loss: 0.7254, time: 856.34
-INFO:local_logger:Now training epoch 44. LR=0.000150
-INFO:local_logger:----- Epoch[043/800], Train Loss: 0.7256, time: 855.79
-INFO:local_logger:Now training epoch 44. LR=0.000150
-INFO:local_logger:----- Epoch[043/800], Train Loss: 0.7253, time: 856.30
-INFO:local_logger:Now training epoch 44. LR=0.000150
-INFO:local_logger:----- Epoch[043/800], Train Loss: 0.7258, time: 856.24
-INFO:local_logger:Now training epoch 44. LR=0.000150
-INFO:local_logger:----- Epoch[043/800], Train Loss: 0.7256, time: 852.55
-INFO:master_logger:----- Epoch[043/800], Train Loss: 0.7255, time: 852.55
-INFO:local_logger:----- Epoch[043/800], Train Loss: 0.7252, time: 856.26
-INFO:local_logger:Now training epoch 44. LR=0.000150
-INFO:local_logger:----- Epoch[043/800], Train Loss: 0.7258, time: 856.83
-INFO:local_logger:Now training epoch 44. LR=0.000150
-INFO:local_logger:----- Epoch[043/800], Train Loss: 0.7257, time: 856.33
-INFO:local_logger:Now training epoch 44. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-43-Loss-0.7255999422779928.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-43-Loss-0.7255999422779928.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-43-Loss-0.7255999422779928.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-43-Loss-0.7255999422779928.pdopt
-INFO:local_logger:Now training epoch 44. LR=0.000150
-INFO:master_logger:Now training epoch 44. LR=0.000150
-INFO:local_logger:Epoch[044/800], Step[0000/0626], Avg Loss: 0.7265
-INFO:local_logger:Epoch[044/800], Step[0000/0626], Avg Loss: 0.7228
-INFO:master_logger:Epoch[044/800], Step[0000/0626], Avg Loss: 0.7233
-INFO:local_logger:Epoch[044/800], Step[0000/0626], Avg Loss: 0.7197
-INFO:local_logger:Epoch[044/800], Step[0000/0626], Avg Loss: 0.7206
-INFO:local_logger:Epoch[044/800], Step[0000/0626], Avg Loss: 0.7307
-INFO:local_logger:Epoch[044/800], Step[0000/0626], Avg Loss: 0.7300
-INFO:local_logger:Epoch[044/800], Step[0000/0626], Avg Loss: 0.7199
-INFO:local_logger:Epoch[044/800], Step[0000/0626], Avg Loss: 0.7161
-INFO:local_logger:Epoch[044/800], Step[0100/0626], Avg Loss: 0.7253
-INFO:local_logger:Epoch[044/800], Step[0100/0626], Avg Loss: 0.7241
-INFO:local_logger:Epoch[044/800], Step[0100/0626], Avg Loss: 0.7232
-INFO:local_logger:Epoch[044/800], Step[0100/0626], Avg Loss: 0.7238
-INFO:local_logger:Epoch[044/800], Step[0100/0626], Avg Loss: 0.7251
-INFO:local_logger:Epoch[044/800], Step[0100/0626], Avg Loss: 0.7248
-INFO:local_logger:Epoch[044/800], Step[0100/0626], Avg Loss: 0.7246
-INFO:local_logger:Epoch[044/800], Step[0100/0626], Avg Loss: 0.7250
-INFO:master_logger:Epoch[044/800], Step[0100/0626], Avg Loss: 0.7245
-INFO:local_logger:Epoch[044/800], Step[0200/0626], Avg Loss: 0.7238
-INFO:local_logger:Epoch[044/800], Step[0200/0626], Avg Loss: 0.7246
-INFO:local_logger:Epoch[044/800], Step[0200/0626], Avg Loss: 0.7237
-INFO:local_logger:Epoch[044/800], Step[0200/0626], Avg Loss: 0.7243
-INFO:local_logger:Epoch[044/800], Step[0200/0626], Avg Loss: 0.7241
-INFO:local_logger:Epoch[044/800], Step[0200/0626], Avg Loss: 0.7251
-INFO:master_logger:Epoch[044/800], Step[0200/0626], Avg Loss: 0.7243
-INFO:local_logger:Epoch[044/800], Step[0200/0626], Avg Loss: 0.7249
-INFO:local_logger:Epoch[044/800], Step[0200/0626], Avg Loss: 0.7239
-INFO:local_logger:Epoch[044/800], Step[0300/0626], Avg Loss: 0.7239
-INFO:local_logger:Epoch[044/800], Step[0300/0626], Avg Loss: 0.7241
-INFO:master_logger:Epoch[044/800], Step[0300/0626], Avg Loss: 0.7240
-INFO:local_logger:Epoch[044/800], Step[0300/0626], Avg Loss: 0.7239
-INFO:local_logger:Epoch[044/800], Step[0300/0626], Avg Loss: 0.7241
-INFO:local_logger:Epoch[044/800], Step[0300/0626], Avg Loss: 0.7235
-INFO:local_logger:Epoch[044/800], Step[0300/0626], Avg Loss: 0.7242
-INFO:local_logger:Epoch[044/800], Step[0300/0626], Avg Loss: 0.7247
-INFO:local_logger:Epoch[044/800], Step[0300/0626], Avg Loss: 0.7236
-INFO:local_logger:Epoch[044/800], Step[0400/0626], Avg Loss: 0.7238
-INFO:local_logger:Epoch[044/800], Step[0400/0626], Avg Loss: 0.7236
-INFO:local_logger:Epoch[044/800], Step[0400/0626], Avg Loss: 0.7241
-INFO:master_logger:Epoch[044/800], Step[0400/0626], Avg Loss: 0.7238
-INFO:local_logger:Epoch[044/800], Step[0400/0626], Avg Loss: 0.7236
-INFO:local_logger:Epoch[044/800], Step[0400/0626], Avg Loss: 0.7233
-INFO:local_logger:Epoch[044/800], Step[0400/0626], Avg Loss: 0.7243
-INFO:local_logger:Epoch[044/800], Step[0400/0626], Avg Loss: 0.7240
-INFO:local_logger:Epoch[044/800], Step[0400/0626], Avg Loss: 0.7239
-INFO:local_logger:Epoch[044/800], Step[0500/0626], Avg Loss: 0.7237
-INFO:local_logger:Epoch[044/800], Step[0500/0626], Avg Loss: 0.7240
-INFO:local_logger:Epoch[044/800], Step[0500/0626], Avg Loss: 0.7234
-INFO:master_logger:Epoch[044/800], Step[0500/0626], Avg Loss: 0.7236
-INFO:local_logger:Epoch[044/800], Step[0500/0626], Avg Loss: 0.7238
-INFO:local_logger:Epoch[044/800], Step[0500/0626], Avg Loss: 0.7231
-INFO:local_logger:Epoch[044/800], Step[0500/0626], Avg Loss: 0.7238
-INFO:local_logger:Epoch[044/800], Step[0500/0626], Avg Loss: 0.7237
-INFO:local_logger:Epoch[044/800], Step[0500/0626], Avg Loss: 0.7235
-INFO:local_logger:Epoch[044/800], Step[0600/0626], Avg Loss: 0.7236
-INFO:local_logger:Epoch[044/800], Step[0600/0626], Avg Loss: 0.7237
-INFO:local_logger:Epoch[044/800], Step[0600/0626], Avg Loss: 0.7235
-INFO:local_logger:Epoch[044/800], Step[0600/0626], Avg Loss: 0.7235
-INFO:local_logger:Epoch[044/800], Step[0600/0626], Avg Loss: 0.7231
-INFO:master_logger:Epoch[044/800], Step[0600/0626], Avg Loss: 0.7235
-INFO:local_logger:Epoch[044/800], Step[0600/0626], Avg Loss: 0.7237
-INFO:local_logger:Epoch[044/800], Step[0600/0626], Avg Loss: 0.7235
-INFO:local_logger:Epoch[044/800], Step[0600/0626], Avg Loss: 0.7233
-INFO:local_logger:----- Epoch[044/800], Train Loss: 0.7236, time: 893.49
-INFO:local_logger:Now training epoch 45. LR=0.000150
-INFO:local_logger:----- Epoch[044/800], Train Loss: 0.7231, time: 893.57
-INFO:local_logger:Now training epoch 45. LR=0.000150
-INFO:local_logger:----- Epoch[044/800], Train Loss: 0.7234, time: 893.55
-INFO:local_logger:Now training epoch 45. LR=0.000150
-INFO:local_logger:----- Epoch[044/800], Train Loss: 0.7234, time: 893.57
-INFO:local_logger:Now training epoch 45. LR=0.000150
-INFO:local_logger:----- Epoch[044/800], Train Loss: 0.7235, time: 889.85
-INFO:master_logger:----- Epoch[044/800], Train Loss: 0.7235, time: 889.85
-INFO:local_logger:----- Epoch[044/800], Train Loss: 0.7237, time: 894.16
-INFO:local_logger:Now training epoch 45. LR=0.000150
-INFO:local_logger:----- Epoch[044/800], Train Loss: 0.7238, time: 894.14
-INFO:local_logger:Now training epoch 45. LR=0.000150
-INFO:local_logger:----- Epoch[044/800], Train Loss: 0.7235, time: 893.86
-INFO:local_logger:Now training epoch 45. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-44-Loss-0.723522978396613.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-44-Loss-0.723522978396613.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-44-Loss-0.723522978396613.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-44-Loss-0.723522978396613.pdopt
-INFO:local_logger:Now training epoch 45. LR=0.000150
-INFO:master_logger:Now training epoch 45. LR=0.000150
-INFO:local_logger:Epoch[045/800], Step[0000/0626], Avg Loss: 0.7270
-INFO:local_logger:Epoch[045/800], Step[0000/0626], Avg Loss: 0.7223
-INFO:local_logger:Epoch[045/800], Step[0000/0626], Avg Loss: 0.7158
-INFO:master_logger:Epoch[045/800], Step[0000/0626], Avg Loss: 0.7205
-INFO:local_logger:Epoch[045/800], Step[0000/0626], Avg Loss: 0.7192
-INFO:local_logger:Epoch[045/800], Step[0000/0626], Avg Loss: 0.7172
-INFO:local_logger:Epoch[045/800], Step[0000/0626], Avg Loss: 0.7222
-INFO:local_logger:Epoch[045/800], Step[0000/0626], Avg Loss: 0.7194
-INFO:local_logger:Epoch[045/800], Step[0000/0626], Avg Loss: 0.7208
-INFO:local_logger:Epoch[045/800], Step[0100/0626], Avg Loss: 0.7217
-INFO:local_logger:Epoch[045/800], Step[0100/0626], Avg Loss: 0.7221
-INFO:local_logger:Epoch[045/800], Step[0100/0626], Avg Loss: 0.7229
-INFO:local_logger:Epoch[045/800], Step[0100/0626], Avg Loss: 0.7221
-INFO:local_logger:Epoch[045/800], Step[0100/0626], Avg Loss: 0.7228
-INFO:local_logger:Epoch[045/800], Step[0100/0626], Avg Loss: 0.7213
-INFO:master_logger:Epoch[045/800], Step[0100/0626], Avg Loss: 0.7223
-INFO:local_logger:Epoch[045/800], Step[0100/0626], Avg Loss: 0.7222
-INFO:local_logger:Epoch[045/800], Step[0100/0626], Avg Loss: 0.7230
-INFO:local_logger:Epoch[045/800], Step[0200/0626], Avg Loss: 0.7218
-INFO:local_logger:Epoch[045/800], Step[0200/0626], Avg Loss: 0.7221
-INFO:local_logger:Epoch[045/800], Step[0200/0626], Avg Loss: 0.7231
-INFO:local_logger:Epoch[045/800], Step[0200/0626], Avg Loss: 0.7222
-INFO:local_logger:Epoch[045/800], Step[0200/0626], Avg Loss: 0.7228
-INFO:local_logger:Epoch[045/800], Step[0200/0626], Avg Loss: 0.7222
-INFO:local_logger:Epoch[045/800], Step[0200/0626], Avg Loss: 0.7222
-INFO:master_logger:Epoch[045/800], Step[0200/0626], Avg Loss: 0.7223
-INFO:local_logger:Epoch[045/800], Step[0200/0626], Avg Loss: 0.7219
-INFO:local_logger:Epoch[045/800], Step[0300/0626], Avg Loss: 0.7219
-INFO:local_logger:Epoch[045/800], Step[0300/0626], Avg Loss: 0.7220
-INFO:local_logger:Epoch[045/800], Step[0300/0626], Avg Loss: 0.7218
-INFO:local_logger:Epoch[045/800], Step[0300/0626], Avg Loss: 0.7219
-INFO:local_logger:Epoch[045/800], Step[0300/0626], Avg Loss: 0.7220
-INFO:master_logger:Epoch[045/800], Step[0300/0626], Avg Loss: 0.7219
-INFO:local_logger:Epoch[045/800], Step[0300/0626], Avg Loss: 0.7227
-INFO:local_logger:Epoch[045/800], Step[0300/0626], Avg Loss: 0.7213
-INFO:local_logger:Epoch[045/800], Step[0300/0626], Avg Loss: 0.7219
-INFO:local_logger:Epoch[045/800], Step[0400/0626], Avg Loss: 0.7217
-INFO:local_logger:Epoch[045/800], Step[0400/0626], Avg Loss: 0.7217
-INFO:local_logger:Epoch[045/800], Step[0400/0626], Avg Loss: 0.7217
-INFO:master_logger:Epoch[045/800], Step[0400/0626], Avg Loss: 0.7218
-INFO:local_logger:Epoch[045/800], Step[0400/0626], Avg Loss: 0.7212
-INFO:local_logger:Epoch[045/800], Step[0400/0626], Avg Loss: 0.7219
-INFO:local_logger:Epoch[045/800], Step[0400/0626], Avg Loss: 0.7220
-INFO:local_logger:Epoch[045/800], Step[0400/0626], Avg Loss: 0.7217
-INFO:local_logger:Epoch[045/800], Step[0400/0626], Avg Loss: 0.7224
-INFO:local_logger:Epoch[045/800], Step[0500/0626], Avg Loss: 0.7217
-INFO:local_logger:Epoch[045/800], Step[0500/0626], Avg Loss: 0.7217
-INFO:local_logger:Epoch[045/800], Step[0500/0626], Avg Loss: 0.7220
-INFO:master_logger:Epoch[045/800], Step[0500/0626], Avg Loss: 0.7218
-INFO:local_logger:Epoch[045/800], Step[0500/0626], Avg Loss: 0.7216
-INFO:local_logger:Epoch[045/800], Step[0500/0626], Avg Loss: 0.7218
-INFO:local_logger:Epoch[045/800], Step[0500/0626], Avg Loss: 0.7213
-INFO:local_logger:Epoch[045/800], Step[0500/0626], Avg Loss: 0.7217
-INFO:local_logger:Epoch[045/800], Step[0500/0626], Avg Loss: 0.7223
-INFO:local_logger:Epoch[045/800], Step[0600/0626], Avg Loss: 0.7212
-INFO:master_logger:Epoch[045/800], Step[0600/0626], Avg Loss: 0.7216
-INFO:local_logger:Epoch[045/800], Step[0600/0626], Avg Loss: 0.7216
-INFO:local_logger:Epoch[045/800], Step[0600/0626], Avg Loss: 0.7216
-INFO:local_logger:Epoch[045/800], Step[0600/0626], Avg Loss: 0.7220
-INFO:local_logger:Epoch[045/800], Step[0600/0626], Avg Loss: 0.7217
-INFO:local_logger:Epoch[045/800], Step[0600/0626], Avg Loss: 0.7219
-INFO:local_logger:Epoch[045/800], Step[0600/0626], Avg Loss: 0.7214
-INFO:local_logger:Epoch[045/800], Step[0600/0626], Avg Loss: 0.7213
-INFO:local_logger:----- Epoch[045/800], Train Loss: 0.7212, time: 859.11
-INFO:local_logger:Now training epoch 46. LR=0.000150
-INFO:local_logger:----- Epoch[045/800], Train Loss: 0.7212, time: 855.42
-INFO:master_logger:----- Epoch[045/800], Train Loss: 0.7216, time: 855.42
-INFO:local_logger:----- Epoch[045/800], Train Loss: 0.7216, time: 859.01
-INFO:local_logger:Now training epoch 46. LR=0.000150
-INFO:local_logger:----- Epoch[045/800], Train Loss: 0.7216, time: 859.54
-INFO:local_logger:Now training epoch 46. LR=0.000150
-INFO:local_logger:----- Epoch[045/800], Train Loss: 0.7219, time: 859.82
-INFO:local_logger:Now training epoch 46. LR=0.000150
-INFO:local_logger:----- Epoch[045/800], Train Loss: 0.7218, time: 859.92
-INFO:local_logger:----- Epoch[045/800], Train Loss: 0.7214, time: 859.73
-INFO:local_logger:Now training epoch 46. LR=0.000150
-INFO:local_logger:Now training epoch 46. LR=0.000150
-INFO:local_logger:----- Epoch[045/800], Train Loss: 0.7217, time: 859.75
-INFO:local_logger:Now training epoch 46. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-45-Loss-0.7212178304741391.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-45-Loss-0.7212178304741391.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-45-Loss-0.7212178304741391.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-45-Loss-0.7212178304741391.pdopt
-INFO:local_logger:Now training epoch 46. LR=0.000150
-INFO:master_logger:Now training epoch 46. LR=0.000150
-INFO:local_logger:Epoch[046/800], Step[0000/0626], Avg Loss: 0.7165
-INFO:local_logger:Epoch[046/800], Step[0000/0626], Avg Loss: 0.7255
-INFO:local_logger:Epoch[046/800], Step[0000/0626], Avg Loss: 0.7341
-INFO:master_logger:Epoch[046/800], Step[0000/0626], Avg Loss: 0.7218
-INFO:local_logger:Epoch[046/800], Step[0000/0626], Avg Loss: 0.7214
-INFO:local_logger:Epoch[046/800], Step[0000/0626], Avg Loss: 0.7314
-INFO:local_logger:Epoch[046/800], Step[0000/0626], Avg Loss: 0.7055
-INFO:local_logger:Epoch[046/800], Step[0000/0626], Avg Loss: 0.7244
-INFO:local_logger:Epoch[046/800], Step[0000/0626], Avg Loss: 0.7156
-INFO:local_logger:Epoch[046/800], Step[0100/0626], Avg Loss: 0.7201
-INFO:local_logger:Epoch[046/800], Step[0100/0626], Avg Loss: 0.7210
-INFO:local_logger:Epoch[046/800], Step[0100/0626], Avg Loss: 0.7219
-INFO:master_logger:Epoch[046/800], Step[0100/0626], Avg Loss: 0.7206
-INFO:local_logger:Epoch[046/800], Step[0100/0626], Avg Loss: 0.7207
-INFO:local_logger:Epoch[046/800], Step[0100/0626], Avg Loss: 0.7203
-INFO:local_logger:Epoch[046/800], Step[0100/0626], Avg Loss: 0.7207
-INFO:local_logger:Epoch[046/800], Step[0100/0626], Avg Loss: 0.7196
-INFO:local_logger:Epoch[046/800], Step[0100/0626], Avg Loss: 0.7207
-INFO:local_logger:Epoch[046/800], Step[0200/0626], Avg Loss: 0.7191
-INFO:local_logger:Epoch[046/800], Step[0200/0626], Avg Loss: 0.7207
-INFO:local_logger:Epoch[046/800], Step[0200/0626], Avg Loss: 0.7206
-INFO:local_logger:Epoch[046/800], Step[0200/0626], Avg Loss: 0.7199
-INFO:local_logger:Epoch[046/800], Step[0200/0626], Avg Loss: 0.7209
-INFO:local_logger:Epoch[046/800], Step[0200/0626], Avg Loss: 0.7203
-INFO:local_logger:Epoch[046/800], Step[0200/0626], Avg Loss: 0.7207
-INFO:master_logger:Epoch[046/800], Step[0200/0626], Avg Loss: 0.7204
-INFO:local_logger:Epoch[046/800], Step[0200/0626], Avg Loss: 0.7208
-INFO:local_logger:Epoch[046/800], Step[0300/0626], Avg Loss: 0.7199
-INFO:local_logger:Epoch[046/800], Step[0300/0626], Avg Loss: 0.7205
-INFO:local_logger:Epoch[046/800], Step[0300/0626], Avg Loss: 0.7204
-INFO:local_logger:Epoch[046/800], Step[0300/0626], Avg Loss: 0.7199
-INFO:local_logger:Epoch[046/800], Step[0300/0626], Avg Loss: 0.7201
-INFO:local_logger:Epoch[046/800], Step[0300/0626], Avg Loss: 0.7202
-INFO:local_logger:Epoch[046/800], Step[0300/0626], Avg Loss: 0.7202
-INFO:local_logger:Epoch[046/800], Step[0300/0626], Avg Loss: 0.7202
-INFO:master_logger:Epoch[046/800], Step[0300/0626], Avg Loss: 0.7202
-INFO:local_logger:Epoch[046/800], Step[0400/0626], Avg Loss: 0.7200
-INFO:local_logger:Epoch[046/800], Step[0400/0626], Avg Loss: 0.7198
-INFO:local_logger:Epoch[046/800], Step[0400/0626], Avg Loss: 0.7205
-INFO:local_logger:Epoch[046/800], Step[0400/0626], Avg Loss: 0.7202
-INFO:master_logger:Epoch[046/800], Step[0400/0626], Avg Loss: 0.7200
-INFO:local_logger:Epoch[046/800], Step[0400/0626], Avg Loss: 0.7199
-INFO:local_logger:Epoch[046/800], Step[0400/0626], Avg Loss: 0.7201
-INFO:local_logger:Epoch[046/800], Step[0400/0626], Avg Loss: 0.7198
-INFO:local_logger:Epoch[046/800], Step[0400/0626], Avg Loss: 0.7200
-INFO:local_logger:Epoch[046/800], Step[0500/0626], Avg Loss: 0.7197
-INFO:local_logger:Epoch[046/800], Step[0500/0626], Avg Loss: 0.7200
-INFO:local_logger:Epoch[046/800], Step[0500/0626], Avg Loss: 0.7201
-INFO:local_logger:Epoch[046/800], Step[0500/0626], Avg Loss: 0.7198
-INFO:local_logger:Epoch[046/800], Step[0500/0626], Avg Loss: 0.7198
-INFO:local_logger:Epoch[046/800], Step[0500/0626], Avg Loss: 0.7198
-INFO:master_logger:Epoch[046/800], Step[0500/0626], Avg Loss: 0.7199
-INFO:local_logger:Epoch[046/800], Step[0500/0626], Avg Loss: 0.7199
-INFO:local_logger:Epoch[046/800], Step[0500/0626], Avg Loss: 0.7204
-INFO:local_logger:Epoch[046/800], Step[0600/0626], Avg Loss: 0.7197
-INFO:local_logger:Epoch[046/800], Step[0600/0626], Avg Loss: 0.7198
-INFO:local_logger:Epoch[046/800], Step[0600/0626], Avg Loss: 0.7203
-INFO:local_logger:Epoch[046/800], Step[0600/0626], Avg Loss: 0.7197
-INFO:local_logger:Epoch[046/800], Step[0600/0626], Avg Loss: 0.7198
-INFO:local_logger:Epoch[046/800], Step[0600/0626], Avg Loss: 0.7198
-INFO:local_logger:Epoch[046/800], Step[0600/0626], Avg Loss: 0.7199
-INFO:master_logger:Epoch[046/800], Step[0600/0626], Avg Loss: 0.7198
-INFO:local_logger:Epoch[046/800], Step[0600/0626], Avg Loss: 0.7197
-INFO:local_logger:----- Epoch[046/800], Train Loss: 0.7199, time: 894.85
-INFO:local_logger:Now training epoch 47. LR=0.000150
-INFO:local_logger:----- Epoch[046/800], Train Loss: 0.7197, time: 894.90
-INFO:local_logger:Now training epoch 47. LR=0.000150
-INFO:local_logger:----- Epoch[046/800], Train Loss: 0.7198, time: 895.63
-INFO:local_logger:Now training epoch 47. LR=0.000150
-INFO:local_logger:----- Epoch[046/800], Train Loss: 0.7197, time: 895.84
-INFO:local_logger:Now training epoch 47. LR=0.000150
-INFO:local_logger:----- Epoch[046/800], Train Loss: 0.7197, time: 895.79
-INFO:local_logger:Now training epoch 47. LR=0.000150
-INFO:local_logger:----- Epoch[046/800], Train Loss: 0.7198, time: 892.44
-INFO:master_logger:----- Epoch[046/800], Train Loss: 0.7198, time: 892.44
-INFO:local_logger:----- Epoch[046/800], Train Loss: 0.7198, time: 895.48
-INFO:local_logger:Now training epoch 47. LR=0.000150
-INFO:local_logger:----- Epoch[046/800], Train Loss: 0.7204, time: 895.48
-INFO:local_logger:Now training epoch 47. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-46-Loss-0.7197591380592546.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-46-Loss-0.7197591380592546.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-46-Loss-0.7197591380592546.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-46-Loss-0.7197591380592546.pdopt
-INFO:local_logger:Now training epoch 47. LR=0.000150
-INFO:master_logger:Now training epoch 47. LR=0.000150
-INFO:local_logger:Epoch[047/800], Step[0000/0626], Avg Loss: 0.7141
-INFO:local_logger:Epoch[047/800], Step[0000/0626], Avg Loss: 0.7210
-INFO:local_logger:Epoch[047/800], Step[0000/0626], Avg Loss: 0.7275
-INFO:local_logger:Epoch[047/800], Step[0000/0626], Avg Loss: 0.7216
-INFO:local_logger:Epoch[047/800], Step[0000/0626], Avg Loss: 0.7156
-INFO:local_logger:Epoch[047/800], Step[0000/0626], Avg Loss: 0.7147
-INFO:local_logger:Epoch[047/800], Step[0000/0626], Avg Loss: 0.7261
-INFO:local_logger:Epoch[047/800], Step[0000/0626], Avg Loss: 0.7192
-INFO:master_logger:Epoch[047/800], Step[0000/0626], Avg Loss: 0.7199
-INFO:local_logger:Epoch[047/800], Step[0100/0626], Avg Loss: 0.7184
-INFO:local_logger:Epoch[047/800], Step[0100/0626], Avg Loss: 0.7187
-INFO:master_logger:Epoch[047/800], Step[0100/0626], Avg Loss: 0.7189
-INFO:local_logger:Epoch[047/800], Step[0100/0626], Avg Loss: 0.7194
-INFO:local_logger:Epoch[047/800], Step[0100/0626], Avg Loss: 0.7188
-INFO:local_logger:Epoch[047/800], Step[0100/0626], Avg Loss: 0.7183
-INFO:local_logger:Epoch[047/800], Step[0100/0626], Avg Loss: 0.7188
-INFO:local_logger:Epoch[047/800], Step[0100/0626], Avg Loss: 0.7189
-INFO:local_logger:Epoch[047/800], Step[0100/0626], Avg Loss: 0.7196
-INFO:local_logger:Epoch[047/800], Step[0200/0626], Avg Loss: 0.7184
-INFO:local_logger:Epoch[047/800], Step[0200/0626], Avg Loss: 0.7190
-INFO:local_logger:Epoch[047/800], Step[0200/0626], Avg Loss: 0.7188
-INFO:local_logger:Epoch[047/800], Step[0200/0626], Avg Loss: 0.7180
-INFO:master_logger:Epoch[047/800], Step[0200/0626], Avg Loss: 0.7188
-INFO:local_logger:Epoch[047/800], Step[0200/0626], Avg Loss: 0.7185
-INFO:local_logger:Epoch[047/800], Step[0200/0626], Avg Loss: 0.7195
-INFO:local_logger:Epoch[047/800], Step[0200/0626], Avg Loss: 0.7186
-INFO:local_logger:Epoch[047/800], Step[0200/0626], Avg Loss: 0.7192
-INFO:local_logger:Epoch[047/800], Step[0300/0626], Avg Loss: 0.7188
-INFO:local_logger:Epoch[047/800], Step[0300/0626], Avg Loss: 0.7188
-INFO:local_logger:Epoch[047/800], Step[0300/0626], Avg Loss: 0.7182
-INFO:local_logger:Epoch[047/800], Step[0300/0626], Avg Loss: 0.7187
-INFO:local_logger:Epoch[047/800], Step[0300/0626], Avg Loss: 0.7182
-INFO:local_logger:Epoch[047/800], Step[0300/0626], Avg Loss: 0.7179
-INFO:local_logger:Epoch[047/800], Step[0300/0626], Avg Loss: 0.7184
-INFO:master_logger:Epoch[047/800], Step[0300/0626], Avg Loss: 0.7184
-INFO:local_logger:Epoch[047/800], Step[0300/0626], Avg Loss: 0.7184
-INFO:local_logger:Epoch[047/800], Step[0400/0626], Avg Loss: 0.7188
-INFO:local_logger:Epoch[047/800], Step[0400/0626], Avg Loss: 0.7183
-INFO:local_logger:Epoch[047/800], Step[0400/0626], Avg Loss: 0.7180
-INFO:local_logger:Epoch[047/800], Step[0400/0626], Avg Loss: 0.7188
-INFO:master_logger:Epoch[047/800], Step[0400/0626], Avg Loss: 0.7184
-INFO:local_logger:Epoch[047/800], Step[0400/0626], Avg Loss: 0.7183
-INFO:local_logger:Epoch[047/800], Step[0400/0626], Avg Loss: 0.7185
-INFO:local_logger:Epoch[047/800], Step[0400/0626], Avg Loss: 0.7188
-INFO:local_logger:Epoch[047/800], Step[0400/0626], Avg Loss: 0.7178
-INFO:local_logger:Epoch[047/800], Step[0500/0626], Avg Loss: 0.7187
-INFO:local_logger:Epoch[047/800], Step[0500/0626], Avg Loss: 0.7189
-INFO:local_logger:Epoch[047/800], Step[0500/0626], Avg Loss: 0.7179
-INFO:local_logger:Epoch[047/800], Step[0500/0626], Avg Loss: 0.7187
-INFO:local_logger:Epoch[047/800], Step[0500/0626], Avg Loss: 0.7182
-INFO:master_logger:Epoch[047/800], Step[0500/0626], Avg Loss: 0.7183
-INFO:local_logger:Epoch[047/800], Step[0500/0626], Avg Loss: 0.7179
-INFO:local_logger:Epoch[047/800], Step[0500/0626], Avg Loss: 0.7183
-INFO:local_logger:Epoch[047/800], Step[0500/0626], Avg Loss: 0.7180
-INFO:local_logger:Epoch[047/800], Step[0600/0626], Avg Loss: 0.7181
-INFO:local_logger:Epoch[047/800], Step[0600/0626], Avg Loss: 0.7185
-INFO:local_logger:Epoch[047/800], Step[0600/0626], Avg Loss: 0.7185
-INFO:local_logger:Epoch[047/800], Step[0600/0626], Avg Loss: 0.7188
-INFO:local_logger:Epoch[047/800], Step[0600/0626], Avg Loss: 0.7180
-INFO:local_logger:Epoch[047/800], Step[0600/0626], Avg Loss: 0.7179
-INFO:local_logger:Epoch[047/800], Step[0600/0626], Avg Loss: 0.7177
-INFO:local_logger:Epoch[047/800], Step[0600/0626], Avg Loss: 0.7181
-INFO:master_logger:Epoch[047/800], Step[0600/0626], Avg Loss: 0.7182
-INFO:local_logger:----- Epoch[047/800], Train Loss: 0.7188, time: 864.28
-INFO:local_logger:Now training epoch 48. LR=0.000150
-INFO:local_logger:----- Epoch[047/800], Train Loss: 0.7180, time: 863.82
-INFO:local_logger:Now training epoch 48. LR=0.000150
-INFO:local_logger:----- Epoch[047/800], Train Loss: 0.7183, time: 863.91
-INFO:local_logger:Now training epoch 48. LR=0.000150
-INFO:local_logger:----- Epoch[047/800], Train Loss: 0.7185, time: 864.34
-INFO:local_logger:Now training epoch 48. LR=0.000150
-INFO:local_logger:----- Epoch[047/800], Train Loss: 0.7177, time: 864.67
-INFO:local_logger:Now training epoch 48. LR=0.000150
-INFO:local_logger:----- Epoch[047/800], Train Loss: 0.7179, time: 864.05
-INFO:local_logger:Now training epoch 48. LR=0.000150
-INFO:local_logger:----- Epoch[047/800], Train Loss: 0.7182, time: 864.42
-INFO:local_logger:Now training epoch 48. LR=0.000150
-INFO:local_logger:----- Epoch[047/800], Train Loss: 0.7181, time: 860.27
-INFO:master_logger:----- Epoch[047/800], Train Loss: 0.7182, time: 860.27
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-47-Loss-0.7181137765215982.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-47-Loss-0.7181137765215982.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-47-Loss-0.7181137765215982.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-47-Loss-0.7181137765215982.pdopt
-INFO:local_logger:Now training epoch 48. LR=0.000150
-INFO:master_logger:Now training epoch 48. LR=0.000150
-INFO:local_logger:Epoch[048/800], Step[0000/0626], Avg Loss: 0.7296
-INFO:local_logger:Epoch[048/800], Step[0000/0626], Avg Loss: 0.7261
-INFO:local_logger:Epoch[048/800], Step[0000/0626], Avg Loss: 0.7148
-INFO:local_logger:Epoch[048/800], Step[0000/0626], Avg Loss: 0.7056
-INFO:local_logger:Epoch[048/800], Step[0000/0626], Avg Loss: 0.7238
-INFO:master_logger:Epoch[048/800], Step[0000/0626], Avg Loss: 0.7205
-INFO:local_logger:Epoch[048/800], Step[0000/0626], Avg Loss: 0.7255
-INFO:local_logger:Epoch[048/800], Step[0000/0626], Avg Loss: 0.7134
-INFO:local_logger:Epoch[048/800], Step[0000/0626], Avg Loss: 0.7255
-INFO:local_logger:Epoch[048/800], Step[0100/0626], Avg Loss: 0.7177
-INFO:local_logger:Epoch[048/800], Step[0100/0626], Avg Loss: 0.7189
-INFO:local_logger:Epoch[048/800], Step[0100/0626], Avg Loss: 0.7174
-INFO:local_logger:Epoch[048/800], Step[0100/0626], Avg Loss: 0.7174
-INFO:local_logger:Epoch[048/800], Step[0100/0626], Avg Loss: 0.7175
-INFO:master_logger:Epoch[048/800], Step[0100/0626], Avg Loss: 0.7177
-INFO:local_logger:Epoch[048/800], Step[0100/0626], Avg Loss: 0.7172
-INFO:local_logger:Epoch[048/800], Step[0100/0626], Avg Loss: 0.7192
-INFO:local_logger:Epoch[048/800], Step[0100/0626], Avg Loss: 0.7164
-INFO:local_logger:Epoch[048/800], Step[0200/0626], Avg Loss: 0.7168
-INFO:local_logger:Epoch[048/800], Step[0200/0626], Avg Loss: 0.7174
-INFO:local_logger:Epoch[048/800], Step[0200/0626], Avg Loss: 0.7176
-INFO:local_logger:Epoch[048/800], Step[0200/0626], Avg Loss: 0.7172
-INFO:local_logger:Epoch[048/800], Step[0200/0626], Avg Loss: 0.7181
-INFO:local_logger:Epoch[048/800], Step[0200/0626], Avg Loss: 0.7168
-INFO:local_logger:Epoch[048/800], Step[0200/0626], Avg Loss: 0.7172
-INFO:master_logger:Epoch[048/800], Step[0200/0626], Avg Loss: 0.7174
-INFO:local_logger:Epoch[048/800], Step[0200/0626], Avg Loss: 0.7183
-INFO:local_logger:Epoch[048/800], Step[0300/0626], Avg Loss: 0.7169
-INFO:local_logger:Epoch[048/800], Step[0300/0626], Avg Loss: 0.7173
-INFO:local_logger:Epoch[048/800], Step[0300/0626], Avg Loss: 0.7171
-INFO:local_logger:Epoch[048/800], Step[0300/0626], Avg Loss: 0.7171
-INFO:local_logger:Epoch[048/800], Step[0300/0626], Avg Loss: 0.7168
-INFO:master_logger:Epoch[048/800], Step[0300/0626], Avg Loss: 0.7170
-INFO:local_logger:Epoch[048/800], Step[0300/0626], Avg Loss: 0.7172
-INFO:local_logger:Epoch[048/800], Step[0300/0626], Avg Loss: 0.7164
-INFO:local_logger:Epoch[048/800], Step[0300/0626], Avg Loss: 0.7176
-INFO:local_logger:Epoch[048/800], Step[0400/0626], Avg Loss: 0.7168
-INFO:local_logger:Epoch[048/800], Step[0400/0626], Avg Loss: 0.7168
-INFO:local_logger:Epoch[048/800], Step[0400/0626], Avg Loss: 0.7171
-INFO:local_logger:Epoch[048/800], Step[0400/0626], Avg Loss: 0.7171
-INFO:local_logger:Epoch[048/800], Step[0400/0626], Avg Loss: 0.7168
-INFO:local_logger:Epoch[048/800], Step[0400/0626], Avg Loss: 0.7172
-INFO:master_logger:Epoch[048/800], Step[0400/0626], Avg Loss: 0.7170
-INFO:local_logger:Epoch[048/800], Step[0400/0626], Avg Loss: 0.7173
-INFO:local_logger:Epoch[048/800], Step[0400/0626], Avg Loss: 0.7170
-INFO:local_logger:Epoch[048/800], Step[0500/0626], Avg Loss: 0.7171
-INFO:local_logger:Epoch[048/800], Step[0500/0626], Avg Loss: 0.7168
-INFO:local_logger:Epoch[048/800], Step[0500/0626], Avg Loss: 0.7169
-INFO:local_logger:Epoch[048/800], Step[0500/0626], Avg Loss: 0.7167
-INFO:local_logger:Epoch[048/800], Step[0500/0626], Avg Loss: 0.7169
-INFO:master_logger:Epoch[048/800], Step[0500/0626], Avg Loss: 0.7168
-INFO:local_logger:Epoch[048/800], Step[0500/0626], Avg Loss: 0.7168
-INFO:local_logger:Epoch[048/800], Step[0500/0626], Avg Loss: 0.7164
-INFO:local_logger:Epoch[048/800], Step[0500/0626], Avg Loss: 0.7170
-INFO:local_logger:Epoch[048/800], Step[0600/0626], Avg Loss: 0.7165
-INFO:local_logger:Epoch[048/800], Step[0600/0626], Avg Loss: 0.7168
-INFO:local_logger:Epoch[048/800], Step[0600/0626], Avg Loss: 0.7168
-INFO:local_logger:Epoch[048/800], Step[0600/0626], Avg Loss: 0.7162
-INFO:local_logger:Epoch[048/800], Step[0600/0626], Avg Loss: 0.7166
-INFO:local_logger:Epoch[048/800], Step[0600/0626], Avg Loss: 0.7168
-INFO:master_logger:Epoch[048/800], Step[0600/0626], Avg Loss: 0.7166
-INFO:local_logger:Epoch[048/800], Step[0600/0626], Avg Loss: 0.7166
-INFO:local_logger:Epoch[048/800], Step[0600/0626], Avg Loss: 0.7168
-INFO:local_logger:----- Epoch[048/800], Train Loss: 0.7162, time: 894.29
-INFO:local_logger:Now training epoch 49. LR=0.000150
-INFO:local_logger:----- Epoch[048/800], Train Loss: 0.7165, time: 894.54
-INFO:local_logger:Now training epoch 49. LR=0.000150
-INFO:local_logger:----- Epoch[048/800], Train Loss: 0.7169, time: 894.25
-INFO:local_logger:Now training epoch 49. LR=0.000150
-INFO:local_logger:----- Epoch[048/800], Train Loss: 0.7165, time: 894.58
-INFO:local_logger:Now training epoch 49. LR=0.000150
-INFO:local_logger:----- Epoch[048/800], Train Loss: 0.7168, time: 894.96
-INFO:local_logger:Now training epoch 49. LR=0.000150
-INFO:local_logger:----- Epoch[048/800], Train Loss: 0.7165, time: 891.19
-INFO:local_logger:----- Epoch[048/800], Train Loss: 0.7169, time: 894.94
-INFO:master_logger:----- Epoch[048/800], Train Loss: 0.7166, time: 891.19
-INFO:local_logger:Now training epoch 49. LR=0.000150
-INFO:local_logger:----- Epoch[048/800], Train Loss: 0.7167, time: 895.17
-INFO:local_logger:Now training epoch 49. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-48-Loss-0.7164831838117034.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-48-Loss-0.7164831838117034.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-48-Loss-0.7164831838117034.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-48-Loss-0.7164831838117034.pdopt
-INFO:local_logger:Now training epoch 49. LR=0.000150
-INFO:master_logger:Now training epoch 49. LR=0.000150
-INFO:local_logger:Epoch[049/800], Step[0000/0626], Avg Loss: 0.7106
-INFO:local_logger:Epoch[049/800], Step[0000/0626], Avg Loss: 0.7079
-INFO:local_logger:Epoch[049/800], Step[0000/0626], Avg Loss: 0.7136
-INFO:master_logger:Epoch[049/800], Step[0000/0626], Avg Loss: 0.7139
-INFO:local_logger:Epoch[049/800], Step[0000/0626], Avg Loss: 0.7094
-INFO:local_logger:Epoch[049/800], Step[0000/0626], Avg Loss: 0.7180
-INFO:local_logger:Epoch[049/800], Step[0000/0626], Avg Loss: 0.7324
-INFO:local_logger:Epoch[049/800], Step[0000/0626], Avg Loss: 0.7095
-INFO:local_logger:Epoch[049/800], Step[0000/0626], Avg Loss: 0.7095
-INFO:local_logger:Epoch[049/800], Step[0100/0626], Avg Loss: 0.7149
-INFO:local_logger:Epoch[049/800], Step[0100/0626], Avg Loss: 0.7149
-INFO:local_logger:Epoch[049/800], Step[0100/0626], Avg Loss: 0.7153
-INFO:local_logger:Epoch[049/800], Step[0100/0626], Avg Loss: 0.7157
-INFO:local_logger:Epoch[049/800], Step[0100/0626], Avg Loss: 0.7161
-INFO:local_logger:Epoch[049/800], Step[0100/0626], Avg Loss: 0.7156
-INFO:master_logger:Epoch[049/800], Step[0100/0626], Avg Loss: 0.7153
-INFO:local_logger:Epoch[049/800], Step[0100/0626], Avg Loss: 0.7150
-INFO:local_logger:Epoch[049/800], Step[0100/0626], Avg Loss: 0.7151
-INFO:local_logger:Epoch[049/800], Step[0200/0626], Avg Loss: 0.7156
-INFO:local_logger:Epoch[049/800], Step[0200/0626], Avg Loss: 0.7158
-INFO:local_logger:Epoch[049/800], Step[0200/0626], Avg Loss: 0.7151
-INFO:local_logger:Epoch[049/800], Step[0200/0626], Avg Loss: 0.7161
-INFO:local_logger:Epoch[049/800], Step[0200/0626], Avg Loss: 0.7155
-INFO:local_logger:Epoch[049/800], Step[0200/0626], Avg Loss: 0.7153
-INFO:local_logger:Epoch[049/800], Step[0200/0626], Avg Loss: 0.7156
-INFO:local_logger:Epoch[049/800], Step[0200/0626], Avg Loss: 0.7152
-INFO:master_logger:Epoch[049/800], Step[0200/0626], Avg Loss: 0.7155
-INFO:local_logger:Epoch[049/800], Step[0300/0626], Avg Loss: 0.7154
-INFO:local_logger:Epoch[049/800], Step[0300/0626], Avg Loss: 0.7155
-INFO:local_logger:Epoch[049/800], Step[0300/0626], Avg Loss: 0.7153
-INFO:local_logger:Epoch[049/800], Step[0300/0626], Avg Loss: 0.7159
-INFO:local_logger:Epoch[049/800], Step[0300/0626], Avg Loss: 0.7162
-INFO:master_logger:Epoch[049/800], Step[0300/0626], Avg Loss: 0.7156
-INFO:local_logger:Epoch[049/800], Step[0300/0626], Avg Loss: 0.7157
-INFO:local_logger:Epoch[049/800], Step[0300/0626], Avg Loss: 0.7152
-INFO:local_logger:Epoch[049/800], Step[0300/0626], Avg Loss: 0.7156
-INFO:local_logger:Epoch[049/800], Step[0400/0626], Avg Loss: 0.7154
-INFO:local_logger:Epoch[049/800], Step[0400/0626], Avg Loss: 0.7155
-INFO:local_logger:Epoch[049/800], Step[0400/0626], Avg Loss: 0.7150
-INFO:local_logger:Epoch[049/800], Step[0400/0626], Avg Loss: 0.7154
-INFO:local_logger:Epoch[049/800], Step[0400/0626], Avg Loss: 0.7154
-INFO:local_logger:Epoch[049/800], Step[0400/0626], Avg Loss: 0.7160
-INFO:local_logger:Epoch[049/800], Step[0400/0626], Avg Loss: 0.7154
-INFO:local_logger:Epoch[049/800], Step[0400/0626], Avg Loss: 0.7158
-INFO:master_logger:Epoch[049/800], Step[0400/0626], Avg Loss: 0.7155
-INFO:local_logger:Epoch[049/800], Step[0500/0626], Avg Loss: 0.7151
-INFO:local_logger:Epoch[049/800], Step[0500/0626], Avg Loss: 0.7156
-INFO:local_logger:Epoch[049/800], Step[0500/0626], Avg Loss: 0.7154
-INFO:local_logger:Epoch[049/800], Step[0500/0626], Avg Loss: 0.7154
-INFO:local_logger:Epoch[049/800], Step[0500/0626], Avg Loss: 0.7155
-INFO:local_logger:Epoch[049/800], Step[0500/0626], Avg Loss: 0.7156
-INFO:master_logger:Epoch[049/800], Step[0500/0626], Avg Loss: 0.7154
-INFO:local_logger:Epoch[049/800], Step[0500/0626], Avg Loss: 0.7153
-INFO:local_logger:Epoch[049/800], Step[0500/0626], Avg Loss: 0.7154
-INFO:local_logger:Epoch[049/800], Step[0600/0626], Avg Loss: 0.7151
-INFO:local_logger:Epoch[049/800], Step[0600/0626], Avg Loss: 0.7151
-INFO:local_logger:Epoch[049/800], Step[0600/0626], Avg Loss: 0.7153
-INFO:local_logger:Epoch[049/800], Step[0600/0626], Avg Loss: 0.7154
-INFO:local_logger:Epoch[049/800], Step[0600/0626], Avg Loss: 0.7152
-INFO:local_logger:Epoch[049/800], Step[0600/0626], Avg Loss: 0.7151
-INFO:local_logger:Epoch[049/800], Step[0600/0626], Avg Loss: 0.7150
-INFO:master_logger:Epoch[049/800], Step[0600/0626], Avg Loss: 0.7152
-INFO:local_logger:Epoch[049/800], Step[0600/0626], Avg Loss: 0.7153
-INFO:local_logger:----- Epoch[049/800], Train Loss: 0.7150, time: 858.69
-INFO:local_logger:Now training epoch 50. LR=0.000150
-INFO:local_logger:----- Epoch[049/800], Train Loss: 0.7151, time: 857.91
-INFO:local_logger:Now training epoch 50. LR=0.000150
-INFO:local_logger:----- Epoch[049/800], Train Loss: 0.7152, time: 858.69
-INFO:local_logger:Now training epoch 50. LR=0.000150
-INFO:local_logger:----- Epoch[049/800], Train Loss: 0.7152, time: 857.99
-INFO:local_logger:Now training epoch 50. LR=0.000150
-INFO:local_logger:----- Epoch[049/800], Train Loss: 0.7151, time: 858.78
-INFO:local_logger:Now training epoch 50. LR=0.000150
-INFO:local_logger:----- Epoch[049/800], Train Loss: 0.7151, time: 858.00
-INFO:local_logger:Now training epoch 50. LR=0.000150
-INFO:local_logger:----- Epoch[049/800], Train Loss: 0.7154, time: 858.39
-INFO:local_logger:Now training epoch 50. LR=0.000150
-INFO:local_logger:----- Epoch[049/800], Train Loss: 0.7153, time: 854.33
-INFO:master_logger:----- Epoch[049/800], Train Loss: 0.7152, time: 854.33
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-49-Loss-0.715259619077928.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-49-Loss-0.715259619077928.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-49-Loss-0.715259619077928.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-49-Loss-0.715259619077928.pdopt
-INFO:local_logger:Now training epoch 50. LR=0.000150
-INFO:master_logger:Now training epoch 50. LR=0.000150
-INFO:local_logger:Epoch[050/800], Step[0000/0626], Avg Loss: 0.7160
-INFO:local_logger:Epoch[050/800], Step[0000/0626], Avg Loss: 0.7233
-INFO:master_logger:Epoch[050/800], Step[0000/0626], Avg Loss: 0.7153
-INFO:local_logger:Epoch[050/800], Step[0000/0626], Avg Loss: 0.7204
-INFO:local_logger:Epoch[050/800], Step[0000/0626], Avg Loss: 0.7224
-INFO:local_logger:Epoch[050/800], Step[0000/0626], Avg Loss: 0.7126
-INFO:local_logger:Epoch[050/800], Step[0000/0626], Avg Loss: 0.7130
-INFO:local_logger:Epoch[050/800], Step[0000/0626], Avg Loss: 0.7084
-INFO:local_logger:Epoch[050/800], Step[0000/0626], Avg Loss: 0.7066
-INFO:local_logger:Epoch[050/800], Step[0100/0626], Avg Loss: 0.7145
-INFO:local_logger:Epoch[050/800], Step[0100/0626], Avg Loss: 0.7148
-INFO:local_logger:Epoch[050/800], Step[0100/0626], Avg Loss: 0.7134
-INFO:local_logger:Epoch[050/800], Step[0100/0626], Avg Loss: 0.7152
-INFO:local_logger:Epoch[050/800], Step[0100/0626], Avg Loss: 0.7152
-INFO:local_logger:Epoch[050/800], Step[0100/0626], Avg Loss: 0.7142
-INFO:master_logger:Epoch[050/800], Step[0100/0626], Avg Loss: 0.7145
-INFO:local_logger:Epoch[050/800], Step[0100/0626], Avg Loss: 0.7144
-INFO:local_logger:Epoch[050/800], Step[0100/0626], Avg Loss: 0.7141
-INFO:local_logger:Epoch[050/800], Step[0200/0626], Avg Loss: 0.7140
-INFO:local_logger:Epoch[050/800], Step[0200/0626], Avg Loss: 0.7143
-INFO:local_logger:Epoch[050/800], Step[0200/0626], Avg Loss: 0.7138
-INFO:local_logger:Epoch[050/800], Step[0200/0626], Avg Loss: 0.7142
-INFO:local_logger:Epoch[050/800], Step[0200/0626], Avg Loss: 0.7137
-INFO:local_logger:Epoch[050/800], Step[0200/0626], Avg Loss: 0.7148
-INFO:master_logger:Epoch[050/800], Step[0200/0626], Avg Loss: 0.7142
-INFO:local_logger:Epoch[050/800], Step[0200/0626], Avg Loss: 0.7145
-INFO:local_logger:Epoch[050/800], Step[0200/0626], Avg Loss: 0.7143
-INFO:local_logger:Epoch[050/800], Step[0300/0626], Avg Loss: 0.7144
-INFO:local_logger:Epoch[050/800], Step[0300/0626], Avg Loss: 0.7141
-INFO:local_logger:Epoch[050/800], Step[0300/0626], Avg Loss: 0.7143
-INFO:local_logger:Epoch[050/800], Step[0300/0626], Avg Loss: 0.7141
-INFO:local_logger:Epoch[050/800], Step[0300/0626], Avg Loss: 0.7143
-INFO:local_logger:Epoch[050/800], Step[0300/0626], Avg Loss: 0.7140
-INFO:local_logger:Epoch[050/800], Step[0300/0626], Avg Loss: 0.7137
-INFO:local_logger:Epoch[050/800], Step[0300/0626], Avg Loss: 0.7137
-INFO:master_logger:Epoch[050/800], Step[0300/0626], Avg Loss: 0.7141
-INFO:local_logger:Epoch[050/800], Step[0400/0626], Avg Loss: 0.7140
-INFO:local_logger:Epoch[050/800], Step[0400/0626], Avg Loss: 0.7139
-INFO:local_logger:Epoch[050/800], Step[0400/0626], Avg Loss: 0.7143
-INFO:local_logger:Epoch[050/800], Step[0400/0626], Avg Loss: 0.7136
-INFO:local_logger:Epoch[050/800], Step[0400/0626], Avg Loss: 0.7137
-INFO:local_logger:Epoch[050/800], Step[0400/0626], Avg Loss: 0.7139
-INFO:local_logger:Epoch[050/800], Step[0400/0626], Avg Loss: 0.7139
-INFO:local_logger:Epoch[050/800], Step[0400/0626], Avg Loss: 0.7142
-INFO:master_logger:Epoch[050/800], Step[0400/0626], Avg Loss: 0.7139
-INFO:local_logger:Epoch[050/800], Step[0500/0626], Avg Loss: 0.7137
-INFO:local_logger:Epoch[050/800], Step[0500/0626], Avg Loss: 0.7140
-INFO:local_logger:Epoch[050/800], Step[0500/0626], Avg Loss: 0.7142
-INFO:local_logger:Epoch[050/800], Step[0500/0626], Avg Loss: 0.7140
-INFO:local_logger:Epoch[050/800], Step[0500/0626], Avg Loss: 0.7140
-INFO:local_logger:Epoch[050/800], Step[0500/0626], Avg Loss: 0.7137
-INFO:master_logger:Epoch[050/800], Step[0500/0626], Avg Loss: 0.7139
-INFO:local_logger:Epoch[050/800], Step[0500/0626], Avg Loss: 0.7136
-INFO:local_logger:Epoch[050/800], Step[0500/0626], Avg Loss: 0.7140
-INFO:local_logger:Epoch[050/800], Step[0600/0626], Avg Loss: 0.7140
-INFO:local_logger:Epoch[050/800], Step[0600/0626], Avg Loss: 0.7139
-INFO:local_logger:Epoch[050/800], Step[0600/0626], Avg Loss: 0.7142
-INFO:local_logger:Epoch[050/800], Step[0600/0626], Avg Loss: 0.7138
-INFO:local_logger:Epoch[050/800], Step[0600/0626], Avg Loss: 0.7141
-INFO:local_logger:Epoch[050/800], Step[0600/0626], Avg Loss: 0.7141
-INFO:local_logger:Epoch[050/800], Step[0600/0626], Avg Loss: 0.7141
-INFO:master_logger:Epoch[050/800], Step[0600/0626], Avg Loss: 0.7140
-INFO:local_logger:Epoch[050/800], Step[0600/0626], Avg Loss: 0.7141
-INFO:local_logger:----- Epoch[050/800], Train Loss: 0.7141, time: 882.33
-INFO:local_logger:Now training epoch 51. LR=0.000150
-INFO:local_logger:----- Epoch[050/800], Train Loss: 0.7141, time: 882.32
-INFO:local_logger:Now training epoch 51. LR=0.000150
-INFO:local_logger:----- Epoch[050/800], Train Loss: 0.7139, time: 878.40
-INFO:master_logger:----- Epoch[050/800], Train Loss: 0.7140, time: 878.40
-INFO:local_logger:----- Epoch[050/800], Train Loss: 0.7144, time: 882.28
-INFO:local_logger:Now training epoch 51. LR=0.000150
-INFO:local_logger:----- Epoch[050/800], Train Loss: 0.7139, time: 882.66
-INFO:local_logger:Now training epoch 51. LR=0.000150
-INFO:local_logger:----- Epoch[050/800], Train Loss: 0.7140, time: 882.66
-INFO:local_logger:Now training epoch 51. LR=0.000150
-INFO:local_logger:----- Epoch[050/800], Train Loss: 0.7138, time: 882.67
-INFO:local_logger:Now training epoch 51. LR=0.000150
-INFO:local_logger:----- Epoch[050/800], Train Loss: 0.7140, time: 882.77
-INFO:local_logger:Now training epoch 51. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-50-Loss-0.7138677369669435.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-50-Loss-0.7138677369669435.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-50-Loss-0.7138677369669435.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-50-Loss-0.7138677369669435.pdopt
-INFO:local_logger:Now training epoch 51. LR=0.000150
-INFO:master_logger:Now training epoch 51. LR=0.000150
-INFO:local_logger:Epoch[051/800], Step[0000/0626], Avg Loss: 0.7092
-INFO:local_logger:Epoch[051/800], Step[0000/0626], Avg Loss: 0.7069
-INFO:local_logger:Epoch[051/800], Step[0000/0626], Avg Loss: 0.7248
-INFO:local_logger:Epoch[051/800], Step[0000/0626], Avg Loss: 0.7198
-INFO:local_logger:Epoch[051/800], Step[0000/0626], Avg Loss: 0.7048
-INFO:master_logger:Epoch[051/800], Step[0000/0626], Avg Loss: 0.7142
-INFO:local_logger:Epoch[051/800], Step[0000/0626], Avg Loss: 0.7200
-INFO:local_logger:Epoch[051/800], Step[0000/0626], Avg Loss: 0.7138
-INFO:local_logger:Epoch[051/800], Step[0000/0626], Avg Loss: 0.7145
-INFO:local_logger:Epoch[051/800], Step[0100/0626], Avg Loss: 0.7135
-INFO:local_logger:Epoch[051/800], Step[0100/0626], Avg Loss: 0.7127
-INFO:local_logger:Epoch[051/800], Step[0100/0626], Avg Loss: 0.7135
-INFO:local_logger:Epoch[051/800], Step[0100/0626], Avg Loss: 0.7137
-INFO:local_logger:Epoch[051/800], Step[0100/0626], Avg Loss: 0.7118
-INFO:local_logger:Epoch[051/800], Step[0100/0626], Avg Loss: 0.7132
-INFO:master_logger:Epoch[051/800], Step[0100/0626], Avg Loss: 0.7129
-INFO:local_logger:Epoch[051/800], Step[0100/0626], Avg Loss: 0.7122
-INFO:local_logger:Epoch[051/800], Step[0100/0626], Avg Loss: 0.7129
-INFO:local_logger:Epoch[051/800], Step[0200/0626], Avg Loss: 0.7131
-INFO:local_logger:Epoch[051/800], Step[0200/0626], Avg Loss: 0.7137
-INFO:local_logger:Epoch[051/800], Step[0200/0626], Avg Loss: 0.7133
-INFO:local_logger:Epoch[051/800], Step[0200/0626], Avg Loss: 0.7130
-INFO:local_logger:Epoch[051/800], Step[0200/0626], Avg Loss: 0.7130
-INFO:local_logger:Epoch[051/800], Step[0200/0626], Avg Loss: 0.7133
-INFO:local_logger:Epoch[051/800], Step[0200/0626], Avg Loss: 0.7129
-INFO:master_logger:Epoch[051/800], Step[0200/0626], Avg Loss: 0.7130
-INFO:local_logger:Epoch[051/800], Step[0200/0626], Avg Loss: 0.7120
-INFO:local_logger:Epoch[051/800], Step[0300/0626], Avg Loss: 0.7131
-INFO:local_logger:Epoch[051/800], Step[0300/0626], Avg Loss: 0.7130
-INFO:local_logger:Epoch[051/800], Step[0300/0626], Avg Loss: 0.7125
-INFO:master_logger:Epoch[051/800], Step[0300/0626], Avg Loss: 0.7129
-INFO:local_logger:Epoch[051/800], Step[0300/0626], Avg Loss: 0.7131
-INFO:local_logger:Epoch[051/800], Step[0300/0626], Avg Loss: 0.7127
-INFO:local_logger:Epoch[051/800], Step[0300/0626], Avg Loss: 0.7129
-INFO:local_logger:Epoch[051/800], Step[0300/0626], Avg Loss: 0.7128
-INFO:local_logger:Epoch[051/800], Step[0300/0626], Avg Loss: 0.7129
-INFO:local_logger:Epoch[051/800], Step[0400/0626], Avg Loss: 0.7126
-INFO:local_logger:Epoch[051/800], Step[0400/0626], Avg Loss: 0.7126
-INFO:local_logger:Epoch[051/800], Step[0400/0626], Avg Loss: 0.7130
-INFO:local_logger:Epoch[051/800], Step[0400/0626], Avg Loss: 0.7130
-INFO:local_logger:Epoch[051/800], Step[0400/0626], Avg Loss: 0.7128
-INFO:local_logger:Epoch[051/800], Step[0400/0626], Avg Loss: 0.7122
-INFO:master_logger:Epoch[051/800], Step[0400/0626], Avg Loss: 0.7128
-INFO:local_logger:Epoch[051/800], Step[0400/0626], Avg Loss: 0.7131
-INFO:local_logger:Epoch[051/800], Step[0400/0626], Avg Loss: 0.7127
-INFO:local_logger:Epoch[051/800], Step[0500/0626], Avg Loss: 0.7124
-INFO:local_logger:Epoch[051/800], Step[0500/0626], Avg Loss: 0.7128
-INFO:local_logger:Epoch[051/800], Step[0500/0626], Avg Loss: 0.7127
-INFO:local_logger:Epoch[051/800], Step[0500/0626], Avg Loss: 0.7122
-INFO:local_logger:Epoch[051/800], Step[0500/0626], Avg Loss: 0.7129
-INFO:local_logger:Epoch[051/800], Step[0500/0626], Avg Loss: 0.7129
-INFO:master_logger:Epoch[051/800], Step[0500/0626], Avg Loss: 0.7126
-INFO:local_logger:Epoch[051/800], Step[0500/0626], Avg Loss: 0.7126
-INFO:local_logger:Epoch[051/800], Step[0500/0626], Avg Loss: 0.7126
-INFO:local_logger:Epoch[051/800], Step[0600/0626], Avg Loss: 0.7126
-INFO:local_logger:Epoch[051/800], Step[0600/0626], Avg Loss: 0.7128
-INFO:local_logger:Epoch[051/800], Step[0600/0626], Avg Loss: 0.7125
-INFO:local_logger:Epoch[051/800], Step[0600/0626], Avg Loss: 0.7128
-INFO:local_logger:Epoch[051/800], Step[0600/0626], Avg Loss: 0.7128
-INFO:local_logger:Epoch[051/800], Step[0600/0626], Avg Loss: 0.7122
-INFO:local_logger:Epoch[051/800], Step[0600/0626], Avg Loss: 0.7120
-INFO:master_logger:Epoch[051/800], Step[0600/0626], Avg Loss: 0.7125
-INFO:local_logger:Epoch[051/800], Step[0600/0626], Avg Loss: 0.7125
-INFO:local_logger:----- Epoch[051/800], Train Loss: 0.7127, time: 848.53
-INFO:local_logger:Now training epoch 52. LR=0.000150
-INFO:local_logger:----- Epoch[051/800], Train Loss: 0.7120, time: 845.42
-INFO:master_logger:----- Epoch[051/800], Train Loss: 0.7125, time: 845.42
-INFO:local_logger:----- Epoch[051/800], Train Loss: 0.7125, time: 849.57
-INFO:local_logger:Now training epoch 52. LR=0.000150
-INFO:local_logger:----- Epoch[051/800], Train Loss: 0.7126, time: 849.50
-INFO:local_logger:Now training epoch 52. LR=0.000150
-INFO:local_logger:----- Epoch[051/800], Train Loss: 0.7127, time: 849.57
-INFO:local_logger:Now training epoch 52. LR=0.000150
-INFO:local_logger:----- Epoch[051/800], Train Loss: 0.7126, time: 849.14
-INFO:local_logger:Now training epoch 52. LR=0.000150
-INFO:local_logger:----- Epoch[051/800], Train Loss: 0.7122, time: 849.16
-INFO:local_logger:Now training epoch 52. LR=0.000150
-INFO:local_logger:----- Epoch[051/800], Train Loss: 0.7126, time: 849.16
-INFO:local_logger:Now training epoch 52. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-51-Loss-0.7120018708518765.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-51-Loss-0.7120018708518765.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-51-Loss-0.7120018708518765.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-51-Loss-0.7120018708518765.pdopt
-INFO:local_logger:Now training epoch 52. LR=0.000150
-INFO:master_logger:Now training epoch 52. LR=0.000150
-INFO:local_logger:Epoch[052/800], Step[0000/0626], Avg Loss: 0.7075
-INFO:local_logger:Epoch[052/800], Step[0000/0626], Avg Loss: 0.7092
-INFO:local_logger:Epoch[052/800], Step[0000/0626], Avg Loss: 0.7105
-INFO:master_logger:Epoch[052/800], Step[0000/0626], Avg Loss: 0.7076
-INFO:local_logger:Epoch[052/800], Step[0000/0626], Avg Loss: 0.6988
-INFO:local_logger:Epoch[052/800], Step[0000/0626], Avg Loss: 0.7034
-INFO:local_logger:Epoch[052/800], Step[0000/0626], Avg Loss: 0.7050
-INFO:local_logger:Epoch[052/800], Step[0000/0626], Avg Loss: 0.7185
-INFO:local_logger:Epoch[052/800], Step[0000/0626], Avg Loss: 0.7081
-INFO:local_logger:Epoch[052/800], Step[0100/0626], Avg Loss: 0.7119
-INFO:local_logger:Epoch[052/800], Step[0100/0626], Avg Loss: 0.7123
-INFO:local_logger:Epoch[052/800], Step[0100/0626], Avg Loss: 0.7110
-INFO:local_logger:Epoch[052/800], Step[0100/0626], Avg Loss: 0.7121
-INFO:local_logger:Epoch[052/800], Step[0100/0626], Avg Loss: 0.7118
-INFO:local_logger:Epoch[052/800], Step[0100/0626], Avg Loss: 0.7113
-INFO:local_logger:Epoch[052/800], Step[0100/0626], Avg Loss: 0.7116
-INFO:master_logger:Epoch[052/800], Step[0100/0626], Avg Loss: 0.7117
-INFO:local_logger:Epoch[052/800], Step[0100/0626], Avg Loss: 0.7119
-INFO:local_logger:Epoch[052/800], Step[0200/0626], Avg Loss: 0.7110
-INFO:local_logger:Epoch[052/800], Step[0200/0626], Avg Loss: 0.7116
-INFO:local_logger:Epoch[052/800], Step[0200/0626], Avg Loss: 0.7112
-INFO:local_logger:Epoch[052/800], Step[0200/0626], Avg Loss: 0.7122
-INFO:local_logger:Epoch[052/800], Step[0200/0626], Avg Loss: 0.7114
-INFO:local_logger:Epoch[052/800], Step[0200/0626], Avg Loss: 0.7115
-INFO:local_logger:Epoch[052/800], Step[0200/0626], Avg Loss: 0.7111
-INFO:master_logger:Epoch[052/800], Step[0200/0626], Avg Loss: 0.7115
-INFO:local_logger:Epoch[052/800], Step[0200/0626], Avg Loss: 0.7123
-INFO:local_logger:Epoch[052/800], Step[0300/0626], Avg Loss: 0.7116
-INFO:local_logger:Epoch[052/800], Step[0300/0626], Avg Loss: 0.7112
-INFO:local_logger:Epoch[052/800], Step[0300/0626], Avg Loss: 0.7117
-INFO:local_logger:Epoch[052/800], Step[0300/0626], Avg Loss: 0.7109
-INFO:local_logger:Epoch[052/800], Step[0300/0626], Avg Loss: 0.7114
-INFO:master_logger:Epoch[052/800], Step[0300/0626], Avg Loss: 0.7113
-INFO:local_logger:Epoch[052/800], Step[0300/0626], Avg Loss: 0.7111
-INFO:local_logger:Epoch[052/800], Step[0300/0626], Avg Loss: 0.7113
-INFO:local_logger:Epoch[052/800], Step[0300/0626], Avg Loss: 0.7116
-INFO:local_logger:Epoch[052/800], Step[0400/0626], Avg Loss: 0.7117
-INFO:local_logger:Epoch[052/800], Step[0400/0626], Avg Loss: 0.7112
-INFO:local_logger:Epoch[052/800], Step[0400/0626], Avg Loss: 0.7113
-INFO:local_logger:Epoch[052/800], Step[0400/0626], Avg Loss: 0.7110
-INFO:master_logger:Epoch[052/800], Step[0400/0626], Avg Loss: 0.7112
-INFO:local_logger:Epoch[052/800], Step[0400/0626], Avg Loss: 0.7111
-INFO:local_logger:Epoch[052/800], Step[0400/0626], Avg Loss: 0.7107
-INFO:local_logger:Epoch[052/800], Step[0400/0626], Avg Loss: 0.7117
-INFO:local_logger:Epoch[052/800], Step[0400/0626], Avg Loss: 0.7111
-INFO:local_logger:Epoch[052/800], Step[0500/0626], Avg Loss: 0.7107
-INFO:local_logger:Epoch[052/800], Step[0500/0626], Avg Loss: 0.7112
-INFO:local_logger:Epoch[052/800], Step[0500/0626], Avg Loss: 0.7115
-INFO:local_logger:Epoch[052/800], Step[0500/0626], Avg Loss: 0.7114
-INFO:local_logger:Epoch[052/800], Step[0500/0626], Avg Loss: 0.7107
-INFO:master_logger:Epoch[052/800], Step[0500/0626], Avg Loss: 0.7111
-INFO:local_logger:Epoch[052/800], Step[0500/0626], Avg Loss: 0.7109
-INFO:local_logger:Epoch[052/800], Step[0500/0626], Avg Loss: 0.7116
-INFO:local_logger:Epoch[052/800], Step[0500/0626], Avg Loss: 0.7112
-INFO:local_logger:Epoch[052/800], Step[0600/0626], Avg Loss: 0.7106
-INFO:local_logger:Epoch[052/800], Step[0600/0626], Avg Loss: 0.7113
-INFO:local_logger:Epoch[052/800], Step[0600/0626], Avg Loss: 0.7114
-INFO:local_logger:Epoch[052/800], Step[0600/0626], Avg Loss: 0.7109
-INFO:local_logger:Epoch[052/800], Step[0600/0626], Avg Loss: 0.7113
-INFO:local_logger:Epoch[052/800], Step[0600/0626], Avg Loss: 0.7110
-INFO:local_logger:Epoch[052/800], Step[0600/0626], Avg Loss: 0.7113
-INFO:master_logger:Epoch[052/800], Step[0600/0626], Avg Loss: 0.7111
-INFO:local_logger:Epoch[052/800], Step[0600/0626], Avg Loss: 0.7107
-INFO:local_logger:----- Epoch[052/800], Train Loss: 0.7107, time: 899.12
-INFO:local_logger:Now training epoch 53. LR=0.000150
-INFO:local_logger:----- Epoch[052/800], Train Loss: 0.7109, time: 899.76
-INFO:local_logger:Now training epoch 53. LR=0.000150
-INFO:local_logger:----- Epoch[052/800], Train Loss: 0.7113, time: 899.78
-INFO:local_logger:Now training epoch 53. LR=0.000150
-INFO:local_logger:----- Epoch[052/800], Train Loss: 0.7113, time: 900.39
-INFO:local_logger:----- Epoch[052/800], Train Loss: 0.7114, time: 896.39
-INFO:local_logger:Now training epoch 53. LR=0.000150
-INFO:master_logger:----- Epoch[052/800], Train Loss: 0.7110, time: 896.39
-INFO:local_logger:----- Epoch[052/800], Train Loss: 0.7109, time: 899.78
-INFO:local_logger:Now training epoch 53. LR=0.000150
-INFO:local_logger:----- Epoch[052/800], Train Loss: 0.7105, time: 899.79
-INFO:local_logger:Now training epoch 53. LR=0.000150
-INFO:local_logger:----- Epoch[052/800], Train Loss: 0.7112, time: 899.77
-INFO:local_logger:Now training epoch 53. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-52-Loss-0.7113885321068728.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-52-Loss-0.7113885321068728.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-52-Loss-0.7113885321068728.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-52-Loss-0.7113885321068728.pdopt
-INFO:local_logger:Now training epoch 53. LR=0.000150
-INFO:master_logger:Now training epoch 53. LR=0.000150
-INFO:local_logger:Epoch[053/800], Step[0000/0626], Avg Loss: 0.7064
-INFO:master_logger:Epoch[053/800], Step[0000/0626], Avg Loss: 0.7112
-INFO:local_logger:Epoch[053/800], Step[0000/0626], Avg Loss: 0.7098
-INFO:local_logger:Epoch[053/800], Step[0000/0626], Avg Loss: 0.7075
-INFO:local_logger:Epoch[053/800], Step[0000/0626], Avg Loss: 0.7109
-INFO:local_logger:Epoch[053/800], Step[0000/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[053/800], Step[0000/0626], Avg Loss: 0.7135
-INFO:local_logger:Epoch[053/800], Step[0000/0626], Avg Loss: 0.7272
-INFO:local_logger:Epoch[053/800], Step[0000/0626], Avg Loss: 0.7107
-INFO:local_logger:Epoch[053/800], Step[0100/0626], Avg Loss: 0.7100
-INFO:master_logger:Epoch[053/800], Step[0100/0626], Avg Loss: 0.7105
-INFO:local_logger:Epoch[053/800], Step[0100/0626], Avg Loss: 0.7112
-INFO:local_logger:Epoch[053/800], Step[0100/0626], Avg Loss: 0.7106
-INFO:local_logger:Epoch[053/800], Step[0100/0626], Avg Loss: 0.7117
-INFO:local_logger:Epoch[053/800], Step[0100/0626], Avg Loss: 0.7093
-INFO:local_logger:Epoch[053/800], Step[0100/0626], Avg Loss: 0.7106
-INFO:local_logger:Epoch[053/800], Step[0100/0626], Avg Loss: 0.7110
-INFO:local_logger:Epoch[053/800], Step[0100/0626], Avg Loss: 0.7096
-INFO:local_logger:Epoch[053/800], Step[0200/0626], Avg Loss: 0.7102
-INFO:local_logger:Epoch[053/800], Step[0200/0626], Avg Loss: 0.7106
-INFO:local_logger:Epoch[053/800], Step[0200/0626], Avg Loss: 0.7117
-INFO:local_logger:Epoch[053/800], Step[0200/0626], Avg Loss: 0.7098
-INFO:local_logger:Epoch[053/800], Step[0200/0626], Avg Loss: 0.7102
-INFO:local_logger:Epoch[053/800], Step[0200/0626], Avg Loss: 0.7105
-INFO:master_logger:Epoch[053/800], Step[0200/0626], Avg Loss: 0.7103
-INFO:local_logger:Epoch[053/800], Step[0200/0626], Avg Loss: 0.7102
-INFO:local_logger:Epoch[053/800], Step[0200/0626], Avg Loss: 0.7094
-INFO:local_logger:Epoch[053/800], Step[0300/0626], Avg Loss: 0.7105
-INFO:local_logger:Epoch[053/800], Step[0300/0626], Avg Loss: 0.7101
-INFO:local_logger:Epoch[053/800], Step[0300/0626], Avg Loss: 0.7099
-INFO:local_logger:Epoch[053/800], Step[0300/0626], Avg Loss: 0.7094
-INFO:local_logger:Epoch[053/800], Step[0300/0626], Avg Loss: 0.7103
-INFO:local_logger:Epoch[053/800], Step[0300/0626], Avg Loss: 0.7107
-INFO:master_logger:Epoch[053/800], Step[0300/0626], Avg Loss: 0.7101
-INFO:local_logger:Epoch[053/800], Step[0300/0626], Avg Loss: 0.7099
-INFO:local_logger:Epoch[053/800], Step[0300/0626], Avg Loss: 0.7101
-INFO:local_logger:Epoch[053/800], Step[0400/0626], Avg Loss: 0.7103
-INFO:local_logger:Epoch[053/800], Step[0400/0626], Avg Loss: 0.7096
-INFO:local_logger:Epoch[053/800], Step[0400/0626], Avg Loss: 0.7104
-INFO:local_logger:Epoch[053/800], Step[0400/0626], Avg Loss: 0.7103
-INFO:local_logger:Epoch[053/800], Step[0400/0626], Avg Loss: 0.7102
-INFO:master_logger:Epoch[053/800], Step[0400/0626], Avg Loss: 0.7102
-INFO:local_logger:Epoch[053/800], Step[0400/0626], Avg Loss: 0.7104
-INFO:local_logger:Epoch[053/800], Step[0400/0626], Avg Loss: 0.7101
-INFO:local_logger:Epoch[053/800], Step[0400/0626], Avg Loss: 0.7103
-INFO:local_logger:Epoch[053/800], Step[0500/0626], Avg Loss: 0.7105
-INFO:local_logger:Epoch[053/800], Step[0500/0626], Avg Loss: 0.7104
-INFO:local_logger:Epoch[053/800], Step[0500/0626], Avg Loss: 0.7103
-INFO:local_logger:Epoch[053/800], Step[0500/0626], Avg Loss: 0.7101
-INFO:local_logger:Epoch[053/800], Step[0500/0626], Avg Loss: 0.7096
-INFO:local_logger:Epoch[053/800], Step[0500/0626], Avg Loss: 0.7102
-INFO:local_logger:Epoch[053/800], Step[0500/0626], Avg Loss: 0.7102
-INFO:master_logger:Epoch[053/800], Step[0500/0626], Avg Loss: 0.7102
-INFO:local_logger:Epoch[053/800], Step[0500/0626], Avg Loss: 0.7103
-INFO:local_logger:Epoch[053/800], Step[0600/0626], Avg Loss: 0.7094
-INFO:local_logger:Epoch[053/800], Step[0600/0626], Avg Loss: 0.7102
-INFO:local_logger:Epoch[053/800], Step[0600/0626], Avg Loss: 0.7101
-INFO:local_logger:Epoch[053/800], Step[0600/0626], Avg Loss: 0.7100
-INFO:local_logger:Epoch[053/800], Step[0600/0626], Avg Loss: 0.7102
-INFO:local_logger:Epoch[053/800], Step[0600/0626], Avg Loss: 0.7101
-INFO:master_logger:Epoch[053/800], Step[0600/0626], Avg Loss: 0.7100
-INFO:local_logger:Epoch[053/800], Step[0600/0626], Avg Loss: 0.7101
-INFO:local_logger:Epoch[053/800], Step[0600/0626], Avg Loss: 0.7098
-INFO:local_logger:----- Epoch[053/800], Train Loss: 0.7102, time: 889.34
-INFO:master_logger:----- Epoch[053/800], Train Loss: 0.7100, time: 889.34
-INFO:local_logger:----- Epoch[053/800], Train Loss: 0.7102, time: 893.08
-INFO:local_logger:Now training epoch 54. LR=0.000150
-INFO:local_logger:----- Epoch[053/800], Train Loss: 0.7100, time: 893.08
-INFO:local_logger:Now training epoch 54. LR=0.000150
-INFO:local_logger:----- Epoch[053/800], Train Loss: 0.7094, time: 893.08
-INFO:local_logger:Now training epoch 54. LR=0.000150
-INFO:local_logger:----- Epoch[053/800], Train Loss: 0.7100, time: 893.10
-INFO:local_logger:Now training epoch 54. LR=0.000150
-INFO:local_logger:----- Epoch[053/800], Train Loss: 0.7100, time: 893.10
-INFO:local_logger:Now training epoch 54. LR=0.000150
-INFO:local_logger:----- Epoch[053/800], Train Loss: 0.7100, time: 893.75
-INFO:local_logger:Now training epoch 54. LR=0.000150
-INFO:local_logger:----- Epoch[053/800], Train Loss: 0.7098, time: 893.11
-INFO:local_logger:Now training epoch 54. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-53-Loss-0.7102464560284915.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-53-Loss-0.7102464560284915.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-53-Loss-0.7102464560284915.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-53-Loss-0.7102464560284915.pdopt
-INFO:local_logger:Now training epoch 54. LR=0.000150
-INFO:master_logger:Now training epoch 54. LR=0.000150
-INFO:local_logger:Epoch[054/800], Step[0000/0626], Avg Loss: 0.7001
-INFO:local_logger:Epoch[054/800], Step[0000/0626], Avg Loss: 0.7124
-INFO:local_logger:Epoch[054/800], Step[0000/0626], Avg Loss: 0.7103
-INFO:local_logger:Epoch[054/800], Step[0000/0626], Avg Loss: 0.7021
-INFO:master_logger:Epoch[054/800], Step[0000/0626], Avg Loss: 0.7094
-INFO:local_logger:Epoch[054/800], Step[0000/0626], Avg Loss: 0.7254
-INFO:local_logger:Epoch[054/800], Step[0000/0626], Avg Loss: 0.7077
-INFO:local_logger:Epoch[054/800], Step[0000/0626], Avg Loss: 0.7109
-INFO:local_logger:Epoch[054/800], Step[0000/0626], Avg Loss: 0.7063
-INFO:local_logger:Epoch[054/800], Step[0100/0626], Avg Loss: 0.7094
-INFO:local_logger:Epoch[054/800], Step[0100/0626], Avg Loss: 0.7105
-INFO:local_logger:Epoch[054/800], Step[0100/0626], Avg Loss: 0.7093
-INFO:master_logger:Epoch[054/800], Step[0100/0626], Avg Loss: 0.7094
-INFO:local_logger:Epoch[054/800], Step[0100/0626], Avg Loss: 0.7095
-INFO:local_logger:Epoch[054/800], Step[0100/0626], Avg Loss: 0.7089
-INFO:local_logger:Epoch[054/800], Step[0100/0626], Avg Loss: 0.7095
-INFO:local_logger:Epoch[054/800], Step[0100/0626], Avg Loss: 0.7097
-INFO:local_logger:Epoch[054/800], Step[0100/0626], Avg Loss: 0.7087
-INFO:local_logger:Epoch[054/800], Step[0200/0626], Avg Loss: 0.7082
-INFO:local_logger:Epoch[054/800], Step[0200/0626], Avg Loss: 0.7086
-INFO:local_logger:Epoch[054/800], Step[0200/0626], Avg Loss: 0.7092
-INFO:local_logger:Epoch[054/800], Step[0200/0626], Avg Loss: 0.7092
-INFO:local_logger:Epoch[054/800], Step[0200/0626], Avg Loss: 0.7087
-INFO:local_logger:Epoch[054/800], Step[0200/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0200/0626], Avg Loss: 0.7085
-INFO:master_logger:Epoch[054/800], Step[0200/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0200/0626], Avg Loss: 0.7093
-INFO:local_logger:Epoch[054/800], Step[0300/0626], Avg Loss: 0.7083
-INFO:local_logger:Epoch[054/800], Step[0300/0626], Avg Loss: 0.7089
-INFO:local_logger:Epoch[054/800], Step[0300/0626], Avg Loss: 0.7089
-INFO:local_logger:Epoch[054/800], Step[0300/0626], Avg Loss: 0.7089
-INFO:local_logger:Epoch[054/800], Step[0300/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0300/0626], Avg Loss: 0.7085
-INFO:master_logger:Epoch[054/800], Step[0300/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0300/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0300/0626], Avg Loss: 0.7092
-INFO:local_logger:Epoch[054/800], Step[0400/0626], Avg Loss: 0.7089
-INFO:local_logger:Epoch[054/800], Step[0400/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0400/0626], Avg Loss: 0.7090
-INFO:local_logger:Epoch[054/800], Step[0400/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0400/0626], Avg Loss: 0.7083
-INFO:master_logger:Epoch[054/800], Step[0400/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0400/0626], Avg Loss: 0.7090
-INFO:local_logger:Epoch[054/800], Step[0400/0626], Avg Loss: 0.7091
-INFO:local_logger:Epoch[054/800], Step[0400/0626], Avg Loss: 0.7083
-INFO:local_logger:Epoch[054/800], Step[0500/0626], Avg Loss: 0.7087
-INFO:local_logger:Epoch[054/800], Step[0500/0626], Avg Loss: 0.7083
-INFO:local_logger:Epoch[054/800], Step[0500/0626], Avg Loss: 0.7084
-INFO:local_logger:Epoch[054/800], Step[0500/0626], Avg Loss: 0.7085
-INFO:local_logger:Epoch[054/800], Step[0500/0626], Avg Loss: 0.7087
-INFO:local_logger:Epoch[054/800], Step[0500/0626], Avg Loss: 0.7090
-INFO:local_logger:Epoch[054/800], Step[0500/0626], Avg Loss: 0.7090
-INFO:master_logger:Epoch[054/800], Step[0500/0626], Avg Loss: 0.7086
-INFO:local_logger:Epoch[054/800], Step[0500/0626], Avg Loss: 0.7083
-INFO:local_logger:Epoch[054/800], Step[0600/0626], Avg Loss: 0.7083
-INFO:local_logger:Epoch[054/800], Step[0600/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0600/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0600/0626], Avg Loss: 0.7085
-INFO:local_logger:Epoch[054/800], Step[0600/0626], Avg Loss: 0.7087
-INFO:master_logger:Epoch[054/800], Step[0600/0626], Avg Loss: 0.7085
-INFO:local_logger:Epoch[054/800], Step[0600/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[054/800], Step[0600/0626], Avg Loss: 0.7081
-INFO:local_logger:Epoch[054/800], Step[0600/0626], Avg Loss: 0.7081
-INFO:local_logger:----- Epoch[054/800], Train Loss: 0.7087, time: 903.80
-INFO:local_logger:Now training epoch 55. LR=0.000150
-INFO:local_logger:----- Epoch[054/800], Train Loss: 0.7082, time: 903.85
-INFO:local_logger:Now training epoch 55. LR=0.000150
-INFO:local_logger:----- Epoch[054/800], Train Loss: 0.7083, time: 904.37
-INFO:local_logger:Now training epoch 55. LR=0.000150
-INFO:local_logger:----- Epoch[054/800], Train Loss: 0.7081, time: 904.37
-INFO:local_logger:Now training epoch 55. LR=0.000150
-INFO:local_logger:----- Epoch[054/800], Train Loss: 0.7085, time: 904.35
-INFO:local_logger:Now training epoch 55. LR=0.000150
-INFO:local_logger:----- Epoch[054/800], Train Loss: 0.7087, time: 904.43
-INFO:local_logger:Now training epoch 55. LR=0.000150
-INFO:local_logger:----- Epoch[054/800], Train Loss: 0.7088, time: 900.86
-INFO:master_logger:----- Epoch[054/800], Train Loss: 0.7085, time: 900.86
-INFO:local_logger:----- Epoch[054/800], Train Loss: 0.7087, time: 904.47
-INFO:local_logger:Now training epoch 55. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-54-Loss-0.7087632936406654.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-54-Loss-0.7087632936406654.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-54-Loss-0.7087632936406654.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-54-Loss-0.7087632936406654.pdopt
-INFO:local_logger:Now training epoch 55. LR=0.000150
-INFO:master_logger:Now training epoch 55. LR=0.000150
-INFO:local_logger:Epoch[055/800], Step[0000/0626], Avg Loss: 0.6897
-INFO:master_logger:Epoch[055/800], Step[0000/0626], Avg Loss: 0.7057
-INFO:local_logger:Epoch[055/800], Step[0000/0626], Avg Loss: 0.7023
-INFO:local_logger:Epoch[055/800], Step[0000/0626], Avg Loss: 0.6937
-INFO:local_logger:Epoch[055/800], Step[0000/0626], Avg Loss: 0.7040
-INFO:local_logger:Epoch[055/800], Step[0000/0626], Avg Loss: 0.7203
-INFO:local_logger:Epoch[055/800], Step[0000/0626], Avg Loss: 0.7165
-INFO:local_logger:Epoch[055/800], Step[0000/0626], Avg Loss: 0.7079
-INFO:local_logger:Epoch[055/800], Step[0000/0626], Avg Loss: 0.7108
-INFO:local_logger:Epoch[055/800], Step[0100/0626], Avg Loss: 0.7093
-INFO:local_logger:Epoch[055/800], Step[0100/0626], Avg Loss: 0.7084
-INFO:local_logger:Epoch[055/800], Step[0100/0626], Avg Loss: 0.7081
-INFO:local_logger:Epoch[055/800], Step[0100/0626], Avg Loss: 0.7084
-INFO:master_logger:Epoch[055/800], Step[0100/0626], Avg Loss: 0.7083
-INFO:local_logger:Epoch[055/800], Step[0100/0626], Avg Loss: 0.7082
-INFO:local_logger:Epoch[055/800], Step[0100/0626], Avg Loss: 0.7082
-INFO:local_logger:Epoch[055/800], Step[0100/0626], Avg Loss: 0.7078
-INFO:local_logger:Epoch[055/800], Step[0100/0626], Avg Loss: 0.7078
-INFO:local_logger:Epoch[055/800], Step[0200/0626], Avg Loss: 0.7074
-INFO:local_logger:Epoch[055/800], Step[0200/0626], Avg Loss: 0.7086
-INFO:local_logger:Epoch[055/800], Step[0200/0626], Avg Loss: 0.7080
-INFO:local_logger:Epoch[055/800], Step[0200/0626], Avg Loss: 0.7086
-INFO:master_logger:Epoch[055/800], Step[0200/0626], Avg Loss: 0.7081
-INFO:local_logger:Epoch[055/800], Step[0200/0626], Avg Loss: 0.7088
-INFO:local_logger:Epoch[055/800], Step[0200/0626], Avg Loss: 0.7078
-INFO:local_logger:Epoch[055/800], Step[0200/0626], Avg Loss: 0.7078
-INFO:local_logger:Epoch[055/800], Step[0200/0626], Avg Loss: 0.7077
-INFO:local_logger:Epoch[055/800], Step[0300/0626], Avg Loss: 0.7071
-INFO:local_logger:Epoch[055/800], Step[0300/0626], Avg Loss: 0.7086
-INFO:local_logger:Epoch[055/800], Step[0300/0626], Avg Loss: 0.7076
-INFO:master_logger:Epoch[055/800], Step[0300/0626], Avg Loss: 0.7080
-INFO:local_logger:Epoch[055/800], Step[0300/0626], Avg Loss: 0.7086
-INFO:local_logger:Epoch[055/800], Step[0300/0626], Avg Loss: 0.7078
-INFO:local_logger:Epoch[055/800], Step[0300/0626], Avg Loss: 0.7079
-INFO:local_logger:Epoch[055/800], Step[0300/0626], Avg Loss: 0.7079
-INFO:local_logger:Epoch[055/800], Step[0300/0626], Avg Loss: 0.7083
-INFO:local_logger:Epoch[055/800], Step[0400/0626], Avg Loss: 0.7073
-INFO:local_logger:Epoch[055/800], Step[0400/0626], Avg Loss: 0.7081
-INFO:local_logger:Epoch[055/800], Step[0400/0626], Avg Loss: 0.7079
-INFO:local_logger:Epoch[055/800], Step[0400/0626], Avg Loss: 0.7076
-INFO:local_logger:Epoch[055/800], Step[0400/0626], Avg Loss: 0.7074
-INFO:local_logger:Epoch[055/800], Step[0400/0626], Avg Loss: 0.7078
-INFO:local_logger:Epoch[055/800], Step[0400/0626], Avg Loss: 0.7087
-INFO:master_logger:Epoch[055/800], Step[0400/0626], Avg Loss: 0.7079
-INFO:local_logger:Epoch[055/800], Step[0400/0626], Avg Loss: 0.7081
-INFO:local_logger:Epoch[055/800], Step[0500/0626], Avg Loss: 0.7087
-INFO:local_logger:Epoch[055/800], Step[0500/0626], Avg Loss: 0.7074
-INFO:local_logger:Epoch[055/800], Step[0500/0626], Avg Loss: 0.7077
-INFO:local_logger:Epoch[055/800], Step[0500/0626], Avg Loss: 0.7082
-INFO:local_logger:Epoch[055/800], Step[0500/0626], Avg Loss: 0.7076
-INFO:local_logger:Epoch[055/800], Step[0500/0626], Avg Loss: 0.7073
-INFO:master_logger:Epoch[055/800], Step[0500/0626], Avg Loss: 0.7078
-INFO:local_logger:Epoch[055/800], Step[0500/0626], Avg Loss: 0.7079
-INFO:local_logger:Epoch[055/800], Step[0500/0626], Avg Loss: 0.7074
-INFO:local_logger:Epoch[055/800], Step[0600/0626], Avg Loss: 0.7073
-INFO:local_logger:Epoch[055/800], Step[0600/0626], Avg Loss: 0.7074
-INFO:local_logger:Epoch[055/800], Step[0600/0626], Avg Loss: 0.7081
-INFO:local_logger:Epoch[055/800], Step[0600/0626], Avg Loss: 0.7074
-INFO:local_logger:Epoch[055/800], Step[0600/0626], Avg Loss: 0.7072
-INFO:local_logger:Epoch[055/800], Step[0600/0626], Avg Loss: 0.7084
-INFO:master_logger:Epoch[055/800], Step[0600/0626], Avg Loss: 0.7076
-INFO:local_logger:Epoch[055/800], Step[0600/0626], Avg Loss: 0.7077
-INFO:local_logger:Epoch[055/800], Step[0600/0626], Avg Loss: 0.7074
-INFO:local_logger:----- Epoch[055/800], Train Loss: 0.7081, time: 859.40
-INFO:local_logger:Now training epoch 56. LR=0.000150
-INFO:local_logger:----- Epoch[055/800], Train Loss: 0.7074, time: 855.79
-INFO:master_logger:----- Epoch[055/800], Train Loss: 0.7076, time: 855.79
-INFO:local_logger:----- Epoch[055/800], Train Loss: 0.7073, time: 859.94
-INFO:local_logger:Now training epoch 56. LR=0.000150
-INFO:local_logger:----- Epoch[055/800], Train Loss: 0.7073, time: 859.85
-INFO:local_logger:Now training epoch 56. LR=0.000150
-INFO:local_logger:----- Epoch[055/800], Train Loss: 0.7072, time: 860.45
-INFO:local_logger:Now training epoch 56. LR=0.000150
-INFO:local_logger:----- Epoch[055/800], Train Loss: 0.7074, time: 859.97
-INFO:local_logger:Now training epoch 56. LR=0.000150
-INFO:local_logger:----- Epoch[055/800], Train Loss: 0.7077, time: 859.89
-INFO:local_logger:Now training epoch 56. LR=0.000150
-INFO:local_logger:----- Epoch[055/800], Train Loss: 0.7083, time: 860.52
-INFO:local_logger:Now training epoch 56. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-55-Loss-0.7074442110429905.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-55-Loss-0.7074442110429905.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-55-Loss-0.7074442110429905.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-55-Loss-0.7074442110429905.pdopt
-INFO:local_logger:Now training epoch 56. LR=0.000150
-INFO:master_logger:Now training epoch 56. LR=0.000150
-INFO:local_logger:Epoch[056/800], Step[0000/0626], Avg Loss: 0.6969
-INFO:local_logger:Epoch[056/800], Step[0000/0626], Avg Loss: 0.7058
-INFO:local_logger:Epoch[056/800], Step[0000/0626], Avg Loss: 0.7068
-INFO:master_logger:Epoch[056/800], Step[0000/0626], Avg Loss: 0.7070
-INFO:local_logger:Epoch[056/800], Step[0000/0626], Avg Loss: 0.7114
-INFO:local_logger:Epoch[056/800], Step[0000/0626], Avg Loss: 0.7124
-INFO:local_logger:Epoch[056/800], Step[0000/0626], Avg Loss: 0.6985
-INFO:local_logger:Epoch[056/800], Step[0000/0626], Avg Loss: 0.7150
-INFO:local_logger:Epoch[056/800], Step[0000/0626], Avg Loss: 0.7089
-INFO:local_logger:Epoch[056/800], Step[0100/0626], Avg Loss: 0.7066
-INFO:local_logger:Epoch[056/800], Step[0100/0626], Avg Loss: 0.7074
-INFO:local_logger:Epoch[056/800], Step[0100/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[056/800], Step[0100/0626], Avg Loss: 0.7079
-INFO:local_logger:Epoch[056/800], Step[0100/0626], Avg Loss: 0.7056
-INFO:master_logger:Epoch[056/800], Step[0100/0626], Avg Loss: 0.7068
-INFO:local_logger:Epoch[056/800], Step[0100/0626], Avg Loss: 0.7069
-INFO:local_logger:Epoch[056/800], Step[0100/0626], Avg Loss: 0.7066
-INFO:local_logger:Epoch[056/800], Step[0100/0626], Avg Loss: 0.7070
-INFO:local_logger:Epoch[056/800], Step[0200/0626], Avg Loss: 0.7071
-INFO:local_logger:Epoch[056/800], Step[0200/0626], Avg Loss: 0.7066
-INFO:local_logger:Epoch[056/800], Step[0200/0626], Avg Loss: 0.7069
-INFO:local_logger:Epoch[056/800], Step[0200/0626], Avg Loss: 0.7075
-INFO:local_logger:Epoch[056/800], Step[0200/0626], Avg Loss: 0.7069
-INFO:local_logger:Epoch[056/800], Step[0200/0626], Avg Loss: 0.7069
-INFO:local_logger:Epoch[056/800], Step[0200/0626], Avg Loss: 0.7069
-INFO:local_logger:Epoch[056/800], Step[0200/0626], Avg Loss: 0.7059
-INFO:master_logger:Epoch[056/800], Step[0200/0626], Avg Loss: 0.7068
-INFO:local_logger:Epoch[056/800], Step[0300/0626], Avg Loss: 0.7077
-INFO:local_logger:Epoch[056/800], Step[0300/0626], Avg Loss: 0.7064
-INFO:local_logger:Epoch[056/800], Step[0300/0626], Avg Loss: 0.7068
-INFO:local_logger:Epoch[056/800], Step[0300/0626], Avg Loss: 0.7067
-INFO:local_logger:Epoch[056/800], Step[0300/0626], Avg Loss: 0.7067
-INFO:local_logger:Epoch[056/800], Step[0300/0626], Avg Loss: 0.7074
-INFO:local_logger:Epoch[056/800], Step[0300/0626], Avg Loss: 0.7059
-INFO:local_logger:Epoch[056/800], Step[0300/0626], Avg Loss: 0.7065
-INFO:master_logger:Epoch[056/800], Step[0300/0626], Avg Loss: 0.7068
-INFO:local_logger:Epoch[056/800], Step[0400/0626], Avg Loss: 0.7060
-INFO:local_logger:Epoch[056/800], Step[0400/0626], Avg Loss: 0.7065
-INFO:local_logger:Epoch[056/800], Step[0400/0626], Avg Loss: 0.7066
-INFO:local_logger:Epoch[056/800], Step[0400/0626], Avg Loss: 0.7069
-INFO:local_logger:Epoch[056/800], Step[0400/0626], Avg Loss: 0.7073
-INFO:local_logger:Epoch[056/800], Step[0400/0626], Avg Loss: 0.7072
-INFO:master_logger:Epoch[056/800], Step[0400/0626], Avg Loss: 0.7067
-INFO:local_logger:Epoch[056/800], Step[0400/0626], Avg Loss: 0.7063
-INFO:local_logger:Epoch[056/800], Step[0400/0626], Avg Loss: 0.7067
-INFO:local_logger:Epoch[056/800], Step[0500/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[056/800], Step[0500/0626], Avg Loss: 0.7063
-INFO:local_logger:Epoch[056/800], Step[0500/0626], Avg Loss: 0.7069
-INFO:local_logger:Epoch[056/800], Step[0500/0626], Avg Loss: 0.7069
-INFO:master_logger:Epoch[056/800], Step[0500/0626], Avg Loss: 0.7066
-INFO:local_logger:Epoch[056/800], Step[0500/0626], Avg Loss: 0.7067
-INFO:local_logger:Epoch[056/800], Step[0500/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[056/800], Step[0500/0626], Avg Loss: 0.7068
-INFO:local_logger:Epoch[056/800], Step[0500/0626], Avg Loss: 0.7066
-INFO:local_logger:Epoch[056/800], Step[0600/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[056/800], Step[0600/0626], Avg Loss: 0.7066
-INFO:local_logger:Epoch[056/800], Step[0600/0626], Avg Loss: 0.7068
-INFO:local_logger:Epoch[056/800], Step[0600/0626], Avg Loss: 0.7069
-INFO:local_logger:Epoch[056/800], Step[0600/0626], Avg Loss: 0.7063
-INFO:local_logger:Epoch[056/800], Step[0600/0626], Avg Loss: 0.7068
-INFO:local_logger:Epoch[056/800], Step[0600/0626], Avg Loss: 0.7061
-INFO:master_logger:Epoch[056/800], Step[0600/0626], Avg Loss: 0.7065
-INFO:local_logger:Epoch[056/800], Step[0600/0626], Avg Loss: 0.7065
-INFO:local_logger:----- Epoch[056/800], Train Loss: 0.7067, time: 890.40
-INFO:local_logger:Now training epoch 57. LR=0.000150
-INFO:local_logger:----- Epoch[056/800], Train Loss: 0.7065, time: 890.61
-INFO:local_logger:Now training epoch 57. LR=0.000150
-INFO:local_logger:----- Epoch[056/800], Train Loss: 0.7066, time: 890.94
-INFO:local_logger:Now training epoch 57. LR=0.000150
-INFO:local_logger:----- Epoch[056/800], Train Loss: 0.7068, time: 891.55
-INFO:local_logger:----- Epoch[056/800], Train Loss: 0.7063, time: 890.99
-INFO:local_logger:Now training epoch 57. LR=0.000150
-INFO:local_logger:----- Epoch[056/800], Train Loss: 0.7060, time: 891.01
-INFO:local_logger:Now training epoch 57. LR=0.000150
-INFO:local_logger:Now training epoch 57. LR=0.000150
-INFO:local_logger:----- Epoch[056/800], Train Loss: 0.7063, time: 887.67
-INFO:master_logger:----- Epoch[056/800], Train Loss: 0.7065, time: 887.67
-INFO:local_logger:----- Epoch[056/800], Train Loss: 0.7069, time: 890.98
-INFO:local_logger:Now training epoch 57. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-56-Loss-0.7062517588045653.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-56-Loss-0.7062517588045653.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-56-Loss-0.7062517588045653.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-56-Loss-0.7062517588045653.pdopt
-INFO:local_logger:Now training epoch 57. LR=0.000150
-INFO:master_logger:Now training epoch 57. LR=0.000150
-INFO:local_logger:Epoch[057/800], Step[0000/0626], Avg Loss: 0.7140
-INFO:local_logger:Epoch[057/800], Step[0000/0626], Avg Loss: 0.6994
-INFO:local_logger:Epoch[057/800], Step[0000/0626], Avg Loss: 0.7131
-INFO:master_logger:Epoch[057/800], Step[0000/0626], Avg Loss: 0.7096
-INFO:local_logger:Epoch[057/800], Step[0000/0626], Avg Loss: 0.7112
-INFO:local_logger:Epoch[057/800], Step[0000/0626], Avg Loss: 0.7165
-INFO:local_logger:Epoch[057/800], Step[0000/0626], Avg Loss: 0.7022
-INFO:local_logger:Epoch[057/800], Step[0000/0626], Avg Loss: 0.7034
-INFO:local_logger:Epoch[057/800], Step[0000/0626], Avg Loss: 0.7171
-INFO:local_logger:Epoch[057/800], Step[0100/0626], Avg Loss: 0.7063
-INFO:local_logger:Epoch[057/800], Step[0100/0626], Avg Loss: 0.7057
-INFO:local_logger:Epoch[057/800], Step[0100/0626], Avg Loss: 0.7054
-INFO:local_logger:Epoch[057/800], Step[0100/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[057/800], Step[0100/0626], Avg Loss: 0.7068
-INFO:local_logger:Epoch[057/800], Step[0100/0626], Avg Loss: 0.7067
-INFO:local_logger:Epoch[057/800], Step[0100/0626], Avg Loss: 0.7053
-INFO:local_logger:Epoch[057/800], Step[0100/0626], Avg Loss: 0.7065
-INFO:master_logger:Epoch[057/800], Step[0100/0626], Avg Loss: 0.7061
-INFO:local_logger:Epoch[057/800], Step[0200/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[057/800], Step[0200/0626], Avg Loss: 0.7056
-INFO:local_logger:Epoch[057/800], Step[0200/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[057/800], Step[0200/0626], Avg Loss: 0.7067
-INFO:local_logger:Epoch[057/800], Step[0200/0626], Avg Loss: 0.7058
-INFO:local_logger:Epoch[057/800], Step[0200/0626], Avg Loss: 0.7057
-INFO:local_logger:Epoch[057/800], Step[0200/0626], Avg Loss: 0.7051
-INFO:master_logger:Epoch[057/800], Step[0200/0626], Avg Loss: 0.7059
-INFO:local_logger:Epoch[057/800], Step[0200/0626], Avg Loss: 0.7061
-INFO:local_logger:Epoch[057/800], Step[0300/0626], Avg Loss: 0.7056
-INFO:local_logger:Epoch[057/800], Step[0300/0626], Avg Loss: 0.7059
-INFO:local_logger:Epoch[057/800], Step[0300/0626], Avg Loss: 0.7052
-INFO:local_logger:Epoch[057/800], Step[0300/0626], Avg Loss: 0.7058
-INFO:local_logger:Epoch[057/800], Step[0300/0626], Avg Loss: 0.7054
-INFO:master_logger:Epoch[057/800], Step[0300/0626], Avg Loss: 0.7057
-INFO:local_logger:Epoch[057/800], Step[0300/0626], Avg Loss: 0.7054
-INFO:local_logger:Epoch[057/800], Step[0300/0626], Avg Loss: 0.7063
-INFO:local_logger:Epoch[057/800], Step[0300/0626], Avg Loss: 0.7061
-INFO:local_logger:Epoch[057/800], Step[0400/0626], Avg Loss: 0.7058
-INFO:local_logger:Epoch[057/800], Step[0400/0626], Avg Loss: 0.7060
-INFO:local_logger:Epoch[057/800], Step[0400/0626], Avg Loss: 0.7052
-INFO:local_logger:Epoch[057/800], Step[0400/0626], Avg Loss: 0.7059
-INFO:local_logger:Epoch[057/800], Step[0400/0626], Avg Loss: 0.7056
-INFO:local_logger:Epoch[057/800], Step[0400/0626], Avg Loss: 0.7063
-INFO:local_logger:Epoch[057/800], Step[0400/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[057/800], Step[0400/0626], Avg Loss: 0.7055
-INFO:master_logger:Epoch[057/800], Step[0400/0626], Avg Loss: 0.7058
-INFO:local_logger:Epoch[057/800], Step[0500/0626], Avg Loss: 0.7054
-INFO:local_logger:Epoch[057/800], Step[0500/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[057/800], Step[0500/0626], Avg Loss: 0.7050
-INFO:local_logger:Epoch[057/800], Step[0500/0626], Avg Loss: 0.7059
-INFO:local_logger:Epoch[057/800], Step[0500/0626], Avg Loss: 0.7055
-INFO:local_logger:Epoch[057/800], Step[0500/0626], Avg Loss: 0.7061
-INFO:local_logger:Epoch[057/800], Step[0500/0626], Avg Loss: 0.7057
-INFO:local_logger:Epoch[057/800], Step[0500/0626], Avg Loss: 0.7059
-INFO:master_logger:Epoch[057/800], Step[0500/0626], Avg Loss: 0.7057
-INFO:local_logger:Epoch[057/800], Step[0600/0626], Avg Loss: 0.7051
-INFO:local_logger:Epoch[057/800], Step[0600/0626], Avg Loss: 0.7058
-INFO:local_logger:Epoch[057/800], Step[0600/0626], Avg Loss: 0.7054
-INFO:local_logger:Epoch[057/800], Step[0600/0626], Avg Loss: 0.7054
-INFO:local_logger:Epoch[057/800], Step[0600/0626], Avg Loss: 0.7057
-INFO:local_logger:Epoch[057/800], Step[0600/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[057/800], Step[0600/0626], Avg Loss: 0.7062
-INFO:local_logger:Epoch[057/800], Step[0600/0626], Avg Loss: 0.7056
-INFO:master_logger:Epoch[057/800], Step[0600/0626], Avg Loss: 0.7057
-INFO:local_logger:----- Epoch[057/800], Train Loss: 0.7060, time: 853.51
-INFO:local_logger:Now training epoch 58. LR=0.000150
-INFO:local_logger:----- Epoch[057/800], Train Loss: 0.7053, time: 853.54
-INFO:local_logger:Now training epoch 58. LR=0.000150
-INFO:local_logger:----- Epoch[057/800], Train Loss: 0.7057, time: 854.16
-INFO:local_logger:Now training epoch 58. LR=0.000150
-INFO:local_logger:----- Epoch[057/800], Train Loss: 0.7050, time: 854.00
-INFO:local_logger:----- Epoch[057/800], Train Loss: 0.7061, time: 854.36
-INFO:local_logger:Now training epoch 58. LR=0.000150
-INFO:local_logger:Now training epoch 58. LR=0.000150
-INFO:local_logger:----- Epoch[057/800], Train Loss: 0.7056, time: 849.91
-INFO:local_logger:----- Epoch[057/800], Train Loss: 0.7054, time: 853.96
-INFO:master_logger:----- Epoch[057/800], Train Loss: 0.7056, time: 849.91
-INFO:local_logger:Now training epoch 58. LR=0.000150
-INFO:local_logger:----- Epoch[057/800], Train Loss: 0.7056, time: 853.96
-INFO:local_logger:Now training epoch 58. LR=0.000150
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-57-Loss-0.70561545900947.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-57-Loss-0.70561545900947.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-57-Loss-0.70561545900947.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-57-Loss-0.70561545900947.pdopt
-INFO:local_logger:Now training epoch 58. LR=0.000150
-INFO:master_logger:Now training epoch 58. LR=0.000150
-INFO:local_logger:Epoch[058/800], Step[0000/0626], Avg Loss: 0.7087
-INFO:local_logger:Epoch[058/800], Step[0000/0626], Avg Loss: 0.6950
-INFO:master_logger:Epoch[058/800], Step[0000/0626], Avg Loss: 0.7032
-INFO:local_logger:Epoch[058/800], Step[0000/0626], Avg Loss: 0.7035
-INFO:local_logger:Epoch[058/800], Step[0000/0626], Avg Loss: 0.7008
-INFO:local_logger:Epoch[058/800], Step[0000/0626], Avg Loss: 0.6990
-INFO:local_logger:Epoch[058/800], Step[0000/0626], Avg Loss: 0.7091
-INFO:local_logger:Epoch[058/800], Step[0000/0626], Avg Loss: 0.7010
-INFO:local_logger:Epoch[058/800], Step[0000/0626], Avg Loss: 0.7083
-INFO:local_logger:Epoch[058/800], Step[0100/0626], Avg Loss: 0.7057
-INFO:local_logger:Epoch[058/800], Step[0100/0626], Avg Loss: 0.7047
-INFO:local_logger:Epoch[058/800], Step[0100/0626], Avg Loss: 0.7033
-INFO:local_logger:Epoch[058/800], Step[0100/0626], Avg Loss: 0.7035
-INFO:local_logger:Epoch[058/800], Step[0100/0626], Avg Loss: 0.7044
-INFO:local_logger:Epoch[058/800], Step[0100/0626], Avg Loss: 0.7042
-INFO:local_logger:Epoch[058/800], Step[0100/0626], Avg Loss: 0.7048
-INFO:local_logger:Epoch[058/800], Step[0100/0626], Avg Loss: 0.7051
-INFO:master_logger:Epoch[058/800], Step[0100/0626], Avg Loss: 0.7045
-INFO:local_logger:Epoch[058/800], Step[0200/0626], Avg Loss: 0.7056
-INFO:local_logger:Epoch[058/800], Step[0200/0626], Avg Loss: 0.7048
-INFO:local_logger:Epoch[058/800], Step[0200/0626], Avg Loss: 0.7050
-INFO:local_logger:Epoch[058/800], Step[0200/0626], Avg Loss: 0.7052
-INFO:local_logger:Epoch[058/800], Step[0200/0626], Avg Loss: 0.7048
-INFO:master_logger:Epoch[058/800], Step[0200/0626], Avg Loss: 0.7048
-INFO:local_logger:Epoch[058/800], Step[0200/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0200/0626], Avg Loss: 0.7051
-INFO:local_logger:Epoch[058/800], Step[0200/0626], Avg Loss: 0.7032
-INFO:local_logger:Epoch[058/800], Step[0300/0626], Avg Loss: 0.7048
-INFO:local_logger:Epoch[058/800], Step[0300/0626], Avg Loss: 0.7052
-INFO:local_logger:Epoch[058/800], Step[0300/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0300/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0300/0626], Avg Loss: 0.7051
-INFO:local_logger:Epoch[058/800], Step[0300/0626], Avg Loss: 0.7046
-INFO:master_logger:Epoch[058/800], Step[0300/0626], Avg Loss: 0.7047
-INFO:local_logger:Epoch[058/800], Step[0300/0626], Avg Loss: 0.7049
-INFO:local_logger:Epoch[058/800], Step[0300/0626], Avg Loss: 0.7035
-INFO:local_logger:Epoch[058/800], Step[0400/0626], Avg Loss: 0.7047
-INFO:local_logger:Epoch[058/800], Step[0400/0626], Avg Loss: 0.7047
-INFO:local_logger:Epoch[058/800], Step[0400/0626], Avg Loss: 0.7048
-INFO:local_logger:Epoch[058/800], Step[0400/0626], Avg Loss: 0.7035
-INFO:local_logger:Epoch[058/800], Step[0400/0626], Avg Loss: 0.7048
-INFO:master_logger:Epoch[058/800], Step[0400/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0400/0626], Avg Loss: 0.7047
-INFO:local_logger:Epoch[058/800], Step[0400/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0400/0626], Avg Loss: 0.7047
-INFO:local_logger:Epoch[058/800], Step[0500/0626], Avg Loss: 0.7048
-INFO:local_logger:Epoch[058/800], Step[0500/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0500/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0500/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0500/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0500/0626], Avg Loss: 0.7043
-INFO:local_logger:Epoch[058/800], Step[0500/0626], Avg Loss: 0.7034
-INFO:master_logger:Epoch[058/800], Step[0500/0626], Avg Loss: 0.7044
-INFO:local_logger:Epoch[058/800], Step[0500/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0600/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0600/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[058/800], Step[0600/0626], Avg Loss: 0.7045
-INFO:local_logger:Epoch[058/800], Step[0600/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[058/800], Step[0600/0626], Avg Loss: 0.7045
-INFO:local_logger:Epoch[058/800], Step[0600/0626], Avg Loss: 0.7046
-INFO:master_logger:Epoch[058/800], Step[0600/0626], Avg Loss: 0.7044
-INFO:local_logger:Epoch[058/800], Step[0600/0626], Avg Loss: 0.7045
-INFO:local_logger:Epoch[058/800], Step[0600/0626], Avg Loss: 0.7042
-INFO:local_logger:----- Epoch[058/800], Train Loss: 0.7045, time: 883.86
-INFO:master_logger:----- Epoch[058/800], Train Loss: 0.7044, time: 883.86
-INFO:local_logger:----- Epoch[058/800], Train Loss: 0.7044, time: 888.09
-INFO:local_logger:Now training epoch 59. LR=0.000151
-INFO:local_logger:----- Epoch[058/800], Train Loss: 0.7043, time: 888.09
-INFO:local_logger:Now training epoch 59. LR=0.000151
-INFO:local_logger:----- Epoch[058/800], Train Loss: 0.7045, time: 887.67
-INFO:local_logger:Now training epoch 59. LR=0.000151
-INFO:local_logger:----- Epoch[058/800], Train Loss: 0.7036, time: 887.78
-INFO:local_logger:Now training epoch 59. LR=0.000151
-INFO:local_logger:----- Epoch[058/800], Train Loss: 0.7046, time: 888.37
-INFO:local_logger:Now training epoch 59. LR=0.000151
-INFO:local_logger:----- Epoch[058/800], Train Loss: 0.7045, time: 887.96
-INFO:local_logger:Now training epoch 59. LR=0.000151
-INFO:local_logger:----- Epoch[058/800], Train Loss: 0.7046, time: 887.96
-INFO:local_logger:Now training epoch 59. LR=0.000151
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-58-Loss-0.704508643255555.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-58-Loss-0.704508643255555.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-58-Loss-0.704508643255555.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-58-Loss-0.704508643255555.pdopt
-INFO:local_logger:Now training epoch 59. LR=0.000151
-INFO:master_logger:Now training epoch 59. LR=0.000151
-INFO:local_logger:Epoch[059/800], Step[0000/0626], Avg Loss: 0.6945
-INFO:master_logger:Epoch[059/800], Step[0000/0626], Avg Loss: 0.7041
-INFO:local_logger:Epoch[059/800], Step[0000/0626], Avg Loss: 0.7028
-INFO:local_logger:Epoch[059/800], Step[0000/0626], Avg Loss: 0.7075
-INFO:local_logger:Epoch[059/800], Step[0000/0626], Avg Loss: 0.7132
-INFO:local_logger:Epoch[059/800], Step[0000/0626], Avg Loss: 0.7134
-INFO:local_logger:Epoch[059/800], Step[0000/0626], Avg Loss: 0.7138
-INFO:local_logger:Epoch[059/800], Step[0000/0626], Avg Loss: 0.6925
-INFO:local_logger:Epoch[059/800], Step[0000/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[059/800], Step[0100/0626], Avg Loss: 0.7022
-INFO:local_logger:Epoch[059/800], Step[0100/0626], Avg Loss: 0.7047
-INFO:local_logger:Epoch[059/800], Step[0100/0626], Avg Loss: 0.7039
-INFO:local_logger:Epoch[059/800], Step[0100/0626], Avg Loss: 0.7043
-INFO:master_logger:Epoch[059/800], Step[0100/0626], Avg Loss: 0.7041
-INFO:local_logger:Epoch[059/800], Step[0100/0626], Avg Loss: 0.7048
-INFO:local_logger:Epoch[059/800], Step[0100/0626], Avg Loss: 0.7046
-INFO:local_logger:Epoch[059/800], Step[0100/0626], Avg Loss: 0.7044
-INFO:local_logger:Epoch[059/800], Step[0100/0626], Avg Loss: 0.7034
-INFO:local_logger:Epoch[059/800], Step[0200/0626], Avg Loss: 0.7040
-INFO:local_logger:Epoch[059/800], Step[0200/0626], Avg Loss: 0.7040
-INFO:local_logger:Epoch[059/800], Step[0200/0626], Avg Loss: 0.7041
-INFO:local_logger:Epoch[059/800], Step[0200/0626], Avg Loss: 0.7032
-INFO:local_logger:Epoch[059/800], Step[0200/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[059/800], Step[0200/0626], Avg Loss: 0.7039
-INFO:master_logger:Epoch[059/800], Step[0200/0626], Avg Loss: 0.7037
-INFO:local_logger:Epoch[059/800], Step[0200/0626], Avg Loss: 0.7021
-INFO:local_logger:Epoch[059/800], Step[0200/0626], Avg Loss: 0.7043
-INFO:local_logger:Epoch[059/800], Step[0300/0626], Avg Loss: 0.7041
-INFO:local_logger:Epoch[059/800], Step[0300/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[059/800], Step[0300/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[059/800], Step[0300/0626], Avg Loss: 0.7040
-INFO:master_logger:Epoch[059/800], Step[0300/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[059/800], Step[0300/0626], Avg Loss: 0.7033
-INFO:local_logger:Epoch[059/800], Step[0300/0626], Avg Loss: 0.7027
-INFO:local_logger:Epoch[059/800], Step[0300/0626], Avg Loss: 0.7040
-INFO:local_logger:Epoch[059/800], Step[0300/0626], Avg Loss: 0.7034
-INFO:local_logger:Epoch[059/800], Step[0400/0626], Avg Loss: 0.7037
-INFO:local_logger:Epoch[059/800], Step[0400/0626], Avg Loss: 0.7040
-INFO:local_logger:Epoch[059/800], Step[0400/0626], Avg Loss: 0.7029
-INFO:local_logger:Epoch[059/800], Step[0400/0626], Avg Loss: 0.7037
-INFO:local_logger:Epoch[059/800], Step[0400/0626], Avg Loss: 0.7035
-INFO:local_logger:Epoch[059/800], Step[0400/0626], Avg Loss: 0.7033
-INFO:local_logger:Epoch[059/800], Step[0400/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[059/800], Step[0400/0626], Avg Loss: 0.7039
-INFO:master_logger:Epoch[059/800], Step[0400/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[059/800], Step[0500/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[059/800], Step[0500/0626], Avg Loss: 0.7037
-INFO:local_logger:Epoch[059/800], Step[0500/0626], Avg Loss: 0.7034
-INFO:local_logger:Epoch[059/800], Step[0500/0626], Avg Loss: 0.7039
-INFO:local_logger:Epoch[059/800], Step[0500/0626], Avg Loss: 0.7031
-INFO:local_logger:Epoch[059/800], Step[0500/0626], Avg Loss: 0.7037
-INFO:master_logger:Epoch[059/800], Step[0500/0626], Avg Loss: 0.7035
-INFO:local_logger:Epoch[059/800], Step[0500/0626], Avg Loss: 0.7031
-INFO:local_logger:Epoch[059/800], Step[0500/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[059/800], Step[0600/0626], Avg Loss: 0.7035
-INFO:local_logger:Epoch[059/800], Step[0600/0626], Avg Loss: 0.7037
-INFO:local_logger:Epoch[059/800], Step[0600/0626], Avg Loss: 0.7035
-INFO:local_logger:Epoch[059/800], Step[0600/0626], Avg Loss: 0.7034
-INFO:master_logger:Epoch[059/800], Step[0600/0626], Avg Loss: 0.7034
-INFO:local_logger:Epoch[059/800], Step[0600/0626], Avg Loss: 0.7037
-INFO:local_logger:Epoch[059/800], Step[0600/0626], Avg Loss: 0.7034
-INFO:local_logger:Epoch[059/800], Step[0600/0626], Avg Loss: 0.7032
-INFO:local_logger:Epoch[059/800], Step[0600/0626], Avg Loss: 0.7030
-INFO:local_logger:----- Epoch[059/800], Train Loss: 0.7034, time: 855.63
-INFO:master_logger:----- Epoch[059/800], Train Loss: 0.7034, time: 855.63
-INFO:local_logger:----- Epoch[059/800], Train Loss: 0.7034, time: 859.41
-INFO:local_logger:Now training epoch 60. LR=0.000151
-INFO:local_logger:----- Epoch[059/800], Train Loss: 0.7035, time: 859.16
-INFO:local_logger:Now training epoch 60. LR=0.000151
-INFO:local_logger:----- Epoch[059/800], Train Loss: 0.7030, time: 859.83
-INFO:local_logger:Now training epoch 60. LR=0.000151
-INFO:local_logger:----- Epoch[059/800], Train Loss: 0.7033, time: 859.66
-INFO:local_logger:Now training epoch 60. LR=0.000151
-INFO:local_logger:----- Epoch[059/800], Train Loss: 0.7036, time: 859.94
-INFO:local_logger:Now training epoch 60. LR=0.000151
-INFO:local_logger:----- Epoch[059/800], Train Loss: 0.7037, time: 859.65
-INFO:local_logger:Now training epoch 60. LR=0.000151
-INFO:local_logger:----- Epoch[059/800], Train Loss: 0.7032, time: 859.97
-INFO:local_logger:Now training epoch 60. LR=0.000151
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-59-Loss-0.7033895635093321.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-59-Loss-0.7033895635093321.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-59-Loss-0.7033895635093321.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-59-Loss-0.7033895635093321.pdopt
-INFO:local_logger:Now training epoch 60. LR=0.000151
-INFO:master_logger:Now training epoch 60. LR=0.000151
-INFO:local_logger:Epoch[060/800], Step[0000/0626], Avg Loss: 0.7205
-INFO:local_logger:Epoch[060/800], Step[0000/0626], Avg Loss: 0.7122
-INFO:local_logger:Epoch[060/800], Step[0000/0626], Avg Loss: 0.7150
-INFO:master_logger:Epoch[060/800], Step[0000/0626], Avg Loss: 0.7093
-INFO:local_logger:Epoch[060/800], Step[0000/0626], Avg Loss: 0.7005
-INFO:local_logger:Epoch[060/800], Step[0000/0626], Avg Loss: 0.6991
-INFO:local_logger:Epoch[060/800], Step[0000/0626], Avg Loss: 0.7068
-INFO:local_logger:Epoch[060/800], Step[0000/0626], Avg Loss: 0.7212
-INFO:local_logger:Epoch[060/800], Step[0000/0626], Avg Loss: 0.6992
-INFO:local_logger:Epoch[060/800], Step[0100/0626], Avg Loss: 0.7019
-INFO:local_logger:Epoch[060/800], Step[0100/0626], Avg Loss: 0.7031
-INFO:local_logger:Epoch[060/800], Step[0100/0626], Avg Loss: 0.7024
-INFO:local_logger:Epoch[060/800], Step[0100/0626], Avg Loss: 0.7027
-INFO:local_logger:Epoch[060/800], Step[0100/0626], Avg Loss: 0.7024
-INFO:local_logger:Epoch[060/800], Step[0100/0626], Avg Loss: 0.7041
-INFO:master_logger:Epoch[060/800], Step[0100/0626], Avg Loss: 0.7029
-INFO:local_logger:Epoch[060/800], Step[0100/0626], Avg Loss: 0.7026
-INFO:local_logger:Epoch[060/800], Step[0100/0626], Avg Loss: 0.7037
-INFO:local_logger:Epoch[060/800], Step[0200/0626], Avg Loss: 0.7033
-INFO:local_logger:Epoch[060/800], Step[0200/0626], Avg Loss: 0.7022
-INFO:local_logger:Epoch[060/800], Step[0200/0626], Avg Loss: 0.7043
-INFO:local_logger:Epoch[060/800], Step[0200/0626], Avg Loss: 0.7025
-INFO:local_logger:Epoch[060/800], Step[0200/0626], Avg Loss: 0.7025
-INFO:local_logger:Epoch[060/800], Step[0200/0626], Avg Loss: 0.7030
-INFO:master_logger:Epoch[060/800], Step[0200/0626], Avg Loss: 0.7029
-INFO:local_logger:Epoch[060/800], Step[0200/0626], Avg Loss: 0.7025
-INFO:local_logger:Epoch[060/800], Step[0200/0626], Avg Loss: 0.7033
-INFO:local_logger:Epoch[060/800], Step[0300/0626], Avg Loss: 0.7030
-INFO:local_logger:Epoch[060/800], Step[0300/0626], Avg Loss: 0.7022
-INFO:local_logger:Epoch[060/800], Step[0300/0626], Avg Loss: 0.7026
-INFO:local_logger:Epoch[060/800], Step[0300/0626], Avg Loss: 0.7029
-INFO:local_logger:Epoch[060/800], Step[0300/0626], Avg Loss: 0.7034
-INFO:master_logger:Epoch[060/800], Step[0300/0626], Avg Loss: 0.7028
-INFO:local_logger:Epoch[060/800], Step[0300/0626], Avg Loss: 0.7027
-INFO:local_logger:Epoch[060/800], Step[0300/0626], Avg Loss: 0.7025
-INFO:local_logger:Epoch[060/800], Step[0300/0626], Avg Loss: 0.7030
-INFO:local_logger:Epoch[060/800], Step[0400/0626], Avg Loss: 0.7024
-INFO:local_logger:Epoch[060/800], Step[0400/0626], Avg Loss: 0.7024
-INFO:local_logger:Epoch[060/800], Step[0400/0626], Avg Loss: 0.7025
-INFO:local_logger:Epoch[060/800], Step[0400/0626], Avg Loss: 0.7026
-INFO:local_logger:Epoch[060/800], Step[0400/0626], Avg Loss: 0.7031
-INFO:local_logger:Epoch[060/800], Step[0400/0626], Avg Loss: 0.7029
-INFO:master_logger:Epoch[060/800], Step[0400/0626], Avg Loss: 0.7026
-INFO:local_logger:Epoch[060/800], Step[0400/0626], Avg Loss: 0.7028
-INFO:local_logger:Epoch[060/800], Step[0400/0626], Avg Loss: 0.7023
-INFO:local_logger:Epoch[060/800], Step[0500/0626], Avg Loss: 0.7026
-INFO:local_logger:Epoch[060/800], Step[0500/0626], Avg Loss: 0.7023
-INFO:local_logger:Epoch[060/800], Step[0500/0626], Avg Loss: 0.7029
-INFO:local_logger:Epoch[060/800], Step[0500/0626], Avg Loss: 0.7024
-INFO:local_logger:Epoch[060/800], Step[0500/0626], Avg Loss: 0.7029
-INFO:local_logger:Epoch[060/800], Step[0500/0626], Avg Loss: 0.7025
-INFO:local_logger:Epoch[060/800], Step[0500/0626], Avg Loss: 0.7028
-INFO:local_logger:Epoch[060/800], Step[0500/0626], Avg Loss: 0.7022
-INFO:master_logger:Epoch[060/800], Step[0500/0626], Avg Loss: 0.7026
-INFO:local_logger:Epoch[060/800], Step[0600/0626], Avg Loss: 0.7026
-INFO:local_logger:Epoch[060/800], Step[0600/0626], Avg Loss: 0.7029
-INFO:local_logger:Epoch[060/800], Step[0600/0626], Avg Loss: 0.7025
-INFO:master_logger:Epoch[060/800], Step[0600/0626], Avg Loss: 0.7025
-INFO:local_logger:Epoch[060/800], Step[0600/0626], Avg Loss: 0.7023
-INFO:local_logger:Epoch[060/800], Step[0600/0626], Avg Loss: 0.7021
-INFO:local_logger:Epoch[060/800], Step[0600/0626], Avg Loss: 0.7023
-INFO:local_logger:Epoch[060/800], Step[0600/0626], Avg Loss: 0.7024
-INFO:local_logger:Epoch[060/800], Step[0600/0626], Avg Loss: 0.7028
-INFO:local_logger:----- Epoch[060/800], Train Loss: 0.7020, time: 883.22
-INFO:local_logger:Now training epoch 61. LR=0.000151
-INFO:local_logger:----- Epoch[060/800], Train Loss: 0.7027, time: 884.34
-INFO:local_logger:Now training epoch 61. LR=0.000151
-INFO:local_logger:----- Epoch[060/800], Train Loss: 0.7025, time: 883.83
-INFO:local_logger:Now training epoch 61. LR=0.000151
-INFO:local_logger:----- Epoch[060/800], Train Loss: 0.7022, time: 883.86
-INFO:local_logger:Now training epoch 61. LR=0.000151
-INFO:local_logger:----- Epoch[060/800], Train Loss: 0.7030, time: 883.87
-INFO:local_logger:Now training epoch 61. LR=0.000151
-INFO:local_logger:----- Epoch[060/800], Train Loss: 0.7023, time: 883.92
-INFO:local_logger:Now training epoch 61. LR=0.000151
-INFO:local_logger:----- Epoch[060/800], Train Loss: 0.7024, time: 883.95
-INFO:local_logger:Now training epoch 61. LR=0.000151
-INFO:local_logger:----- Epoch[060/800], Train Loss: 0.7025, time: 880.74
-INFO:master_logger:----- Epoch[060/800], Train Loss: 0.7024, time: 880.74
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-60-Loss-0.7024735177213701.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-60-Loss-0.7024735177213701.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-60-Loss-0.7024735177213701.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-60-Loss-0.7024735177213701.pdopt
-INFO:local_logger:Now training epoch 61. LR=0.000151
-INFO:master_logger:Now training epoch 61. LR=0.000151
-INFO:local_logger:Epoch[061/800], Step[0000/0626], Avg Loss: 0.6994
-INFO:local_logger:Epoch[061/800], Step[0000/0626], Avg Loss: 0.6940
-INFO:master_logger:Epoch[061/800], Step[0000/0626], Avg Loss: 0.6945
-INFO:local_logger:Epoch[061/800], Step[0000/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[061/800], Step[0000/0626], Avg Loss: 0.6741
-INFO:local_logger:Epoch[061/800], Step[0000/0626], Avg Loss: 0.7025
-INFO:local_logger:Epoch[061/800], Step[0000/0626], Avg Loss: 0.7017
-INFO:local_logger:Epoch[061/800], Step[0000/0626], Avg Loss: 0.6991
-INFO:local_logger:Epoch[061/800], Step[0000/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[061/800], Step[0100/0626], Avg Loss: 0.7019
-INFO:local_logger:Epoch[061/800], Step[0100/0626], Avg Loss: 0.7020
-INFO:local_logger:Epoch[061/800], Step[0100/0626], Avg Loss: 0.7019
-INFO:local_logger:Epoch[061/800], Step[0100/0626], Avg Loss: 0.7011
-INFO:local_logger:Epoch[061/800], Step[0100/0626], Avg Loss: 0.7002
-INFO:local_logger:Epoch[061/800], Step[0100/0626], Avg Loss: 0.7016
-INFO:master_logger:Epoch[061/800], Step[0100/0626], Avg Loss: 0.7018
-INFO:local_logger:Epoch[061/800], Step[0100/0626], Avg Loss: 0.7024
-INFO:local_logger:Epoch[061/800], Step[0100/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[061/800], Step[0200/0626], Avg Loss: 0.7020
-INFO:local_logger:Epoch[061/800], Step[0200/0626], Avg Loss: 0.7028
-INFO:local_logger:Epoch[061/800], Step[0200/0626], Avg Loss: 0.7020
-INFO:local_logger:Epoch[061/800], Step[0200/0626], Avg Loss: 0.7015
-INFO:local_logger:Epoch[061/800], Step[0200/0626], Avg Loss: 0.7027
-INFO:local_logger:Epoch[061/800], Step[0200/0626], Avg Loss: 0.7022
-INFO:local_logger:Epoch[061/800], Step[0200/0626], Avg Loss: 0.7017
-INFO:local_logger:Epoch[061/800], Step[0200/0626], Avg Loss: 0.7011
-INFO:master_logger:Epoch[061/800], Step[0200/0626], Avg Loss: 0.7020
-INFO:local_logger:Epoch[061/800], Step[0300/0626], Avg Loss: 0.7016
-INFO:local_logger:Epoch[061/800], Step[0300/0626], Avg Loss: 0.7010
-INFO:local_logger:Epoch[061/800], Step[0300/0626], Avg Loss: 0.7024
-INFO:local_logger:Epoch[061/800], Step[0300/0626], Avg Loss: 0.7012
-INFO:local_logger:Epoch[061/800], Step[0300/0626], Avg Loss: 0.7020
-INFO:local_logger:Epoch[061/800], Step[0300/0626], Avg Loss: 0.7020
-INFO:master_logger:Epoch[061/800], Step[0300/0626], Avg Loss: 0.7018
-INFO:local_logger:Epoch[061/800], Step[0300/0626], Avg Loss: 0.7017
-INFO:local_logger:Epoch[061/800], Step[0300/0626], Avg Loss: 0.7026
-INFO:local_logger:Epoch[061/800], Step[0400/0626], Avg Loss: 0.7012
-INFO:local_logger:Epoch[061/800], Step[0400/0626], Avg Loss: 0.7012
-INFO:master_logger:Epoch[061/800], Step[0400/0626], Avg Loss: 0.7018
-INFO:local_logger:Epoch[061/800], Step[0400/0626], Avg Loss: 0.7019
-INFO:local_logger:Epoch[061/800], Step[0400/0626], Avg Loss: 0.7021
-INFO:local_logger:Epoch[061/800], Step[0400/0626], Avg Loss: 0.7020
-INFO:local_logger:Epoch[061/800], Step[0400/0626], Avg Loss: 0.7022
-INFO:local_logger:Epoch[061/800], Step[0400/0626], Avg Loss: 0.7016
-INFO:local_logger:Epoch[061/800], Step[0400/0626], Avg Loss: 0.7019
-INFO:local_logger:Epoch[061/800], Step[0500/0626], Avg Loss: 0.7012
-INFO:local_logger:Epoch[061/800], Step[0500/0626], Avg Loss: 0.7013
-INFO:local_logger:Epoch[061/800], Step[0500/0626], Avg Loss: 0.7017
-INFO:local_logger:Epoch[061/800], Step[0500/0626], Avg Loss: 0.7019
-INFO:local_logger:Epoch[061/800], Step[0500/0626], Avg Loss: 0.7022
-INFO:local_logger:Epoch[061/800], Step[0500/0626], Avg Loss: 0.7022
-INFO:master_logger:Epoch[061/800], Step[0500/0626], Avg Loss: 0.7017
-INFO:local_logger:Epoch[061/800], Step[0500/0626], Avg Loss: 0.7018
-INFO:local_logger:Epoch[061/800], Step[0500/0626], Avg Loss: 0.7015
-INFO:local_logger:Epoch[061/800], Step[0600/0626], Avg Loss: 0.7021
-INFO:local_logger:Epoch[061/800], Step[0600/0626], Avg Loss: 0.7013
-INFO:local_logger:Epoch[061/800], Step[0600/0626], Avg Loss: 0.7018
-INFO:local_logger:Epoch[061/800], Step[0600/0626], Avg Loss: 0.7013
-INFO:local_logger:Epoch[061/800], Step[0600/0626], Avg Loss: 0.7012
-INFO:local_logger:Epoch[061/800], Step[0600/0626], Avg Loss: 0.7015
-INFO:local_logger:Epoch[061/800], Step[0600/0626], Avg Loss: 0.7014
-INFO:master_logger:Epoch[061/800], Step[0600/0626], Avg Loss: 0.7016
-INFO:local_logger:Epoch[061/800], Step[0600/0626], Avg Loss: 0.7019
-INFO:local_logger:----- Epoch[061/800], Train Loss: 0.7015, time: 862.93
-INFO:local_logger:Now training epoch 62. LR=0.000151
-INFO:local_logger:----- Epoch[061/800], Train Loss: 0.7014, time: 863.80
-INFO:local_logger:Now training epoch 62. LR=0.000151
-INFO:local_logger:----- Epoch[061/800], Train Loss: 0.7013, time: 863.80
-INFO:local_logger:Now training epoch 62. LR=0.000151
-INFO:local_logger:----- Epoch[061/800], Train Loss: 0.7012, time: 860.38
-INFO:master_logger:----- Epoch[061/800], Train Loss: 0.7015, time: 860.38
-INFO:local_logger:----- Epoch[061/800], Train Loss: 0.7017, time: 864.16
-INFO:local_logger:Now training epoch 62. LR=0.000151
-INFO:local_logger:----- Epoch[061/800], Train Loss: 0.7012, time: 864.25
-INFO:local_logger:Now training epoch 62. LR=0.000151
-INFO:local_logger:----- Epoch[061/800], Train Loss: 0.7019, time: 865.44
-INFO:local_logger:Now training epoch 62. LR=0.000151
-INFO:local_logger:----- Epoch[061/800], Train Loss: 0.7021, time: 864.25
-INFO:local_logger:Now training epoch 62. LR=0.000151
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-61-Loss-0.7012385332086987.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-61-Loss-0.7012385332086987.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-61-Loss-0.7012385332086987.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-61-Loss-0.7012385332086987.pdopt
-INFO:local_logger:Now training epoch 62. LR=0.000151
-INFO:master_logger:Now training epoch 62. LR=0.000151
-INFO:local_logger:Epoch[062/800], Step[0000/0626], Avg Loss: 0.6976
-INFO:local_logger:Epoch[062/800], Step[0000/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[062/800], Step[0000/0626], Avg Loss: 0.7062
-INFO:master_logger:Epoch[062/800], Step[0000/0626], Avg Loss: 0.7003
-INFO:local_logger:Epoch[062/800], Step[0000/0626], Avg Loss: 0.6922
-INFO:local_logger:Epoch[062/800], Step[0000/0626], Avg Loss: 0.7018
-INFO:local_logger:Epoch[062/800], Step[0000/0626], Avg Loss: 0.7010
-INFO:local_logger:Epoch[062/800], Step[0000/0626], Avg Loss: 0.7098
-INFO:local_logger:Epoch[062/800], Step[0000/0626], Avg Loss: 0.7053
-INFO:local_logger:Epoch[062/800], Step[0100/0626], Avg Loss: 0.7008
-INFO:local_logger:Epoch[062/800], Step[0100/0626], Avg Loss: 0.7009
-INFO:local_logger:Epoch[062/800], Step[0100/0626], Avg Loss: 0.7013
-INFO:local_logger:Epoch[062/800], Step[0100/0626], Avg Loss: 0.7006
-INFO:local_logger:Epoch[062/800], Step[0100/0626], Avg Loss: 0.7017
-INFO:local_logger:Epoch[062/800], Step[0100/0626], Avg Loss: 0.6986
-INFO:master_logger:Epoch[062/800], Step[0100/0626], Avg Loss: 0.7006
-INFO:local_logger:Epoch[062/800], Step[0100/0626], Avg Loss: 0.7006
-INFO:local_logger:Epoch[062/800], Step[0100/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[062/800], Step[0200/0626], Avg Loss: 0.7008
-INFO:local_logger:Epoch[062/800], Step[0200/0626], Avg Loss: 0.7005
-INFO:local_logger:Epoch[062/800], Step[0200/0626], Avg Loss: 0.7011
-INFO:local_logger:Epoch[062/800], Step[0200/0626], Avg Loss: 0.6999
-INFO:local_logger:Epoch[062/800], Step[0200/0626], Avg Loss: 0.6996
-INFO:local_logger:Epoch[062/800], Step[0200/0626], Avg Loss: 0.7010
-INFO:local_logger:Epoch[062/800], Step[0200/0626], Avg Loss: 0.7009
-INFO:master_logger:Epoch[062/800], Step[0200/0626], Avg Loss: 0.7006
-INFO:local_logger:Epoch[062/800], Step[0200/0626], Avg Loss: 0.7012
-INFO:local_logger:Epoch[062/800], Step[0300/0626], Avg Loss: 0.6999
-INFO:local_logger:Epoch[062/800], Step[0300/0626], Avg Loss: 0.7012
-INFO:local_logger:Epoch[062/800], Step[0300/0626], Avg Loss: 0.7001
-INFO:local_logger:Epoch[062/800], Step[0300/0626], Avg Loss: 0.7006
-INFO:local_logger:Epoch[062/800], Step[0300/0626], Avg Loss: 0.7012
-INFO:local_logger:Epoch[062/800], Step[0300/0626], Avg Loss: 0.7011
-INFO:master_logger:Epoch[062/800], Step[0300/0626], Avg Loss: 0.7007
-INFO:local_logger:Epoch[062/800], Step[0300/0626], Avg Loss: 0.7008
-INFO:local_logger:Epoch[062/800], Step[0300/0626], Avg Loss: 0.7007
-INFO:local_logger:Epoch[062/800], Step[0400/0626], Avg Loss: 0.7004
-INFO:local_logger:Epoch[062/800], Step[0400/0626], Avg Loss: 0.7012
-INFO:local_logger:Epoch[062/800], Step[0400/0626], Avg Loss: 0.7005
-INFO:local_logger:Epoch[062/800], Step[0400/0626], Avg Loss: 0.7000
-INFO:master_logger:Epoch[062/800], Step[0400/0626], Avg Loss: 0.7006
-INFO:local_logger:Epoch[062/800], Step[0400/0626], Avg Loss: 0.7005
-INFO:local_logger:Epoch[062/800], Step[0400/0626], Avg Loss: 0.7008
-INFO:local_logger:Epoch[062/800], Step[0400/0626], Avg Loss: 0.7005
-INFO:local_logger:Epoch[062/800], Step[0400/0626], Avg Loss: 0.7011
-INFO:local_logger:Epoch[062/800], Step[0500/0626], Avg Loss: 0.7002
-INFO:local_logger:Epoch[062/800], Step[0500/0626], Avg Loss: 0.7014
-INFO:local_logger:Epoch[062/800], Step[0500/0626], Avg Loss: 0.7009
-INFO:local_logger:Epoch[062/800], Step[0500/0626], Avg Loss: 0.7009
-INFO:local_logger:Epoch[062/800], Step[0500/0626], Avg Loss: 0.7005
-INFO:local_logger:Epoch[062/800], Step[0500/0626], Avg Loss: 0.7006
-INFO:local_logger:Epoch[062/800], Step[0500/0626], Avg Loss: 0.7002
-INFO:master_logger:Epoch[062/800], Step[0500/0626], Avg Loss: 0.7006
-INFO:local_logger:Epoch[062/800], Step[0500/0626], Avg Loss: 0.7005
-INFO:local_logger:Epoch[062/800], Step[0600/0626], Avg Loss: 0.7014
-INFO:local_logger:Epoch[062/800], Step[0600/0626], Avg Loss: 0.7008
-INFO:local_logger:Epoch[062/800], Step[0600/0626], Avg Loss: 0.7001
-INFO:local_logger:Epoch[062/800], Step[0600/0626], Avg Loss: 0.7008
-INFO:local_logger:Epoch[062/800], Step[0600/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[062/800], Step[0600/0626], Avg Loss: 0.7006
-INFO:master_logger:Epoch[062/800], Step[0600/0626], Avg Loss: 0.7006
-INFO:local_logger:Epoch[062/800], Step[0600/0626], Avg Loss: 0.7002
-INFO:local_logger:Epoch[062/800], Step[0600/0626], Avg Loss: 0.7006
-INFO:local_logger:----- Epoch[062/800], Train Loss: 0.7001, time: 880.83
-INFO:local_logger:Now training epoch 63. LR=0.000151
-INFO:local_logger:----- Epoch[062/800], Train Loss: 0.7001, time: 877.21
-INFO:master_logger:----- Epoch[062/800], Train Loss: 0.7005, time: 877.21
-INFO:local_logger:----- Epoch[062/800], Train Loss: 0.7008, time: 880.96
-INFO:local_logger:Now training epoch 63. LR=0.000151
-INFO:local_logger:----- Epoch[062/800], Train Loss: 0.7008, time: 881.01
-INFO:local_logger:Now training epoch 63. LR=0.000151
-INFO:local_logger:----- Epoch[062/800], Train Loss: 0.7006, time: 881.03
-INFO:local_logger:Now training epoch 63. LR=0.000151
-INFO:local_logger:----- Epoch[062/800], Train Loss: 0.7005, time: 881.40
-INFO:local_logger:Now training epoch 63. LR=0.000151
-INFO:local_logger:----- Epoch[062/800], Train Loss: 0.7001, time: 881.51
-INFO:local_logger:Now training epoch 63. LR=0.000151
-INFO:local_logger:----- Epoch[062/800], Train Loss: 0.7013, time: 882.39
-INFO:local_logger:Now training epoch 63. LR=0.000151
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-62-Loss-0.7001281701651857.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-62-Loss-0.7001281701651857.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-62-Loss-0.7001281701651857.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-62-Loss-0.7001281701651857.pdopt
-INFO:local_logger:Now training epoch 63. LR=0.000151
-INFO:master_logger:Now training epoch 63. LR=0.000151
-INFO:local_logger:Epoch[063/800], Step[0000/0626], Avg Loss: 0.7051
-INFO:local_logger:Epoch[063/800], Step[0000/0626], Avg Loss: 0.7172
-INFO:local_logger:Epoch[063/800], Step[0000/0626], Avg Loss: 0.6949
-INFO:master_logger:Epoch[063/800], Step[0000/0626], Avg Loss: 0.6984
-INFO:local_logger:Epoch[063/800], Step[0000/0626], Avg Loss: 0.7023
-INFO:local_logger:Epoch[063/800], Step[0000/0626], Avg Loss: 0.7033
-INFO:local_logger:Epoch[063/800], Step[0000/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[063/800], Step[0000/0626], Avg Loss: 0.6916
-INFO:local_logger:Epoch[063/800], Step[0000/0626], Avg Loss: 0.6763
-INFO:local_logger:Epoch[063/800], Step[0100/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[063/800], Step[0100/0626], Avg Loss: 0.7004
-INFO:local_logger:Epoch[063/800], Step[0100/0626], Avg Loss: 0.7001
-INFO:master_logger:Epoch[063/800], Step[0100/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[063/800], Step[0100/0626], Avg Loss: 0.7002
-INFO:local_logger:Epoch[063/800], Step[0100/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[063/800], Step[0100/0626], Avg Loss: 0.6995
-INFO:local_logger:Epoch[063/800], Step[0100/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[063/800], Step[0100/0626], Avg Loss: 0.7001
-INFO:local_logger:Epoch[063/800], Step[0200/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[063/800], Step[0200/0626], Avg Loss: 0.6995
-INFO:local_logger:Epoch[063/800], Step[0200/0626], Avg Loss: 0.7004
-INFO:local_logger:Epoch[063/800], Step[0200/0626], Avg Loss: 0.6985
-INFO:local_logger:Epoch[063/800], Step[0200/0626], Avg Loss: 0.7006
-INFO:local_logger:Epoch[063/800], Step[0200/0626], Avg Loss: 0.6999
-INFO:master_logger:Epoch[063/800], Step[0200/0626], Avg Loss: 0.6998
-INFO:local_logger:Epoch[063/800], Step[0200/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[063/800], Step[0200/0626], Avg Loss: 0.7001
-INFO:local_logger:Epoch[063/800], Step[0300/0626], Avg Loss: 0.6999
-INFO:local_logger:Epoch[063/800], Step[0300/0626], Avg Loss: 0.7002
-INFO:local_logger:Epoch[063/800], Step[0300/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[063/800], Step[0300/0626], Avg Loss: 0.7005
-INFO:local_logger:Epoch[063/800], Step[0300/0626], Avg Loss: 0.7013
-INFO:master_logger:Epoch[063/800], Step[0300/0626], Avg Loss: 0.7001
-INFO:local_logger:Epoch[063/800], Step[0300/0626], Avg Loss: 0.7002
-INFO:local_logger:Epoch[063/800], Step[0300/0626], Avg Loss: 0.6989
-INFO:local_logger:Epoch[063/800], Step[0300/0626], Avg Loss: 0.6998
-INFO:local_logger:Epoch[063/800], Step[0400/0626], Avg Loss: 0.6996
-INFO:local_logger:Epoch[063/800], Step[0400/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[063/800], Step[0400/0626], Avg Loss: 0.6995
-INFO:local_logger:Epoch[063/800], Step[0400/0626], Avg Loss: 0.6991
-INFO:local_logger:Epoch[063/800], Step[0400/0626], Avg Loss: 0.7004
-INFO:local_logger:Epoch[063/800], Step[0400/0626], Avg Loss: 0.7011
-INFO:local_logger:Epoch[063/800], Step[0400/0626], Avg Loss: 0.7001
-INFO:master_logger:Epoch[063/800], Step[0400/0626], Avg Loss: 0.6999
-INFO:local_logger:Epoch[063/800], Step[0400/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[063/800], Step[0500/0626], Avg Loss: 0.6998
-INFO:local_logger:Epoch[063/800], Step[0500/0626], Avg Loss: 0.7001
-INFO:local_logger:Epoch[063/800], Step[0500/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[063/800], Step[0500/0626], Avg Loss: 0.6995
-INFO:local_logger:Epoch[063/800], Step[0500/0626], Avg Loss: 0.7001
-INFO:local_logger:Epoch[063/800], Step[0500/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[063/800], Step[0500/0626], Avg Loss: 0.7008
-INFO:master_logger:Epoch[063/800], Step[0500/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[063/800], Step[0500/0626], Avg Loss: 0.6994
-INFO:local_logger:Epoch[063/800], Step[0600/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[063/800], Step[0600/0626], Avg Loss: 0.6993
-INFO:local_logger:Epoch[063/800], Step[0600/0626], Avg Loss: 0.6993
-INFO:local_logger:Epoch[063/800], Step[0600/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[063/800], Step[0600/0626], Avg Loss: 0.6998
-INFO:local_logger:Epoch[063/800], Step[0600/0626], Avg Loss: 0.7005
-INFO:local_logger:Epoch[063/800], Step[0600/0626], Avg Loss: 0.6997
-INFO:master_logger:Epoch[063/800], Step[0600/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[063/800], Step[0600/0626], Avg Loss: 0.6998
-INFO:local_logger:----- Epoch[063/800], Train Loss: 0.6997, time: 870.66
-INFO:master_logger:----- Epoch[063/800], Train Loss: 0.6997, time: 870.66
-INFO:local_logger:----- Epoch[063/800], Train Loss: 0.6998, time: 874.64
-INFO:local_logger:Now training epoch 64. LR=0.000151
-INFO:local_logger:----- Epoch[063/800], Train Loss: 0.6993, time: 874.57
-INFO:local_logger:Now training epoch 64. LR=0.000151
-INFO:local_logger:----- Epoch[063/800], Train Loss: 0.6998, time: 874.56
-INFO:local_logger:Now training epoch 64. LR=0.000151
-INFO:local_logger:----- Epoch[063/800], Train Loss: 0.6997, time: 874.53
-INFO:local_logger:Now training epoch 64. LR=0.000151
-INFO:local_logger:----- Epoch[063/800], Train Loss: 0.6993, time: 874.53
-INFO:local_logger:Now training epoch 64. LR=0.000151
-INFO:local_logger:----- Epoch[063/800], Train Loss: 0.6998, time: 874.79
-INFO:local_logger:Now training epoch 64. LR=0.000151
-INFO:local_logger:----- Epoch[063/800], Train Loss: 0.7004, time: 874.66
-INFO:local_logger:Now training epoch 64. LR=0.000151
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-63-Loss-0.6997380468063858.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-63-Loss-0.6997380468063858.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-63-Loss-0.6997380468063858.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-63-Loss-0.6997380468063858.pdopt
-INFO:local_logger:Now training epoch 64. LR=0.000151
-INFO:master_logger:Now training epoch 64. LR=0.000151
-INFO:local_logger:Epoch[064/800], Step[0000/0626], Avg Loss: 0.7015
-INFO:local_logger:Epoch[064/800], Step[0000/0626], Avg Loss: 0.6893
-INFO:master_logger:Epoch[064/800], Step[0000/0626], Avg Loss: 0.7009
-INFO:local_logger:Epoch[064/800], Step[0000/0626], Avg Loss: 0.7021
-INFO:local_logger:Epoch[064/800], Step[0000/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[064/800], Step[0000/0626], Avg Loss: 0.7124
-INFO:local_logger:Epoch[064/800], Step[0000/0626], Avg Loss: 0.7052
-INFO:local_logger:Epoch[064/800], Step[0000/0626], Avg Loss: 0.7017
-INFO:local_logger:Epoch[064/800], Step[0000/0626], Avg Loss: 0.6999
-INFO:local_logger:Epoch[064/800], Step[0100/0626], Avg Loss: 0.7004
-INFO:local_logger:Epoch[064/800], Step[0100/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[064/800], Step[0100/0626], Avg Loss: 0.6991
-INFO:local_logger:Epoch[064/800], Step[0100/0626], Avg Loss: 0.6999
-INFO:local_logger:Epoch[064/800], Step[0100/0626], Avg Loss: 0.6988
-INFO:local_logger:Epoch[064/800], Step[0100/0626], Avg Loss: 0.6993
-INFO:master_logger:Epoch[064/800], Step[0100/0626], Avg Loss: 0.6995
-INFO:local_logger:Epoch[064/800], Step[0100/0626], Avg Loss: 0.6986
-INFO:local_logger:Epoch[064/800], Step[0100/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[064/800], Step[0200/0626], Avg Loss: 0.6995
-INFO:local_logger:Epoch[064/800], Step[0200/0626], Avg Loss: 0.6994
-INFO:local_logger:Epoch[064/800], Step[0200/0626], Avg Loss: 0.6987
-INFO:local_logger:Epoch[064/800], Step[0200/0626], Avg Loss: 0.7007
-INFO:local_logger:Epoch[064/800], Step[0200/0626], Avg Loss: 0.6995
-INFO:local_logger:Epoch[064/800], Step[0200/0626], Avg Loss: 0.6998
-INFO:master_logger:Epoch[064/800], Step[0200/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[064/800], Step[0200/0626], Avg Loss: 0.6999
-INFO:local_logger:Epoch[064/800], Step[0200/0626], Avg Loss: 0.7003
-INFO:local_logger:Epoch[064/800], Step[0300/0626], Avg Loss: 0.6992
-INFO:local_logger:Epoch[064/800], Step[0300/0626], Avg Loss: 0.6998
-INFO:local_logger:Epoch[064/800], Step[0300/0626], Avg Loss: 0.6999
-INFO:local_logger:Epoch[064/800], Step[0300/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[064/800], Step[0300/0626], Avg Loss: 0.6986
-INFO:master_logger:Epoch[064/800], Step[0300/0626], Avg Loss: 0.6994
-INFO:local_logger:Epoch[064/800], Step[0300/0626], Avg Loss: 0.6991
-INFO:local_logger:Epoch[064/800], Step[0300/0626], Avg Loss: 0.6993
-INFO:local_logger:Epoch[064/800], Step[0300/0626], Avg Loss: 0.6998
-INFO:local_logger:Epoch[064/800], Step[0400/0626], Avg Loss: 0.6996
-INFO:local_logger:Epoch[064/800], Step[0400/0626], Avg Loss: 0.6997
-INFO:local_logger:Epoch[064/800], Step[0400/0626], Avg Loss: 0.6995
-INFO:local_logger:Epoch[064/800], Step[0400/0626], Avg Loss: 0.6996
-INFO:local_logger:Epoch[064/800], Step[0400/0626], Avg Loss: 0.6992
-INFO:master_logger:Epoch[064/800], Step[0400/0626], Avg Loss: 0.6993
-INFO:local_logger:Epoch[064/800], Step[0400/0626], Avg Loss: 0.6992
-INFO:local_logger:Epoch[064/800], Step[0400/0626], Avg Loss: 0.6992
-INFO:local_logger:Epoch[064/800], Step[0400/0626], Avg Loss: 0.6984
-INFO:local_logger:Epoch[064/800], Step[0500/0626], Avg Loss: 0.6989
-INFO:local_logger:Epoch[064/800], Step[0500/0626], Avg Loss: 0.6983
-INFO:local_logger:Epoch[064/800], Step[0500/0626], Avg Loss: 0.6994
-INFO:local_logger:Epoch[064/800], Step[0500/0626], Avg Loss: 0.6992
-INFO:local_logger:Epoch[064/800], Step[0500/0626], Avg Loss: 0.6996
-INFO:local_logger:Epoch[064/800], Step[0500/0626], Avg Loss: 0.6991
-INFO:local_logger:Epoch[064/800], Step[0500/0626], Avg Loss: 0.6988
-INFO:master_logger:Epoch[064/800], Step[0500/0626], Avg Loss: 0.6991
-INFO:local_logger:Epoch[064/800], Step[0500/0626], Avg Loss: 0.6994
-INFO:local_logger:Epoch[064/800], Step[0600/0626], Avg Loss: 0.6991
-INFO:local_logger:Epoch[064/800], Step[0600/0626], Avg Loss: 0.6992
-INFO:local_logger:Epoch[064/800], Step[0600/0626], Avg Loss: 0.6987
-INFO:local_logger:Epoch[064/800], Step[0600/0626], Avg Loss: 0.6993
-INFO:local_logger:Epoch[064/800], Step[0600/0626], Avg Loss: 0.6989
-INFO:master_logger:Epoch[064/800], Step[0600/0626], Avg Loss: 0.6990
-INFO:local_logger:Epoch[064/800], Step[0600/0626], Avg Loss: 0.6988
-INFO:local_logger:Epoch[064/800], Step[0600/0626], Avg Loss: 0.6993
-INFO:local_logger:Epoch[064/800], Step[0600/0626], Avg Loss: 0.6984
-INFO:local_logger:----- Epoch[064/800], Train Loss: 0.6992, time: 883.12
-INFO:local_logger:Now training epoch 65. LR=0.000151
-INFO:local_logger:----- Epoch[064/800], Train Loss: 0.6983, time: 883.49
-INFO:local_logger:Now training epoch 65. LR=0.000151
-INFO:local_logger:----- Epoch[064/800], Train Loss: 0.6991, time: 883.58
-INFO:local_logger:Now training epoch 65. LR=0.000151
-INFO:local_logger:----- Epoch[064/800], Train Loss: 0.6987, time: 883.60
-INFO:local_logger:Now training epoch 65. LR=0.000151
-INFO:local_logger:----- Epoch[064/800], Train Loss: 0.6992, time: 880.22
-INFO:master_logger:----- Epoch[064/800], Train Loss: 0.6990, time: 880.22
-INFO:local_logger:----- Epoch[064/800], Train Loss: 0.6988, time: 883.73
-INFO:local_logger:Now training epoch 65. LR=0.000151
-INFO:local_logger:----- Epoch[064/800], Train Loss: 0.6989, time: 883.71
-INFO:local_logger:Now training epoch 65. LR=0.000151
-INFO:local_logger:----- Epoch[064/800], Train Loss: 0.6993, time: 883.76
-INFO:local_logger:Now training epoch 65. LR=0.000151
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-64-Loss-0.6992388257000982.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-64-Loss-0.6992388257000982.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-64-Loss-0.6992388257000982.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-64-Loss-0.6992388257000982.pdopt
-INFO:local_logger:Now training epoch 65. LR=0.000151
-INFO:master_logger:Now training epoch 65. LR=0.000151
-INFO:local_logger:Epoch[065/800], Step[0000/0626], Avg Loss: 0.7020
-INFO:local_logger:Epoch[065/800], Step[0000/0626], Avg Loss: 0.7021
-INFO:local_logger:Epoch[065/800], Step[0000/0626], Avg Loss: 0.7097
-INFO:master_logger:Epoch[065/800], Step[0000/0626], Avg Loss: 0.6995
-INFO:local_logger:Epoch[065/800], Step[0000/0626], Avg Loss: 0.7040
-INFO:local_logger:Epoch[065/800], Step[0000/0626], Avg Loss: 0.6936
-INFO:local_logger:Epoch[065/800], Step[0000/0626], Avg Loss: 0.6956
-INFO:local_logger:Epoch[065/800], Step[0000/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[065/800], Step[0000/0626], Avg Loss: 0.6983
-INFO:local_logger:Epoch[065/800], Step[0100/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[065/800], Step[0100/0626], Avg Loss: 0.6980
-INFO:local_logger:Epoch[065/800], Step[0100/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[065/800], Step[0100/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[065/800], Step[0100/0626], Avg Loss: 0.6990
-INFO:local_logger:Epoch[065/800], Step[0100/0626], Avg Loss: 0.6990
-INFO:local_logger:Epoch[065/800], Step[0100/0626], Avg Loss: 0.6972
-INFO:local_logger:Epoch[065/800], Step[0100/0626], Avg Loss: 0.6976
-INFO:master_logger:Epoch[065/800], Step[0100/0626], Avg Loss: 0.6980
-INFO:local_logger:Epoch[065/800], Step[0200/0626], Avg Loss: 0.6989
-INFO:local_logger:Epoch[065/800], Step[0200/0626], Avg Loss: 0.6979
-INFO:local_logger:Epoch[065/800], Step[0200/0626], Avg Loss: 0.6986
-INFO:local_logger:Epoch[065/800], Step[0200/0626], Avg Loss: 0.6989
-INFO:local_logger:Epoch[065/800], Step[0200/0626], Avg Loss: 0.6983
-INFO:master_logger:Epoch[065/800], Step[0200/0626], Avg Loss: 0.6984
-INFO:local_logger:Epoch[065/800], Step[0200/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[065/800], Step[0200/0626], Avg Loss: 0.6985
-INFO:local_logger:Epoch[065/800], Step[0200/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[065/800], Step[0300/0626], Avg Loss: 0.6983
-INFO:local_logger:Epoch[065/800], Step[0300/0626], Avg Loss: 0.6990
-INFO:local_logger:Epoch[065/800], Step[0300/0626], Avg Loss: 0.6980
-INFO:local_logger:Epoch[065/800], Step[0300/0626], Avg Loss: 0.6984
-INFO:local_logger:Epoch[065/800], Step[0300/0626], Avg Loss: 0.6986
-INFO:local_logger:Epoch[065/800], Step[0300/0626], Avg Loss: 0.6982
-INFO:master_logger:Epoch[065/800], Step[0300/0626], Avg Loss: 0.6984
-INFO:local_logger:Epoch[065/800], Step[0300/0626], Avg Loss: 0.6984
-INFO:local_logger:Epoch[065/800], Step[0300/0626], Avg Loss: 0.6979
-INFO:local_logger:Epoch[065/800], Step[0400/0626], Avg Loss: 0.6985
-INFO:local_logger:Epoch[065/800], Step[0400/0626], Avg Loss: 0.6980
-INFO:local_logger:Epoch[065/800], Step[0400/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[065/800], Step[0400/0626], Avg Loss: 0.6990
-INFO:master_logger:Epoch[065/800], Step[0400/0626], Avg Loss: 0.6982
-INFO:local_logger:Epoch[065/800], Step[0400/0626], Avg Loss: 0.6980
-INFO:local_logger:Epoch[065/800], Step[0400/0626], Avg Loss: 0.6978
-INFO:local_logger:Epoch[065/800], Step[0400/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[065/800], Step[0400/0626], Avg Loss: 0.6982
-INFO:local_logger:Epoch[065/800], Step[0500/0626], Avg Loss: 0.6984
-INFO:local_logger:Epoch[065/800], Step[0500/0626], Avg Loss: 0.6983
-INFO:local_logger:Epoch[065/800], Step[0500/0626], Avg Loss: 0.6990
-INFO:local_logger:Epoch[065/800], Step[0500/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[065/800], Step[0500/0626], Avg Loss: 0.6976
-INFO:local_logger:Epoch[065/800], Step[0500/0626], Avg Loss: 0.6980
-INFO:master_logger:Epoch[065/800], Step[0500/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[065/800], Step[0500/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[065/800], Step[0500/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[065/800], Step[0600/0626], Avg Loss: 0.6982
-INFO:local_logger:Epoch[065/800], Step[0600/0626], Avg Loss: 0.6982
-INFO:local_logger:Epoch[065/800], Step[0600/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[065/800], Step[0600/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[065/800], Step[0600/0626], Avg Loss: 0.6988
-INFO:master_logger:Epoch[065/800], Step[0600/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[065/800], Step[0600/0626], Avg Loss: 0.6978
-INFO:local_logger:Epoch[065/800], Step[0600/0626], Avg Loss: 0.6975
-INFO:local_logger:Epoch[065/800], Step[0600/0626], Avg Loss: 0.6981
-INFO:local_logger:----- Epoch[065/800], Train Loss: 0.6981, time: 871.05
-INFO:local_logger:Now training epoch 66. LR=0.000151
-INFO:local_logger:----- Epoch[065/800], Train Loss: 0.6988, time: 867.64
-INFO:master_logger:----- Epoch[065/800], Train Loss: 0.6980, time: 867.64
-INFO:local_logger:----- Epoch[065/800], Train Loss: 0.6977, time: 871.33
-INFO:local_logger:Now training epoch 66. LR=0.000151
-INFO:local_logger:----- Epoch[065/800], Train Loss: 0.6980, time: 872.01
-INFO:local_logger:Now training epoch 66. LR=0.000151
-INFO:local_logger:----- Epoch[065/800], Train Loss: 0.6975, time: 871.43
-INFO:local_logger:Now training epoch 66. LR=0.000151
-INFO:local_logger:----- Epoch[065/800], Train Loss: 0.6982, time: 871.92
-INFO:local_logger:Now training epoch 66. LR=0.000151
-INFO:local_logger:----- Epoch[065/800], Train Loss: 0.6981, time: 871.81
-INFO:local_logger:Now training epoch 66. LR=0.000151
-INFO:local_logger:----- Epoch[065/800], Train Loss: 0.6978, time: 871.98
-INFO:local_logger:Now training epoch 66. LR=0.000151
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-65-Loss-0.6988199688404415.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-65-Loss-0.6988199688404415.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-65-Loss-0.6988199688404415.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-65-Loss-0.6988199688404415.pdopt
-INFO:local_logger:Now training epoch 66. LR=0.000151
-INFO:master_logger:Now training epoch 66. LR=0.000151
-INFO:local_logger:Epoch[066/800], Step[0000/0626], Avg Loss: 0.6946
-INFO:local_logger:Epoch[066/800], Step[0000/0626], Avg Loss: 0.6997
-INFO:master_logger:Epoch[066/800], Step[0000/0626], Avg Loss: 0.6995
-INFO:local_logger:Epoch[066/800], Step[0000/0626], Avg Loss: 0.6986
-INFO:local_logger:Epoch[066/800], Step[0000/0626], Avg Loss: 0.7179
-INFO:local_logger:Epoch[066/800], Step[0000/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[066/800], Step[0000/0626], Avg Loss: 0.6927
-INFO:local_logger:Epoch[066/800], Step[0000/0626], Avg Loss: 0.7034
-INFO:local_logger:Epoch[066/800], Step[0000/0626], Avg Loss: 0.6970
-INFO:local_logger:Epoch[066/800], Step[0100/0626], Avg Loss: 0.6978
-INFO:local_logger:Epoch[066/800], Step[0100/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[066/800], Step[0100/0626], Avg Loss: 0.6987
-INFO:master_logger:Epoch[066/800], Step[0100/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[066/800], Step[0100/0626], Avg Loss: 0.6980
-INFO:local_logger:Epoch[066/800], Step[0100/0626], Avg Loss: 0.6973
-INFO:local_logger:Epoch[066/800], Step[0100/0626], Avg Loss: 0.6973
-INFO:local_logger:Epoch[066/800], Step[0100/0626], Avg Loss: 0.6979
-INFO:local_logger:Epoch[066/800], Step[0100/0626], Avg Loss: 0.6965
-INFO:local_logger:Epoch[066/800], Step[0200/0626], Avg Loss: 0.6971
-INFO:local_logger:Epoch[066/800], Step[0200/0626], Avg Loss: 0.6974
-INFO:local_logger:Epoch[066/800], Step[0200/0626], Avg Loss: 0.6972
-INFO:local_logger:Epoch[066/800], Step[0200/0626], Avg Loss: 0.6980
-INFO:local_logger:Epoch[066/800], Step[0200/0626], Avg Loss: 0.6976
-INFO:local_logger:Epoch[066/800], Step[0200/0626], Avg Loss: 0.6988
-INFO:local_logger:Epoch[066/800], Step[0200/0626], Avg Loss: 0.6973
-INFO:local_logger:Epoch[066/800], Step[0200/0626], Avg Loss: 0.6983
-INFO:master_logger:Epoch[066/800], Step[0200/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[066/800], Step[0300/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[066/800], Step[0300/0626], Avg Loss: 0.6978
-INFO:local_logger:Epoch[066/800], Step[0300/0626], Avg Loss: 0.6973
-INFO:local_logger:Epoch[066/800], Step[0300/0626], Avg Loss: 0.6976
-INFO:local_logger:Epoch[066/800], Step[0300/0626], Avg Loss: 0.6971
-INFO:local_logger:Epoch[066/800], Step[0300/0626], Avg Loss: 0.6975
-INFO:local_logger:Epoch[066/800], Step[0300/0626], Avg Loss: 0.6984
-INFO:master_logger:Epoch[066/800], Step[0300/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[066/800], Step[0300/0626], Avg Loss: 0.6984
-INFO:local_logger:Epoch[066/800], Step[0400/0626], Avg Loss: 0.6972
-INFO:local_logger:Epoch[066/800], Step[0400/0626], Avg Loss: 0.6973
-INFO:local_logger:Epoch[066/800], Step[0400/0626], Avg Loss: 0.6975
-INFO:local_logger:Epoch[066/800], Step[0400/0626], Avg Loss: 0.6974
-INFO:local_logger:Epoch[066/800], Step[0400/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[066/800], Step[0400/0626], Avg Loss: 0.6970
-INFO:master_logger:Epoch[066/800], Step[0400/0626], Avg Loss: 0.6976
-INFO:local_logger:Epoch[066/800], Step[0400/0626], Avg Loss: 0.6980
-INFO:local_logger:Epoch[066/800], Step[0400/0626], Avg Loss: 0.6979
-INFO:local_logger:Epoch[066/800], Step[0500/0626], Avg Loss: 0.6972
-INFO:local_logger:Epoch[066/800], Step[0500/0626], Avg Loss: 0.6978
-INFO:local_logger:Epoch[066/800], Step[0500/0626], Avg Loss: 0.6972
-INFO:local_logger:Epoch[066/800], Step[0500/0626], Avg Loss: 0.6971
-INFO:local_logger:Epoch[066/800], Step[0500/0626], Avg Loss: 0.6970
-INFO:master_logger:Epoch[066/800], Step[0500/0626], Avg Loss: 0.6974
-INFO:local_logger:Epoch[066/800], Step[0500/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[066/800], Step[0500/0626], Avg Loss: 0.6975
-INFO:local_logger:Epoch[066/800], Step[0500/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[066/800], Step[0600/0626], Avg Loss: 0.6971
-INFO:local_logger:Epoch[066/800], Step[0600/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[066/800], Step[0600/0626], Avg Loss: 0.6976
-INFO:local_logger:Epoch[066/800], Step[0600/0626], Avg Loss: 0.6970
-INFO:local_logger:Epoch[066/800], Step[0600/0626], Avg Loss: 0.6976
-INFO:local_logger:Epoch[066/800], Step[0600/0626], Avg Loss: 0.6972
-INFO:local_logger:Epoch[066/800], Step[0600/0626], Avg Loss: 0.6970
-INFO:local_logger:Epoch[066/800], Step[0600/0626], Avg Loss: 0.6975
-INFO:master_logger:Epoch[066/800], Step[0600/0626], Avg Loss: 0.6973
-INFO:local_logger:----- Epoch[066/800], Train Loss: 0.6970, time: 874.18
-INFO:master_logger:----- Epoch[066/800], Train Loss: 0.6973, time: 874.18
-INFO:local_logger:----- Epoch[066/800], Train Loss: 0.6976, time: 878.36
-INFO:local_logger:Now training epoch 67. LR=0.000151
-INFO:local_logger:----- Epoch[066/800], Train Loss: 0.6969, time: 877.55
-INFO:local_logger:Now training epoch 67. LR=0.000151
-INFO:local_logger:----- Epoch[066/800], Train Loss: 0.6971, time: 878.04
-INFO:local_logger:Now training epoch 67. LR=0.000151
-INFO:local_logger:----- Epoch[066/800], Train Loss: 0.6977, time: 877.55
-INFO:local_logger:Now training epoch 67. LR=0.000151
-INFO:local_logger:----- Epoch[066/800], Train Loss: 0.6970, time: 878.01
-INFO:local_logger:Now training epoch 67. LR=0.000151
-INFO:local_logger:----- Epoch[066/800], Train Loss: 0.6975, time: 878.03
-INFO:local_logger:Now training epoch 67. LR=0.000151
-INFO:local_logger:----- Epoch[066/800], Train Loss: 0.6975, time: 878.01
-INFO:local_logger:Now training epoch 67. LR=0.000151
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-66-Loss-0.6970274622691075.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-66-Loss-0.6970274622691075.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-66-Loss-0.6970274622691075.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-66-Loss-0.6970274622691075.pdopt
-INFO:local_logger:Now training epoch 67. LR=0.000151
-INFO:master_logger:Now training epoch 67. LR=0.000151
-INFO:local_logger:Epoch[067/800], Step[0000/0626], Avg Loss: 0.6978
-INFO:master_logger:Epoch[067/800], Step[0000/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[067/800], Step[0000/0626], Avg Loss: 0.6925
-INFO:local_logger:Epoch[067/800], Step[0000/0626], Avg Loss: 0.6998
-INFO:local_logger:Epoch[067/800], Step[0000/0626], Avg Loss: 0.7094
-INFO:local_logger:Epoch[067/800], Step[0000/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[067/800], Step[0000/0626], Avg Loss: 0.6943
-INFO:local_logger:Epoch[067/800], Step[0000/0626], Avg Loss: 0.6945
-INFO:local_logger:Epoch[067/800], Step[0000/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[067/800], Step[0100/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[067/800], Step[0100/0626], Avg Loss: 0.6981
-INFO:local_logger:Epoch[067/800], Step[0100/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[067/800], Step[0100/0626], Avg Loss: 0.6969
-INFO:local_logger:Epoch[067/800], Step[0100/0626], Avg Loss: 0.6956
-INFO:local_logger:Epoch[067/800], Step[0100/0626], Avg Loss: 0.6968
-INFO:local_logger:Epoch[067/800], Step[0100/0626], Avg Loss: 0.6962
-INFO:master_logger:Epoch[067/800], Step[0100/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[067/800], Step[0100/0626], Avg Loss: 0.6968
-INFO:local_logger:Epoch[067/800], Step[0200/0626], Avg Loss: 0.6973
-INFO:local_logger:Epoch[067/800], Step[0200/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[067/800], Step[0200/0626], Avg Loss: 0.6968
-INFO:local_logger:Epoch[067/800], Step[0200/0626], Avg Loss: 0.6974
-INFO:local_logger:Epoch[067/800], Step[0200/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[067/800], Step[0200/0626], Avg Loss: 0.6971
-INFO:master_logger:Epoch[067/800], Step[0200/0626], Avg Loss: 0.6968
-INFO:local_logger:Epoch[067/800], Step[0200/0626], Avg Loss: 0.6974
-INFO:local_logger:Epoch[067/800], Step[0200/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[067/800], Step[0300/0626], Avg Loss: 0.6969
-INFO:local_logger:Epoch[067/800], Step[0300/0626], Avg Loss: 0.6967
-INFO:local_logger:Epoch[067/800], Step[0300/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[067/800], Step[0300/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[067/800], Step[0300/0626], Avg Loss: 0.6973
-INFO:local_logger:Epoch[067/800], Step[0300/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[067/800], Step[0300/0626], Avg Loss: 0.6972
-INFO:local_logger:Epoch[067/800], Step[0300/0626], Avg Loss: 0.6964
-INFO:master_logger:Epoch[067/800], Step[0300/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[067/800], Step[0400/0626], Avg Loss: 0.6960
-INFO:local_logger:Epoch[067/800], Step[0400/0626], Avg Loss: 0.6972
-INFO:local_logger:Epoch[067/800], Step[0400/0626], Avg Loss: 0.6964
-INFO:local_logger:Epoch[067/800], Step[0400/0626], Avg Loss: 0.6971
-INFO:local_logger:Epoch[067/800], Step[0400/0626], Avg Loss: 0.6967
-INFO:local_logger:Epoch[067/800], Step[0400/0626], Avg Loss: 0.6966
-INFO:master_logger:Epoch[067/800], Step[0400/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[067/800], Step[0400/0626], Avg Loss: 0.6968
-INFO:local_logger:Epoch[067/800], Step[0400/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[067/800], Step[0500/0626], Avg Loss: 0.6959
-INFO:local_logger:Epoch[067/800], Step[0500/0626], Avg Loss: 0.6968
-INFO:local_logger:Epoch[067/800], Step[0500/0626], Avg Loss: 0.6965
-INFO:local_logger:Epoch[067/800], Step[0500/0626], Avg Loss: 0.6960
-INFO:local_logger:Epoch[067/800], Step[0500/0626], Avg Loss: 0.6964
-INFO:local_logger:Epoch[067/800], Step[0500/0626], Avg Loss: 0.6968
-INFO:master_logger:Epoch[067/800], Step[0500/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[067/800], Step[0500/0626], Avg Loss: 0.6972
-INFO:local_logger:Epoch[067/800], Step[0500/0626], Avg Loss: 0.6971
-INFO:local_logger:Epoch[067/800], Step[0600/0626], Avg Loss: 0.6960
-INFO:local_logger:Epoch[067/800], Step[0600/0626], Avg Loss: 0.6965
-INFO:local_logger:Epoch[067/800], Step[0600/0626], Avg Loss: 0.6968
-INFO:local_logger:Epoch[067/800], Step[0600/0626], Avg Loss: 0.6970
-INFO:local_logger:Epoch[067/800], Step[0600/0626], Avg Loss: 0.6965
-INFO:local_logger:Epoch[067/800], Step[0600/0626], Avg Loss: 0.6965
-INFO:local_logger:Epoch[067/800], Step[0600/0626], Avg Loss: 0.6970
-INFO:master_logger:Epoch[067/800], Step[0600/0626], Avg Loss: 0.6965
-INFO:local_logger:Epoch[067/800], Step[0600/0626], Avg Loss: 0.6961
-INFO:local_logger:----- Epoch[067/800], Train Loss: 0.6970, time: 875.35
-INFO:local_logger:Now training epoch 68. LR=0.000151
-INFO:local_logger:----- Epoch[067/800], Train Loss: 0.6968, time: 872.12
-INFO:master_logger:----- Epoch[067/800], Train Loss: 0.6965, time: 872.12
-INFO:local_logger:----- Epoch[067/800], Train Loss: 0.6965, time: 876.01
-INFO:local_logger:Now training epoch 68. LR=0.000151
-INFO:local_logger:----- Epoch[067/800], Train Loss: 0.6965, time: 875.96
-INFO:local_logger:Now training epoch 68. LR=0.000151
-INFO:local_logger:----- Epoch[067/800], Train Loss: 0.6970, time: 875.59
-INFO:local_logger:Now training epoch 68. LR=0.000151
-INFO:local_logger:----- Epoch[067/800], Train Loss: 0.6961, time: 876.05
-INFO:local_logger:Now training epoch 68. LR=0.000151
-INFO:local_logger:----- Epoch[067/800], Train Loss: 0.6965, time: 875.92
-INFO:local_logger:Now training epoch 68. LR=0.000151
-INFO:local_logger:----- Epoch[067/800], Train Loss: 0.6960, time: 876.13
-INFO:local_logger:Now training epoch 68. LR=0.000151
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-67-Loss-0.696807305239932.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-67-Loss-0.696807305239932.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-67-Loss-0.696807305239932.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-67-Loss-0.696807305239932.pdopt
-INFO:local_logger:Now training epoch 68. LR=0.000151
-INFO:master_logger:Now training epoch 68. LR=0.000151
-INFO:local_logger:Epoch[068/800], Step[0000/0626], Avg Loss: 0.6982
-INFO:local_logger:Epoch[068/800], Step[0000/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[068/800], Step[0000/0626], Avg Loss: 0.6916
-INFO:local_logger:Epoch[068/800], Step[0000/0626], Avg Loss: 0.7004
-INFO:master_logger:Epoch[068/800], Step[0000/0626], Avg Loss: 0.6972
-INFO:local_logger:Epoch[068/800], Step[0000/0626], Avg Loss: 0.7004
-INFO:local_logger:Epoch[068/800], Step[0000/0626], Avg Loss: 0.7004
-INFO:local_logger:Epoch[068/800], Step[0000/0626], Avg Loss: 0.7036
-INFO:local_logger:Epoch[068/800], Step[0000/0626], Avg Loss: 0.6894
-INFO:local_logger:Epoch[068/800], Step[0100/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[068/800], Step[0100/0626], Avg Loss: 0.6969
-INFO:master_logger:Epoch[068/800], Step[0100/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[068/800], Step[0100/0626], Avg Loss: 0.6964
-INFO:local_logger:Epoch[068/800], Step[0100/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[068/800], Step[0100/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[068/800], Step[0100/0626], Avg Loss: 0.6970
-INFO:local_logger:Epoch[068/800], Step[0100/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[068/800], Step[0100/0626], Avg Loss: 0.6976
-INFO:local_logger:Epoch[068/800], Step[0200/0626], Avg Loss: 0.6962
-INFO:local_logger:Epoch[068/800], Step[0200/0626], Avg Loss: 0.6954
-INFO:local_logger:Epoch[068/800], Step[0200/0626], Avg Loss: 0.6952
-INFO:local_logger:Epoch[068/800], Step[0200/0626], Avg Loss: 0.6968
-INFO:local_logger:Epoch[068/800], Step[0200/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[068/800], Step[0200/0626], Avg Loss: 0.6969
-INFO:local_logger:Epoch[068/800], Step[0200/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[068/800], Step[0200/0626], Avg Loss: 0.6954
-INFO:master_logger:Epoch[068/800], Step[0200/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[068/800], Step[0300/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[068/800], Step[0300/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[068/800], Step[0300/0626], Avg Loss: 0.6964
-INFO:local_logger:Epoch[068/800], Step[0300/0626], Avg Loss: 0.6963
-INFO:local_logger:Epoch[068/800], Step[0300/0626], Avg Loss: 0.6960
-INFO:local_logger:Epoch[068/800], Step[0300/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[068/800], Step[0300/0626], Avg Loss: 0.6970
-INFO:local_logger:Epoch[068/800], Step[0300/0626], Avg Loss: 0.6958
-INFO:master_logger:Epoch[068/800], Step[0300/0626], Avg Loss: 0.6959
-INFO:local_logger:Epoch[068/800], Step[0400/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[068/800], Step[0400/0626], Avg Loss: 0.6963
-INFO:local_logger:Epoch[068/800], Step[0400/0626], Avg Loss: 0.6962
-INFO:local_logger:Epoch[068/800], Step[0400/0626], Avg Loss: 0.6956
-INFO:local_logger:Epoch[068/800], Step[0400/0626], Avg Loss: 0.6965
-INFO:local_logger:Epoch[068/800], Step[0400/0626], Avg Loss: 0.6959
-INFO:master_logger:Epoch[068/800], Step[0400/0626], Avg Loss: 0.6959
-INFO:local_logger:Epoch[068/800], Step[0400/0626], Avg Loss: 0.6948
-INFO:local_logger:Epoch[068/800], Step[0400/0626], Avg Loss: 0.6955
-INFO:local_logger:Epoch[068/800], Step[0500/0626], Avg Loss: 0.6962
-INFO:local_logger:Epoch[068/800], Step[0500/0626], Avg Loss: 0.6954
-INFO:local_logger:Epoch[068/800], Step[0500/0626], Avg Loss: 0.6946
-INFO:local_logger:Epoch[068/800], Step[0500/0626], Avg Loss: 0.6965
-INFO:local_logger:Epoch[068/800], Step[0500/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[068/800], Step[0500/0626], Avg Loss: 0.6959
-INFO:local_logger:Epoch[068/800], Step[0500/0626], Avg Loss: 0.6957
-INFO:master_logger:Epoch[068/800], Step[0500/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[068/800], Step[0500/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[068/800], Step[0600/0626], Avg Loss: 0.6946
-INFO:local_logger:Epoch[068/800], Step[0600/0626], Avg Loss: 0.6960
-INFO:local_logger:Epoch[068/800], Step[0600/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[068/800], Step[0600/0626], Avg Loss: 0.6957
-INFO:local_logger:Epoch[068/800], Step[0600/0626], Avg Loss: 0.6957
-INFO:local_logger:Epoch[068/800], Step[0600/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[068/800], Step[0600/0626], Avg Loss: 0.6962
-INFO:local_logger:Epoch[068/800], Step[0600/0626], Avg Loss: 0.6960
-INFO:master_logger:Epoch[068/800], Step[0600/0626], Avg Loss: 0.6957
-INFO:local_logger:----- Epoch[068/800], Train Loss: 0.6946, time: 870.06
-INFO:local_logger:Now training epoch 69. LR=0.000151
-INFO:local_logger:----- Epoch[068/800], Train Loss: 0.6959, time: 870.67
-INFO:local_logger:----- Epoch[068/800], Train Loss: 0.6958, time: 871.34
-INFO:local_logger:Now training epoch 69. LR=0.000151
-INFO:local_logger:Now training epoch 69. LR=0.000151
-INFO:local_logger:----- Epoch[068/800], Train Loss: 0.6961, time: 870.65
-INFO:local_logger:Now training epoch 69. LR=0.000151
-INFO:local_logger:----- Epoch[068/800], Train Loss: 0.6958, time: 870.67
-INFO:local_logger:Now training epoch 69. LR=0.000151
-INFO:local_logger:----- Epoch[068/800], Train Loss: 0.6961, time: 870.74
-INFO:local_logger:Now training epoch 69. LR=0.000151
-INFO:local_logger:----- Epoch[068/800], Train Loss: 0.6960, time: 870.73
-INFO:local_logger:Now training epoch 69. LR=0.000151
-INFO:local_logger:----- Epoch[068/800], Train Loss: 0.6957, time: 867.42
-INFO:master_logger:----- Epoch[068/800], Train Loss: 0.6957, time: 867.42
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-68-Loss-0.6956601794348751.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-68-Loss-0.6956601794348751.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-68-Loss-0.6956601794348751.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-68-Loss-0.6956601794348751.pdopt
-INFO:local_logger:Now training epoch 69. LR=0.000151
-INFO:master_logger:Now training epoch 69. LR=0.000151
-INFO:local_logger:Epoch[069/800], Step[0000/0626], Avg Loss: 0.6915
-INFO:local_logger:Epoch[069/800], Step[0000/0626], Avg Loss: 0.6951
-INFO:local_logger:Epoch[069/800], Step[0000/0626], Avg Loss: 0.6895
-INFO:local_logger:Epoch[069/800], Step[0000/0626], Avg Loss: 0.6998
-INFO:local_logger:Epoch[069/800], Step[0000/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[069/800], Step[0000/0626], Avg Loss: 0.6932
-INFO:master_logger:Epoch[069/800], Step[0000/0626], Avg Loss: 0.6968
-INFO:local_logger:Epoch[069/800], Step[0000/0626], Avg Loss: 0.7177
-INFO:local_logger:Epoch[069/800], Step[0000/0626], Avg Loss: 0.6914
-INFO:local_logger:Epoch[069/800], Step[0100/0626], Avg Loss: 0.6957
-INFO:local_logger:Epoch[069/800], Step[0100/0626], Avg Loss: 0.6964
-INFO:local_logger:Epoch[069/800], Step[0100/0626], Avg Loss: 0.6955
-INFO:local_logger:Epoch[069/800], Step[0100/0626], Avg Loss: 0.6955
-INFO:local_logger:Epoch[069/800], Step[0100/0626], Avg Loss: 0.6945
-INFO:local_logger:Epoch[069/800], Step[0100/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[069/800], Step[0100/0626], Avg Loss: 0.6959
-INFO:master_logger:Epoch[069/800], Step[0100/0626], Avg Loss: 0.6956
-INFO:local_logger:Epoch[069/800], Step[0100/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[069/800], Step[0200/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[069/800], Step[0200/0626], Avg Loss: 0.6948
-INFO:local_logger:Epoch[069/800], Step[0200/0626], Avg Loss: 0.6954
-INFO:local_logger:Epoch[069/800], Step[0200/0626], Avg Loss: 0.6948
-INFO:local_logger:Epoch[069/800], Step[0200/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[069/800], Step[0200/0626], Avg Loss: 0.6960
-INFO:local_logger:Epoch[069/800], Step[0200/0626], Avg Loss: 0.6949
-INFO:master_logger:Epoch[069/800], Step[0200/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[069/800], Step[0200/0626], Avg Loss: 0.6945
-INFO:local_logger:Epoch[069/800], Step[0300/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[069/800], Step[0300/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[069/800], Step[0300/0626], Avg Loss: 0.6953
-INFO:local_logger:Epoch[069/800], Step[0300/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[069/800], Step[0300/0626], Avg Loss: 0.6944
-INFO:master_logger:Epoch[069/800], Step[0300/0626], Avg Loss: 0.6951
-INFO:local_logger:Epoch[069/800], Step[0300/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[069/800], Step[0300/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[069/800], Step[0300/0626], Avg Loss: 0.6946
-INFO:local_logger:Epoch[069/800], Step[0400/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[069/800], Step[0400/0626], Avg Loss: 0.6952
-INFO:local_logger:Epoch[069/800], Step[0400/0626], Avg Loss: 0.6960
-INFO:local_logger:Epoch[069/800], Step[0400/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[069/800], Step[0400/0626], Avg Loss: 0.6953
-INFO:master_logger:Epoch[069/800], Step[0400/0626], Avg Loss: 0.6952
-INFO:local_logger:Epoch[069/800], Step[0400/0626], Avg Loss: 0.6960
-INFO:local_logger:Epoch[069/800], Step[0400/0626], Avg Loss: 0.6948
-INFO:local_logger:Epoch[069/800], Step[0400/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[069/800], Step[0500/0626], Avg Loss: 0.6956
-INFO:local_logger:Epoch[069/800], Step[0500/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[069/800], Step[0500/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[069/800], Step[0500/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[069/800], Step[0500/0626], Avg Loss: 0.6954
-INFO:local_logger:Epoch[069/800], Step[0500/0626], Avg Loss: 0.6955
-INFO:local_logger:Epoch[069/800], Step[0500/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[069/800], Step[0500/0626], Avg Loss: 0.6950
-INFO:master_logger:Epoch[069/800], Step[0500/0626], Avg Loss: 0.6951
-INFO:local_logger:Epoch[069/800], Step[0600/0626], Avg Loss: 0.6946
-INFO:local_logger:Epoch[069/800], Step[0600/0626], Avg Loss: 0.6948
-INFO:local_logger:Epoch[069/800], Step[0600/0626], Avg Loss: 0.6948
-INFO:local_logger:Epoch[069/800], Step[0600/0626], Avg Loss: 0.6957
-INFO:local_logger:Epoch[069/800], Step[0600/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[069/800], Step[0600/0626], Avg Loss: 0.6947
-INFO:master_logger:Epoch[069/800], Step[0600/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[069/800], Step[0600/0626], Avg Loss: 0.6953
-INFO:local_logger:Epoch[069/800], Step[0600/0626], Avg Loss: 0.6953
-INFO:local_logger:----- Epoch[069/800], Train Loss: 0.6957, time: 877.32
-INFO:local_logger:Now training epoch 70. LR=0.000151
-INFO:local_logger:----- Epoch[069/800], Train Loss: 0.6947, time: 876.82
-INFO:local_logger:Now training epoch 70. LR=0.000151
-INFO:local_logger:----- Epoch[069/800], Train Loss: 0.6954, time: 877.40
-INFO:local_logger:Now training epoch 70. LR=0.000151
-INFO:local_logger:----- Epoch[069/800], Train Loss: 0.6947, time: 877.36
-INFO:local_logger:Now training epoch 70. LR=0.000151
-INFO:local_logger:----- Epoch[069/800], Train Loss: 0.6953, time: 877.48
-INFO:local_logger:Now training epoch 70. LR=0.000151
-INFO:local_logger:----- Epoch[069/800], Train Loss: 0.6948, time: 877.49
-INFO:local_logger:Now training epoch 70. LR=0.000151
-INFO:local_logger:----- Epoch[069/800], Train Loss: 0.6945, time: 877.44
-INFO:local_logger:Now training epoch 70. LR=0.000151
-INFO:local_logger:----- Epoch[069/800], Train Loss: 0.6950, time: 873.74
-INFO:master_logger:----- Epoch[069/800], Train Loss: 0.6950, time: 873.74
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-69-Loss-0.6949500013049544.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-69-Loss-0.6949500013049544.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-69-Loss-0.6949500013049544.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-69-Loss-0.6949500013049544.pdopt
-INFO:local_logger:Now training epoch 70. LR=0.000151
-INFO:master_logger:Now training epoch 70. LR=0.000151
-INFO:local_logger:Epoch[070/800], Step[0000/0626], Avg Loss: 0.6925
-INFO:local_logger:Epoch[070/800], Step[0000/0626], Avg Loss: 0.6983
-INFO:master_logger:Epoch[070/800], Step[0000/0626], Avg Loss: 0.6937
-INFO:local_logger:Epoch[070/800], Step[0000/0626], Avg Loss: 0.6927
-INFO:local_logger:Epoch[070/800], Step[0000/0626], Avg Loss: 0.7030
-INFO:local_logger:Epoch[070/800], Step[0000/0626], Avg Loss: 0.7050
-INFO:local_logger:Epoch[070/800], Step[0000/0626], Avg Loss: 0.6782
-INFO:local_logger:Epoch[070/800], Step[0000/0626], Avg Loss: 0.6987
-INFO:local_logger:Epoch[070/800], Step[0000/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[070/800], Step[0100/0626], Avg Loss: 0.6955
-INFO:local_logger:Epoch[070/800], Step[0100/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[070/800], Step[0100/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[070/800], Step[0100/0626], Avg Loss: 0.6946
-INFO:master_logger:Epoch[070/800], Step[0100/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[070/800], Step[0100/0626], Avg Loss: 0.6943
-INFO:local_logger:Epoch[070/800], Step[0100/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[070/800], Step[0100/0626], Avg Loss: 0.6940
-INFO:local_logger:Epoch[070/800], Step[0100/0626], Avg Loss: 0.6951
-INFO:local_logger:Epoch[070/800], Step[0200/0626], Avg Loss: 0.6956
-INFO:local_logger:Epoch[070/800], Step[0200/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[070/800], Step[0200/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[070/800], Step[0200/0626], Avg Loss: 0.6952
-INFO:local_logger:Epoch[070/800], Step[0200/0626], Avg Loss: 0.6948
-INFO:local_logger:Epoch[070/800], Step[0200/0626], Avg Loss: 0.6944
-INFO:master_logger:Epoch[070/800], Step[0200/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[070/800], Step[0200/0626], Avg Loss: 0.6951
-INFO:local_logger:Epoch[070/800], Step[0200/0626], Avg Loss: 0.6958
-INFO:local_logger:Epoch[070/800], Step[0300/0626], Avg Loss: 0.6945
-INFO:local_logger:Epoch[070/800], Step[0300/0626], Avg Loss: 0.6952
-INFO:local_logger:Epoch[070/800], Step[0300/0626], Avg Loss: 0.6946
-INFO:master_logger:Epoch[070/800], Step[0300/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[070/800], Step[0300/0626], Avg Loss: 0.6959
-INFO:local_logger:Epoch[070/800], Step[0300/0626], Avg Loss: 0.6953
-INFO:local_logger:Epoch[070/800], Step[0300/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[070/800], Step[0300/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[070/800], Step[0300/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[070/800], Step[0400/0626], Avg Loss: 0.6945
-INFO:local_logger:Epoch[070/800], Step[0400/0626], Avg Loss: 0.6946
-INFO:local_logger:Epoch[070/800], Step[0400/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[070/800], Step[0400/0626], Avg Loss: 0.6954
-INFO:local_logger:Epoch[070/800], Step[0400/0626], Avg Loss: 0.6943
-INFO:local_logger:Epoch[070/800], Step[0400/0626], Avg Loss: 0.6951
-INFO:local_logger:Epoch[070/800], Step[0400/0626], Avg Loss: 0.6942
-INFO:local_logger:Epoch[070/800], Step[0400/0626], Avg Loss: 0.6949
-INFO:master_logger:Epoch[070/800], Step[0400/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[070/800], Step[0500/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[070/800], Step[0500/0626], Avg Loss: 0.6942
-INFO:local_logger:Epoch[070/800], Step[0500/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[070/800], Step[0500/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[070/800], Step[0500/0626], Avg Loss: 0.6954
-INFO:local_logger:Epoch[070/800], Step[0500/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[070/800], Step[0500/0626], Avg Loss: 0.6941
-INFO:master_logger:Epoch[070/800], Step[0500/0626], Avg Loss: 0.6946
-INFO:local_logger:Epoch[070/800], Step[0500/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[070/800], Step[0600/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[070/800], Step[0600/0626], Avg Loss: 0.6948
-INFO:local_logger:Epoch[070/800], Step[0600/0626], Avg Loss: 0.6942
-INFO:local_logger:Epoch[070/800], Step[0600/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[070/800], Step[0600/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[070/800], Step[0600/0626], Avg Loss: 0.6941
-INFO:local_logger:Epoch[070/800], Step[0600/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[070/800], Step[0600/0626], Avg Loss: 0.6941
-INFO:master_logger:Epoch[070/800], Step[0600/0626], Avg Loss: 0.6945
-INFO:local_logger:----- Epoch[070/800], Train Loss: 0.6942, time: 864.91
-INFO:local_logger:Now training epoch 71. LR=0.000151
-INFO:local_logger:----- Epoch[070/800], Train Loss: 0.6942, time: 865.61
-INFO:local_logger:Now training epoch 71. LR=0.000151
-INFO:local_logger:----- Epoch[070/800], Train Loss: 0.6941, time: 865.52
-INFO:local_logger:Now training epoch 71. LR=0.000151
-INFO:local_logger:----- Epoch[070/800], Train Loss: 0.6949, time: 864.93
-INFO:local_logger:Now training epoch 71. LR=0.000151
-INFO:local_logger:----- Epoch[070/800], Train Loss: 0.6946, time: 861.12
-INFO:master_logger:----- Epoch[070/800], Train Loss: 0.6945, time: 861.12
-INFO:local_logger:----- Epoch[070/800], Train Loss: 0.6945, time: 864.92
-INFO:local_logger:Now training epoch 71. LR=0.000151
-INFO:local_logger:----- Epoch[070/800], Train Loss: 0.6947, time: 864.88
-INFO:local_logger:Now training epoch 71. LR=0.000151
-INFO:local_logger:----- Epoch[070/800], Train Loss: 0.6948, time: 864.89
-INFO:local_logger:Now training epoch 71. LR=0.000151
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-70-Loss-0.6946037372341894.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-70-Loss-0.6946037372341894.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-70-Loss-0.6946037372341894.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-70-Loss-0.6946037372341894.pdopt
-INFO:local_logger:Now training epoch 71. LR=0.000151
-INFO:master_logger:Now training epoch 71. LR=0.000151
-INFO:local_logger:Epoch[071/800], Step[0000/0626], Avg Loss: 0.6850
-INFO:master_logger:Epoch[071/800], Step[0000/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[071/800], Step[0000/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[071/800], Step[0000/0626], Avg Loss: 0.6757
-INFO:local_logger:Epoch[071/800], Step[0000/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[071/800], Step[0000/0626], Avg Loss: 0.7043
-INFO:local_logger:Epoch[071/800], Step[0000/0626], Avg Loss: 0.7002
-INFO:local_logger:Epoch[071/800], Step[0000/0626], Avg Loss: 0.6977
-INFO:local_logger:Epoch[071/800], Step[0000/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[071/800], Step[0100/0626], Avg Loss: 0.6936
-INFO:local_logger:Epoch[071/800], Step[0100/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[071/800], Step[0100/0626], Avg Loss: 0.6950
-INFO:local_logger:Epoch[071/800], Step[0100/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[071/800], Step[0100/0626], Avg Loss: 0.6932
-INFO:local_logger:Epoch[071/800], Step[0100/0626], Avg Loss: 0.6924
-INFO:local_logger:Epoch[071/800], Step[0100/0626], Avg Loss: 0.6937
-INFO:master_logger:Epoch[071/800], Step[0100/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[071/800], Step[0100/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[071/800], Step[0200/0626], Avg Loss: 0.6943
-INFO:local_logger:Epoch[071/800], Step[0200/0626], Avg Loss: 0.6941
-INFO:local_logger:Epoch[071/800], Step[0200/0626], Avg Loss: 0.6942
-INFO:local_logger:Epoch[071/800], Step[0200/0626], Avg Loss: 0.6935
-INFO:local_logger:Epoch[071/800], Step[0200/0626], Avg Loss: 0.6942
-INFO:local_logger:Epoch[071/800], Step[0200/0626], Avg Loss: 0.6931
-INFO:local_logger:Epoch[071/800], Step[0200/0626], Avg Loss: 0.6936
-INFO:master_logger:Epoch[071/800], Step[0200/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[071/800], Step[0200/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[071/800], Step[0300/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[071/800], Step[0300/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[071/800], Step[0300/0626], Avg Loss: 0.6945
-INFO:local_logger:Epoch[071/800], Step[0300/0626], Avg Loss: 0.6937
-INFO:local_logger:Epoch[071/800], Step[0300/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[071/800], Step[0300/0626], Avg Loss: 0.6944
-INFO:master_logger:Epoch[071/800], Step[0300/0626], Avg Loss: 0.6937
-INFO:local_logger:Epoch[071/800], Step[0300/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[071/800], Step[0300/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[071/800], Step[0400/0626], Avg Loss: 0.6946
-INFO:local_logger:Epoch[071/800], Step[0400/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[071/800], Step[0400/0626], Avg Loss: 0.6931
-INFO:local_logger:Epoch[071/800], Step[0400/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[071/800], Step[0400/0626], Avg Loss: 0.6940
-INFO:local_logger:Epoch[071/800], Step[0400/0626], Avg Loss: 0.6931
-INFO:local_logger:Epoch[071/800], Step[0400/0626], Avg Loss: 0.6940
-INFO:master_logger:Epoch[071/800], Step[0400/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[071/800], Step[0400/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[071/800], Step[0500/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[071/800], Step[0500/0626], Avg Loss: 0.6934
-INFO:local_logger:Epoch[071/800], Step[0500/0626], Avg Loss: 0.6945
-INFO:local_logger:Epoch[071/800], Step[0500/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[071/800], Step[0500/0626], Avg Loss: 0.6937
-INFO:local_logger:Epoch[071/800], Step[0500/0626], Avg Loss: 0.6941
-INFO:local_logger:Epoch[071/800], Step[0500/0626], Avg Loss: 0.6944
-INFO:master_logger:Epoch[071/800], Step[0500/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[071/800], Step[0500/0626], Avg Loss: 0.6932
-INFO:local_logger:Epoch[071/800], Step[0600/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[071/800], Step[0600/0626], Avg Loss: 0.6933
-INFO:local_logger:Epoch[071/800], Step[0600/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[071/800], Step[0600/0626], Avg Loss: 0.6941
-INFO:local_logger:Epoch[071/800], Step[0600/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[071/800], Step[0600/0626], Avg Loss: 0.6937
-INFO:master_logger:Epoch[071/800], Step[0600/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[071/800], Step[0600/0626], Avg Loss: 0.6934
-INFO:local_logger:Epoch[071/800], Step[0600/0626], Avg Loss: 0.6943
-INFO:local_logger:----- Epoch[071/800], Train Loss: 0.6943, time: 885.74
-INFO:local_logger:Now training epoch 72. LR=0.000152
-INFO:local_logger:----- Epoch[071/800], Train Loss: 0.6939, time: 886.70
-INFO:local_logger:Now training epoch 72. LR=0.000152
-INFO:local_logger:----- Epoch[071/800], Train Loss: 0.6940, time: 886.74
-INFO:local_logger:----- Epoch[071/800], Train Loss: 0.6935, time: 886.79
-INFO:local_logger:Now training epoch 72. LR=0.000152
-INFO:local_logger:Now training epoch 72. LR=0.000152
-INFO:local_logger:----- Epoch[071/800], Train Loss: 0.6938, time: 886.70
-INFO:local_logger:Now training epoch 72. LR=0.000152
-INFO:local_logger:----- Epoch[071/800], Train Loss: 0.6934, time: 886.76
-INFO:local_logger:Now training epoch 72. LR=0.000152
-INFO:local_logger:----- Epoch[071/800], Train Loss: 0.6937, time: 882.95
-INFO:master_logger:----- Epoch[071/800], Train Loss: 0.6939, time: 882.95
-INFO:local_logger:----- Epoch[071/800], Train Loss: 0.6945, time: 886.78
-INFO:local_logger:Now training epoch 72. LR=0.000152
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-71-Loss-0.6937192819392223.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-71-Loss-0.6937192819392223.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-71-Loss-0.6937192819392223.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-71-Loss-0.6937192819392223.pdopt
-INFO:local_logger:Now training epoch 72. LR=0.000152
-INFO:master_logger:Now training epoch 72. LR=0.000152
-INFO:local_logger:Epoch[072/800], Step[0000/0626], Avg Loss: 0.6943
-INFO:master_logger:Epoch[072/800], Step[0000/0626], Avg Loss: 0.6974
-INFO:local_logger:Epoch[072/800], Step[0000/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[072/800], Step[0000/0626], Avg Loss: 0.7013
-INFO:local_logger:Epoch[072/800], Step[0000/0626], Avg Loss: 0.6924
-INFO:local_logger:Epoch[072/800], Step[0000/0626], Avg Loss: 0.7045
-INFO:local_logger:Epoch[072/800], Step[0000/0626], Avg Loss: 0.6941
-INFO:local_logger:Epoch[072/800], Step[0000/0626], Avg Loss: 0.6954
-INFO:local_logger:Epoch[072/800], Step[0000/0626], Avg Loss: 0.7056
-INFO:local_logger:Epoch[072/800], Step[0100/0626], Avg Loss: 0.6924
-INFO:local_logger:Epoch[072/800], Step[0100/0626], Avg Loss: 0.6941
-INFO:local_logger:Epoch[072/800], Step[0100/0626], Avg Loss: 0.6941
-INFO:local_logger:Epoch[072/800], Step[0100/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[072/800], Step[0100/0626], Avg Loss: 0.6934
-INFO:local_logger:Epoch[072/800], Step[0100/0626], Avg Loss: 0.6944
-INFO:master_logger:Epoch[072/800], Step[0100/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[072/800], Step[0100/0626], Avg Loss: 0.6948
-INFO:local_logger:Epoch[072/800], Step[0100/0626], Avg Loss: 0.6936
-INFO:local_logger:Epoch[072/800], Step[0200/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[072/800], Step[0200/0626], Avg Loss: 0.6933
-INFO:local_logger:Epoch[072/800], Step[0200/0626], Avg Loss: 0.6936
-INFO:local_logger:Epoch[072/800], Step[0200/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[072/800], Step[0200/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[072/800], Step[0200/0626], Avg Loss: 0.6937
-INFO:master_logger:Epoch[072/800], Step[0200/0626], Avg Loss: 0.6935
-INFO:local_logger:Epoch[072/800], Step[0200/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[072/800], Step[0200/0626], Avg Loss: 0.6941
-INFO:local_logger:Epoch[072/800], Step[0300/0626], Avg Loss: 0.6932
-INFO:local_logger:Epoch[072/800], Step[0300/0626], Avg Loss: 0.6936
-INFO:local_logger:Epoch[072/800], Step[0300/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[072/800], Step[0300/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[072/800], Step[0300/0626], Avg Loss: 0.6935
-INFO:master_logger:Epoch[072/800], Step[0300/0626], Avg Loss: 0.6934
-INFO:local_logger:Epoch[072/800], Step[0300/0626], Avg Loss: 0.6942
-INFO:local_logger:Epoch[072/800], Step[0300/0626], Avg Loss: 0.6931
-INFO:local_logger:Epoch[072/800], Step[0300/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[072/800], Step[0400/0626], Avg Loss: 0.6931
-INFO:local_logger:Epoch[072/800], Step[0400/0626], Avg Loss: 0.6927
-INFO:local_logger:Epoch[072/800], Step[0400/0626], Avg Loss: 0.6934
-INFO:local_logger:Epoch[072/800], Step[0400/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[072/800], Step[0400/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[072/800], Step[0400/0626], Avg Loss: 0.6936
-INFO:local_logger:Epoch[072/800], Step[0400/0626], Avg Loss: 0.6930
-INFO:master_logger:Epoch[072/800], Step[0400/0626], Avg Loss: 0.6933
-INFO:local_logger:Epoch[072/800], Step[0400/0626], Avg Loss: 0.6926
-INFO:local_logger:Epoch[072/800], Step[0500/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[072/800], Step[0500/0626], Avg Loss: 0.6934
-INFO:local_logger:Epoch[072/800], Step[0500/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[072/800], Step[0500/0626], Avg Loss: 0.6937
-INFO:master_logger:Epoch[072/800], Step[0500/0626], Avg Loss: 0.6933
-INFO:local_logger:Epoch[072/800], Step[0500/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[072/800], Step[0500/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[072/800], Step[0500/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[072/800], Step[0500/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[072/800], Step[0600/0626], Avg Loss: 0.6936
-INFO:local_logger:Epoch[072/800], Step[0600/0626], Avg Loss: 0.6934
-INFO:local_logger:Epoch[072/800], Step[0600/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[072/800], Step[0600/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[072/800], Step[0600/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[072/800], Step[0600/0626], Avg Loss: 0.6937
-INFO:master_logger:Epoch[072/800], Step[0600/0626], Avg Loss: 0.6933
-INFO:local_logger:Epoch[072/800], Step[0600/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[072/800], Step[0600/0626], Avg Loss: 0.6937
-INFO:local_logger:----- Epoch[072/800], Train Loss: 0.6928, time: 870.67
-INFO:local_logger:Now training epoch 73. LR=0.000152
-INFO:local_logger:----- Epoch[072/800], Train Loss: 0.6930, time: 869.72
-INFO:local_logger:Now training epoch 73. LR=0.000152
-INFO:local_logger:----- Epoch[072/800], Train Loss: 0.6936, time: 869.72
-INFO:local_logger:Now training epoch 73. LR=0.000152
-INFO:local_logger:----- Epoch[072/800], Train Loss: 0.6930, time: 869.77
-INFO:local_logger:Now training epoch 73. LR=0.000152
-INFO:local_logger:----- Epoch[072/800], Train Loss: 0.6934, time: 870.14
-INFO:local_logger:Now training epoch 73. LR=0.000152
-INFO:local_logger:----- Epoch[072/800], Train Loss: 0.6936, time: 870.14
-INFO:local_logger:Now training epoch 73. LR=0.000152
-INFO:local_logger:----- Epoch[072/800], Train Loss: 0.6936, time: 866.43
-INFO:local_logger:----- Epoch[072/800], Train Loss: 0.6929, time: 870.14
-INFO:master_logger:----- Epoch[072/800], Train Loss: 0.6932, time: 866.43
-INFO:local_logger:Now training epoch 73. LR=0.000152
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-72-Loss-0.693554089917601.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-72-Loss-0.693554089917601.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-72-Loss-0.693554089917601.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-72-Loss-0.693554089917601.pdopt
-INFO:local_logger:Now training epoch 73. LR=0.000152
-INFO:master_logger:Now training epoch 73. LR=0.000152
-INFO:local_logger:Epoch[073/800], Step[0000/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[073/800], Step[0000/0626], Avg Loss: 0.6943
-INFO:master_logger:Epoch[073/800], Step[0000/0626], Avg Loss: 0.6940
-INFO:local_logger:Epoch[073/800], Step[0000/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[073/800], Step[0000/0626], Avg Loss: 0.6941
-INFO:local_logger:Epoch[073/800], Step[0000/0626], Avg Loss: 0.6980
-INFO:local_logger:Epoch[073/800], Step[0000/0626], Avg Loss: 0.6944
-INFO:local_logger:Epoch[073/800], Step[0000/0626], Avg Loss: 0.6816
-INFO:local_logger:Epoch[073/800], Step[0000/0626], Avg Loss: 0.7038
-INFO:local_logger:Epoch[073/800], Step[0100/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[073/800], Step[0100/0626], Avg Loss: 0.6915
-INFO:local_logger:Epoch[073/800], Step[0100/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[073/800], Step[0100/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[073/800], Step[0100/0626], Avg Loss: 0.6935
-INFO:local_logger:Epoch[073/800], Step[0100/0626], Avg Loss: 0.6940
-INFO:master_logger:Epoch[073/800], Step[0100/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[073/800], Step[0100/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[073/800], Step[0100/0626], Avg Loss: 0.6922
-INFO:local_logger:Epoch[073/800], Step[0200/0626], Avg Loss: 0.6924
-INFO:local_logger:Epoch[073/800], Step[0200/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[073/800], Step[0200/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[073/800], Step[0200/0626], Avg Loss: 0.6925
-INFO:master_logger:Epoch[073/800], Step[0200/0626], Avg Loss: 0.6927
-INFO:local_logger:Epoch[073/800], Step[0200/0626], Avg Loss: 0.6925
-INFO:local_logger:Epoch[073/800], Step[0200/0626], Avg Loss: 0.6916
-INFO:local_logger:Epoch[073/800], Step[0200/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[073/800], Step[0200/0626], Avg Loss: 0.6934
-INFO:local_logger:Epoch[073/800], Step[0300/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[073/800], Step[0300/0626], Avg Loss: 0.6932
-INFO:local_logger:Epoch[073/800], Step[0300/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[073/800], Step[0300/0626], Avg Loss: 0.6922
-INFO:local_logger:Epoch[073/800], Step[0300/0626], Avg Loss: 0.6937
-INFO:master_logger:Epoch[073/800], Step[0300/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[073/800], Step[0300/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[073/800], Step[0300/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[073/800], Step[0300/0626], Avg Loss: 0.6926
-INFO:local_logger:Epoch[073/800], Step[0400/0626], Avg Loss: 0.6923
-INFO:local_logger:Epoch[073/800], Step[0400/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[073/800], Step[0400/0626], Avg Loss: 0.6917
-INFO:local_logger:Epoch[073/800], Step[0400/0626], Avg Loss: 0.6933
-INFO:local_logger:Epoch[073/800], Step[0400/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[073/800], Step[0400/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[073/800], Step[0400/0626], Avg Loss: 0.6933
-INFO:master_logger:Epoch[073/800], Step[0400/0626], Avg Loss: 0.6927
-INFO:local_logger:Epoch[073/800], Step[0400/0626], Avg Loss: 0.6926
-INFO:local_logger:Epoch[073/800], Step[0500/0626], Avg Loss: 0.6920
-INFO:local_logger:Epoch[073/800], Step[0500/0626], Avg Loss: 0.6916
-INFO:local_logger:Epoch[073/800], Step[0500/0626], Avg Loss: 0.6923
-INFO:local_logger:Epoch[073/800], Step[0500/0626], Avg Loss: 0.6931
-INFO:local_logger:Epoch[073/800], Step[0500/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[073/800], Step[0500/0626], Avg Loss: 0.6927
-INFO:local_logger:Epoch[073/800], Step[0500/0626], Avg Loss: 0.6929
-INFO:master_logger:Epoch[073/800], Step[0500/0626], Avg Loss: 0.6926
-INFO:local_logger:Epoch[073/800], Step[0500/0626], Avg Loss: 0.6933
-INFO:local_logger:Epoch[073/800], Step[0600/0626], Avg Loss: 0.6921
-INFO:local_logger:Epoch[073/800], Step[0600/0626], Avg Loss: 0.6927
-INFO:local_logger:Epoch[073/800], Step[0600/0626], Avg Loss: 0.6927
-INFO:local_logger:Epoch[073/800], Step[0600/0626], Avg Loss: 0.6928
-INFO:master_logger:Epoch[073/800], Step[0600/0626], Avg Loss: 0.6925
-INFO:local_logger:Epoch[073/800], Step[0600/0626], Avg Loss: 0.6921
-INFO:local_logger:Epoch[073/800], Step[0600/0626], Avg Loss: 0.6918
-INFO:local_logger:Epoch[073/800], Step[0600/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[073/800], Step[0600/0626], Avg Loss: 0.6929
-INFO:local_logger:----- Epoch[073/800], Train Loss: 0.6926, time: 884.48
-INFO:local_logger:Now training epoch 74. LR=0.000152
-INFO:local_logger:----- Epoch[073/800], Train Loss: 0.6921, time: 884.48
-INFO:local_logger:Now training epoch 74. LR=0.000152
-INFO:local_logger:----- Epoch[073/800], Train Loss: 0.6928, time: 884.08
-INFO:local_logger:Now training epoch 74. LR=0.000152
-INFO:local_logger:----- Epoch[073/800], Train Loss: 0.6926, time: 880.49
-INFO:master_logger:----- Epoch[073/800], Train Loss: 0.6925, time: 880.49
-INFO:local_logger:----- Epoch[073/800], Train Loss: 0.6917, time: 884.29
-INFO:local_logger:Now training epoch 74. LR=0.000152
-INFO:local_logger:----- Epoch[073/800], Train Loss: 0.6920, time: 884.32
-INFO:local_logger:Now training epoch 74. LR=0.000152
-INFO:local_logger:----- Epoch[073/800], Train Loss: 0.6929, time: 884.79
-INFO:local_logger:Now training epoch 74. LR=0.000152
-INFO:local_logger:----- Epoch[073/800], Train Loss: 0.6930, time: 884.73
-INFO:local_logger:Now training epoch 74. LR=0.000152
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-73-Loss-0.6925669066549239.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-73-Loss-0.6925669066549239.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-73-Loss-0.6925669066549239.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-73-Loss-0.6925669066549239.pdopt
-INFO:local_logger:Now training epoch 74. LR=0.000152
-INFO:master_logger:Now training epoch 74. LR=0.000152
-INFO:local_logger:Epoch[074/800], Step[0000/0626], Avg Loss: 0.6904
-INFO:local_logger:Epoch[074/800], Step[0000/0626], Avg Loss: 0.6894
-INFO:master_logger:Epoch[074/800], Step[0000/0626], Avg Loss: 0.6933
-INFO:local_logger:Epoch[074/800], Step[0000/0626], Avg Loss: 0.7013
-INFO:local_logger:Epoch[074/800], Step[0000/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[074/800], Step[0000/0626], Avg Loss: 0.6899
-INFO:local_logger:Epoch[074/800], Step[0000/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[074/800], Step[0000/0626], Avg Loss: 0.6982
-INFO:local_logger:Epoch[074/800], Step[0000/0626], Avg Loss: 0.7004
-INFO:local_logger:Epoch[074/800], Step[0100/0626], Avg Loss: 0.6923
-INFO:local_logger:Epoch[074/800], Step[0100/0626], Avg Loss: 0.6921
-INFO:local_logger:Epoch[074/800], Step[0100/0626], Avg Loss: 0.6912
-INFO:local_logger:Epoch[074/800], Step[0100/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[074/800], Step[0100/0626], Avg Loss: 0.6927
-INFO:master_logger:Epoch[074/800], Step[0100/0626], Avg Loss: 0.6920
-INFO:local_logger:Epoch[074/800], Step[0100/0626], Avg Loss: 0.6914
-INFO:local_logger:Epoch[074/800], Step[0100/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[074/800], Step[0100/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[074/800], Step[0200/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[074/800], Step[0200/0626], Avg Loss: 0.6924
-INFO:local_logger:Epoch[074/800], Step[0200/0626], Avg Loss: 0.6925
-INFO:local_logger:Epoch[074/800], Step[0200/0626], Avg Loss: 0.6917
-INFO:local_logger:Epoch[074/800], Step[0200/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[074/800], Step[0200/0626], Avg Loss: 0.6916
-INFO:master_logger:Epoch[074/800], Step[0200/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[074/800], Step[0200/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[074/800], Step[0200/0626], Avg Loss: 0.6927
-INFO:local_logger:Epoch[074/800], Step[0300/0626], Avg Loss: 0.6916
-INFO:local_logger:Epoch[074/800], Step[0300/0626], Avg Loss: 0.6912
-INFO:local_logger:Epoch[074/800], Step[0300/0626], Avg Loss: 0.6920
-INFO:local_logger:Epoch[074/800], Step[0300/0626], Avg Loss: 0.6929
-INFO:master_logger:Epoch[074/800], Step[0300/0626], Avg Loss: 0.6920
-INFO:local_logger:Epoch[074/800], Step[0300/0626], Avg Loss: 0.6923
-INFO:local_logger:Epoch[074/800], Step[0300/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[074/800], Step[0300/0626], Avg Loss: 0.6924
-INFO:local_logger:Epoch[074/800], Step[0300/0626], Avg Loss: 0.6921
-INFO:local_logger:Epoch[074/800], Step[0400/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[074/800], Step[0400/0626], Avg Loss: 0.6925
-INFO:local_logger:Epoch[074/800], Step[0400/0626], Avg Loss: 0.6921
-INFO:local_logger:Epoch[074/800], Step[0400/0626], Avg Loss: 0.6918
-INFO:local_logger:Epoch[074/800], Step[0400/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[074/800], Step[0400/0626], Avg Loss: 0.6926
-INFO:local_logger:Epoch[074/800], Step[0400/0626], Avg Loss: 0.6920
-INFO:master_logger:Epoch[074/800], Step[0400/0626], Avg Loss: 0.6920
-INFO:local_logger:Epoch[074/800], Step[0400/0626], Avg Loss: 0.6918
-INFO:local_logger:Epoch[074/800], Step[0500/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[074/800], Step[0500/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[074/800], Step[0500/0626], Avg Loss: 0.6922
-INFO:local_logger:Epoch[074/800], Step[0500/0626], Avg Loss: 0.6923
-INFO:local_logger:Epoch[074/800], Step[0500/0626], Avg Loss: 0.6922
-INFO:local_logger:Epoch[074/800], Step[0500/0626], Avg Loss: 0.6918
-INFO:local_logger:Epoch[074/800], Step[0500/0626], Avg Loss: 0.6920
-INFO:master_logger:Epoch[074/800], Step[0500/0626], Avg Loss: 0.6920
-INFO:local_logger:Epoch[074/800], Step[0500/0626], Avg Loss: 0.6915
-INFO:local_logger:Epoch[074/800], Step[0600/0626], Avg Loss: 0.6923
-INFO:local_logger:Epoch[074/800], Step[0600/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[074/800], Step[0600/0626], Avg Loss: 0.6918
-INFO:local_logger:Epoch[074/800], Step[0600/0626], Avg Loss: 0.6917
-INFO:local_logger:Epoch[074/800], Step[0600/0626], Avg Loss: 0.6917
-INFO:local_logger:Epoch[074/800], Step[0600/0626], Avg Loss: 0.6921
-INFO:local_logger:Epoch[074/800], Step[0600/0626], Avg Loss: 0.6922
-INFO:master_logger:Epoch[074/800], Step[0600/0626], Avg Loss: 0.6920
-INFO:local_logger:Epoch[074/800], Step[0600/0626], Avg Loss: 0.6922
-INFO:local_logger:----- Epoch[074/800], Train Loss: 0.6922, time: 868.83
-INFO:local_logger:Now training epoch 75. LR=0.000152
-INFO:local_logger:----- Epoch[074/800], Train Loss: 0.6922, time: 869.13
-INFO:local_logger:Now training epoch 75. LR=0.000152
-INFO:local_logger:----- Epoch[074/800], Train Loss: 0.6916, time: 868.88
-INFO:local_logger:Now training epoch 75. LR=0.000152
-INFO:local_logger:----- Epoch[074/800], Train Loss: 0.6920, time: 868.87
-INFO:local_logger:Now training epoch 75. LR=0.000152
-INFO:local_logger:----- Epoch[074/800], Train Loss: 0.6917, time: 869.16
-INFO:local_logger:Now training epoch 75. LR=0.000152
-INFO:local_logger:----- Epoch[074/800], Train Loss: 0.6917, time: 868.97
-INFO:local_logger:Now training epoch 75. LR=0.000152
-INFO:local_logger:----- Epoch[074/800], Train Loss: 0.6918, time: 865.36
-INFO:master_logger:----- Epoch[074/800], Train Loss: 0.6919, time: 865.36
-INFO:local_logger:----- Epoch[074/800], Train Loss: 0.6922, time: 869.28
-INFO:local_logger:Now training epoch 75. LR=0.000152
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-74-Loss-0.6918195025700205.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-74-Loss-0.6918195025700205.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-74-Loss-0.6918195025700205.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-74-Loss-0.6918195025700205.pdopt
-INFO:local_logger:Now training epoch 75. LR=0.000152
-INFO:master_logger:Now training epoch 75. LR=0.000152
-INFO:local_logger:Epoch[075/800], Step[0000/0626], Avg Loss: 0.6855
-INFO:master_logger:Epoch[075/800], Step[0000/0626], Avg Loss: 0.6909
-INFO:local_logger:Epoch[075/800], Step[0000/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[075/800], Step[0000/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[075/800], Step[0000/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[075/800], Step[0000/0626], Avg Loss: 0.6973
-INFO:local_logger:Epoch[075/800], Step[0000/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[075/800], Step[0000/0626], Avg Loss: 0.6943
-INFO:local_logger:Epoch[075/800], Step[0000/0626], Avg Loss: 0.6994
-INFO:local_logger:Epoch[075/800], Step[0100/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[075/800], Step[0100/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[075/800], Step[0100/0626], Avg Loss: 0.6910
-INFO:local_logger:Epoch[075/800], Step[0100/0626], Avg Loss: 0.6919
-INFO:local_logger:Epoch[075/800], Step[0100/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[075/800], Step[0100/0626], Avg Loss: 0.6928
-INFO:master_logger:Epoch[075/800], Step[0100/0626], Avg Loss: 0.6914
-INFO:local_logger:Epoch[075/800], Step[0100/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[075/800], Step[0100/0626], Avg Loss: 0.6915
-INFO:local_logger:Epoch[075/800], Step[0200/0626], Avg Loss: 0.6921
-INFO:local_logger:Epoch[075/800], Step[0200/0626], Avg Loss: 0.6912
-INFO:local_logger:Epoch[075/800], Step[0200/0626], Avg Loss: 0.6916
-INFO:local_logger:Epoch[075/800], Step[0200/0626], Avg Loss: 0.6914
-INFO:local_logger:Epoch[075/800], Step[0200/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[075/800], Step[0200/0626], Avg Loss: 0.6921
-INFO:local_logger:Epoch[075/800], Step[0200/0626], Avg Loss: 0.6911
-INFO:master_logger:Epoch[075/800], Step[0200/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[075/800], Step[0200/0626], Avg Loss: 0.6904
-INFO:local_logger:Epoch[075/800], Step[0300/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[075/800], Step[0300/0626], Avg Loss: 0.6915
-INFO:local_logger:Epoch[075/800], Step[0300/0626], Avg Loss: 0.6911
-INFO:local_logger:Epoch[075/800], Step[0300/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[075/800], Step[0300/0626], Avg Loss: 0.6916
-INFO:local_logger:Epoch[075/800], Step[0300/0626], Avg Loss: 0.6913
-INFO:master_logger:Epoch[075/800], Step[0300/0626], Avg Loss: 0.6912
-INFO:local_logger:Epoch[075/800], Step[0300/0626], Avg Loss: 0.6916
-INFO:local_logger:Epoch[075/800], Step[0300/0626], Avg Loss: 0.6915
-INFO:local_logger:Epoch[075/800], Step[0400/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[075/800], Step[0400/0626], Avg Loss: 0.6910
-INFO:local_logger:Epoch[075/800], Step[0400/0626], Avg Loss: 0.6917
-INFO:local_logger:Epoch[075/800], Step[0400/0626], Avg Loss: 0.6910
-INFO:local_logger:Epoch[075/800], Step[0400/0626], Avg Loss: 0.6919
-INFO:master_logger:Epoch[075/800], Step[0400/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[075/800], Step[0400/0626], Avg Loss: 0.6915
-INFO:local_logger:Epoch[075/800], Step[0400/0626], Avg Loss: 0.6914
-INFO:local_logger:Epoch[075/800], Step[0400/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[075/800], Step[0500/0626], Avg Loss: 0.6918
-INFO:local_logger:Epoch[075/800], Step[0500/0626], Avg Loss: 0.6911
-INFO:local_logger:Epoch[075/800], Step[0500/0626], Avg Loss: 0.6917
-INFO:local_logger:Epoch[075/800], Step[0500/0626], Avg Loss: 0.6911
-INFO:local_logger:Epoch[075/800], Step[0500/0626], Avg Loss: 0.6911
-INFO:local_logger:Epoch[075/800], Step[0500/0626], Avg Loss: 0.6918
-INFO:local_logger:Epoch[075/800], Step[0500/0626], Avg Loss: 0.6913
-INFO:master_logger:Epoch[075/800], Step[0500/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[075/800], Step[0500/0626], Avg Loss: 0.6904
-INFO:local_logger:Epoch[075/800], Step[0600/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[075/800], Step[0600/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[075/800], Step[0600/0626], Avg Loss: 0.6916
-INFO:local_logger:Epoch[075/800], Step[0600/0626], Avg Loss: 0.6910
-INFO:master_logger:Epoch[075/800], Step[0600/0626], Avg Loss: 0.6912
-INFO:local_logger:Epoch[075/800], Step[0600/0626], Avg Loss: 0.6915
-INFO:local_logger:Epoch[075/800], Step[0600/0626], Avg Loss: 0.6918
-INFO:local_logger:Epoch[075/800], Step[0600/0626], Avg Loss: 0.6912
-INFO:local_logger:Epoch[075/800], Step[0600/0626], Avg Loss: 0.6913
-INFO:local_logger:----- Epoch[075/800], Train Loss: 0.6918, time: 889.31
-INFO:local_logger:Now training epoch 76. LR=0.000152
-INFO:local_logger:----- Epoch[075/800], Train Loss: 0.6917, time: 889.53
-INFO:local_logger:Now training epoch 76. LR=0.000152
-INFO:local_logger:----- Epoch[075/800], Train Loss: 0.6907, time: 889.63
-INFO:local_logger:Now training epoch 76. LR=0.000152
-INFO:local_logger:----- Epoch[075/800], Train Loss: 0.6910, time: 885.80
-INFO:master_logger:----- Epoch[075/800], Train Loss: 0.6913, time: 885.80
-INFO:local_logger:----- Epoch[075/800], Train Loss: 0.6913, time: 889.63
-INFO:local_logger:Now training epoch 76. LR=0.000152
-INFO:local_logger:----- Epoch[075/800], Train Loss: 0.6912, time: 890.10
-INFO:local_logger:Now training epoch 76. LR=0.000152
-INFO:local_logger:----- Epoch[075/800], Train Loss: 0.6916, time: 890.11
-INFO:local_logger:Now training epoch 76. LR=0.000152
-INFO:local_logger:----- Epoch[075/800], Train Loss: 0.6911, time: 890.09
-INFO:local_logger:Now training epoch 76. LR=0.000152
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-75-Loss-0.6910110246189355.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-75-Loss-0.6910110246189355.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-75-Loss-0.6910110246189355.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-75-Loss-0.6910110246189355.pdopt
-INFO:local_logger:Now training epoch 76. LR=0.000152
-INFO:master_logger:Now training epoch 76. LR=0.000152
-INFO:local_logger:Epoch[076/800], Step[0000/0626], Avg Loss: 0.6949
-INFO:local_logger:Epoch[076/800], Step[0000/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[076/800], Step[0000/0626], Avg Loss: 0.6967
-INFO:master_logger:Epoch[076/800], Step[0000/0626], Avg Loss: 0.6895
-INFO:local_logger:Epoch[076/800], Step[0000/0626], Avg Loss: 0.6921
-INFO:local_logger:Epoch[076/800], Step[0000/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[076/800], Step[0000/0626], Avg Loss: 0.6918
-INFO:local_logger:Epoch[076/800], Step[0000/0626], Avg Loss: 0.6833
-INFO:local_logger:Epoch[076/800], Step[0000/0626], Avg Loss: 0.6889
-INFO:local_logger:Epoch[076/800], Step[0100/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[076/800], Step[0100/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[076/800], Step[0100/0626], Avg Loss: 0.6890
-INFO:local_logger:Epoch[076/800], Step[0100/0626], Avg Loss: 0.6912
-INFO:local_logger:Epoch[076/800], Step[0100/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[076/800], Step[0100/0626], Avg Loss: 0.6912
-INFO:master_logger:Epoch[076/800], Step[0100/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[076/800], Step[0100/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[076/800], Step[0100/0626], Avg Loss: 0.6922
-INFO:local_logger:Epoch[076/800], Step[0200/0626], Avg Loss: 0.6914
-INFO:local_logger:Epoch[076/800], Step[0200/0626], Avg Loss: 0.6904
-INFO:local_logger:Epoch[076/800], Step[0200/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[076/800], Step[0200/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[076/800], Step[0200/0626], Avg Loss: 0.6911
-INFO:local_logger:Epoch[076/800], Step[0200/0626], Avg Loss: 0.6909
-INFO:local_logger:Epoch[076/800], Step[0200/0626], Avg Loss: 0.6909
-INFO:local_logger:Epoch[076/800], Step[0200/0626], Avg Loss: 0.6905
-INFO:master_logger:Epoch[076/800], Step[0200/0626], Avg Loss: 0.6909
-INFO:local_logger:Epoch[076/800], Step[0300/0626], Avg Loss: 0.6911
-INFO:local_logger:Epoch[076/800], Step[0300/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[076/800], Step[0300/0626], Avg Loss: 0.6903
-INFO:local_logger:Epoch[076/800], Step[0300/0626], Avg Loss: 0.6903
-INFO:local_logger:Epoch[076/800], Step[0300/0626], Avg Loss: 0.6906
-INFO:master_logger:Epoch[076/800], Step[0300/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[076/800], Step[0300/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[076/800], Step[0300/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[076/800], Step[0300/0626], Avg Loss: 0.6910
-INFO:local_logger:Epoch[076/800], Step[0400/0626], Avg Loss: 0.6901
-INFO:local_logger:Epoch[076/800], Step[0400/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[076/800], Step[0400/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[076/800], Step[0400/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[076/800], Step[0400/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[076/800], Step[0400/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[076/800], Step[0400/0626], Avg Loss: 0.6908
-INFO:master_logger:Epoch[076/800], Step[0400/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[076/800], Step[0400/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[076/800], Step[0500/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[076/800], Step[0500/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[076/800], Step[0500/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[076/800], Step[0500/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[076/800], Step[0500/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[076/800], Step[0500/0626], Avg Loss: 0.6911
-INFO:local_logger:Epoch[076/800], Step[0500/0626], Avg Loss: 0.6903
-INFO:master_logger:Epoch[076/800], Step[0500/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[076/800], Step[0500/0626], Avg Loss: 0.6910
-INFO:local_logger:Epoch[076/800], Step[0600/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[076/800], Step[0600/0626], Avg Loss: 0.6904
-INFO:local_logger:Epoch[076/800], Step[0600/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[076/800], Step[0600/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[076/800], Step[0600/0626], Avg Loss: 0.6902
-INFO:master_logger:Epoch[076/800], Step[0600/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[076/800], Step[0600/0626], Avg Loss: 0.6912
-INFO:local_logger:Epoch[076/800], Step[0600/0626], Avg Loss: 0.6910
-INFO:local_logger:Epoch[076/800], Step[0600/0626], Avg Loss: 0.6903
-INFO:local_logger:----- Epoch[076/800], Train Loss: 0.6911, time: 859.45
-INFO:local_logger:Now training epoch 77. LR=0.000152
-INFO:local_logger:----- Epoch[076/800], Train Loss: 0.6906, time: 855.76
-INFO:master_logger:----- Epoch[076/800], Train Loss: 0.6906, time: 855.76
-INFO:local_logger:----- Epoch[076/800], Train Loss: 0.6910, time: 859.92
-INFO:local_logger:Now training epoch 77. LR=0.000152
-INFO:local_logger:----- Epoch[076/800], Train Loss: 0.6907, time: 859.58
-INFO:local_logger:Now training epoch 77. LR=0.000152
-INFO:local_logger:----- Epoch[076/800], Train Loss: 0.6906, time: 859.59
-INFO:local_logger:Now training epoch 77. LR=0.000152
-INFO:local_logger:----- Epoch[076/800], Train Loss: 0.6903, time: 860.18
-INFO:local_logger:Now training epoch 77. LR=0.000152
-INFO:local_logger:----- Epoch[076/800], Train Loss: 0.6903, time: 860.24
-INFO:local_logger:Now training epoch 77. LR=0.000152
-INFO:local_logger:----- Epoch[076/800], Train Loss: 0.6904, time: 859.60
-INFO:local_logger:Now training epoch 77. LR=0.000152
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-76-Loss-0.6905609986769955.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-76-Loss-0.6905609986769955.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-76-Loss-0.6905609986769955.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-76-Loss-0.6905609986769955.pdopt
-INFO:local_logger:Now training epoch 77. LR=0.000152
-INFO:master_logger:Now training epoch 77. LR=0.000152
-INFO:local_logger:Epoch[077/800], Step[0000/0626], Avg Loss: 0.6953
-INFO:local_logger:Epoch[077/800], Step[0000/0626], Avg Loss: 0.6740
-INFO:local_logger:Epoch[077/800], Step[0000/0626], Avg Loss: 0.6827
-INFO:master_logger:Epoch[077/800], Step[0000/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[077/800], Step[0000/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[077/800], Step[0000/0626], Avg Loss: 0.7074
-INFO:local_logger:Epoch[077/800], Step[0000/0626], Avg Loss: 0.7042
-INFO:local_logger:Epoch[077/800], Step[0000/0626], Avg Loss: 0.6947
-INFO:local_logger:Epoch[077/800], Step[0000/0626], Avg Loss: 0.6790
-INFO:local_logger:Epoch[077/800], Step[0100/0626], Avg Loss: 0.6899
-INFO:local_logger:Epoch[077/800], Step[0100/0626], Avg Loss: 0.6911
-INFO:local_logger:Epoch[077/800], Step[0100/0626], Avg Loss: 0.6890
-INFO:local_logger:Epoch[077/800], Step[0100/0626], Avg Loss: 0.6909
-INFO:local_logger:Epoch[077/800], Step[0100/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[077/800], Step[0100/0626], Avg Loss: 0.6900
-INFO:master_logger:Epoch[077/800], Step[0100/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[077/800], Step[0100/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[077/800], Step[0100/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[077/800], Step[0200/0626], Avg Loss: 0.6901
-INFO:local_logger:Epoch[077/800], Step[0200/0626], Avg Loss: 0.6903
-INFO:local_logger:Epoch[077/800], Step[0200/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[077/800], Step[0200/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[077/800], Step[0200/0626], Avg Loss: 0.6907
-INFO:local_logger:Epoch[077/800], Step[0200/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[077/800], Step[0200/0626], Avg Loss: 0.6892
-INFO:master_logger:Epoch[077/800], Step[0200/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[077/800], Step[0200/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[077/800], Step[0300/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[077/800], Step[0300/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[077/800], Step[0300/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[077/800], Step[0300/0626], Avg Loss: 0.6904
-INFO:local_logger:Epoch[077/800], Step[0300/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[077/800], Step[0300/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[077/800], Step[0300/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[077/800], Step[0300/0626], Avg Loss: 0.6908
-INFO:master_logger:Epoch[077/800], Step[0300/0626], Avg Loss: 0.6901
-INFO:local_logger:Epoch[077/800], Step[0400/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[077/800], Step[0400/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[077/800], Step[0400/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[077/800], Step[0400/0626], Avg Loss: 0.6903
-INFO:local_logger:Epoch[077/800], Step[0400/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[077/800], Step[0400/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[077/800], Step[0400/0626], Avg Loss: 0.6902
-INFO:master_logger:Epoch[077/800], Step[0400/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[077/800], Step[0400/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[077/800], Step[0500/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[077/800], Step[0500/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[077/800], Step[0500/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[077/800], Step[0500/0626], Avg Loss: 0.6903
-INFO:local_logger:Epoch[077/800], Step[0500/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[077/800], Step[0500/0626], Avg Loss: 0.6899
-INFO:local_logger:Epoch[077/800], Step[0500/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[077/800], Step[0500/0626], Avg Loss: 0.6896
-INFO:master_logger:Epoch[077/800], Step[0500/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[077/800], Step[0600/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[077/800], Step[0600/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[077/800], Step[0600/0626], Avg Loss: 0.6906
-INFO:local_logger:Epoch[077/800], Step[0600/0626], Avg Loss: 0.6899
-INFO:local_logger:Epoch[077/800], Step[0600/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[077/800], Step[0600/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[077/800], Step[0600/0626], Avg Loss: 0.6898
-INFO:master_logger:Epoch[077/800], Step[0600/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[077/800], Step[0600/0626], Avg Loss: 0.6904
-INFO:local_logger:----- Epoch[077/800], Train Loss: 0.6899, time: 882.70
-INFO:local_logger:Now training epoch 78. LR=0.000152
-INFO:local_logger:----- Epoch[077/800], Train Loss: 0.6898, time: 884.37
-INFO:local_logger:Now training epoch 78. LR=0.000152
-INFO:local_logger:----- Epoch[077/800], Train Loss: 0.6901, time: 883.82
-INFO:local_logger:Now training epoch 78. LR=0.000152
-INFO:local_logger:----- Epoch[077/800], Train Loss: 0.6906, time: 883.80
-INFO:local_logger:----- Epoch[077/800], Train Loss: 0.6902, time: 883.80
-INFO:local_logger:Now training epoch 78. LR=0.000152
-INFO:local_logger:Now training epoch 78. LR=0.000152
-INFO:local_logger:----- Epoch[077/800], Train Loss: 0.6905, time: 883.93
-INFO:local_logger:Now training epoch 78. LR=0.000152
-INFO:local_logger:----- Epoch[077/800], Train Loss: 0.6898, time: 880.45
-INFO:master_logger:----- Epoch[077/800], Train Loss: 0.6902, time: 880.45
-INFO:local_logger:----- Epoch[077/800], Train Loss: 0.6905, time: 883.84
-INFO:local_logger:Now training epoch 78. LR=0.000152
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-77-Loss-0.6897522572932259.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-77-Loss-0.6897522572932259.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-77-Loss-0.6897522572932259.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-77-Loss-0.6897522572932259.pdopt
-INFO:local_logger:Now training epoch 78. LR=0.000152
-INFO:master_logger:Now training epoch 78. LR=0.000152
-INFO:local_logger:Epoch[078/800], Step[0000/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[078/800], Step[0000/0626], Avg Loss: 0.6877
-INFO:master_logger:Epoch[078/800], Step[0000/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[078/800], Step[0000/0626], Avg Loss: 0.6933
-INFO:local_logger:Epoch[078/800], Step[0000/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[078/800], Step[0000/0626], Avg Loss: 0.6979
-INFO:local_logger:Epoch[078/800], Step[0000/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[078/800], Step[0000/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[078/800], Step[0000/0626], Avg Loss: 0.6743
-INFO:local_logger:Epoch[078/800], Step[0100/0626], Avg Loss: 0.6903
-INFO:local_logger:Epoch[078/800], Step[0100/0626], Avg Loss: 0.6884
-INFO:local_logger:Epoch[078/800], Step[0100/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[078/800], Step[0100/0626], Avg Loss: 0.6884
-INFO:local_logger:Epoch[078/800], Step[0100/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[078/800], Step[0100/0626], Avg Loss: 0.6902
-INFO:master_logger:Epoch[078/800], Step[0100/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[078/800], Step[0100/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[078/800], Step[0100/0626], Avg Loss: 0.6903
-INFO:local_logger:Epoch[078/800], Step[0200/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[078/800], Step[0200/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[078/800], Step[0200/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[078/800], Step[0200/0626], Avg Loss: 0.6892
-INFO:local_logger:Epoch[078/800], Step[0200/0626], Avg Loss: 0.6895
-INFO:master_logger:Epoch[078/800], Step[0200/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[078/800], Step[0200/0626], Avg Loss: 0.6899
-INFO:local_logger:Epoch[078/800], Step[0200/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[078/800], Step[0200/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[078/800], Step[0300/0626], Avg Loss: 0.6896
-INFO:master_logger:Epoch[078/800], Step[0300/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[078/800], Step[0300/0626], Avg Loss: 0.6895
-INFO:local_logger:Epoch[078/800], Step[0300/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[078/800], Step[0300/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[078/800], Step[0300/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[078/800], Step[0300/0626], Avg Loss: 0.6895
-INFO:local_logger:Epoch[078/800], Step[0300/0626], Avg Loss: 0.6903
-INFO:local_logger:Epoch[078/800], Step[0300/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[078/800], Step[0400/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[078/800], Step[0400/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[078/800], Step[0400/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[078/800], Step[0400/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[078/800], Step[0400/0626], Avg Loss: 0.6901
-INFO:master_logger:Epoch[078/800], Step[0400/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[078/800], Step[0400/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[078/800], Step[0400/0626], Avg Loss: 0.6894
-INFO:local_logger:Epoch[078/800], Step[0400/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[078/800], Step[0500/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[078/800], Step[0500/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[078/800], Step[0500/0626], Avg Loss: 0.6894
-INFO:local_logger:Epoch[078/800], Step[0500/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[078/800], Step[0500/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[078/800], Step[0500/0626], Avg Loss: 0.6898
-INFO:master_logger:Epoch[078/800], Step[0500/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[078/800], Step[0500/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[078/800], Step[0500/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[078/800], Step[0600/0626], Avg Loss: 0.6895
-INFO:local_logger:Epoch[078/800], Step[0600/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[078/800], Step[0600/0626], Avg Loss: 0.6895
-INFO:local_logger:Epoch[078/800], Step[0600/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[078/800], Step[0600/0626], Avg Loss: 0.6895
-INFO:local_logger:Epoch[078/800], Step[0600/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[078/800], Step[0600/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[078/800], Step[0600/0626], Avg Loss: 0.6897
-INFO:master_logger:Epoch[078/800], Step[0600/0626], Avg Loss: 0.6896
-INFO:local_logger:----- Epoch[078/800], Train Loss: 0.6895, time: 850.97
-INFO:local_logger:Now training epoch 79. LR=0.000152
-INFO:local_logger:----- Epoch[078/800], Train Loss: 0.6897, time: 849.85
-INFO:local_logger:Now training epoch 79. LR=0.000152
-INFO:local_logger:----- Epoch[078/800], Train Loss: 0.6897, time: 850.45
-INFO:local_logger:Now training epoch 79. LR=0.000152
-INFO:local_logger:----- Epoch[078/800], Train Loss: 0.6894, time: 850.45
-INFO:local_logger:Now training epoch 79. LR=0.000152
-INFO:local_logger:----- Epoch[078/800], Train Loss: 0.6895, time: 846.72
-INFO:master_logger:----- Epoch[078/800], Train Loss: 0.6896, time: 846.72
-INFO:local_logger:----- Epoch[078/800], Train Loss: 0.6897, time: 850.44
-INFO:local_logger:Now training epoch 79. LR=0.000152
-INFO:local_logger:----- Epoch[078/800], Train Loss: 0.6894, time: 850.46
-INFO:local_logger:Now training epoch 79. LR=0.000152
-INFO:local_logger:----- Epoch[078/800], Train Loss: 0.6898, time: 850.47
-INFO:local_logger:Now training epoch 79. LR=0.000152
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-78-Loss-0.6895287958086865.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-78-Loss-0.6895287958086865.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-78-Loss-0.6895287958086865.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-78-Loss-0.6895287958086865.pdopt
-INFO:local_logger:Now training epoch 79. LR=0.000152
-INFO:master_logger:Now training epoch 79. LR=0.000152
-INFO:local_logger:Epoch[079/800], Step[0000/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[079/800], Step[0000/0626], Avg Loss: 0.6870
-INFO:master_logger:Epoch[079/800], Step[0000/0626], Avg Loss: 0.6882
-INFO:local_logger:Epoch[079/800], Step[0000/0626], Avg Loss: 0.7025
-INFO:local_logger:Epoch[079/800], Step[0000/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[079/800], Step[0000/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[079/800], Step[0000/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[079/800], Step[0000/0626], Avg Loss: 0.6939
-INFO:local_logger:Epoch[079/800], Step[0000/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[079/800], Step[0100/0626], Avg Loss: 0.6890
-INFO:local_logger:Epoch[079/800], Step[0100/0626], Avg Loss: 0.6901
-INFO:local_logger:Epoch[079/800], Step[0100/0626], Avg Loss: 0.6892
-INFO:master_logger:Epoch[079/800], Step[0100/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[079/800], Step[0100/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[079/800], Step[0100/0626], Avg Loss: 0.6902
-INFO:local_logger:Epoch[079/800], Step[0100/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[079/800], Step[0100/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[079/800], Step[0100/0626], Avg Loss: 0.6882
-INFO:local_logger:Epoch[079/800], Step[0200/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[079/800], Step[0200/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[079/800], Step[0200/0626], Avg Loss: 0.6892
-INFO:local_logger:Epoch[079/800], Step[0200/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[079/800], Step[0200/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[079/800], Step[0200/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[079/800], Step[0200/0626], Avg Loss: 0.6890
-INFO:local_logger:Epoch[079/800], Step[0200/0626], Avg Loss: 0.6900
-INFO:master_logger:Epoch[079/800], Step[0200/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[079/800], Step[0300/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[079/800], Step[0300/0626], Avg Loss: 0.6892
-INFO:local_logger:Epoch[079/800], Step[0300/0626], Avg Loss: 0.6894
-INFO:master_logger:Epoch[079/800], Step[0300/0626], Avg Loss: 0.6894
-INFO:local_logger:Epoch[079/800], Step[0300/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[079/800], Step[0300/0626], Avg Loss: 0.6889
-INFO:local_logger:Epoch[079/800], Step[0300/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[079/800], Step[0300/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[079/800], Step[0300/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[079/800], Step[0400/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[079/800], Step[0400/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[079/800], Step[0400/0626], Avg Loss: 0.6894
-INFO:local_logger:Epoch[079/800], Step[0400/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[079/800], Step[0400/0626], Avg Loss: 0.6891
-INFO:master_logger:Epoch[079/800], Step[0400/0626], Avg Loss: 0.6892
-INFO:local_logger:Epoch[079/800], Step[0400/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[079/800], Step[0400/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[079/800], Step[0400/0626], Avg Loss: 0.6894
-INFO:local_logger:Epoch[079/800], Step[0500/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[079/800], Step[0500/0626], Avg Loss: 0.6892
-INFO:local_logger:Epoch[079/800], Step[0500/0626], Avg Loss: 0.6895
-INFO:local_logger:Epoch[079/800], Step[0500/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[079/800], Step[0500/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[079/800], Step[0500/0626], Avg Loss: 0.6892
-INFO:local_logger:Epoch[079/800], Step[0500/0626], Avg Loss: 0.6895
-INFO:local_logger:Epoch[079/800], Step[0500/0626], Avg Loss: 0.6888
-INFO:master_logger:Epoch[079/800], Step[0500/0626], Avg Loss: 0.6892
-INFO:local_logger:Epoch[079/800], Step[0600/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[079/800], Step[0600/0626], Avg Loss: 0.6894
-INFO:local_logger:Epoch[079/800], Step[0600/0626], Avg Loss: 0.6892
-INFO:master_logger:Epoch[079/800], Step[0600/0626], Avg Loss: 0.6892
-INFO:local_logger:Epoch[079/800], Step[0600/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[079/800], Step[0600/0626], Avg Loss: 0.6889
-INFO:local_logger:Epoch[079/800], Step[0600/0626], Avg Loss: 0.6894
-INFO:local_logger:Epoch[079/800], Step[0600/0626], Avg Loss: 0.6896
-INFO:local_logger:Epoch[079/800], Step[0600/0626], Avg Loss: 0.6891
-INFO:local_logger:----- Epoch[079/800], Train Loss: 0.6892, time: 888.30
-INFO:local_logger:Now training epoch 80. LR=0.000152
-INFO:local_logger:----- Epoch[079/800], Train Loss: 0.6889, time: 888.13
-INFO:local_logger:----- Epoch[079/800], Train Loss: 0.6891, time: 888.74
-INFO:local_logger:Now training epoch 80. LR=0.000152
-INFO:local_logger:Now training epoch 80. LR=0.000152
-INFO:local_logger:----- Epoch[079/800], Train Loss: 0.6896, time: 888.14
-INFO:local_logger:Now training epoch 80. LR=0.000152
-INFO:local_logger:----- Epoch[079/800], Train Loss: 0.6886, time: 884.16
-INFO:master_logger:----- Epoch[079/800], Train Loss: 0.6892, time: 884.16
-INFO:local_logger:----- Epoch[079/800], Train Loss: 0.6893, time: 888.24
-INFO:local_logger:Now training epoch 80. LR=0.000152
-INFO:local_logger:----- Epoch[079/800], Train Loss: 0.6896, time: 888.24
-INFO:local_logger:Now training epoch 80. LR=0.000152
-INFO:local_logger:----- Epoch[079/800], Train Loss: 0.6893, time: 888.26
-INFO:local_logger:Now training epoch 80. LR=0.000152
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-79-Loss-0.6886302635034396.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-79-Loss-0.6886302635034396.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-79-Loss-0.6886302635034396.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-79-Loss-0.6886302635034396.pdopt
-INFO:local_logger:Now training epoch 80. LR=0.000152
-INFO:master_logger:Now training epoch 80. LR=0.000152
-INFO:local_logger:Epoch[080/800], Step[0000/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[080/800], Step[0000/0626], Avg Loss: 0.6859
-INFO:master_logger:Epoch[080/800], Step[0000/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[080/800], Step[0000/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[080/800], Step[0000/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[080/800], Step[0000/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[080/800], Step[0000/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[080/800], Step[0000/0626], Avg Loss: 0.7038
-INFO:local_logger:Epoch[080/800], Step[0000/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[080/800], Step[0100/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[080/800], Step[0100/0626], Avg Loss: 0.6889
-INFO:local_logger:Epoch[080/800], Step[0100/0626], Avg Loss: 0.6877
-INFO:master_logger:Epoch[080/800], Step[0100/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[080/800], Step[0100/0626], Avg Loss: 0.6869
-INFO:local_logger:Epoch[080/800], Step[0100/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[080/800], Step[0100/0626], Avg Loss: 0.6890
-INFO:local_logger:Epoch[080/800], Step[0100/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[080/800], Step[0100/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[080/800], Step[0200/0626], Avg Loss: 0.6885
-INFO:local_logger:Epoch[080/800], Step[0200/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[080/800], Step[0200/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[080/800], Step[0200/0626], Avg Loss: 0.6886
-INFO:master_logger:Epoch[080/800], Step[0200/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[080/800], Step[0200/0626], Avg Loss: 0.6890
-INFO:local_logger:Epoch[080/800], Step[0200/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[080/800], Step[0200/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[080/800], Step[0200/0626], Avg Loss: 0.6880
-INFO:local_logger:Epoch[080/800], Step[0300/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[080/800], Step[0300/0626], Avg Loss: 0.6889
-INFO:local_logger:Epoch[080/800], Step[0300/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[080/800], Step[0300/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[080/800], Step[0300/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[080/800], Step[0300/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[080/800], Step[0300/0626], Avg Loss: 0.6885
-INFO:local_logger:Epoch[080/800], Step[0300/0626], Avg Loss: 0.6887
-INFO:master_logger:Epoch[080/800], Step[0300/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[080/800], Step[0400/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[080/800], Step[0400/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[080/800], Step[0400/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[080/800], Step[0400/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[080/800], Step[0400/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[080/800], Step[0400/0626], Avg Loss: 0.6886
-INFO:master_logger:Epoch[080/800], Step[0400/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[080/800], Step[0400/0626], Avg Loss: 0.6890
-INFO:local_logger:Epoch[080/800], Step[0400/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[080/800], Step[0500/0626], Avg Loss: 0.6890
-INFO:local_logger:Epoch[080/800], Step[0500/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[080/800], Step[0500/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[080/800], Step[0500/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[080/800], Step[0500/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[080/800], Step[0500/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[080/800], Step[0500/0626], Avg Loss: 0.6888
-INFO:master_logger:Epoch[080/800], Step[0500/0626], Avg Loss: 0.6889
-INFO:local_logger:Epoch[080/800], Step[0500/0626], Avg Loss: 0.6889
-INFO:local_logger:Epoch[080/800], Step[0600/0626], Avg Loss: 0.6890
-INFO:local_logger:Epoch[080/800], Step[0600/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[080/800], Step[0600/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[080/800], Step[0600/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[080/800], Step[0600/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[080/800], Step[0600/0626], Avg Loss: 0.6888
-INFO:master_logger:Epoch[080/800], Step[0600/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[080/800], Step[0600/0626], Avg Loss: 0.6889
-INFO:local_logger:Epoch[080/800], Step[0600/0626], Avg Loss: 0.6890
-INFO:local_logger:----- Epoch[080/800], Train Loss: 0.6890, time: 849.18
-INFO:local_logger:Now training epoch 81. LR=0.000153
-INFO:local_logger:----- Epoch[080/800], Train Loss: 0.6888, time: 849.16
-INFO:local_logger:Now training epoch 81. LR=0.000153
-INFO:local_logger:----- Epoch[080/800], Train Loss: 0.6890, time: 849.36
-INFO:local_logger:Now training epoch 81. LR=0.000153
-INFO:local_logger:----- Epoch[080/800], Train Loss: 0.6887, time: 849.41
-INFO:local_logger:Now training epoch 81. LR=0.000153
-INFO:local_logger:----- Epoch[080/800], Train Loss: 0.6888, time: 849.87
-INFO:local_logger:Now training epoch 81. LR=0.000153
-INFO:local_logger:----- Epoch[080/800], Train Loss: 0.6889, time: 849.31
-INFO:local_logger:Now training epoch 81. LR=0.000153
-INFO:local_logger:----- Epoch[080/800], Train Loss: 0.6888, time: 845.47
-INFO:master_logger:----- Epoch[080/800], Train Loss: 0.6889, time: 845.47
-INFO:local_logger:----- Epoch[080/800], Train Loss: 0.6891, time: 849.50
-INFO:local_logger:Now training epoch 81. LR=0.000153
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-80-Loss-0.6887507163269282.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-80-Loss-0.6887507163269282.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-80-Loss-0.6887507163269282.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-80-Loss-0.6887507163269282.pdopt
-INFO:local_logger:Now training epoch 81. LR=0.000153
-INFO:master_logger:Now training epoch 81. LR=0.000153
-INFO:local_logger:Epoch[081/800], Step[0000/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[081/800], Step[0000/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[081/800], Step[0000/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[081/800], Step[0000/0626], Avg Loss: 0.6999
-INFO:master_logger:Epoch[081/800], Step[0000/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[081/800], Step[0000/0626], Avg Loss: 0.6984
-INFO:local_logger:Epoch[081/800], Step[0000/0626], Avg Loss: 0.7003
-INFO:local_logger:Epoch[081/800], Step[0000/0626], Avg Loss: 0.6882
-INFO:local_logger:Epoch[081/800], Step[0000/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[081/800], Step[0100/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[081/800], Step[0100/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[081/800], Step[0100/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[081/800], Step[0100/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[081/800], Step[0100/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[081/800], Step[0100/0626], Avg Loss: 0.6890
-INFO:local_logger:Epoch[081/800], Step[0100/0626], Avg Loss: 0.6882
-INFO:master_logger:Epoch[081/800], Step[0100/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[081/800], Step[0100/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[081/800], Step[0200/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[081/800], Step[0200/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[081/800], Step[0200/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[081/800], Step[0200/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[081/800], Step[0200/0626], Avg Loss: 0.6885
-INFO:local_logger:Epoch[081/800], Step[0200/0626], Avg Loss: 0.6892
-INFO:local_logger:Epoch[081/800], Step[0200/0626], Avg Loss: 0.6885
-INFO:local_logger:Epoch[081/800], Step[0200/0626], Avg Loss: 0.6886
-INFO:master_logger:Epoch[081/800], Step[0200/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[081/800], Step[0300/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[081/800], Step[0300/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[081/800], Step[0300/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[081/800], Step[0300/0626], Avg Loss: 0.6881
-INFO:master_logger:Epoch[081/800], Step[0300/0626], Avg Loss: 0.6882
-INFO:local_logger:Epoch[081/800], Step[0300/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[081/800], Step[0300/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[081/800], Step[0300/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[081/800], Step[0300/0626], Avg Loss: 0.6885
-INFO:local_logger:Epoch[081/800], Step[0400/0626], Avg Loss: 0.6879
-INFO:local_logger:Epoch[081/800], Step[0400/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[081/800], Step[0400/0626], Avg Loss: 0.6885
-INFO:local_logger:Epoch[081/800], Step[0400/0626], Avg Loss: 0.6879
-INFO:local_logger:Epoch[081/800], Step[0400/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[081/800], Step[0400/0626], Avg Loss: 0.6880
-INFO:local_logger:Epoch[081/800], Step[0400/0626], Avg Loss: 0.6886
-INFO:master_logger:Epoch[081/800], Step[0400/0626], Avg Loss: 0.6882
-INFO:local_logger:Epoch[081/800], Step[0400/0626], Avg Loss: 0.6888
-INFO:local_logger:Epoch[081/800], Step[0500/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[081/800], Step[0500/0626], Avg Loss: 0.6884
-INFO:local_logger:Epoch[081/800], Step[0500/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[081/800], Step[0500/0626], Avg Loss: 0.6885
-INFO:local_logger:Epoch[081/800], Step[0500/0626], Avg Loss: 0.6879
-INFO:local_logger:Epoch[081/800], Step[0500/0626], Avg Loss: 0.6878
-INFO:master_logger:Epoch[081/800], Step[0500/0626], Avg Loss: 0.6882
-INFO:local_logger:Epoch[081/800], Step[0500/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[081/800], Step[0500/0626], Avg Loss: 0.6885
-INFO:local_logger:Epoch[081/800], Step[0600/0626], Avg Loss: 0.6880
-INFO:local_logger:Epoch[081/800], Step[0600/0626], Avg Loss: 0.6880
-INFO:local_logger:Epoch[081/800], Step[0600/0626], Avg Loss: 0.6882
-INFO:local_logger:Epoch[081/800], Step[0600/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[081/800], Step[0600/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[081/800], Step[0600/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[081/800], Step[0600/0626], Avg Loss: 0.6886
-INFO:master_logger:Epoch[081/800], Step[0600/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[081/800], Step[0600/0626], Avg Loss: 0.6885
-INFO:local_logger:----- Epoch[081/800], Train Loss: 0.6878, time: 886.96
-INFO:local_logger:Now training epoch 82. LR=0.000153
-INFO:local_logger:----- Epoch[081/800], Train Loss: 0.6882, time: 887.25
-INFO:local_logger:Now training epoch 82. LR=0.000153
-INFO:local_logger:----- Epoch[081/800], Train Loss: 0.6882, time: 887.32
-INFO:local_logger:Now training epoch 82. LR=0.000153
-INFO:local_logger:----- Epoch[081/800], Train Loss: 0.6883, time: 887.55
-INFO:local_logger:Now training epoch 82. LR=0.000153
-INFO:local_logger:----- Epoch[081/800], Train Loss: 0.6878, time: 887.45
-INFO:local_logger:Now training epoch 82. LR=0.000153
-INFO:local_logger:----- Epoch[081/800], Train Loss: 0.6883, time: 887.60
-INFO:local_logger:Now training epoch 82. LR=0.000153
-INFO:local_logger:----- Epoch[081/800], Train Loss: 0.6880, time: 887.62
-INFO:local_logger:Now training epoch 82. LR=0.000153
-INFO:local_logger:----- Epoch[081/800], Train Loss: 0.6885, time: 883.70
-INFO:master_logger:----- Epoch[081/800], Train Loss: 0.6882, time: 883.70
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-81-Loss-0.6885438528329545.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-81-Loss-0.6885438528329545.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-81-Loss-0.6885438528329545.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-81-Loss-0.6885438528329545.pdopt
-INFO:local_logger:Now training epoch 82. LR=0.000153
-INFO:master_logger:Now training epoch 82. LR=0.000153
-INFO:local_logger:Epoch[082/800], Step[0000/0626], Avg Loss: 0.6723
-INFO:local_logger:Epoch[082/800], Step[0000/0626], Avg Loss: 0.6813
-INFO:master_logger:Epoch[082/800], Step[0000/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[082/800], Step[0000/0626], Avg Loss: 0.6948
-INFO:local_logger:Epoch[082/800], Step[0000/0626], Avg Loss: 0.6790
-INFO:local_logger:Epoch[082/800], Step[0000/0626], Avg Loss: 0.6783
-INFO:local_logger:Epoch[082/800], Step[0000/0626], Avg Loss: 0.6882
-INFO:local_logger:Epoch[082/800], Step[0000/0626], Avg Loss: 0.7052
-INFO:local_logger:Epoch[082/800], Step[0000/0626], Avg Loss: 0.6743
-INFO:local_logger:Epoch[082/800], Step[0100/0626], Avg Loss: 0.6875
-INFO:local_logger:Epoch[082/800], Step[0100/0626], Avg Loss: 0.6885
-INFO:local_logger:Epoch[082/800], Step[0100/0626], Avg Loss: 0.6874
-INFO:master_logger:Epoch[082/800], Step[0100/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[082/800], Step[0100/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[082/800], Step[0100/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[082/800], Step[0100/0626], Avg Loss: 0.6871
-INFO:local_logger:Epoch[082/800], Step[0100/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[082/800], Step[0100/0626], Avg Loss: 0.6882
-INFO:local_logger:Epoch[082/800], Step[0200/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[082/800], Step[0200/0626], Avg Loss: 0.6880
-INFO:local_logger:Epoch[082/800], Step[0200/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[082/800], Step[0200/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[082/800], Step[0200/0626], Avg Loss: 0.6875
-INFO:master_logger:Epoch[082/800], Step[0200/0626], Avg Loss: 0.6879
-INFO:local_logger:Epoch[082/800], Step[0200/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[082/800], Step[0200/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[082/800], Step[0200/0626], Avg Loss: 0.6884
-INFO:local_logger:Epoch[082/800], Step[0300/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[082/800], Step[0300/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[082/800], Step[0300/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[082/800], Step[0300/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[082/800], Step[0300/0626], Avg Loss: 0.6882
-INFO:master_logger:Epoch[082/800], Step[0300/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[082/800], Step[0300/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[082/800], Step[0300/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[082/800], Step[0300/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[082/800], Step[0400/0626], Avg Loss: 0.6880
-INFO:local_logger:Epoch[082/800], Step[0400/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[082/800], Step[0400/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[082/800], Step[0400/0626], Avg Loss: 0.6879
-INFO:local_logger:Epoch[082/800], Step[0400/0626], Avg Loss: 0.6871
-INFO:local_logger:Epoch[082/800], Step[0400/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[082/800], Step[0400/0626], Avg Loss: 0.6879
-INFO:master_logger:Epoch[082/800], Step[0400/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[082/800], Step[0400/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[082/800], Step[0500/0626], Avg Loss: 0.6879
-INFO:local_logger:Epoch[082/800], Step[0500/0626], Avg Loss: 0.6872
-INFO:local_logger:Epoch[082/800], Step[0500/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[082/800], Step[0500/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[082/800], Step[0500/0626], Avg Loss: 0.6874
-INFO:master_logger:Epoch[082/800], Step[0500/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[082/800], Step[0500/0626], Avg Loss: 0.6879
-INFO:local_logger:Epoch[082/800], Step[0500/0626], Avg Loss: 0.6880
-INFO:local_logger:Epoch[082/800], Step[0500/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[082/800], Step[0600/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[082/800], Step[0600/0626], Avg Loss: 0.6880
-INFO:local_logger:Epoch[082/800], Step[0600/0626], Avg Loss: 0.6879
-INFO:local_logger:Epoch[082/800], Step[0600/0626], Avg Loss: 0.6873
-INFO:master_logger:Epoch[082/800], Step[0600/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[082/800], Step[0600/0626], Avg Loss: 0.6871
-INFO:local_logger:Epoch[082/800], Step[0600/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[082/800], Step[0600/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[082/800], Step[0600/0626], Avg Loss: 0.6880
-INFO:local_logger:----- Epoch[082/800], Train Loss: 0.6876, time: 851.19
-INFO:local_logger:Now training epoch 83. LR=0.000153
-INFO:local_logger:----- Epoch[082/800], Train Loss: 0.6870, time: 851.29
-INFO:local_logger:Now training epoch 83. LR=0.000153
-INFO:local_logger:----- Epoch[082/800], Train Loss: 0.6876, time: 851.29
-INFO:local_logger:Now training epoch 83. LR=0.000153
-INFO:local_logger:----- Epoch[082/800], Train Loss: 0.6881, time: 851.15
-INFO:local_logger:Now training epoch 83. LR=0.000153
-INFO:local_logger:----- Epoch[082/800], Train Loss: 0.6881, time: 851.69
-INFO:local_logger:Now training epoch 83. LR=0.000153
-INFO:local_logger:----- Epoch[082/800], Train Loss: 0.6873, time: 851.08
-INFO:local_logger:Now training epoch 83. LR=0.000153
-INFO:local_logger:----- Epoch[082/800], Train Loss: 0.6877, time: 847.31
-INFO:master_logger:----- Epoch[082/800], Train Loss: 0.6876, time: 847.31
-INFO:local_logger:----- Epoch[082/800], Train Loss: 0.6879, time: 851.21
-INFO:local_logger:Now training epoch 83. LR=0.000153
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-82-Loss-0.687688001142508.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-82-Loss-0.687688001142508.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-82-Loss-0.687688001142508.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-82-Loss-0.687688001142508.pdopt
-INFO:local_logger:Now training epoch 83. LR=0.000153
-INFO:master_logger:Now training epoch 83. LR=0.000153
-INFO:local_logger:Epoch[083/800], Step[0000/0626], Avg Loss: 0.6878
-INFO:master_logger:Epoch[083/800], Step[0000/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[083/800], Step[0000/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[083/800], Step[0000/0626], Avg Loss: 0.6793
-INFO:local_logger:Epoch[083/800], Step[0000/0626], Avg Loss: 0.6913
-INFO:local_logger:Epoch[083/800], Step[0000/0626], Avg Loss: 0.6781
-INFO:local_logger:Epoch[083/800], Step[0000/0626], Avg Loss: 0.7000
-INFO:local_logger:Epoch[083/800], Step[0000/0626], Avg Loss: 0.6871
-INFO:local_logger:Epoch[083/800], Step[0000/0626], Avg Loss: 0.6893
-INFO:local_logger:Epoch[083/800], Step[0100/0626], Avg Loss: 0.6891
-INFO:local_logger:Epoch[083/800], Step[0100/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[083/800], Step[0100/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[083/800], Step[0100/0626], Avg Loss: 0.6875
-INFO:local_logger:Epoch[083/800], Step[0100/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[083/800], Step[0100/0626], Avg Loss: 0.6882
-INFO:master_logger:Epoch[083/800], Step[0100/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[083/800], Step[0100/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[083/800], Step[0100/0626], Avg Loss: 0.6872
-INFO:local_logger:Epoch[083/800], Step[0200/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[083/800], Step[0200/0626], Avg Loss: 0.6871
-INFO:local_logger:Epoch[083/800], Step[0200/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[083/800], Step[0200/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[083/800], Step[0200/0626], Avg Loss: 0.6880
-INFO:local_logger:Epoch[083/800], Step[0200/0626], Avg Loss: 0.6869
-INFO:local_logger:Epoch[083/800], Step[0200/0626], Avg Loss: 0.6890
-INFO:master_logger:Epoch[083/800], Step[0200/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[083/800], Step[0200/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[083/800], Step[0300/0626], Avg Loss: 0.6877
-INFO:local_logger:Epoch[083/800], Step[0300/0626], Avg Loss: 0.6871
-INFO:local_logger:Epoch[083/800], Step[0300/0626], Avg Loss: 0.6868
-INFO:local_logger:Epoch[083/800], Step[0300/0626], Avg Loss: 0.6872
-INFO:master_logger:Epoch[083/800], Step[0300/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[083/800], Step[0300/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[083/800], Step[0300/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[083/800], Step[0300/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[083/800], Step[0300/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[083/800], Step[0400/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[083/800], Step[0400/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[083/800], Step[0400/0626], Avg Loss: 0.6879
-INFO:local_logger:Epoch[083/800], Step[0400/0626], Avg Loss: 0.6873
-INFO:master_logger:Epoch[083/800], Step[0400/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[083/800], Step[0400/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[083/800], Step[0400/0626], Avg Loss: 0.6867
-INFO:local_logger:Epoch[083/800], Step[0400/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[083/800], Step[0400/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[083/800], Step[0500/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[083/800], Step[0500/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[083/800], Step[0500/0626], Avg Loss: 0.6875
-INFO:local_logger:Epoch[083/800], Step[0500/0626], Avg Loss: 0.6866
-INFO:master_logger:Epoch[083/800], Step[0500/0626], Avg Loss: 0.6872
-INFO:local_logger:Epoch[083/800], Step[0500/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[083/800], Step[0500/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[083/800], Step[0500/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[083/800], Step[0500/0626], Avg Loss: 0.6868
-INFO:local_logger:Epoch[083/800], Step[0600/0626], Avg Loss: 0.6872
-INFO:local_logger:Epoch[083/800], Step[0600/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[083/800], Step[0600/0626], Avg Loss: 0.6871
-INFO:local_logger:Epoch[083/800], Step[0600/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[083/800], Step[0600/0626], Avg Loss: 0.6875
-INFO:local_logger:Epoch[083/800], Step[0600/0626], Avg Loss: 0.6866
-INFO:master_logger:Epoch[083/800], Step[0600/0626], Avg Loss: 0.6872
-INFO:local_logger:Epoch[083/800], Step[0600/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[083/800], Step[0600/0626], Avg Loss: 0.6867
-INFO:local_logger:----- Epoch[083/800], Train Loss: 0.6872, time: 894.39
-INFO:local_logger:----- Epoch[083/800], Train Loss: 0.6868, time: 894.42
-INFO:local_logger:Now training epoch 84. LR=0.000153
-INFO:local_logger:Now training epoch 84. LR=0.000153
-INFO:local_logger:----- Epoch[083/800], Train Loss: 0.6876, time: 894.42
-INFO:local_logger:Now training epoch 84. LR=0.000153
-INFO:local_logger:----- Epoch[083/800], Train Loss: 0.6871, time: 890.84
-INFO:master_logger:----- Epoch[083/800], Train Loss: 0.6872, time: 890.84
-INFO:local_logger:----- Epoch[083/800], Train Loss: 0.6876, time: 894.60
-INFO:local_logger:Now training epoch 84. LR=0.000153
-INFO:local_logger:----- Epoch[083/800], Train Loss: 0.6873, time: 894.67
-INFO:local_logger:Now training epoch 84. LR=0.000153
-INFO:local_logger:----- Epoch[083/800], Train Loss: 0.6875, time: 894.68
-INFO:local_logger:Now training epoch 84. LR=0.000153
-INFO:local_logger:----- Epoch[083/800], Train Loss: 0.6866, time: 894.74
-INFO:local_logger:Now training epoch 84. LR=0.000153
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-83-Loss-0.6870564654976226.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-83-Loss-0.6870564654976226.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-83-Loss-0.6870564654976226.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-83-Loss-0.6870564654976226.pdopt
-INFO:local_logger:Now training epoch 84. LR=0.000153
-INFO:master_logger:Now training epoch 84. LR=0.000153
-INFO:local_logger:Epoch[084/800], Step[0000/0626], Avg Loss: 0.6927
-INFO:master_logger:Epoch[084/800], Step[0000/0626], Avg Loss: 0.6881
-INFO:local_logger:Epoch[084/800], Step[0000/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[084/800], Step[0000/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[084/800], Step[0000/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[084/800], Step[0000/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[084/800], Step[0000/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[084/800], Step[0000/0626], Avg Loss: 0.6745
-INFO:local_logger:Epoch[084/800], Step[0000/0626], Avg Loss: 0.6914
-INFO:local_logger:Epoch[084/800], Step[0100/0626], Avg Loss: 0.6875
-INFO:local_logger:Epoch[084/800], Step[0100/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[084/800], Step[0100/0626], Avg Loss: 0.6880
-INFO:master_logger:Epoch[084/800], Step[0100/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[084/800], Step[0100/0626], Avg Loss: 0.6867
-INFO:local_logger:Epoch[084/800], Step[0100/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[084/800], Step[0100/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[084/800], Step[0100/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[084/800], Step[0100/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[084/800], Step[0200/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[084/800], Step[0200/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[084/800], Step[0200/0626], Avg Loss: 0.6867
-INFO:master_logger:Epoch[084/800], Step[0200/0626], Avg Loss: 0.6869
-INFO:local_logger:Epoch[084/800], Step[0200/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[084/800], Step[0200/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[084/800], Step[0200/0626], Avg Loss: 0.6884
-INFO:local_logger:Epoch[084/800], Step[0200/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[084/800], Step[0200/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[084/800], Step[0300/0626], Avg Loss: 0.6867
-INFO:local_logger:Epoch[084/800], Step[0300/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[084/800], Step[0300/0626], Avg Loss: 0.6878
-INFO:local_logger:Epoch[084/800], Step[0300/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[084/800], Step[0300/0626], Avg Loss: 0.6867
-INFO:local_logger:Epoch[084/800], Step[0300/0626], Avg Loss: 0.6868
-INFO:local_logger:Epoch[084/800], Step[0300/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[084/800], Step[0300/0626], Avg Loss: 0.6868
-INFO:master_logger:Epoch[084/800], Step[0300/0626], Avg Loss: 0.6869
-INFO:local_logger:Epoch[084/800], Step[0400/0626], Avg Loss: 0.6867
-INFO:local_logger:Epoch[084/800], Step[0400/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[084/800], Step[0400/0626], Avg Loss: 0.6869
-INFO:local_logger:Epoch[084/800], Step[0400/0626], Avg Loss: 0.6875
-INFO:local_logger:Epoch[084/800], Step[0400/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[084/800], Step[0400/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[084/800], Step[0400/0626], Avg Loss: 0.6876
-INFO:master_logger:Epoch[084/800], Step[0400/0626], Avg Loss: 0.6869
-INFO:local_logger:Epoch[084/800], Step[0400/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[084/800], Step[0500/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[084/800], Step[0500/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[084/800], Step[0500/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[084/800], Step[0500/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[084/800], Step[0500/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[084/800], Step[0500/0626], Avg Loss: 0.6866
-INFO:master_logger:Epoch[084/800], Step[0500/0626], Avg Loss: 0.6869
-INFO:local_logger:Epoch[084/800], Step[0500/0626], Avg Loss: 0.6868
-INFO:local_logger:Epoch[084/800], Step[0500/0626], Avg Loss: 0.6867
-INFO:local_logger:Epoch[084/800], Step[0600/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[084/800], Step[0600/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[084/800], Step[0600/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[084/800], Step[0600/0626], Avg Loss: 0.6869
-INFO:local_logger:Epoch[084/800], Step[0600/0626], Avg Loss: 0.6875
-INFO:local_logger:Epoch[084/800], Step[0600/0626], Avg Loss: 0.6867
-INFO:master_logger:Epoch[084/800], Step[0600/0626], Avg Loss: 0.6869
-INFO:local_logger:Epoch[084/800], Step[0600/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[084/800], Step[0600/0626], Avg Loss: 0.6865
-INFO:local_logger:----- Epoch[084/800], Train Loss: 0.6870, time: 854.61
-INFO:master_logger:----- Epoch[084/800], Train Loss: 0.6869, time: 854.61
-INFO:local_logger:----- Epoch[084/800], Train Loss: 0.6868, time: 858.62
-INFO:local_logger:Now training epoch 85. LR=0.000153
-INFO:local_logger:----- Epoch[084/800], Train Loss: 0.6865, time: 858.88
-INFO:local_logger:Now training epoch 85. LR=0.000153
-INFO:local_logger:----- Epoch[084/800], Train Loss: 0.6866, time: 858.64
-INFO:local_logger:Now training epoch 85. LR=0.000153
-INFO:local_logger:----- Epoch[084/800], Train Loss: 0.6874, time: 858.90
-INFO:local_logger:Now training epoch 85. LR=0.000153
-INFO:local_logger:----- Epoch[084/800], Train Loss: 0.6868, time: 858.67
-INFO:local_logger:Now training epoch 85. LR=0.000153
-INFO:local_logger:----- Epoch[084/800], Train Loss: 0.6875, time: 858.57
-INFO:local_logger:Now training epoch 85. LR=0.000153
-INFO:local_logger:----- Epoch[084/800], Train Loss: 0.6864, time: 858.93
-INFO:local_logger:Now training epoch 85. LR=0.000153
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-84-Loss-0.6870198997374206.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-84-Loss-0.6870198997374206.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-84-Loss-0.6870198997374206.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-84-Loss-0.6870198997374206.pdopt
-INFO:local_logger:Now training epoch 85. LR=0.000153
-INFO:master_logger:Now training epoch 85. LR=0.000153
-INFO:local_logger:Epoch[085/800], Step[0000/0626], Avg Loss: 0.6915
-INFO:local_logger:Epoch[085/800], Step[0000/0626], Avg Loss: 0.6830
-INFO:master_logger:Epoch[085/800], Step[0000/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[085/800], Step[0000/0626], Avg Loss: 0.6898
-INFO:local_logger:Epoch[085/800], Step[0000/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[085/800], Step[0000/0626], Avg Loss: 0.6935
-INFO:local_logger:Epoch[085/800], Step[0000/0626], Avg Loss: 0.6929
-INFO:local_logger:Epoch[085/800], Step[0000/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[085/800], Step[0000/0626], Avg Loss: 0.6803
-INFO:local_logger:Epoch[085/800], Step[0100/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[085/800], Step[0100/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[085/800], Step[0100/0626], Avg Loss: 0.6868
-INFO:local_logger:Epoch[085/800], Step[0100/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[085/800], Step[0100/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[085/800], Step[0100/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[085/800], Step[0100/0626], Avg Loss: 0.6843
-INFO:master_logger:Epoch[085/800], Step[0100/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[085/800], Step[0100/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[085/800], Step[0200/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[085/800], Step[0200/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[085/800], Step[0200/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[085/800], Step[0200/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[085/800], Step[0200/0626], Avg Loss: 0.6868
-INFO:local_logger:Epoch[085/800], Step[0200/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[085/800], Step[0200/0626], Avg Loss: 0.6865
-INFO:master_logger:Epoch[085/800], Step[0200/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[085/800], Step[0200/0626], Avg Loss: 0.6856
-INFO:local_logger:Epoch[085/800], Step[0300/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[085/800], Step[0300/0626], Avg Loss: 0.6867
-INFO:local_logger:Epoch[085/800], Step[0300/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[085/800], Step[0300/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[085/800], Step[0300/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[085/800], Step[0300/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[085/800], Step[0300/0626], Avg Loss: 0.6866
-INFO:master_logger:Epoch[085/800], Step[0300/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[085/800], Step[0300/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[085/800], Step[0400/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[085/800], Step[0400/0626], Avg Loss: 0.6856
-INFO:local_logger:Epoch[085/800], Step[0400/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[085/800], Step[0400/0626], Avg Loss: 0.6869
-INFO:local_logger:Epoch[085/800], Step[0400/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[085/800], Step[0400/0626], Avg Loss: 0.6866
-INFO:master_logger:Epoch[085/800], Step[0400/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[085/800], Step[0400/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[085/800], Step[0400/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[085/800], Step[0500/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[085/800], Step[0500/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[085/800], Step[0500/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[085/800], Step[0500/0626], Avg Loss: 0.6868
-INFO:local_logger:Epoch[085/800], Step[0500/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[085/800], Step[0500/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[085/800], Step[0500/0626], Avg Loss: 0.6860
-INFO:master_logger:Epoch[085/800], Step[0500/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[085/800], Step[0500/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[085/800], Step[0600/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[085/800], Step[0600/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[085/800], Step[0600/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[085/800], Step[0600/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[085/800], Step[0600/0626], Avg Loss: 0.6859
-INFO:master_logger:Epoch[085/800], Step[0600/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[085/800], Step[0600/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[085/800], Step[0600/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[085/800], Step[0600/0626], Avg Loss: 0.6862
-INFO:local_logger:----- Epoch[085/800], Train Loss: 0.6862, time: 892.65
-INFO:local_logger:Now training epoch 86. LR=0.000153
-INFO:local_logger:----- Epoch[085/800], Train Loss: 0.6862, time: 893.08
-INFO:local_logger:Now training epoch 86. LR=0.000153
-INFO:local_logger:----- Epoch[085/800], Train Loss: 0.6859, time: 890.01
-INFO:master_logger:----- Epoch[085/800], Train Loss: 0.6862, time: 890.01
-INFO:local_logger:----- Epoch[085/800], Train Loss: 0.6864, time: 893.68
-INFO:local_logger:Now training epoch 86. LR=0.000153
-INFO:local_logger:----- Epoch[085/800], Train Loss: 0.6857, time: 893.70
-INFO:local_logger:Now training epoch 86. LR=0.000153
-INFO:local_logger:----- Epoch[085/800], Train Loss: 0.6863, time: 893.73
-INFO:local_logger:Now training epoch 86. LR=0.000153
-INFO:local_logger:----- Epoch[085/800], Train Loss: 0.6867, time: 893.72
-INFO:local_logger:Now training epoch 86. LR=0.000153
-INFO:local_logger:----- Epoch[085/800], Train Loss: 0.6864, time: 893.76
-INFO:local_logger:Now training epoch 86. LR=0.000153
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-85-Loss-0.6859439270970955.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-85-Loss-0.6859439270970955.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-85-Loss-0.6859439270970955.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-85-Loss-0.6859439270970955.pdopt
-INFO:local_logger:Now training epoch 86. LR=0.000153
-INFO:master_logger:Now training epoch 86. LR=0.000153
-INFO:local_logger:Epoch[086/800], Step[0000/0626], Avg Loss: 0.6904
-INFO:local_logger:Epoch[086/800], Step[0000/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[086/800], Step[0000/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[086/800], Step[0000/0626], Avg Loss: 0.7015
-INFO:local_logger:Epoch[086/800], Step[0000/0626], Avg Loss: 0.7112
-INFO:local_logger:Epoch[086/800], Step[0000/0626], Avg Loss: 0.6865
-INFO:local_logger:Epoch[086/800], Step[0000/0626], Avg Loss: 0.6773
-INFO:master_logger:Epoch[086/800], Step[0000/0626], Avg Loss: 0.6887
-INFO:local_logger:Epoch[086/800], Step[0000/0626], Avg Loss: 0.6796
-INFO:local_logger:Epoch[086/800], Step[0100/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[086/800], Step[0100/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[086/800], Step[0100/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[086/800], Step[0100/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[086/800], Step[0100/0626], Avg Loss: 0.6876
-INFO:master_logger:Epoch[086/800], Step[0100/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[086/800], Step[0100/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0100/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[086/800], Step[0100/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[086/800], Step[0200/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[086/800], Step[0200/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[086/800], Step[0200/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[086/800], Step[0200/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[086/800], Step[0200/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[086/800], Step[0200/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[086/800], Step[0200/0626], Avg Loss: 0.6862
-INFO:master_logger:Epoch[086/800], Step[0200/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[086/800], Step[0200/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[086/800], Step[0300/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0300/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[086/800], Step[0300/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[086/800], Step[0300/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[086/800], Step[0300/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[086/800], Step[0300/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[086/800], Step[0300/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0300/0626], Avg Loss: 0.6860
-INFO:master_logger:Epoch[086/800], Step[0300/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[086/800], Step[0400/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[086/800], Step[0400/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0400/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0400/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[086/800], Step[0400/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[086/800], Step[0400/0626], Avg Loss: 0.6862
-INFO:master_logger:Epoch[086/800], Step[0400/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0400/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[086/800], Step[0400/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0500/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0500/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[086/800], Step[0500/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[086/800], Step[0500/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[086/800], Step[0500/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[086/800], Step[0500/0626], Avg Loss: 0.6862
-INFO:master_logger:Epoch[086/800], Step[0500/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0500/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[086/800], Step[0500/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0600/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[086/800], Step[0600/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[086/800], Step[0600/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[086/800], Step[0600/0626], Avg Loss: 0.6858
-INFO:master_logger:Epoch[086/800], Step[0600/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0600/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[086/800], Step[0600/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[086/800], Step[0600/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[086/800], Step[0600/0626], Avg Loss: 0.6858
-INFO:local_logger:----- Epoch[086/800], Train Loss: 0.6860, time: 859.95
-INFO:local_logger:Now training epoch 87. LR=0.000153
-INFO:local_logger:----- Epoch[086/800], Train Loss: 0.6859, time: 860.07
-INFO:local_logger:Now training epoch 87. LR=0.000153
-INFO:local_logger:----- Epoch[086/800], Train Loss: 0.6862, time: 856.92
-INFO:master_logger:----- Epoch[086/800], Train Loss: 0.6860, time: 856.92
-INFO:local_logger:----- Epoch[086/800], Train Loss: 0.6860, time: 861.39
-INFO:local_logger:Now training epoch 87. LR=0.000153
-INFO:local_logger:----- Epoch[086/800], Train Loss: 0.6862, time: 860.97
-INFO:local_logger:Now training epoch 87. LR=0.000153
-INFO:local_logger:----- Epoch[086/800], Train Loss: 0.6857, time: 860.38
-INFO:local_logger:Now training epoch 87. LR=0.000153
-INFO:local_logger:----- Epoch[086/800], Train Loss: 0.6858, time: 860.35
-INFO:local_logger:Now training epoch 87. LR=0.000153
-INFO:local_logger:----- Epoch[086/800], Train Loss: 0.6859, time: 860.35
-INFO:local_logger:Now training epoch 87. LR=0.000153
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-86-Loss-0.6862062182739747.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-86-Loss-0.6862062182739747.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-86-Loss-0.6862062182739747.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-86-Loss-0.6862062182739747.pdopt
-INFO:local_logger:Now training epoch 87. LR=0.000153
-INFO:master_logger:Now training epoch 87. LR=0.000153
-INFO:local_logger:Epoch[087/800], Step[0000/0626], Avg Loss: 0.6924
-INFO:local_logger:Epoch[087/800], Step[0000/0626], Avg Loss: 0.6883
-INFO:master_logger:Epoch[087/800], Step[0000/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0000/0626], Avg Loss: 0.6722
-INFO:local_logger:Epoch[087/800], Step[0000/0626], Avg Loss: 0.6880
-INFO:local_logger:Epoch[087/800], Step[0000/0626], Avg Loss: 0.6993
-INFO:local_logger:Epoch[087/800], Step[0000/0626], Avg Loss: 0.6908
-INFO:local_logger:Epoch[087/800], Step[0000/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[087/800], Step[0000/0626], Avg Loss: 0.6712
-INFO:local_logger:Epoch[087/800], Step[0100/0626], Avg Loss: 0.6852
-INFO:local_logger:Epoch[087/800], Step[0100/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0100/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[087/800], Step[0100/0626], Avg Loss: 0.6868
-INFO:local_logger:Epoch[087/800], Step[0100/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[087/800], Step[0100/0626], Avg Loss: 0.6862
-INFO:master_logger:Epoch[087/800], Step[0100/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[087/800], Step[0100/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0100/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[087/800], Step[0200/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[087/800], Step[0200/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0200/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[087/800], Step[0200/0626], Avg Loss: 0.6856
-INFO:local_logger:Epoch[087/800], Step[0200/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[087/800], Step[0200/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[087/800], Step[0200/0626], Avg Loss: 0.6864
-INFO:master_logger:Epoch[087/800], Step[0200/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[087/800], Step[0200/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0300/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0300/0626], Avg Loss: 0.6856
-INFO:local_logger:Epoch[087/800], Step[0300/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[087/800], Step[0300/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[087/800], Step[0300/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[087/800], Step[0300/0626], Avg Loss: 0.6860
-INFO:master_logger:Epoch[087/800], Step[0300/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[087/800], Step[0300/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0300/0626], Avg Loss: 0.6864
-INFO:local_logger:Epoch[087/800], Step[0400/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[087/800], Step[0400/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[087/800], Step[0400/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0400/0626], Avg Loss: 0.6852
-INFO:local_logger:Epoch[087/800], Step[0400/0626], Avg Loss: 0.6862
-INFO:local_logger:Epoch[087/800], Step[0400/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[087/800], Step[0400/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[087/800], Step[0400/0626], Avg Loss: 0.6858
-INFO:master_logger:Epoch[087/800], Step[0400/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0500/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0500/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[087/800], Step[0500/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[087/800], Step[0500/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[087/800], Step[0500/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[087/800], Step[0500/0626], Avg Loss: 0.6852
-INFO:master_logger:Epoch[087/800], Step[0500/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[087/800], Step[0500/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[087/800], Step[0500/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[087/800], Step[0600/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[087/800], Step[0600/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[087/800], Step[0600/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[087/800], Step[0600/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[087/800], Step[0600/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[087/800], Step[0600/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[087/800], Step[0600/0626], Avg Loss: 0.6858
-INFO:master_logger:Epoch[087/800], Step[0600/0626], Avg Loss: 0.6856
-INFO:local_logger:Epoch[087/800], Step[0600/0626], Avg Loss: 0.6852
-INFO:local_logger:----- Epoch[087/800], Train Loss: 0.6853, time: 895.08
-INFO:local_logger:Now training epoch 88. LR=0.000153
-INFO:local_logger:----- Epoch[087/800], Train Loss: 0.6855, time: 895.09
-INFO:local_logger:Now training epoch 88. LR=0.000153
-INFO:local_logger:----- Epoch[087/800], Train Loss: 0.6860, time: 895.73
-INFO:local_logger:Now training epoch 88. LR=0.000153
-INFO:local_logger:----- Epoch[087/800], Train Loss: 0.6852, time: 896.01
-INFO:local_logger:Now training epoch 88. LR=0.000153
-INFO:local_logger:----- Epoch[087/800], Train Loss: 0.6856, time: 895.77
-INFO:local_logger:----- Epoch[087/800], Train Loss: 0.6857, time: 896.11
-INFO:local_logger:Now training epoch 88. LR=0.000153
-INFO:local_logger:Now training epoch 88. LR=0.000153
-INFO:local_logger:----- Epoch[087/800], Train Loss: 0.6857, time: 892.15
-INFO:master_logger:----- Epoch[087/800], Train Loss: 0.6856, time: 892.15
-INFO:local_logger:----- Epoch[087/800], Train Loss: 0.6855, time: 895.77
-INFO:local_logger:Now training epoch 88. LR=0.000153
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-87-Loss-0.6857299549603768.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-87-Loss-0.6857299549603768.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-87-Loss-0.6857299549603768.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-87-Loss-0.6857299549603768.pdopt
-INFO:local_logger:Now training epoch 88. LR=0.000153
-INFO:master_logger:Now training epoch 88. LR=0.000153
-INFO:local_logger:Epoch[088/800], Step[0000/0626], Avg Loss: 0.6931
-INFO:local_logger:Epoch[088/800], Step[0000/0626], Avg Loss: 0.6886
-INFO:local_logger:Epoch[088/800], Step[0000/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[088/800], Step[0000/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[088/800], Step[0000/0626], Avg Loss: 0.6765
-INFO:local_logger:Epoch[088/800], Step[0000/0626], Avg Loss: 0.6795
-INFO:local_logger:Epoch[088/800], Step[0000/0626], Avg Loss: 0.6839
-INFO:master_logger:Epoch[088/800], Step[0000/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[088/800], Step[0000/0626], Avg Loss: 0.6800
-INFO:local_logger:Epoch[088/800], Step[0100/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[088/800], Step[0100/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[088/800], Step[0100/0626], Avg Loss: 0.6876
-INFO:local_logger:Epoch[088/800], Step[0100/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[088/800], Step[0100/0626], Avg Loss: 0.6868
-INFO:master_logger:Epoch[088/800], Step[0100/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[088/800], Step[0100/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[088/800], Step[0100/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[088/800], Step[0100/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[088/800], Step[0200/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[088/800], Step[0200/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[088/800], Step[0200/0626], Avg Loss: 0.6863
-INFO:local_logger:Epoch[088/800], Step[0200/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[088/800], Step[0200/0626], Avg Loss: 0.6852
-INFO:local_logger:Epoch[088/800], Step[0200/0626], Avg Loss: 0.6860
-INFO:master_logger:Epoch[088/800], Step[0200/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[088/800], Step[0200/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[088/800], Step[0200/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[088/800], Step[0300/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[088/800], Step[0300/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[088/800], Step[0300/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[088/800], Step[0300/0626], Avg Loss: 0.6861
-INFO:local_logger:Epoch[088/800], Step[0300/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[088/800], Step[0300/0626], Avg Loss: 0.6859
-INFO:master_logger:Epoch[088/800], Step[0300/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[088/800], Step[0300/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[088/800], Step[0300/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[088/800], Step[0400/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[088/800], Step[0400/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[088/800], Step[0400/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[088/800], Step[0400/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[088/800], Step[0400/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[088/800], Step[0400/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[088/800], Step[0400/0626], Avg Loss: 0.6859
-INFO:master_logger:Epoch[088/800], Step[0400/0626], Avg Loss: 0.6856
-INFO:local_logger:Epoch[088/800], Step[0400/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[088/800], Step[0500/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[088/800], Step[0500/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[088/800], Step[0500/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[088/800], Step[0500/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[088/800], Step[0500/0626], Avg Loss: 0.6857
-INFO:master_logger:Epoch[088/800], Step[0500/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[088/800], Step[0500/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[088/800], Step[0500/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[088/800], Step[0500/0626], Avg Loss: 0.6857
-INFO:local_logger:Epoch[088/800], Step[0600/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[088/800], Step[0600/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[088/800], Step[0600/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[088/800], Step[0600/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[088/800], Step[0600/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[088/800], Step[0600/0626], Avg Loss: 0.6854
-INFO:master_logger:Epoch[088/800], Step[0600/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[088/800], Step[0600/0626], Avg Loss: 0.6850
-INFO:local_logger:Epoch[088/800], Step[0600/0626], Avg Loss: 0.6853
-INFO:local_logger:----- Epoch[088/800], Train Loss: 0.6855, time: 854.47
-INFO:master_logger:----- Epoch[088/800], Train Loss: 0.6853, time: 854.47
-INFO:local_logger:----- Epoch[088/800], Train Loss: 0.6854, time: 859.85
-INFO:local_logger:Now training epoch 89. LR=0.000154
-INFO:local_logger:----- Epoch[088/800], Train Loss: 0.6849, time: 859.24
-INFO:local_logger:Now training epoch 89. LR=0.000154
-INFO:local_logger:----- Epoch[088/800], Train Loss: 0.6852, time: 859.32
-INFO:local_logger:Now training epoch 89. LR=0.000154
-INFO:local_logger:----- Epoch[088/800], Train Loss: 0.6854, time: 859.33
-INFO:local_logger:Now training epoch 89. LR=0.000154
-INFO:local_logger:----- Epoch[088/800], Train Loss: 0.6853, time: 859.35
-INFO:local_logger:----- Epoch[088/800], Train Loss: 0.6853, time: 859.33
-INFO:local_logger:Now training epoch 89. LR=0.000154
-INFO:local_logger:Now training epoch 89. LR=0.000154
-INFO:local_logger:----- Epoch[088/800], Train Loss: 0.6856, time: 860.00
-INFO:local_logger:Now training epoch 89. LR=0.000154
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-88-Loss-0.6854734038612285.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-88-Loss-0.6854734038612285.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-88-Loss-0.6854734038612285.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-88-Loss-0.6854734038612285.pdopt
-INFO:local_logger:Now training epoch 89. LR=0.000154
-INFO:master_logger:Now training epoch 89. LR=0.000154
-INFO:local_logger:Epoch[089/800], Step[0000/0626], Avg Loss: 0.6780
-INFO:local_logger:Epoch[089/800], Step[0000/0626], Avg Loss: 0.6912
-INFO:local_logger:Epoch[089/800], Step[0000/0626], Avg Loss: 0.6884
-INFO:local_logger:Epoch[089/800], Step[0000/0626], Avg Loss: 0.6960
-INFO:local_logger:Epoch[089/800], Step[0000/0626], Avg Loss: 0.6791
-INFO:local_logger:Epoch[089/800], Step[0000/0626], Avg Loss: 0.6747
-INFO:local_logger:Epoch[089/800], Step[0000/0626], Avg Loss: 0.6772
-INFO:local_logger:Epoch[089/800], Step[0000/0626], Avg Loss: 0.6825
-INFO:master_logger:Epoch[089/800], Step[0000/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[089/800], Step[0100/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[089/800], Step[0100/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[089/800], Step[0100/0626], Avg Loss: 0.6856
-INFO:local_logger:Epoch[089/800], Step[0100/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[089/800], Step[0100/0626], Avg Loss: 0.6850
-INFO:master_logger:Epoch[089/800], Step[0100/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[089/800], Step[0100/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[089/800], Step[0100/0626], Avg Loss: 0.6848
-INFO:local_logger:Epoch[089/800], Step[0100/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[089/800], Step[0200/0626], Avg Loss: 0.6851
-INFO:local_logger:Epoch[089/800], Step[0200/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[089/800], Step[0200/0626], Avg Loss: 0.6846
-INFO:local_logger:Epoch[089/800], Step[0200/0626], Avg Loss: 0.6852
-INFO:local_logger:Epoch[089/800], Step[0200/0626], Avg Loss: 0.6844
-INFO:master_logger:Epoch[089/800], Step[0200/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[089/800], Step[0200/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[089/800], Step[0200/0626], Avg Loss: 0.6852
-INFO:local_logger:Epoch[089/800], Step[0200/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[089/800], Step[0300/0626], Avg Loss: 0.6854
-INFO:local_logger:Epoch[089/800], Step[0300/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[089/800], Step[0300/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[089/800], Step[0300/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[089/800], Step[0300/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[089/800], Step[0300/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[089/800], Step[0300/0626], Avg Loss: 0.6852
-INFO:local_logger:Epoch[089/800], Step[0300/0626], Avg Loss: 0.6847
-INFO:master_logger:Epoch[089/800], Step[0300/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[089/800], Step[0400/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[089/800], Step[0400/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[089/800], Step[0400/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[089/800], Step[0400/0626], Avg Loss: 0.6850
-INFO:local_logger:Epoch[089/800], Step[0400/0626], Avg Loss: 0.6854
-INFO:master_logger:Epoch[089/800], Step[0400/0626], Avg Loss: 0.6850
-INFO:local_logger:Epoch[089/800], Step[0400/0626], Avg Loss: 0.6851
-INFO:local_logger:Epoch[089/800], Step[0400/0626], Avg Loss: 0.6856
-INFO:local_logger:Epoch[089/800], Step[0400/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[089/800], Step[0500/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[089/800], Step[0500/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[089/800], Step[0500/0626], Avg Loss: 0.6851
-INFO:local_logger:Epoch[089/800], Step[0500/0626], Avg Loss: 0.6848
-INFO:local_logger:Epoch[089/800], Step[0500/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[089/800], Step[0500/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[089/800], Step[0500/0626], Avg Loss: 0.6849
-INFO:master_logger:Epoch[089/800], Step[0500/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[089/800], Step[0500/0626], Avg Loss: 0.6850
-INFO:local_logger:Epoch[089/800], Step[0600/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[089/800], Step[0600/0626], Avg Loss: 0.6850
-INFO:local_logger:Epoch[089/800], Step[0600/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[089/800], Step[0600/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[089/800], Step[0600/0626], Avg Loss: 0.6845
-INFO:master_logger:Epoch[089/800], Step[0600/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[089/800], Step[0600/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[089/800], Step[0600/0626], Avg Loss: 0.6848
-INFO:local_logger:Epoch[089/800], Step[0600/0626], Avg Loss: 0.6849
-INFO:local_logger:----- Epoch[089/800], Train Loss: 0.6845, time: 884.60
-INFO:local_logger:Now training epoch 90. LR=0.000154
-INFO:local_logger:----- Epoch[089/800], Train Loss: 0.6849, time: 885.46
-INFO:local_logger:Now training epoch 90. LR=0.000154
-INFO:local_logger:----- Epoch[089/800], Train Loss: 0.6846, time: 882.71
-INFO:master_logger:----- Epoch[089/800], Train Loss: 0.6848, time: 882.71
-INFO:local_logger:----- Epoch[089/800], Train Loss: 0.6850, time: 885.49
-INFO:local_logger:Now training epoch 90. LR=0.000154
-INFO:local_logger:----- Epoch[089/800], Train Loss: 0.6845, time: 885.58
-INFO:local_logger:Now training epoch 90. LR=0.000154
-INFO:local_logger:----- Epoch[089/800], Train Loss: 0.6848, time: 885.61
-INFO:local_logger:Now training epoch 90. LR=0.000154
-INFO:local_logger:----- Epoch[089/800], Train Loss: 0.6849, time: 885.54
-INFO:local_logger:Now training epoch 90. LR=0.000154
-INFO:local_logger:----- Epoch[089/800], Train Loss: 0.6849, time: 885.50
-INFO:local_logger:Now training epoch 90. LR=0.000154
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-89-Loss-0.6845652029200572.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-89-Loss-0.6845652029200572.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-89-Loss-0.6845652029200572.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-89-Loss-0.6845652029200572.pdopt
-INFO:local_logger:Now training epoch 90. LR=0.000154
-INFO:master_logger:Now training epoch 90. LR=0.000154
-INFO:local_logger:Epoch[090/800], Step[0000/0626], Avg Loss: 0.6924
-INFO:local_logger:Epoch[090/800], Step[0000/0626], Avg Loss: 0.6899
-INFO:local_logger:Epoch[090/800], Step[0000/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[090/800], Step[0000/0626], Avg Loss: 0.6763
-INFO:local_logger:Epoch[090/800], Step[0000/0626], Avg Loss: 0.6748
-INFO:master_logger:Epoch[090/800], Step[0000/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[090/800], Step[0000/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[090/800], Step[0000/0626], Avg Loss: 0.6787
-INFO:local_logger:Epoch[090/800], Step[0000/0626], Avg Loss: 0.6883
-INFO:local_logger:Epoch[090/800], Step[0100/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[090/800], Step[0100/0626], Avg Loss: 0.6851
-INFO:local_logger:Epoch[090/800], Step[0100/0626], Avg Loss: 0.6853
-INFO:master_logger:Epoch[090/800], Step[0100/0626], Avg Loss: 0.6846
-INFO:local_logger:Epoch[090/800], Step[0100/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[090/800], Step[0100/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[090/800], Step[0100/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[090/800], Step[0100/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[090/800], Step[0100/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[090/800], Step[0200/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[090/800], Step[0200/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[090/800], Step[0200/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[090/800], Step[0200/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[090/800], Step[0200/0626], Avg Loss: 0.6850
-INFO:master_logger:Epoch[090/800], Step[0200/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[090/800], Step[0200/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[090/800], Step[0200/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[090/800], Step[0200/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[090/800], Step[0300/0626], Avg Loss: 0.6851
-INFO:local_logger:Epoch[090/800], Step[0300/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[090/800], Step[0300/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[090/800], Step[0300/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[090/800], Step[0300/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[090/800], Step[0300/0626], Avg Loss: 0.6848
-INFO:local_logger:Epoch[090/800], Step[0300/0626], Avg Loss: 0.6841
-INFO:master_logger:Epoch[090/800], Step[0300/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[090/800], Step[0300/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[090/800], Step[0400/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[090/800], Step[0400/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[090/800], Step[0400/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[090/800], Step[0400/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[090/800], Step[0400/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[090/800], Step[0400/0626], Avg Loss: 0.6845
-INFO:master_logger:Epoch[090/800], Step[0400/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[090/800], Step[0400/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[090/800], Step[0400/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[090/800], Step[0500/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[090/800], Step[0500/0626], Avg Loss: 0.6846
-INFO:local_logger:Epoch[090/800], Step[0500/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[090/800], Step[0500/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[090/800], Step[0500/0626], Avg Loss: 0.6842
-INFO:master_logger:Epoch[090/800], Step[0500/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[090/800], Step[0500/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[090/800], Step[0500/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[090/800], Step[0500/0626], Avg Loss: 0.6846
-INFO:local_logger:Epoch[090/800], Step[0600/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[090/800], Step[0600/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[090/800], Step[0600/0626], Avg Loss: 0.6842
-INFO:master_logger:Epoch[090/800], Step[0600/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[090/800], Step[0600/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[090/800], Step[0600/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[090/800], Step[0600/0626], Avg Loss: 0.6846
-INFO:local_logger:Epoch[090/800], Step[0600/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[090/800], Step[0600/0626], Avg Loss: 0.6845
-INFO:local_logger:----- Epoch[090/800], Train Loss: 0.6845, time: 851.60
-INFO:local_logger:Now training epoch 91. LR=0.000154
-INFO:local_logger:----- Epoch[090/800], Train Loss: 0.6839, time: 851.05
-INFO:local_logger:Now training epoch 91. LR=0.000154
-INFO:local_logger:----- Epoch[090/800], Train Loss: 0.6844, time: 851.17
-INFO:local_logger:Now training epoch 91. LR=0.000154
-INFO:local_logger:----- Epoch[090/800], Train Loss: 0.6844, time: 851.26
-INFO:local_logger:Now training epoch 91. LR=0.000154
-INFO:local_logger:----- Epoch[090/800], Train Loss: 0.6847, time: 851.26
-INFO:local_logger:Now training epoch 91. LR=0.000154
-INFO:local_logger:----- Epoch[090/800], Train Loss: 0.6844, time: 851.23
-INFO:local_logger:----- Epoch[090/800], Train Loss: 0.6841, time: 847.55
-INFO:local_logger:Now training epoch 91. LR=0.000154
-INFO:master_logger:----- Epoch[090/800], Train Loss: 0.6843, time: 847.55
-INFO:local_logger:----- Epoch[090/800], Train Loss: 0.6840, time: 851.24
-INFO:local_logger:Now training epoch 91. LR=0.000154
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-90-Loss-0.6841203801495528.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-90-Loss-0.6841203801495528.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-90-Loss-0.6841203801495528.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-90-Loss-0.6841203801495528.pdopt
-INFO:local_logger:Now training epoch 91. LR=0.000154
-INFO:master_logger:Now training epoch 91. LR=0.000154
-INFO:local_logger:Epoch[091/800], Step[0000/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[091/800], Step[0000/0626], Avg Loss: 0.6872
-INFO:local_logger:Epoch[091/800], Step[0000/0626], Avg Loss: 0.6756
-INFO:master_logger:Epoch[091/800], Step[0000/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[091/800], Step[0000/0626], Avg Loss: 0.6859
-INFO:local_logger:Epoch[091/800], Step[0000/0626], Avg Loss: 0.6748
-INFO:local_logger:Epoch[091/800], Step[0000/0626], Avg Loss: 0.6793
-INFO:local_logger:Epoch[091/800], Step[0000/0626], Avg Loss: 0.6961
-INFO:local_logger:Epoch[091/800], Step[0000/0626], Avg Loss: 0.6897
-INFO:local_logger:Epoch[091/800], Step[0100/0626], Avg Loss: 0.6846
-INFO:local_logger:Epoch[091/800], Step[0100/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[091/800], Step[0100/0626], Avg Loss: 0.6833
-INFO:local_logger:Epoch[091/800], Step[0100/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[091/800], Step[0100/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[091/800], Step[0100/0626], Avg Loss: 0.6833
-INFO:master_logger:Epoch[091/800], Step[0100/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[091/800], Step[0100/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[091/800], Step[0100/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[091/800], Step[0200/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[091/800], Step[0200/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[091/800], Step[0200/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[091/800], Step[0200/0626], Avg Loss: 0.6833
-INFO:local_logger:Epoch[091/800], Step[0200/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[091/800], Step[0200/0626], Avg Loss: 0.6846
-INFO:local_logger:Epoch[091/800], Step[0200/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[091/800], Step[0200/0626], Avg Loss: 0.6837
-INFO:master_logger:Epoch[091/800], Step[0200/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[091/800], Step[0300/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[091/800], Step[0300/0626], Avg Loss: 0.6850
-INFO:local_logger:Epoch[091/800], Step[0300/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[091/800], Step[0300/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[091/800], Step[0300/0626], Avg Loss: 0.6834
-INFO:master_logger:Epoch[091/800], Step[0300/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[091/800], Step[0300/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[091/800], Step[0300/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[091/800], Step[0300/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[091/800], Step[0400/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[091/800], Step[0400/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[091/800], Step[0400/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[091/800], Step[0400/0626], Avg Loss: 0.6847
-INFO:master_logger:Epoch[091/800], Step[0400/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[091/800], Step[0400/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[091/800], Step[0400/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[091/800], Step[0400/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[091/800], Step[0400/0626], Avg Loss: 0.6848
-INFO:local_logger:Epoch[091/800], Step[0500/0626], Avg Loss: 0.6846
-INFO:local_logger:Epoch[091/800], Step[0500/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[091/800], Step[0500/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[091/800], Step[0500/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[091/800], Step[0500/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[091/800], Step[0500/0626], Avg Loss: 0.6840
-INFO:master_logger:Epoch[091/800], Step[0500/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[091/800], Step[0500/0626], Avg Loss: 0.6846
-INFO:local_logger:Epoch[091/800], Step[0500/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[091/800], Step[0600/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[091/800], Step[0600/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[091/800], Step[0600/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[091/800], Step[0600/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[091/800], Step[0600/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[091/800], Step[0600/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[091/800], Step[0600/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[091/800], Step[0600/0626], Avg Loss: 0.6842
-INFO:master_logger:Epoch[091/800], Step[0600/0626], Avg Loss: 0.6842
-INFO:local_logger:----- Epoch[091/800], Train Loss: 0.6843, time: 886.91
-INFO:local_logger:Now training epoch 92. LR=0.000154
-INFO:local_logger:----- Epoch[091/800], Train Loss: 0.6838, time: 886.90
-INFO:local_logger:Now training epoch 92. LR=0.000154
-INFO:local_logger:----- Epoch[091/800], Train Loss: 0.6848, time: 887.37
-INFO:local_logger:Now training epoch 92. LR=0.000154
-INFO:local_logger:----- Epoch[091/800], Train Loss: 0.6844, time: 887.49
-INFO:local_logger:Now training epoch 92. LR=0.000154
-INFO:local_logger:----- Epoch[091/800], Train Loss: 0.6842, time: 887.83
-INFO:local_logger:Now training epoch 92. LR=0.000154
-INFO:local_logger:----- Epoch[091/800], Train Loss: 0.6839, time: 887.30
-INFO:local_logger:Now training epoch 92. LR=0.000154
-INFO:local_logger:----- Epoch[091/800], Train Loss: 0.6839, time: 883.60
-INFO:master_logger:----- Epoch[091/800], Train Loss: 0.6841, time: 883.60
-INFO:local_logger:----- Epoch[091/800], Train Loss: 0.6838, time: 887.32
-INFO:local_logger:Now training epoch 92. LR=0.000154
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-91-Loss-0.6838902651592956.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-91-Loss-0.6838902651592956.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-91-Loss-0.6838902651592956.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-91-Loss-0.6838902651592956.pdopt
-INFO:local_logger:Now training epoch 92. LR=0.000154
-INFO:master_logger:Now training epoch 92. LR=0.000154
-INFO:local_logger:Epoch[092/800], Step[0000/0626], Avg Loss: 0.6957
-INFO:local_logger:Epoch[092/800], Step[0000/0626], Avg Loss: 0.6676
-INFO:local_logger:Epoch[092/800], Step[0000/0626], Avg Loss: 0.6676
-INFO:local_logger:Epoch[092/800], Step[0000/0626], Avg Loss: 0.6866
-INFO:master_logger:Epoch[092/800], Step[0000/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[092/800], Step[0000/0626], Avg Loss: 0.6922
-INFO:local_logger:Epoch[092/800], Step[0000/0626], Avg Loss: 0.6936
-INFO:local_logger:Epoch[092/800], Step[0000/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[092/800], Step[0000/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[092/800], Step[0100/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[092/800], Step[0100/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[092/800], Step[0100/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[092/800], Step[0100/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[092/800], Step[0100/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[092/800], Step[0100/0626], Avg Loss: 0.6840
-INFO:master_logger:Epoch[092/800], Step[0100/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[092/800], Step[0100/0626], Avg Loss: 0.6850
-INFO:local_logger:Epoch[092/800], Step[0100/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[092/800], Step[0200/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[092/800], Step[0200/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[092/800], Step[0200/0626], Avg Loss: 0.6831
-INFO:master_logger:Epoch[092/800], Step[0200/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[092/800], Step[0200/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[092/800], Step[0200/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[092/800], Step[0200/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[092/800], Step[0200/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[092/800], Step[0200/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[092/800], Step[0300/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[092/800], Step[0300/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[092/800], Step[0300/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[092/800], Step[0300/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[092/800], Step[0300/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[092/800], Step[0300/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[092/800], Step[0300/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[092/800], Step[0300/0626], Avg Loss: 0.6839
-INFO:master_logger:Epoch[092/800], Step[0300/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[092/800], Step[0400/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[092/800], Step[0400/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[092/800], Step[0400/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[092/800], Step[0400/0626], Avg Loss: 0.6841
-INFO:master_logger:Epoch[092/800], Step[0400/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[092/800], Step[0400/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[092/800], Step[0400/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[092/800], Step[0400/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[092/800], Step[0400/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[092/800], Step[0500/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[092/800], Step[0500/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[092/800], Step[0500/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[092/800], Step[0500/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[092/800], Step[0500/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[092/800], Step[0500/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[092/800], Step[0500/0626], Avg Loss: 0.6840
-INFO:master_logger:Epoch[092/800], Step[0500/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[092/800], Step[0500/0626], Avg Loss: 0.6842
-INFO:local_logger:Epoch[092/800], Step[0600/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[092/800], Step[0600/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[092/800], Step[0600/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[092/800], Step[0600/0626], Avg Loss: 0.6839
-INFO:master_logger:Epoch[092/800], Step[0600/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[092/800], Step[0600/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[092/800], Step[0600/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[092/800], Step[0600/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[092/800], Step[0600/0626], Avg Loss: 0.6836
-INFO:local_logger:----- Epoch[092/800], Train Loss: 0.6839, time: 858.55
-INFO:local_logger:Now training epoch 93. LR=0.000154
-INFO:local_logger:----- Epoch[092/800], Train Loss: 0.6839, time: 858.56
-INFO:local_logger:Now training epoch 93. LR=0.000154
-INFO:local_logger:----- Epoch[092/800], Train Loss: 0.6836, time: 858.97
-INFO:local_logger:Now training epoch 93. LR=0.000154
-INFO:local_logger:----- Epoch[092/800], Train Loss: 0.6833, time: 858.75
-INFO:local_logger:Now training epoch 93. LR=0.000154
-INFO:local_logger:----- Epoch[092/800], Train Loss: 0.6837, time: 858.76
-INFO:local_logger:Now training epoch 93. LR=0.000154
-INFO:local_logger:----- Epoch[092/800], Train Loss: 0.6835, time: 858.85
-INFO:local_logger:Now training epoch 93. LR=0.000154
-INFO:local_logger:----- Epoch[092/800], Train Loss: 0.6841, time: 855.23
-INFO:master_logger:----- Epoch[092/800], Train Loss: 0.6837, time: 855.23
-INFO:local_logger:----- Epoch[092/800], Train Loss: 0.6836, time: 859.34
-INFO:local_logger:Now training epoch 93. LR=0.000154
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-92-Loss-0.6840875268930822.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-92-Loss-0.6840875268930822.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-92-Loss-0.6840875268930822.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-92-Loss-0.6840875268930822.pdopt
-INFO:local_logger:Now training epoch 93. LR=0.000154
-INFO:master_logger:Now training epoch 93. LR=0.000154
-INFO:local_logger:Epoch[093/800], Step[0000/0626], Avg Loss: 0.6662
-INFO:local_logger:Epoch[093/800], Step[0000/0626], Avg Loss: 0.6933
-INFO:local_logger:Epoch[093/800], Step[0000/0626], Avg Loss: 0.6938
-INFO:local_logger:Epoch[093/800], Step[0000/0626], Avg Loss: 0.6859
-INFO:master_logger:Epoch[093/800], Step[0000/0626], Avg Loss: 0.6846
-INFO:local_logger:Epoch[093/800], Step[0000/0626], Avg Loss: 0.6770
-INFO:local_logger:Epoch[093/800], Step[0000/0626], Avg Loss: 0.6934
-INFO:local_logger:Epoch[093/800], Step[0000/0626], Avg Loss: 0.6802
-INFO:local_logger:Epoch[093/800], Step[0000/0626], Avg Loss: 0.6873
-INFO:local_logger:Epoch[093/800], Step[0100/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[093/800], Step[0100/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[093/800], Step[0100/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[093/800], Step[0100/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[093/800], Step[0100/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[093/800], Step[0100/0626], Avg Loss: 0.6848
-INFO:local_logger:Epoch[093/800], Step[0100/0626], Avg Loss: 0.6832
-INFO:master_logger:Epoch[093/800], Step[0100/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[093/800], Step[0100/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[093/800], Step[0200/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[093/800], Step[0200/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[093/800], Step[0200/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[093/800], Step[0200/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[093/800], Step[0200/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[093/800], Step[0200/0626], Avg Loss: 0.6835
-INFO:master_logger:Epoch[093/800], Step[0200/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[093/800], Step[0200/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[093/800], Step[0200/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[093/800], Step[0300/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[093/800], Step[0300/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[093/800], Step[0300/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[093/800], Step[0300/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[093/800], Step[0300/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[093/800], Step[0300/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[093/800], Step[0300/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[093/800], Step[0300/0626], Avg Loss: 0.6831
-INFO:master_logger:Epoch[093/800], Step[0300/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[093/800], Step[0400/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[093/800], Step[0400/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[093/800], Step[0400/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[093/800], Step[0400/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[093/800], Step[0400/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[093/800], Step[0400/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[093/800], Step[0400/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[093/800], Step[0400/0626], Avg Loss: 0.6830
-INFO:master_logger:Epoch[093/800], Step[0400/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[093/800], Step[0500/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[093/800], Step[0500/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[093/800], Step[0500/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[093/800], Step[0500/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[093/800], Step[0500/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[093/800], Step[0500/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[093/800], Step[0500/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[093/800], Step[0500/0626], Avg Loss: 0.6830
-INFO:master_logger:Epoch[093/800], Step[0500/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[093/800], Step[0600/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[093/800], Step[0600/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[093/800], Step[0600/0626], Avg Loss: 0.6833
-INFO:local_logger:Epoch[093/800], Step[0600/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[093/800], Step[0600/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[093/800], Step[0600/0626], Avg Loss: 0.6827
-INFO:master_logger:Epoch[093/800], Step[0600/0626], Avg Loss: 0.6833
-INFO:local_logger:Epoch[093/800], Step[0600/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[093/800], Step[0600/0626], Avg Loss: 0.6835
-INFO:local_logger:----- Epoch[093/800], Train Loss: 0.6837, time: 883.02
-INFO:local_logger:Now training epoch 94. LR=0.000154
-INFO:local_logger:----- Epoch[093/800], Train Loss: 0.6832, time: 882.93
-INFO:local_logger:Now training epoch 94. LR=0.000154
-INFO:local_logger:----- Epoch[093/800], Train Loss: 0.6827, time: 879.60
-INFO:master_logger:----- Epoch[093/800], Train Loss: 0.6834, time: 879.60
-INFO:local_logger:----- Epoch[093/800], Train Loss: 0.6832, time: 883.71
-INFO:local_logger:Now training epoch 94. LR=0.000154
-INFO:local_logger:----- Epoch[093/800], Train Loss: 0.6834, time: 883.93
-INFO:local_logger:Now training epoch 94. LR=0.000154
-INFO:local_logger:----- Epoch[093/800], Train Loss: 0.6837, time: 883.92
-INFO:local_logger:Now training epoch 94. LR=0.000154
-INFO:local_logger:----- Epoch[093/800], Train Loss: 0.6836, time: 884.00
-INFO:local_logger:Now training epoch 94. LR=0.000154
-INFO:local_logger:----- Epoch[093/800], Train Loss: 0.6834, time: 883.71
-INFO:local_logger:Now training epoch 94. LR=0.000154
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-93-Loss-0.6827037774682657.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-93-Loss-0.6827037774682657.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-93-Loss-0.6827037774682657.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-93-Loss-0.6827037774682657.pdopt
-INFO:local_logger:Now training epoch 94. LR=0.000154
-INFO:master_logger:Now training epoch 94. LR=0.000154
-INFO:local_logger:Epoch[094/800], Step[0000/0626], Avg Loss: 0.6799
-INFO:local_logger:Epoch[094/800], Step[0000/0626], Avg Loss: 0.6772
-INFO:local_logger:Epoch[094/800], Step[0000/0626], Avg Loss: 0.6722
-INFO:master_logger:Epoch[094/800], Step[0000/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[094/800], Step[0000/0626], Avg Loss: 0.6904
-INFO:local_logger:Epoch[094/800], Step[0000/0626], Avg Loss: 0.6963
-INFO:local_logger:Epoch[094/800], Step[0000/0626], Avg Loss: 0.6866
-INFO:local_logger:Epoch[094/800], Step[0000/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[094/800], Step[0000/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[094/800], Step[0100/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[094/800], Step[0100/0626], Avg Loss: 0.6834
-INFO:master_logger:Epoch[094/800], Step[0100/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[094/800], Step[0100/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[094/800], Step[0100/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[094/800], Step[0100/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[094/800], Step[0100/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[094/800], Step[0100/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[094/800], Step[0100/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[094/800], Step[0200/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[094/800], Step[0200/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[094/800], Step[0200/0626], Avg Loss: 0.6847
-INFO:local_logger:Epoch[094/800], Step[0200/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[094/800], Step[0200/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[094/800], Step[0200/0626], Avg Loss: 0.6828
-INFO:master_logger:Epoch[094/800], Step[0200/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[094/800], Step[0200/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[094/800], Step[0200/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[094/800], Step[0300/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[094/800], Step[0300/0626], Avg Loss: 0.6849
-INFO:local_logger:Epoch[094/800], Step[0300/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[094/800], Step[0300/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[094/800], Step[0300/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[094/800], Step[0300/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[094/800], Step[0300/0626], Avg Loss: 0.6828
-INFO:master_logger:Epoch[094/800], Step[0300/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[094/800], Step[0300/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[094/800], Step[0400/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[094/800], Step[0400/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[094/800], Step[0400/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[094/800], Step[0400/0626], Avg Loss: 0.6845
-INFO:local_logger:Epoch[094/800], Step[0400/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[094/800], Step[0400/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[094/800], Step[0400/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[094/800], Step[0400/0626], Avg Loss: 0.6831
-INFO:master_logger:Epoch[094/800], Step[0400/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[094/800], Step[0500/0626], Avg Loss: 0.6833
-INFO:local_logger:Epoch[094/800], Step[0500/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[094/800], Step[0500/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[094/800], Step[0500/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[094/800], Step[0500/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[094/800], Step[0500/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[094/800], Step[0500/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[094/800], Step[0500/0626], Avg Loss: 0.6832
-INFO:master_logger:Epoch[094/800], Step[0500/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[094/800], Step[0600/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[094/800], Step[0600/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[094/800], Step[0600/0626], Avg Loss: 0.6840
-INFO:local_logger:Epoch[094/800], Step[0600/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[094/800], Step[0600/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[094/800], Step[0600/0626], Avg Loss: 0.6828
-INFO:master_logger:Epoch[094/800], Step[0600/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[094/800], Step[0600/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[094/800], Step[0600/0626], Avg Loss: 0.6825
-INFO:local_logger:----- Epoch[094/800], Train Loss: 0.6835, time: 860.64
-INFO:local_logger:Now training epoch 95. LR=0.000155
-INFO:local_logger:----- Epoch[094/800], Train Loss: 0.6829, time: 861.10
-INFO:local_logger:Now training epoch 95. LR=0.000155
-INFO:local_logger:----- Epoch[094/800], Train Loss: 0.6834, time: 861.04
-INFO:local_logger:Now training epoch 95. LR=0.000155
-INFO:local_logger:----- Epoch[094/800], Train Loss: 0.6827, time: 861.88
-INFO:local_logger:Now training epoch 95. LR=0.000155
-INFO:local_logger:----- Epoch[094/800], Train Loss: 0.6838, time: 861.14
-INFO:local_logger:Now training epoch 95. LR=0.000155
-INFO:local_logger:----- Epoch[094/800], Train Loss: 0.6830, time: 857.69
-INFO:master_logger:----- Epoch[094/800], Train Loss: 0.6831, time: 857.69
-INFO:local_logger:----- Epoch[094/800], Train Loss: 0.6825, time: 861.06
-INFO:local_logger:Now training epoch 95. LR=0.000155
-INFO:local_logger:----- Epoch[094/800], Train Loss: 0.6830, time: 861.17
-INFO:local_logger:Now training epoch 95. LR=0.000155
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-94-Loss-0.6830039001247509.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-94-Loss-0.6830039001247509.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-94-Loss-0.6830039001247509.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-94-Loss-0.6830039001247509.pdopt
-INFO:local_logger:Now training epoch 95. LR=0.000155
-INFO:master_logger:Now training epoch 95. LR=0.000155
-INFO:local_logger:Epoch[095/800], Step[0000/0626], Avg Loss: 0.6832
-INFO:master_logger:Epoch[095/800], Step[0000/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[095/800], Step[0000/0626], Avg Loss: 0.6771
-INFO:local_logger:Epoch[095/800], Step[0000/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[095/800], Step[0000/0626], Avg Loss: 0.6798
-INFO:local_logger:Epoch[095/800], Step[0000/0626], Avg Loss: 0.6706
-INFO:local_logger:Epoch[095/800], Step[0000/0626], Avg Loss: 0.6905
-INFO:local_logger:Epoch[095/800], Step[0000/0626], Avg Loss: 0.6843
-INFO:local_logger:Epoch[095/800], Step[0000/0626], Avg Loss: 0.6855
-INFO:local_logger:Epoch[095/800], Step[0100/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[095/800], Step[0100/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[095/800], Step[0100/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[095/800], Step[0100/0626], Avg Loss: 0.6826
-INFO:master_logger:Epoch[095/800], Step[0100/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[095/800], Step[0100/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[095/800], Step[0100/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[095/800], Step[0100/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[095/800], Step[0100/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[095/800], Step[0200/0626], Avg Loss: 0.6833
-INFO:local_logger:Epoch[095/800], Step[0200/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[095/800], Step[0200/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[095/800], Step[0200/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[095/800], Step[0200/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[095/800], Step[0200/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[095/800], Step[0200/0626], Avg Loss: 0.6829
-INFO:master_logger:Epoch[095/800], Step[0200/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[095/800], Step[0200/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[095/800], Step[0300/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[095/800], Step[0300/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[095/800], Step[0300/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[095/800], Step[0300/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[095/800], Step[0300/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[095/800], Step[0300/0626], Avg Loss: 0.6833
-INFO:local_logger:Epoch[095/800], Step[0300/0626], Avg Loss: 0.6820
-INFO:master_logger:Epoch[095/800], Step[0300/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[095/800], Step[0300/0626], Avg Loss: 0.6835
-INFO:local_logger:Epoch[095/800], Step[0400/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[095/800], Step[0400/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[095/800], Step[0400/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[095/800], Step[0400/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[095/800], Step[0400/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[095/800], Step[0400/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[095/800], Step[0400/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[095/800], Step[0400/0626], Avg Loss: 0.6826
-INFO:master_logger:Epoch[095/800], Step[0400/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[095/800], Step[0500/0626], Avg Loss: 0.6832
-INFO:local_logger:Epoch[095/800], Step[0500/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[095/800], Step[0500/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[095/800], Step[0500/0626], Avg Loss: 0.6828
-INFO:master_logger:Epoch[095/800], Step[0500/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[095/800], Step[0500/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[095/800], Step[0500/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[095/800], Step[0500/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[095/800], Step[0500/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[095/800], Step[0600/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[095/800], Step[0600/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[095/800], Step[0600/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[095/800], Step[0600/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[095/800], Step[0600/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[095/800], Step[0600/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[095/800], Step[0600/0626], Avg Loss: 0.6825
-INFO:master_logger:Epoch[095/800], Step[0600/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[095/800], Step[0600/0626], Avg Loss: 0.6826
-INFO:local_logger:----- Epoch[095/800], Train Loss: 0.6831, time: 887.46
-INFO:local_logger:Now training epoch 96. LR=0.000155
-INFO:local_logger:----- Epoch[095/800], Train Loss: 0.6830, time: 886.49
-INFO:local_logger:Now training epoch 96. LR=0.000155
-INFO:local_logger:----- Epoch[095/800], Train Loss: 0.6825, time: 886.66
-INFO:local_logger:Now training epoch 96. LR=0.000155
-INFO:local_logger:----- Epoch[095/800], Train Loss: 0.6825, time: 886.64
-INFO:local_logger:Now training epoch 96. LR=0.000155
-INFO:local_logger:----- Epoch[095/800], Train Loss: 0.6829, time: 886.77
-INFO:local_logger:Now training epoch 96. LR=0.000155
-INFO:local_logger:----- Epoch[095/800], Train Loss: 0.6826, time: 886.77
-INFO:local_logger:Now training epoch 96. LR=0.000155
-INFO:local_logger:----- Epoch[095/800], Train Loss: 0.6826, time: 883.07
-INFO:master_logger:----- Epoch[095/800], Train Loss: 0.6827, time: 883.07
-INFO:local_logger:----- Epoch[095/800], Train Loss: 0.6827, time: 886.80
-INFO:local_logger:Now training epoch 96. LR=0.000155
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-95-Loss-0.6825694100624208.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-95-Loss-0.6825694100624208.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-95-Loss-0.6825694100624208.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-95-Loss-0.6825694100624208.pdopt
-INFO:local_logger:Now training epoch 96. LR=0.000155
-INFO:master_logger:Now training epoch 96. LR=0.000155
-INFO:local_logger:Epoch[096/800], Step[0000/0626], Avg Loss: 0.6925
-INFO:local_logger:Epoch[096/800], Step[0000/0626], Avg Loss: 0.6757
-INFO:local_logger:Epoch[096/800], Step[0000/0626], Avg Loss: 0.6684
-INFO:master_logger:Epoch[096/800], Step[0000/0626], Avg Loss: 0.6799
-INFO:local_logger:Epoch[096/800], Step[0000/0626], Avg Loss: 0.6804
-INFO:local_logger:Epoch[096/800], Step[0000/0626], Avg Loss: 0.6683
-INFO:local_logger:Epoch[096/800], Step[0000/0626], Avg Loss: 0.6801
-INFO:local_logger:Epoch[096/800], Step[0000/0626], Avg Loss: 0.6900
-INFO:local_logger:Epoch[096/800], Step[0000/0626], Avg Loss: 0.6837
-INFO:local_logger:Epoch[096/800], Step[0100/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[096/800], Step[0100/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[096/800], Step[0100/0626], Avg Loss: 0.6836
-INFO:master_logger:Epoch[096/800], Step[0100/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[096/800], Step[0100/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[096/800], Step[0100/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[096/800], Step[0100/0626], Avg Loss: 0.6841
-INFO:local_logger:Epoch[096/800], Step[0100/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[096/800], Step[0100/0626], Avg Loss: 0.6813
-INFO:local_logger:Epoch[096/800], Step[0200/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[096/800], Step[0200/0626], Avg Loss: 0.6816
-INFO:local_logger:Epoch[096/800], Step[0200/0626], Avg Loss: 0.6816
-INFO:local_logger:Epoch[096/800], Step[0200/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[096/800], Step[0200/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[096/800], Step[0200/0626], Avg Loss: 0.6825
-INFO:master_logger:Epoch[096/800], Step[0200/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[096/800], Step[0200/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[096/800], Step[0200/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[096/800], Step[0300/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[096/800], Step[0300/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[096/800], Step[0300/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[096/800], Step[0300/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[096/800], Step[0300/0626], Avg Loss: 0.6819
-INFO:master_logger:Epoch[096/800], Step[0300/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[096/800], Step[0300/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[096/800], Step[0300/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[096/800], Step[0300/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[096/800], Step[0400/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[096/800], Step[0400/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[096/800], Step[0400/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[096/800], Step[0400/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[096/800], Step[0400/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[096/800], Step[0400/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[096/800], Step[0400/0626], Avg Loss: 0.6822
-INFO:master_logger:Epoch[096/800], Step[0400/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[096/800], Step[0400/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[096/800], Step[0500/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[096/800], Step[0500/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[096/800], Step[0500/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[096/800], Step[0500/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[096/800], Step[0500/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[096/800], Step[0500/0626], Avg Loss: 0.6821
-INFO:master_logger:Epoch[096/800], Step[0500/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[096/800], Step[0500/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[096/800], Step[0500/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[096/800], Step[0600/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[096/800], Step[0600/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[096/800], Step[0600/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[096/800], Step[0600/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[096/800], Step[0600/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[096/800], Step[0600/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[096/800], Step[0600/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[096/800], Step[0600/0626], Avg Loss: 0.6827
-INFO:master_logger:Epoch[096/800], Step[0600/0626], Avg Loss: 0.6824
-INFO:local_logger:----- Epoch[096/800], Train Loss: 0.6821, time: 868.69
-INFO:local_logger:Now training epoch 97. LR=0.000155
-INFO:local_logger:----- Epoch[096/800], Train Loss: 0.6821, time: 869.09
-INFO:local_logger:Now training epoch 97. LR=0.000155
-INFO:local_logger:----- Epoch[096/800], Train Loss: 0.6823, time: 868.99
-INFO:local_logger:Now training epoch 97. LR=0.000155
-INFO:local_logger:----- Epoch[096/800], Train Loss: 0.6826, time: 868.81
-INFO:local_logger:Now training epoch 97. LR=0.000155
-INFO:local_logger:----- Epoch[096/800], Train Loss: 0.6824, time: 868.69
-INFO:local_logger:----- Epoch[096/800], Train Loss: 0.6827, time: 864.96
-INFO:local_logger:Now training epoch 97. LR=0.000155
-INFO:master_logger:----- Epoch[096/800], Train Loss: 0.6824, time: 864.96
-INFO:local_logger:----- Epoch[096/800], Train Loss: 0.6828, time: 868.82
-INFO:local_logger:Now training epoch 97. LR=0.000155
-INFO:local_logger:----- Epoch[096/800], Train Loss: 0.6826, time: 868.66
-INFO:local_logger:Now training epoch 97. LR=0.000155
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-96-Loss-0.6826821214191926.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-96-Loss-0.6826821214191926.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-96-Loss-0.6826821214191926.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-96-Loss-0.6826821214191926.pdopt
-INFO:local_logger:Now training epoch 97. LR=0.000155
-INFO:master_logger:Now training epoch 97. LR=0.000155
-INFO:local_logger:Epoch[097/800], Step[0000/0626], Avg Loss: 0.6880
-INFO:master_logger:Epoch[097/800], Step[0000/0626], Avg Loss: 0.6850
-INFO:local_logger:Epoch[097/800], Step[0000/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[097/800], Step[0000/0626], Avg Loss: 0.6940
-INFO:local_logger:Epoch[097/800], Step[0000/0626], Avg Loss: 0.6844
-INFO:local_logger:Epoch[097/800], Step[0000/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[097/800], Step[0000/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[097/800], Step[0000/0626], Avg Loss: 0.6755
-INFO:local_logger:Epoch[097/800], Step[0000/0626], Avg Loss: 0.6928
-INFO:local_logger:Epoch[097/800], Step[0100/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[097/800], Step[0100/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[097/800], Step[0100/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[097/800], Step[0100/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[097/800], Step[0100/0626], Avg Loss: 0.6822
-INFO:master_logger:Epoch[097/800], Step[0100/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[097/800], Step[0100/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[097/800], Step[0100/0626], Avg Loss: 0.6831
-INFO:local_logger:Epoch[097/800], Step[0100/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[097/800], Step[0200/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[097/800], Step[0200/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[097/800], Step[0200/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[097/800], Step[0200/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[097/800], Step[0200/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[097/800], Step[0200/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[097/800], Step[0200/0626], Avg Loss: 0.6823
-INFO:master_logger:Epoch[097/800], Step[0200/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[097/800], Step[0200/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[097/800], Step[0300/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[097/800], Step[0300/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[097/800], Step[0300/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[097/800], Step[0300/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[097/800], Step[0300/0626], Avg Loss: 0.6822
-INFO:master_logger:Epoch[097/800], Step[0300/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[097/800], Step[0300/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[097/800], Step[0300/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[097/800], Step[0300/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[097/800], Step[0400/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[097/800], Step[0400/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[097/800], Step[0400/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[097/800], Step[0400/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[097/800], Step[0400/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[097/800], Step[0400/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[097/800], Step[0400/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[097/800], Step[0400/0626], Avg Loss: 0.6818
-INFO:master_logger:Epoch[097/800], Step[0400/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[097/800], Step[0500/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[097/800], Step[0500/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[097/800], Step[0500/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[097/800], Step[0500/0626], Avg Loss: 0.6826
-INFO:master_logger:Epoch[097/800], Step[0500/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[097/800], Step[0500/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[097/800], Step[0500/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[097/800], Step[0500/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[097/800], Step[0500/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[097/800], Step[0600/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[097/800], Step[0600/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[097/800], Step[0600/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[097/800], Step[0600/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[097/800], Step[0600/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[097/800], Step[0600/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[097/800], Step[0600/0626], Avg Loss: 0.6825
-INFO:master_logger:Epoch[097/800], Step[0600/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[097/800], Step[0600/0626], Avg Loss: 0.6814
-INFO:local_logger:----- Epoch[097/800], Train Loss: 0.6822, time: 881.18
-INFO:local_logger:Now training epoch 98. LR=0.000155
-INFO:local_logger:----- Epoch[097/800], Train Loss: 0.6818, time: 882.30
-INFO:local_logger:Now training epoch 98. LR=0.000155
-INFO:local_logger:----- Epoch[097/800], Train Loss: 0.6822, time: 882.31
-INFO:local_logger:Now training epoch 98. LR=0.000155
-INFO:local_logger:----- Epoch[097/800], Train Loss: 0.6826, time: 882.32
-INFO:local_logger:Now training epoch 98. LR=0.000155
-INFO:local_logger:----- Epoch[097/800], Train Loss: 0.6813, time: 882.38
-INFO:local_logger:Now training epoch 98. LR=0.000155
-INFO:local_logger:----- Epoch[097/800], Train Loss: 0.6820, time: 882.40
-INFO:local_logger:Now training epoch 98. LR=0.000155
-INFO:local_logger:----- Epoch[097/800], Train Loss: 0.6818, time: 878.64
-INFO:master_logger:----- Epoch[097/800], Train Loss: 0.6820, time: 878.64
-INFO:local_logger:----- Epoch[097/800], Train Loss: 0.6823, time: 882.40
-INFO:local_logger:Now training epoch 98. LR=0.000155
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-97-Loss-0.6818455600972856.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-97-Loss-0.6818455600972856.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-97-Loss-0.6818455600972856.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-97-Loss-0.6818455600972856.pdopt
-INFO:local_logger:Now training epoch 98. LR=0.000155
-INFO:master_logger:Now training epoch 98. LR=0.000155
-INFO:local_logger:Epoch[098/800], Step[0000/0626], Avg Loss: 0.6930
-INFO:local_logger:Epoch[098/800], Step[0000/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[098/800], Step[0000/0626], Avg Loss: 0.6821
-INFO:master_logger:Epoch[098/800], Step[0000/0626], Avg Loss: 0.6860
-INFO:local_logger:Epoch[098/800], Step[0000/0626], Avg Loss: 0.6802
-INFO:local_logger:Epoch[098/800], Step[0000/0626], Avg Loss: 0.6848
-INFO:local_logger:Epoch[098/800], Step[0000/0626], Avg Loss: 0.6756
-INFO:local_logger:Epoch[098/800], Step[0000/0626], Avg Loss: 0.7011
-INFO:local_logger:Epoch[098/800], Step[0000/0626], Avg Loss: 0.6899
-INFO:local_logger:Epoch[098/800], Step[0100/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[098/800], Step[0100/0626], Avg Loss: 0.6824
-INFO:master_logger:Epoch[098/800], Step[0100/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[098/800], Step[0100/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[098/800], Step[0100/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[098/800], Step[0100/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[098/800], Step[0100/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[098/800], Step[0100/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[098/800], Step[0100/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[098/800], Step[0200/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[098/800], Step[0200/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[098/800], Step[0200/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[098/800], Step[0200/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[098/800], Step[0200/0626], Avg Loss: 0.6826
-INFO:master_logger:Epoch[098/800], Step[0200/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[098/800], Step[0200/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[098/800], Step[0200/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[098/800], Step[0200/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[098/800], Step[0300/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[098/800], Step[0300/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[098/800], Step[0300/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[098/800], Step[0300/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[098/800], Step[0300/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[098/800], Step[0300/0626], Avg Loss: 0.6822
-INFO:master_logger:Epoch[098/800], Step[0300/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[098/800], Step[0300/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[098/800], Step[0300/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[098/800], Step[0400/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[098/800], Step[0400/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[098/800], Step[0400/0626], Avg Loss: 0.6827
-INFO:local_logger:Epoch[098/800], Step[0400/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[098/800], Step[0400/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[098/800], Step[0400/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[098/800], Step[0400/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[098/800], Step[0400/0626], Avg Loss: 0.6817
-INFO:master_logger:Epoch[098/800], Step[0400/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[098/800], Step[0500/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[098/800], Step[0500/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[098/800], Step[0500/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[098/800], Step[0500/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[098/800], Step[0500/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[098/800], Step[0500/0626], Avg Loss: 0.6823
-INFO:master_logger:Epoch[098/800], Step[0500/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[098/800], Step[0500/0626], Avg Loss: 0.6816
-INFO:local_logger:Epoch[098/800], Step[0500/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[098/800], Step[0600/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[098/800], Step[0600/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[098/800], Step[0600/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[098/800], Step[0600/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[098/800], Step[0600/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[098/800], Step[0600/0626], Avg Loss: 0.6816
-INFO:master_logger:Epoch[098/800], Step[0600/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[098/800], Step[0600/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[098/800], Step[0600/0626], Avg Loss: 0.6820
-INFO:local_logger:----- Epoch[098/800], Train Loss: 0.6819, time: 870.20
-INFO:master_logger:----- Epoch[098/800], Train Loss: 0.6820, time: 870.20
-INFO:local_logger:----- Epoch[098/800], Train Loss: 0.6823, time: 875.16
-INFO:local_logger:Now training epoch 99. LR=0.000155
-INFO:local_logger:----- Epoch[098/800], Train Loss: 0.6820, time: 874.01
-INFO:local_logger:Now training epoch 99. LR=0.000155
-INFO:local_logger:----- Epoch[098/800], Train Loss: 0.6825, time: 874.00
-INFO:local_logger:Now training epoch 99. LR=0.000155
-INFO:local_logger:----- Epoch[098/800], Train Loss: 0.6817, time: 874.10
-INFO:local_logger:Now training epoch 99. LR=0.000155
-INFO:local_logger:----- Epoch[098/800], Train Loss: 0.6822, time: 874.02
-INFO:local_logger:Now training epoch 99. LR=0.000155
-INFO:local_logger:----- Epoch[098/800], Train Loss: 0.6815, time: 874.10
-INFO:local_logger:Now training epoch 99. LR=0.000155
-INFO:local_logger:----- Epoch[098/800], Train Loss: 0.6816, time: 874.02
-INFO:local_logger:Now training epoch 99. LR=0.000155
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-98-Loss-0.681889634827903.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-98-Loss-0.681889634827903.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-98-Loss-0.681889634827903.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-98-Loss-0.681889634827903.pdopt
-INFO:local_logger:Now training epoch 99. LR=0.000155
-INFO:master_logger:Now training epoch 99. LR=0.000155
-INFO:local_logger:Epoch[099/800], Step[0000/0626], Avg Loss: 0.6968
-INFO:local_logger:Epoch[099/800], Step[0000/0626], Avg Loss: 0.6870
-INFO:local_logger:Epoch[099/800], Step[0000/0626], Avg Loss: 0.6865
-INFO:master_logger:Epoch[099/800], Step[0000/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[099/800], Step[0000/0626], Avg Loss: 0.6853
-INFO:local_logger:Epoch[099/800], Step[0000/0626], Avg Loss: 0.6660
-INFO:local_logger:Epoch[099/800], Step[0000/0626], Avg Loss: 0.6719
-INFO:local_logger:Epoch[099/800], Step[0000/0626], Avg Loss: 0.6836
-INFO:local_logger:Epoch[099/800], Step[0000/0626], Avg Loss: 0.6839
-INFO:local_logger:Epoch[099/800], Step[0100/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[099/800], Step[0100/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[099/800], Step[0100/0626], Avg Loss: 0.6834
-INFO:local_logger:Epoch[099/800], Step[0100/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[099/800], Step[0100/0626], Avg Loss: 0.6826
-INFO:master_logger:Epoch[099/800], Step[0100/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[099/800], Step[0100/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[099/800], Step[0100/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[099/800], Step[0100/0626], Avg Loss: 0.6826
-INFO:local_logger:Epoch[099/800], Step[0200/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[099/800], Step[0200/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[099/800], Step[0200/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[099/800], Step[0200/0626], Avg Loss: 0.6828
-INFO:local_logger:Epoch[099/800], Step[0200/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[099/800], Step[0200/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[099/800], Step[0200/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[099/800], Step[0200/0626], Avg Loss: 0.6822
-INFO:master_logger:Epoch[099/800], Step[0200/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[099/800], Step[0300/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[099/800], Step[0300/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[099/800], Step[0300/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[099/800], Step[0300/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[099/800], Step[0300/0626], Avg Loss: 0.6811
-INFO:master_logger:Epoch[099/800], Step[0300/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[099/800], Step[0300/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[099/800], Step[0300/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[099/800], Step[0300/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[099/800], Step[0400/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[099/800], Step[0400/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[099/800], Step[0400/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[099/800], Step[0400/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[099/800], Step[0400/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[099/800], Step[0400/0626], Avg Loss: 0.6825
-INFO:local_logger:Epoch[099/800], Step[0400/0626], Avg Loss: 0.6816
-INFO:master_logger:Epoch[099/800], Step[0400/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[099/800], Step[0400/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[099/800], Step[0500/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[099/800], Step[0500/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[099/800], Step[0500/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[099/800], Step[0500/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[099/800], Step[0500/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[099/800], Step[0500/0626], Avg Loss: 0.6815
-INFO:master_logger:Epoch[099/800], Step[0500/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[099/800], Step[0500/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[099/800], Step[0500/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[099/800], Step[0600/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[099/800], Step[0600/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[099/800], Step[0600/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[099/800], Step[0600/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[099/800], Step[0600/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[099/800], Step[0600/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[099/800], Step[0600/0626], Avg Loss: 0.6817
-INFO:master_logger:Epoch[099/800], Step[0600/0626], Avg Loss: 0.6816
-INFO:local_logger:Epoch[099/800], Step[0600/0626], Avg Loss: 0.6817
-INFO:local_logger:----- Epoch[099/800], Train Loss: 0.6811, time: 873.44
-INFO:local_logger:Now training epoch 100. LR=0.000155
-INFO:local_logger:----- Epoch[099/800], Train Loss: 0.6817, time: 874.12
-INFO:local_logger:Now training epoch 100. LR=0.000155
-INFO:local_logger:----- Epoch[099/800], Train Loss: 0.6814, time: 874.14
-INFO:local_logger:Now training epoch 100. LR=0.000155
-INFO:local_logger:----- Epoch[099/800], Train Loss: 0.6815, time: 874.31
-INFO:local_logger:Now training epoch 100. LR=0.000155
-INFO:local_logger:----- Epoch[099/800], Train Loss: 0.6816, time: 874.69
-INFO:local_logger:Now training epoch 100. LR=0.000155
-INFO:local_logger:----- Epoch[099/800], Train Loss: 0.6820, time: 874.75
-INFO:local_logger:Now training epoch 100. LR=0.000155
-INFO:local_logger:----- Epoch[099/800], Train Loss: 0.6814, time: 871.01
-INFO:master_logger:----- Epoch[099/800], Train Loss: 0.6816, time: 871.01
-INFO:local_logger:----- Epoch[099/800], Train Loss: 0.6820, time: 874.69
-INFO:local_logger:Now training epoch 100. LR=0.000155
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-99-Loss-0.6813920508235197.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-99-Loss-0.6813920508235197.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-99-Loss-0.6813920508235197.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-99-Loss-0.6813920508235197.pdopt
-INFO:local_logger:Now training epoch 100. LR=0.000155
-INFO:master_logger:Now training epoch 100. LR=0.000155
-INFO:local_logger:Epoch[100/800], Step[0000/0626], Avg Loss: 0.6874
-INFO:local_logger:Epoch[100/800], Step[0000/0626], Avg Loss: 0.6858
-INFO:local_logger:Epoch[100/800], Step[0000/0626], Avg Loss: 0.6763
-INFO:master_logger:Epoch[100/800], Step[0000/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[100/800], Step[0000/0626], Avg Loss: 0.6794
-INFO:local_logger:Epoch[100/800], Step[0000/0626], Avg Loss: 0.6727
-INFO:local_logger:Epoch[100/800], Step[0000/0626], Avg Loss: 0.6920
-INFO:local_logger:Epoch[100/800], Step[0000/0626], Avg Loss: 0.6764
-INFO:local_logger:Epoch[100/800], Step[0000/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[100/800], Step[0100/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[100/800], Step[0100/0626], Avg Loss: 0.6824
-INFO:local_logger:Epoch[100/800], Step[0100/0626], Avg Loss: 0.6801
-INFO:local_logger:Epoch[100/800], Step[0100/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[100/800], Step[0100/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[100/800], Step[0100/0626], Avg Loss: 0.6821
-INFO:master_logger:Epoch[100/800], Step[0100/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[100/800], Step[0100/0626], Avg Loss: 0.6807
-INFO:local_logger:Epoch[100/800], Step[0100/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[100/800], Step[0200/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[100/800], Step[0200/0626], Avg Loss: 0.6804
-INFO:local_logger:Epoch[100/800], Step[0200/0626], Avg Loss: 0.6816
-INFO:local_logger:Epoch[100/800], Step[0200/0626], Avg Loss: 0.6816
-INFO:local_logger:Epoch[100/800], Step[0200/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[100/800], Step[0200/0626], Avg Loss: 0.6812
-INFO:master_logger:Epoch[100/800], Step[0200/0626], Avg Loss: 0.6813
-INFO:local_logger:Epoch[100/800], Step[0200/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[100/800], Step[0200/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[100/800], Step[0300/0626], Avg Loss: 0.6816
-INFO:local_logger:Epoch[100/800], Step[0300/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[100/800], Step[0300/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[100/800], Step[0300/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[100/800], Step[0300/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[100/800], Step[0300/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[100/800], Step[0300/0626], Avg Loss: 0.6812
-INFO:master_logger:Epoch[100/800], Step[0300/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[100/800], Step[0300/0626], Avg Loss: 0.6807
-INFO:local_logger:Epoch[100/800], Step[0400/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[100/800], Step[0400/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[100/800], Step[0400/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[100/800], Step[0400/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[100/800], Step[0400/0626], Avg Loss: 0.6815
-INFO:master_logger:Epoch[100/800], Step[0400/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[100/800], Step[0400/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[100/800], Step[0400/0626], Avg Loss: 0.6813
-INFO:local_logger:Epoch[100/800], Step[0400/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[100/800], Step[0500/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[100/800], Step[0500/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[100/800], Step[0500/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[100/800], Step[0500/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[100/800], Step[0500/0626], Avg Loss: 0.6812
-INFO:master_logger:Epoch[100/800], Step[0500/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[100/800], Step[0500/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[100/800], Step[0500/0626], Avg Loss: 0.6816
-INFO:local_logger:Epoch[100/800], Step[0500/0626], Avg Loss: 0.6813
-INFO:local_logger:Epoch[100/800], Step[0600/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[100/800], Step[0600/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[100/800], Step[0600/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[100/800], Step[0600/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[100/800], Step[0600/0626], Avg Loss: 0.6813
-INFO:local_logger:Epoch[100/800], Step[0600/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[100/800], Step[0600/0626], Avg Loss: 0.6814
-INFO:master_logger:Epoch[100/800], Step[0600/0626], Avg Loss: 0.6813
-INFO:local_logger:Epoch[100/800], Step[0600/0626], Avg Loss: 0.6814
-INFO:local_logger:----- Epoch[100/800], Train Loss: 0.6815, time: 871.01
-INFO:local_logger:Now training epoch 101. LR=0.000156
-INFO:local_logger:----- Epoch[100/800], Train Loss: 0.6814, time: 870.82
-INFO:local_logger:Now training epoch 101. LR=0.000156
-INFO:local_logger:----- Epoch[100/800], Train Loss: 0.6814, time: 871.44
-INFO:local_logger:Now training epoch 101. LR=0.000156
-INFO:local_logger:----- Epoch[100/800], Train Loss: 0.6812, time: 871.50
-INFO:local_logger:Now training epoch 101. LR=0.000156
-INFO:local_logger:----- Epoch[100/800], Train Loss: 0.6812, time: 867.05
-INFO:master_logger:----- Epoch[100/800], Train Loss: 0.6813, time: 867.05
-INFO:local_logger:----- Epoch[100/800], Train Loss: 0.6811, time: 872.20
-INFO:local_logger:Now training epoch 101. LR=0.000156
-INFO:local_logger:----- Epoch[100/800], Train Loss: 0.6814, time: 870.95
-INFO:local_logger:Now training epoch 101. LR=0.000156
-INFO:local_logger:----- Epoch[100/800], Train Loss: 0.6814, time: 871.00
-INFO:local_logger:Now training epoch 101. LR=0.000156
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-100-Loss-0.6812047341083004.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-100-Loss-0.6812047341083004.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-100-Loss-0.6812047341083004.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-100-Loss-0.6812047341083004.pdopt
-INFO:local_logger:Now training epoch 101. LR=0.000156
-INFO:master_logger:Now training epoch 101. LR=0.000156
-INFO:local_logger:Epoch[101/800], Step[0000/0626], Avg Loss: 0.6922
-INFO:local_logger:Epoch[101/800], Step[0000/0626], Avg Loss: 0.6755
-INFO:local_logger:Epoch[101/800], Step[0000/0626], Avg Loss: 0.6834
-INFO:master_logger:Epoch[101/800], Step[0000/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[101/800], Step[0000/0626], Avg Loss: 0.6807
-INFO:local_logger:Epoch[101/800], Step[0000/0626], Avg Loss: 0.6772
-INFO:local_logger:Epoch[101/800], Step[0000/0626], Avg Loss: 0.6966
-INFO:local_logger:Epoch[101/800], Step[0000/0626], Avg Loss: 0.6690
-INFO:local_logger:Epoch[101/800], Step[0000/0626], Avg Loss: 0.6717
-INFO:local_logger:Epoch[101/800], Step[0100/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[101/800], Step[0100/0626], Avg Loss: 0.6798
-INFO:local_logger:Epoch[101/800], Step[0100/0626], Avg Loss: 0.6819
-INFO:master_logger:Epoch[101/800], Step[0100/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[101/800], Step[0100/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[101/800], Step[0100/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[101/800], Step[0100/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[101/800], Step[0100/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[101/800], Step[0100/0626], Avg Loss: 0.6813
-INFO:local_logger:Epoch[101/800], Step[0200/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[101/800], Step[0200/0626], Avg Loss: 0.6813
-INFO:master_logger:Epoch[101/800], Step[0200/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[101/800], Step[0200/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[101/800], Step[0200/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[101/800], Step[0200/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[101/800], Step[0200/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[101/800], Step[0200/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[101/800], Step[0200/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[101/800], Step[0300/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[101/800], Step[0300/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[101/800], Step[0300/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[101/800], Step[0300/0626], Avg Loss: 0.6813
-INFO:local_logger:Epoch[101/800], Step[0300/0626], Avg Loss: 0.6807
-INFO:master_logger:Epoch[101/800], Step[0300/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[101/800], Step[0300/0626], Avg Loss: 0.6807
-INFO:local_logger:Epoch[101/800], Step[0300/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[101/800], Step[0300/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[101/800], Step[0400/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[101/800], Step[0400/0626], Avg Loss: 0.6804
-INFO:local_logger:Epoch[101/800], Step[0400/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[101/800], Step[0400/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[101/800], Step[0400/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[101/800], Step[0400/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[101/800], Step[0400/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[101/800], Step[0400/0626], Avg Loss: 0.6810
-INFO:master_logger:Epoch[101/800], Step[0400/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[101/800], Step[0500/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[101/800], Step[0500/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[101/800], Step[0500/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[101/800], Step[0500/0626], Avg Loss: 0.6807
-INFO:local_logger:Epoch[101/800], Step[0500/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[101/800], Step[0500/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[101/800], Step[0500/0626], Avg Loss: 0.6810
-INFO:master_logger:Epoch[101/800], Step[0500/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[101/800], Step[0500/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[101/800], Step[0600/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[101/800], Step[0600/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[101/800], Step[0600/0626], Avg Loss: 0.6807
-INFO:local_logger:Epoch[101/800], Step[0600/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[101/800], Step[0600/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[101/800], Step[0600/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[101/800], Step[0600/0626], Avg Loss: 0.6808
-INFO:master_logger:Epoch[101/800], Step[0600/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[101/800], Step[0600/0626], Avg Loss: 0.6809
-INFO:local_logger:----- Epoch[101/800], Train Loss: 0.6805, time: 867.69
-INFO:local_logger:Now training epoch 102. LR=0.000156
-INFO:local_logger:----- Epoch[101/800], Train Loss: 0.6806, time: 868.18
-INFO:local_logger:Now training epoch 102. LR=0.000156
-INFO:local_logger:----- Epoch[101/800], Train Loss: 0.6806, time: 868.58
-INFO:local_logger:Now training epoch 102. LR=0.000156
-INFO:local_logger:----- Epoch[101/800], Train Loss: 0.6808, time: 864.53
-INFO:master_logger:----- Epoch[101/800], Train Loss: 0.6808, time: 864.53
-INFO:local_logger:----- Epoch[101/800], Train Loss: 0.6807, time: 868.26
-INFO:local_logger:Now training epoch 102. LR=0.000156
-INFO:local_logger:----- Epoch[101/800], Train Loss: 0.6810, time: 868.29
-INFO:local_logger:Now training epoch 102. LR=0.000156
-INFO:local_logger:----- Epoch[101/800], Train Loss: 0.6811, time: 868.42
-INFO:local_logger:Now training epoch 102. LR=0.000156
-INFO:local_logger:----- Epoch[101/800], Train Loss: 0.6812, time: 868.25
-INFO:local_logger:Now training epoch 102. LR=0.000156
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-101-Loss-0.6808065795139766.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-101-Loss-0.6808065795139766.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-101-Loss-0.6808065795139766.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-101-Loss-0.6808065795139766.pdopt
-INFO:local_logger:Now training epoch 102. LR=0.000156
-INFO:master_logger:Now training epoch 102. LR=0.000156
-INFO:local_logger:Epoch[102/800], Step[0000/0626], Avg Loss: 0.6861
-INFO:master_logger:Epoch[102/800], Step[0000/0626], Avg Loss: 0.6794
-INFO:local_logger:Epoch[102/800], Step[0000/0626], Avg Loss: 0.6910
-INFO:local_logger:Epoch[102/800], Step[0000/0626], Avg Loss: 0.6753
-INFO:local_logger:Epoch[102/800], Step[0000/0626], Avg Loss: 0.6669
-INFO:local_logger:Epoch[102/800], Step[0000/0626], Avg Loss: 0.6722
-INFO:local_logger:Epoch[102/800], Step[0000/0626], Avg Loss: 0.6901
-INFO:local_logger:Epoch[102/800], Step[0000/0626], Avg Loss: 0.6797
-INFO:local_logger:Epoch[102/800], Step[0000/0626], Avg Loss: 0.6743
-INFO:local_logger:Epoch[102/800], Step[0100/0626], Avg Loss: 0.6830
-INFO:local_logger:Epoch[102/800], Step[0100/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[102/800], Step[0100/0626], Avg Loss: 0.6822
-INFO:local_logger:Epoch[102/800], Step[0100/0626], Avg Loss: 0.6813
-INFO:local_logger:Epoch[102/800], Step[0100/0626], Avg Loss: 0.6817
-INFO:master_logger:Epoch[102/800], Step[0100/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[102/800], Step[0100/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[102/800], Step[0100/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[102/800], Step[0100/0626], Avg Loss: 0.6804
-INFO:local_logger:Epoch[102/800], Step[0200/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[102/800], Step[0200/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[102/800], Step[0200/0626], Avg Loss: 0.6807
-INFO:master_logger:Epoch[102/800], Step[0200/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[102/800], Step[0200/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[102/800], Step[0200/0626], Avg Loss: 0.6819
-INFO:local_logger:Epoch[102/800], Step[0200/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[102/800], Step[0200/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[102/800], Step[0200/0626], Avg Loss: 0.6803
-INFO:local_logger:Epoch[102/800], Step[0300/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[102/800], Step[0300/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[102/800], Step[0300/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[102/800], Step[0300/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[102/800], Step[0300/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[102/800], Step[0300/0626], Avg Loss: 0.6814
-INFO:master_logger:Epoch[102/800], Step[0300/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[102/800], Step[0300/0626], Avg Loss: 0.6821
-INFO:local_logger:Epoch[102/800], Step[0300/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[102/800], Step[0400/0626], Avg Loss: 0.6816
-INFO:local_logger:Epoch[102/800], Step[0400/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[102/800], Step[0400/0626], Avg Loss: 0.6804
-INFO:local_logger:Epoch[102/800], Step[0400/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[102/800], Step[0400/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[102/800], Step[0400/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[102/800], Step[0400/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[102/800], Step[0400/0626], Avg Loss: 0.6810
-INFO:master_logger:Epoch[102/800], Step[0400/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[102/800], Step[0500/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[102/800], Step[0500/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[102/800], Step[0500/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[102/800], Step[0500/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[102/800], Step[0500/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[102/800], Step[0500/0626], Avg Loss: 0.6807
-INFO:local_logger:Epoch[102/800], Step[0500/0626], Avg Loss: 0.6820
-INFO:local_logger:Epoch[102/800], Step[0500/0626], Avg Loss: 0.6811
-INFO:master_logger:Epoch[102/800], Step[0500/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[102/800], Step[0600/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[102/800], Step[0600/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[102/800], Step[0600/0626], Avg Loss: 0.6817
-INFO:local_logger:Epoch[102/800], Step[0600/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[102/800], Step[0600/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[102/800], Step[0600/0626], Avg Loss: 0.6810
-INFO:master_logger:Epoch[102/800], Step[0600/0626], Avg Loss: 0.6811
-INFO:local_logger:Epoch[102/800], Step[0600/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[102/800], Step[0600/0626], Avg Loss: 0.6813
-INFO:local_logger:----- Epoch[102/800], Train Loss: 0.6812, time: 872.04
-INFO:local_logger:Now training epoch 103. LR=0.000156
-INFO:local_logger:----- Epoch[102/800], Train Loss: 0.6812, time: 868.78
-INFO:master_logger:----- Epoch[102/800], Train Loss: 0.6811, time: 868.78
-INFO:local_logger:----- Epoch[102/800], Train Loss: 0.6809, time: 872.56
-INFO:local_logger:Now training epoch 103. LR=0.000156
-INFO:local_logger:----- Epoch[102/800], Train Loss: 0.6805, time: 872.77
-INFO:local_logger:Now training epoch 103. LR=0.000156
-INFO:local_logger:----- Epoch[102/800], Train Loss: 0.6808, time: 873.71
-INFO:local_logger:Now training epoch 103. LR=0.000156
-INFO:local_logger:----- Epoch[102/800], Train Loss: 0.6809, time: 873.10
-INFO:local_logger:Now training epoch 103. LR=0.000156
-INFO:local_logger:----- Epoch[102/800], Train Loss: 0.6815, time: 873.19
-INFO:local_logger:Now training epoch 103. LR=0.000156
-INFO:local_logger:----- Epoch[102/800], Train Loss: 0.6815, time: 873.09
-INFO:local_logger:Now training epoch 103. LR=0.000156
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-102-Loss-0.681159841605351.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-102-Loss-0.681159841605351.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-102-Loss-0.681159841605351.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-102-Loss-0.681159841605351.pdopt
-INFO:local_logger:Now training epoch 103. LR=0.000156
-INFO:master_logger:Now training epoch 103. LR=0.000156
-INFO:local_logger:Epoch[103/800], Step[0000/0626], Avg Loss: 0.6787
-INFO:master_logger:Epoch[103/800], Step[0000/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[103/800], Step[0000/0626], Avg Loss: 0.6646
-INFO:local_logger:Epoch[103/800], Step[0000/0626], Avg Loss: 0.6794
-INFO:local_logger:Epoch[103/800], Step[0000/0626], Avg Loss: 0.6850
-INFO:local_logger:Epoch[103/800], Step[0000/0626], Avg Loss: 0.6838
-INFO:local_logger:Epoch[103/800], Step[0000/0626], Avg Loss: 0.6909
-INFO:local_logger:Epoch[103/800], Step[0000/0626], Avg Loss: 0.6721
-INFO:local_logger:Epoch[103/800], Step[0000/0626], Avg Loss: 0.6917
-INFO:local_logger:Epoch[103/800], Step[0100/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[103/800], Step[0100/0626], Avg Loss: 0.6811
-INFO:master_logger:Epoch[103/800], Step[0100/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[103/800], Step[0100/0626], Avg Loss: 0.6829
-INFO:local_logger:Epoch[103/800], Step[0100/0626], Avg Loss: 0.6810
-INFO:local_logger:Epoch[103/800], Step[0100/0626], Avg Loss: 0.6795
-INFO:local_logger:Epoch[103/800], Step[0100/0626], Avg Loss: 0.6823
-INFO:local_logger:Epoch[103/800], Step[0100/0626], Avg Loss: 0.6818
-INFO:local_logger:Epoch[103/800], Step[0100/0626], Avg Loss: 0.6802
-INFO:local_logger:Epoch[103/800], Step[0200/0626], Avg Loss: 0.6799
-INFO:local_logger:Epoch[103/800], Step[0200/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[103/800], Step[0200/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[103/800], Step[0200/0626], Avg Loss: 0.6810
-INFO:master_logger:Epoch[103/800], Step[0200/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[103/800], Step[0200/0626], Avg Loss: 0.6815
-INFO:local_logger:Epoch[103/800], Step[0200/0626], Avg Loss: 0.6804
-INFO:local_logger:Epoch[103/800], Step[0200/0626], Avg Loss: 0.6797
-INFO:local_logger:Epoch[103/800], Step[0200/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[103/800], Step[0300/0626], Avg Loss: 0.6807
-INFO:local_logger:Epoch[103/800], Step[0300/0626], Avg Loss: 0.6803
-INFO:local_logger:Epoch[103/800], Step[0300/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[103/800], Step[0300/0626], Avg Loss: 0.6799
-INFO:local_logger:Epoch[103/800], Step[0300/0626], Avg Loss: 0.6803
-INFO:local_logger:Epoch[103/800], Step[0300/0626], Avg Loss: 0.6801
-INFO:local_logger:Epoch[103/800], Step[0300/0626], Avg Loss: 0.6806
-INFO:master_logger:Epoch[103/800], Step[0300/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[103/800], Step[0300/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[103/800], Step[0400/0626], Avg Loss: 0.6807
-INFO:local_logger:Epoch[103/800], Step[0400/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[103/800], Step[0400/0626], Avg Loss: 0.6802
-INFO:local_logger:Epoch[103/800], Step[0400/0626], Avg Loss: 0.6800
-INFO:local_logger:Epoch[103/800], Step[0400/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[103/800], Step[0400/0626], Avg Loss: 0.6805
-INFO:master_logger:Epoch[103/800], Step[0400/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[103/800], Step[0400/0626], Avg Loss: 0.6814
-INFO:local_logger:Epoch[103/800], Step[0400/0626], Avg Loss: 0.6803
-INFO:local_logger:Epoch[103/800], Step[0500/0626], Avg Loss: 0.6802
-INFO:local_logger:Epoch[103/800], Step[0500/0626], Avg Loss: 0.6812
-INFO:local_logger:Epoch[103/800], Step[0500/0626], Avg Loss: 0.6802
-INFO:local_logger:Epoch[103/800], Step[0500/0626], Avg Loss: 0.6807
-INFO:local_logger:Epoch[103/800], Step[0500/0626], Avg Loss: 0.6808
-INFO:local_logger:Epoch[103/800], Step[0500/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[103/800], Step[0500/0626], Avg Loss: 0.6804
-INFO:master_logger:Epoch[103/800], Step[0500/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[103/800], Step[0500/0626], Avg Loss: 0.6804
-INFO:local_logger:Epoch[103/800], Step[0600/0626], Avg Loss: 0.6806
-INFO:local_logger:Epoch[103/800], Step[0600/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[103/800], Step[0600/0626], Avg Loss: 0.6801
-INFO:local_logger:Epoch[103/800], Step[0600/0626], Avg Loss: 0.6802
-INFO:master_logger:Epoch[103/800], Step[0600/0626], Avg Loss: 0.6805
-INFO:local_logger:Epoch[103/800], Step[0600/0626], Avg Loss: 0.6802
-INFO:local_logger:Epoch[103/800], Step[0600/0626], Avg Loss: 0.6809
-INFO:local_logger:Epoch[103/800], Step[0600/0626], Avg Loss: 0.6802
-INFO:local_logger:Epoch[103/800], Step[0600/0626], Avg Loss: 0.6806
-INFO:local_logger:----- Epoch[103/800], Train Loss: 0.6801, time: 859.70
-INFO:local_logger:Now training epoch 104. LR=0.000156
-INFO:local_logger:----- Epoch[103/800], Train Loss: 0.6803, time: 859.89
-INFO:local_logger:Now training epoch 104. LR=0.000156
-INFO:local_logger:----- Epoch[103/800], Train Loss: 0.6805, time: 859.89
-INFO:local_logger:Now training epoch 104. LR=0.000156
-INFO:local_logger:----- Epoch[103/800], Train Loss: 0.6809, time: 856.80
-INFO:master_logger:----- Epoch[103/800], Train Loss: 0.6805, time: 856.80
-INFO:local_logger:----- Epoch[103/800], Train Loss: 0.6806, time: 860.28
-INFO:local_logger:Now training epoch 104. LR=0.000156
-INFO:local_logger:----- Epoch[103/800], Train Loss: 0.6801, time: 859.89
-INFO:local_logger:Now training epoch 104. LR=0.000156
-INFO:local_logger:----- Epoch[103/800], Train Loss: 0.6802, time: 859.91
-INFO:local_logger:Now training epoch 104. LR=0.000156
-INFO:local_logger:----- Epoch[103/800], Train Loss: 0.6809, time: 860.42
-INFO:local_logger:Now training epoch 104. LR=0.000156
-INFO:local_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-103-Loss-0.6808819352382769.pdparams
-INFO:local_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-103-Loss-0.6808819352382769.pdopt
-INFO:master_logger:----- Save model: ./output/train-20211219-17-07-40/MAE-Epoch-103-Loss-0.6808819352382769.pdparams
-INFO:master_logger:----- Save optim: ./output/train-20211219-17-07-40/MAE-Epoch-103-Loss-0.6808819352382769.pdopt
-INFO:local_logger:Now training epoch 104. LR=0.000156
-INFO:master_logger:Now training epoch 104. LR=0.000156
-INFO:local_logger:Epoch[104/800], Step[0000/0626], Avg Loss: 0.6583
-INFO:local_logger:Epoch[104/800], Step[0000/0626], Avg Loss: 0.6660
-INFO:local_logger:Epoch[104/800], Step[0000/0626], Avg Loss: 0.6799
-INFO:master_logger:Epoch[104/800], Step[0000/0626], Avg Loss: 0.6722
-INFO:local_logger:Epoch[104/800], Step[0000/0626], Avg Loss: 0.6668
-INFO:local_logger:Epoch[104/800], Step[0000/0626], Avg Loss: 0.6885
-INFO:local_logger:Epoch[104/800], Step[0000/0626], Avg Loss: 0.6790
-INFO:local_logger:Epoch[104/800], Step[0000/0626], Avg Loss: 0.6707
-INFO:local_logger:Epoch[104/800], Step[0000/0626], Avg Loss: 0.6680
-
-
---------------------------------------
-C++ Traceback (most recent call last):
---------------------------------------
-0 paddle::platform::GpuMemcpySync(void*, void const*, unsigned long, cudaMemcpyKind)
-
-----------------------
-Error Message Summary:
-----------------------
-FatalError: `Termination signal` is detected by the operating system.
- [TimeInfo: *** Aborted at 1639995159 (unix time) try "date -d @1639995159" if you are using GNU date ***]
- [SignalInfo: *** SIGTERM (@0x84e5) received by PID 25456 (TID 0x7f771efbe700) from PID 34021 ***]
-
-
-
---------------------------------------
-C++ Traceback (most recent call last):
---------------------------------------
-0 paddle::platform::GpuMemcpySync(void*, void const*, unsigned long, cudaMemcpyKind)
-
-----------------------
-Error Message Summary:
-----------------------
-FatalError: `Termination signal` is detected by the operating system.
- [TimeInfo: *** Aborted at 1639995171 (unix time) try "date -d @1639995171" if you are using GNU date ***]
- [SignalInfo: *** SIGTERM (@0x84e5) received by PID 25537 (TID 0x7fcf37fc6700) from PID 34021 ***]
-
-Traceback (most recent call last):
- File "main_multi_gpu_pretrain.py", line 416, in
- main()
- File "main_multi_gpu_pretrain.py", line 412, in main
- dist.spawn(main_worker, args=(config, dataset_train, ), nprocs=config.NGPUS)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 502, in spawn
- while not context.join():
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 312, in join
- self._throw_exception(error_index)
- File "/opt/conda/envs/py36/lib/python3.6/site-packages/paddle/distributed/spawn.py", line 320, in _throw_exception
- (error_index, name))
-Exception: Process 7 terminated with signal SIGTERM.
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 14 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 20 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 20 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 20 leaked semaphores to clean up at shutdown
- len(cache))
-/opt/conda/envs/py36/lib/python3.6/multiprocessing/semaphore_tracker.py:143: UserWarning: semaphore_tracker: There appear to be 20 leaked semaphores to clean up at shutdown
- len(cache))
diff --git a/image_classification/MAE/run_finetune.sh b/image_classification/MAE/run_finetune.sh
deleted file mode 100644
index c4d60575..00000000
--- a/image_classification/MAE/run_finetune.sh
+++ /dev/null
@@ -1,8 +0,0 @@
-CUDA_VISIBLE_DEVICES=0 \
-python main_single_gpu_finetune.py \
--cfg='./configs/vit_base_patch16_224_finetune.yaml' \
--dataset='imagenet2012' \
--batch_size=8 \
--data_path='/dataset/imagenet' \
--amp \
--pretrained='./output/train-20211203-14-42-46/MAE-Epoch-10-Loss-0'
diff --git a/image_classification/MAE/run_finetune_multi.sh b/image_classification/MAE/run_finetune_multi.sh
deleted file mode 100644
index 719a5cd1..00000000
--- a/image_classification/MAE/run_finetune_multi.sh
+++ /dev/null
@@ -1,7 +0,0 @@
-CUDA_VISIBLE_DEVICES=0,1 \
-python main_multi_gpu_finetune.py \
--cfg='./configs/vit_base_patch16_224_finetune.yaml' \
--dataset='imagenet2012' \
--batch_size=8 \
--data_path='/dataset/imagenet' \
--amp \
diff --git a/image_classification/MAE/run_pretrain.sh b/image_classification/MAE/run_pretrain.sh
deleted file mode 100644
index 8c5b1b7b..00000000
--- a/image_classification/MAE/run_pretrain.sh
+++ /dev/null
@@ -1,8 +0,0 @@
-CUDA_VISIBLE_DEVICES=0 \
-python main_single_gpu_pretrain.py \
--cfg='./configs/vit_base_patch16_224_pretrain.yaml' \
--dataset='imagenet2012' \
--batch_size=8 \
--data_path='/dataset/imagenet' \
--mae_pretrain \
-#-amp
diff --git a/image_classification/MAE/run_pretrain_multi.sh b/image_classification/MAE/run_pretrain_multi.sh
deleted file mode 100644
index 6fb6b864..00000000
--- a/image_classification/MAE/run_pretrain_multi.sh
+++ /dev/null
@@ -1,8 +0,0 @@
-CUDA_VISIBLE_DEVICES=0,1,2,3,4 \
-python main_multi_gpu_pretrain.py \
--cfg='./configs/vit_base_patch16_224_pretrain_dec1.yaml' \
--dataset='imagenet2012' \
--batch_size=8 \
--data_path='/dataset/imagenet' \
--mae_pretrain \
-#-amp
diff --git a/image_classification/MAE/run_pretrain_multi_resume.sh b/image_classification/MAE/run_pretrain_multi_resume.sh
deleted file mode 100644
index 1ff2fd94..00000000
--- a/image_classification/MAE/run_pretrain_multi_resume.sh
+++ /dev/null
@@ -1,10 +0,0 @@
-CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
-python main_multi_gpu_pretrain.py \
--cfg='./configs/vit_base_patch16_224_pretrain.yaml' \
--dataset='imagenet2012' \
--batch_size=256 \
--data_path='/dataset/imagenet' \
--resume='./output/train-20211210-08-41-14/MAE-Epoch-12-Loss-0.9377176860235059' \
--last_epoch=12 \
--mae_pretrain \
--amp
diff --git a/image_classification/MAE/stat_define.py b/image_classification/MAE/stat_define.py
deleted file mode 100644
index 963482d7..00000000
--- a/image_classification/MAE/stat_define.py
+++ /dev/null
@@ -1,61 +0,0 @@
-import os
-import glob
-import paddle
-from config import get_config
-from transformer import build_mae_pretrain as build_model
-
-def count_gelu(layer, inputs, output):
- activation_flops = 8
- x = inputs[0]
- num = x.numel()
- layer.total_ops += num * activation_flops
-
-
-def count_softmax(layer, inputs, output):
- softmax_flops = 5 # max/substract, exp, sum, divide
- x = inputs[0]
- num = x.numel()
- layer.total_ops += num * softmax_flops
-
-
-def count_layernorm(layer, inputs, output):
- layer_norm_flops = 5 # get mean (sum), get variance (square and sum), scale(multiply)
- x = inputs[0]
- num = x.numel()
- layer.total_ops += num * layer_norm_flops
-
-
-cfg = './configs/vit_large_patch32_384.yaml'
-#input_size = (1, 3, 224, 224)
-input_size = (1, 3, 384, 384)
-config = get_config(cfg)
-model = build_model(config)
-
-custom_ops = {paddle.nn.GELU: count_gelu,
- paddle.nn.LayerNorm: count_layernorm,
- paddle.nn.Softmax: count_softmax,
- }
-print(os.path.basename(cfg))
-paddle.flops(model,
- input_size=input_size,
- custom_ops=custom_ops,
- print_detail=False)
-
-
-#for cfg in glob.glob('./configs/*.yaml'):
-# #cfg = './configs/swin_base_patch4_window7_224.yaml'
-# input_size = (1, 3, int(cfg[-8:-5]), int(cfg[-8:-5]))
-# config = get_config(cfg)
-# model = build_model(config)
-#
-#
-# custom_ops = {paddle.nn.GELU: count_gelu,
-# paddle.nn.LayerNorm: count_layernorm,
-# paddle.nn.Softmax: count_softmax,
-# }
-# print(os.path.basename(cfg))
-# paddle.flops(model,
-# input_size=input_size,
-# custom_ops=custom_ops,
-# print_detail=False)
-# print('-----------')
diff --git a/image_classification/MAE/tests/__init__.py b/image_classification/MAE/tests/__init__.py
deleted file mode 100644
index 84952a81..00000000
--- a/image_classification/MAE/tests/__init__.py
+++ /dev/null
@@ -1 +0,0 @@
-# init
\ No newline at end of file
diff --git a/image_classification/MAE/tests/test_config.py b/image_classification/MAE/tests/test_config.py
deleted file mode 100644
index 6806e8a1..00000000
--- a/image_classification/MAE/tests/test_config.py
+++ /dev/null
@@ -1,72 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import unittest
-import argparse
-from config import update_config, get_config
-
-class ConfigTest(unittest.TestCase):
- def setUp(self):
- parser = argparse.ArgumentParser('')
- parser.add_argument('-cfg', type=str, default=None)
- parser.add_argument('-dataset', type=str, default="cifar10")
- parser.add_argument('-batch_size', type=int, default=128)
- parser.add_argument('-image_size', type=int, default=256)
- parser.add_argument('-ngpus', type=int, default=None)
- parser.add_argument('-data_path', type=str, default='/cifar10/')
- parser.add_argument('-eval', action='store_false') # enable eval
- parser.add_argument('-pretrained', type=str, default='pretrained')
- parser.add_argument('-resume', type=str, default=None)
- parser.add_argument('-last_epoch', type=int, default=None)
- self.args = parser.parse_args()
-
- def tearDown(self):
- pass
-
- def test_update_config(self):
- config = get_config()
- config = update_config(config, self.args)
-
- self.assertEqual(config.DATA.DATASET, 'cifar10')
- self.assertEqual(config.DATA.BATCH_SIZE, 128)
- self.assertEqual(config.DATA.IMAGE_SIZE, 256)
- self.assertEqual(config.DATA.DATA_PATH, '/cifar10/')
- self.assertEqual(config.EVAL, True)
- self.assertEqual(config.DATA.BATCH_SIZE_EVAL, 128)
- self.assertEqual(config.MODEL.PRETRAINED, 'pretrained')
-
- def test_update_config_from_file(self):
- config = get_config()
- self.args.cfg = './tests/test_config.yaml'
- self.args.image_size = None
- self.args.ngpus = None
- config = update_config(config, self.args)
-
- self.assertEqual(config.DATA.IMAGE_SIZE, 384)
- self.assertEqual(config.DATA.CROP_PCT, 1.0)
-
- self.assertEqual(config.MODEL.TRANS.PATCH_SIZE, 16)
- self.assertEqual(config.MODEL.TRANS.EMBED_DIM, 768)
- self.assertEqual(config.MODEL.TRANS.MLP_RATIO, 4.0)
- self.assertEqual(config.MODEL.TRANS.DEPTH, 12)
- self.assertEqual(config.MODEL.TRANS.NUM_HEADS, 12)
- self.assertEqual(config.MODEL.TRANS.QKV_BIAS, True)
-
- self.assertEqual(config.MODEL.NAME, 'vit_base_patch16_224')
- self.assertEqual(config.MODEL.TYPE, 'ViT')
-
- def test_get_config(self):
- config1 = get_config()
- config2 = get_config()
- self.assertEqual(config1, config2)
diff --git a/image_classification/MAE/tests/test_config.yaml b/image_classification/MAE/tests/test_config.yaml
deleted file mode 100644
index 19709906..00000000
--- a/image_classification/MAE/tests/test_config.yaml
+++ /dev/null
@@ -1,14 +0,0 @@
-DATA:
- IMAGE_SIZE: 384
- CROP_PCT: 1.0
-MODEL:
- TYPE: ViT
- NAME: vit_base_patch16_224
- TRANS:
- PATCH_SIZE: 16
- EMBED_DIM: 768
- MLP_RATIO: 4.0
- DEPTH: 12
- NUM_HEADS: 12
- QKV_BIAS: true
-
diff --git a/image_classification/MAE/tests/test_datasets.py b/image_classification/MAE/tests/test_datasets.py
deleted file mode 100644
index 79952137..00000000
--- a/image_classification/MAE/tests/test_datasets.py
+++ /dev/null
@@ -1,147 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import unittest
-import argparse
-from config import *
-from datasets import *
-from paddle.io import DataLoader
-#from multiprocessing import SimpleQueue
-
-#paddle.set_device('cpu')
-
-class DatasetTest(unittest.TestCase):
- @classmethod
- def setUpClass(cls):
- parser = argparse.ArgumentParser('')
- parser.add_argument('-cfg', type=str, default=None)
- parser.add_argument('-dataset', type=str, default='imagenet2012')
- parser.add_argument('-batch_size', type=int, default=4)
- parser.add_argument('-image_size', type=int, default=224)
- parser.add_argument('-ngpus', type=int, default=None)
- parser.add_argument('-data_path', type=str, default='/dataset/imagenet')
- parser.add_argument('-eval', action='store_true')
- parser.add_argument('-pretrained', type=str, default=None)
- parser.add_argument('-resume', type=str, default=None)
- parser.add_argument('-last_epoch', type=int, default=None)
- cls.args = parser.parse_args()
- cls.config = get_config()
- cls.config = update_config(cls.config, cls.args)
-
- cls.dataset_train = get_dataset(DatasetTest.config, mode='train')
- cls.dataset_test = get_dataset(DatasetTest.config, mode='val')
-
- @classmethod
- def tearDown(cls):
- pass
-
- @unittest.skip('skip for debug')
- def test_shape(self):
- sample = next(iter(DatasetTest.dataset_train))
- self.assertEqual([3, 224, 224], sample[0].shape)
-
- sample = next(iter(DatasetTest.dataset_test))
- self.assertEqual([3, 224, 224], sample[0].shape)
-
- @unittest.skip('skip for debug')
- def test_scaling(self):
- sample = next(iter(DatasetTest.dataset_train))[0]
- self.assertTrue(paddle.any(sample < 0))
- self.assertTrue(paddle.any(sample > 0))
- self.assertGreaterEqual(1, sample.max().cpu().numpy())
- self.assertLessEqual(-1, sample.min().cpu().numpy())
-
- sample = next(iter(DatasetTest.dataset_test))[0]
- self.assertGreaterEqual(1, sample.max().cpu().numpy())
- self.assertLessEqual(-1, sample.min().cpu().numpy())
- self.assertTrue(paddle.any(sample < 0))
- self.assertTrue(paddle.any(sample > 0))
-
- @unittest.skip('skip for debug')
- def test_single_process_dataloader(self):
- self._test_loader(DatasetTest.dataset_train, 'train', False)
- self._test_loader(DatasetTest.dataset_test, 'test', False)
-
- def _test_loader(self, dataset, mode, multi_process):
- dataloader = get_dataloader(DatasetTest.config,
- dataset,
- mode=mode,
- multi_process=multi_process)
- for idx, _ in enumerate(dataloader):
- if idx > 0 and idx % 1 == 0:
- print(f'----- test single process dataloader: {idx}/{len(dataloader)}')
- if idx == 10:
- return
-
- @unittest.skip('skip for debug')
- def test_multi_process_dataloader(self):
- tester = Tester()
- tester.run()
- self.assertEqual(tester.n_samples, 50000)
-
-
-
-
-class Tester:
- def __init__(self):
- parser = argparse.ArgumentParser('')
- parser.add_argument('-cfg', type=str, default=None)
- parser.add_argument('-dataset', type=str, default='imagenet2012')
- parser.add_argument('-batch_size', type=int, default=256)
- parser.add_argument('-image_size', type=int, default=224)
- parser.add_argument('-data_path', type=str, default='/dataset/imagenet/')
- parser.add_argument('-eval', action='store_false') # set test batch size
- parser.add_argument('-pretrained', type=str, default=None)
- args = parser.parse_args()
- self.config = get_config()
- self.config = update_config(self.config, args)
- self.dataset_train = get_dataset(self.config, mode='train')
- self.dataset_test = get_dataset(self.config, mode='val')
- self.n_samples = 0
-
- def run(self, mode='test'):
- # https://github.com/PaddlePaddle/Paddle/blob/5d8e4395b61929627151f6fd4a607589288a78bf/python/paddle/distributed/spawn.py#L272
- context = dist.spawn(self.main_worker, args=(mode,))
- self.n_samples = context.return_queues[0].get()
- print(f'----- total samples: {self.n_samples}')
-
- def main_worker(self, *args):
- mode = args[0]
- dist.init_parallel_env()
- local_rank = dist.get_rank()
- if mode == 'train':
- n_samples = self._test_loader(self.config, self.dataset_train, 'train', True)
- else:
- n_samples = self._test_loader(self.config, self.dataset_test, 'test', True)
-
- n_samples = paddle.to_tensor(np.array([n_samples]))
- dist.reduce(n_samples, 0)
- if local_rank == 0:
- return n_samples.cpu().numpy()
-
-
- def _test_loader(self, config, dataset, mode, multi_process):
- n_samples = 0
- dataloader = get_dataloader(config,
- dataset,
- mode=mode,
- multi_process=multi_process)
- local_rank = dist.get_rank()
- for idx, data in enumerate(dataloader):
- if idx > 0 and idx % 1 == 0:
- print(f'----- test single process({local_rank}) dataloader: {idx}/{len(dataloader)}')
- #print(local_rank, data[1])
- n_samples += data[0].shape[0]
-
- return n_samples
diff --git a/image_classification/MAE/tests/test_transformer.py b/image_classification/MAE/tests/test_transformer.py
deleted file mode 100644
index bbfefc49..00000000
--- a/image_classification/MAE/tests/test_transformer.py
+++ /dev/null
@@ -1,115 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import unittest
-import numpy as np
-import paddle
-import paddle.nn as nn
-import paddle.nn.functional as F
-from config import *
-from transformer import build_mae_pretrain
-from transformer import PatchEmbedding
-from transformer import Attention
-from transformer import Mlp
-from transformer import Encoder
-
-
-class TransformerTest(unittest.TestCase):
- @classmethod
- def setUpClass(cls):
- paddle.set_device('cpu')
- cls.config = get_config()
- cls.dummy_img = np.random.randn(4, 3, 224, 224).astype('float32')
- cls.dummy_tensor = paddle.to_tensor(cls.dummy_img)
- cls.mae = build_mae_pretrain(cls.config)
- cls.mae.train()
-
- @classmethod
- def tearDown(cls):
- pass
-
- # @unittest.skip('skip for debug')
- def test_out_shape(self):
- reconstruct, mask = TransformerTest.mae(TransformerTest.dummy_tensor)
- self.assertEqual(reconstruct.shape, [4, 49, 768])
- self.assertEqual(mask.shape, [4, 49, 768])
-
- @unittest.skip('skip for debug')
- def test_all_parameters_updated(self):
- optim = paddle.optimizer.SGD(parameters=TransformerTest.mae.parameters(), learning_rate=0.1)
- reconstruct, masked_image = TransformerTest.mae(TransformerTest.dummy_tensor)
- loss = F.mse_loss(reconstruct, masked_image)
- loss.backward()
-
- for name, param in TransformerTest.mae.named_parameters():
- if not param.stop_gradient:
- self.assertIsNotNone(param.gradient())
- # self.assertNotEqual(0, np.sum(param.gradient() ** 2))
-
- # @unittest.skip('skip for debug')
- def test_embeddings(self):
- embed = PatchEmbedding()
- dummy_img = np.random.randn(4, 3, 224, 224).astype('float32')
- dummy_tensor = paddle.to_tensor(dummy_img)
-
- patch_out = embed.patch_embedding(dummy_tensor)
- embed_out = embed(dummy_tensor)
- self.assertEqual(patch_out.shape, [4, 768, 14, 14])
- self.assertEqual(embed.cls_token.shape, [1, 1, 768])
- self.assertEqual(embed_out.shape, [4, 14 * 14 + 1, 768])
-
- # @unittest.skip('skip for debug')
- def test_attention(self):
- attn_op = Attention(
- TransformerTest.config.MODEL.TRANS.ENCODER.EMBED_DIM,
- TransformerTest.config.MODEL.TRANS.ENCODER.NUM_HEADS,
- TransformerTest.config.MODEL.TRANS.QKV_BIAS)
- dummy_img = np.random.randn(4, 50, 768).astype('float32')
- dummy_tensor = paddle.to_tensor(dummy_img)
-
- out, attn = attn_op(dummy_tensor)
- self.assertEqual(attn.shape, [4, 12, 50, 50])
- self.assertEqual(out.shape, [4, 50, 768])
-
- def test_mlp(self):
- mlp_op = Mlp(
- TransformerTest.config.MODEL.TRANS.ENCODER.EMBED_DIM,
- TransformerTest.config.MODEL.TRANS.MLP_RATIO)
- dummy_img = np.random.randn(4, 50, 768).astype('float32')
- dummy_tensor = paddle.to_tensor(dummy_img)
-
- out = mlp_op(dummy_tensor)
- self.assertEqual(out.shape, [4, 50, 768])
-
- def test_position_embedding_not_update(self):
- origin = TransformerTest.mae.position_embedding.get_encoder_embedding().clone()
- optim = paddle.optimizer.SGD(parameters=TransformerTest.mae.parameters(), learning_rate=0.1)
- reconstruct, masked_image = TransformerTest.mae(TransformerTest.dummy_tensor)
- loss = F.mse_loss(reconstruct, masked_image)
- loss.backward()
- optim.step()
- update = TransformerTest.mae.position_embedding.get_encoder_embedding().clone()
- self.assertTrue((origin.numpy() == update.numpy()).all())
-
- def test_encoder(self):
- encoder_op = Encoder(
- TransformerTest.config.MODEL.TRANS.ENCODER.EMBED_DIM,
- TransformerTest.config.MODEL.TRANS.ENCODER.NUM_HEADS,
- TransformerTest.config.MODEL.TRANS.ENCODER.DEPTH,
- )
- dummy_img = np.random.randn(4, 50, 768).astype('float32')
- dummy_tensor = paddle.to_tensor(dummy_img)
-
- out, _ = encoder_op(dummy_tensor)
- self.assertEqual(out.shape, [4, 50, 768])
diff --git a/image_classification/MAE/tests/test_utils.py b/image_classification/MAE/tests/test_utils.py
deleted file mode 100644
index 49366af4..00000000
--- a/image_classification/MAE/tests/test_utils.py
+++ /dev/null
@@ -1,90 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-import unittest
-import paddle
-import paddle.nn as nn
-from utils import AverageMeter
-from utils import WarmupCosineScheduler
-from utils import get_exclude_from_weight_decay_fn
-
-
-class UtilTest(unittest.TestCase):
- @classmethod
- def setUpClass(cls):
- pass
-
- @classmethod
- def tearDown(cls):
- pass
-
- def test_average_meter(self):
- meter = AverageMeter()
- for i in range(1, 101):
- meter.update(i, 1)
- self.assertEqual(meter.avg, 50.5)
-
- def test_warmup_cosine_scheduler(self):
- sch = WarmupCosineScheduler(learning_rate=0.1,
- warmup_start_lr=1e-5,
- start_lr=0.1,
- end_lr=0.0,
- warmup_epochs=10,
- total_epochs=100,
- last_epoch=-1)
- lrs = []
- for epoch in range(100):
- lr = sch.get_lr()
- lrs.append(lr)
- sch.step()
- lrs.append(sch.get_lr())
-
- self.assertEqual(lrs[0], 1e-5)
- self.assertEqual(lrs[10], 0.1)
- self.assertEqual(lrs[-1], 0.0)
- self.assertGreaterEqual(min(lrs[0:10]), 1e-5)
- self.assertLessEqual(max(lrs[0:10]), 0.1)
- self.assertGreaterEqual(min(lrs[10::]), 0.0)
- self.assertLessEqual(max(lrs[10::]), 0.1)
-
- def test_warmup_cosine_scheduler_last_epoch(self):
- sch = WarmupCosineScheduler(learning_rate=0.1,
- warmup_start_lr=1e-5,
- start_lr=0.1,
- end_lr=0.0,
- warmup_epochs=10,
- total_epochs=100,
- last_epoch=9)
- lrs = []
- for epoch in range(10, 100):
- lr = sch.get_lr()
- lrs.append(lr)
- sch.step()
- lrs.append(sch.get_lr())
-
- self.assertEqual(lrs[0], 0.1)
- self.assertEqual(lrs[-1], 0.0)
- self.assertGreaterEqual(min(lrs[::]), 0.0)
- self.assertLessEqual(max(lrs[::]), 0.1)
-
- def test_get_exclude_from_weight_decay_fn(self):
- model = nn.Linear(10, 100, bias_attr=True)
- exclude_list = ['bias']
- fn = get_exclude_from_weight_decay_fn(exclude_list)
- # should return false if name in exclude_list
- for name, param in model.named_parameters():
- if name.endswith('weight'):
- self.assertTrue(fn(name))
- elif name.endswith('bias'):
- self.assertFalse(fn(name))
diff --git a/image_classification/MAE/utils.py b/image_classification/MAE/utils.py
deleted file mode 100644
index 44800527..00000000
--- a/image_classification/MAE/utils.py
+++ /dev/null
@@ -1,120 +0,0 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
-#
-# Licensed under the Apache License, Version 2.0 (the "License");
-# you may not use this file except in compliance with the License.
-# You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing, software
-# distributed under the License is distributed on an "AS IS" BASIS,
-# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
-# See the License for the specific language governing permissions and
-# limitations under the License.
-
-"""utils for ViT
-
-Contains AverageMeter for monitoring, get_exclude_from_decay_fn for training
-and WarmupCosineScheduler for training
-
-"""
-
-import math
-from paddle.optimizer.lr import LRScheduler
-
-
-class AverageMeter():
- """ Meter for monitoring losses"""
- def __init__(self):
- self.avg = 0
- self.sum = 0
- self.cnt = 0
- self.reset()
-
- def reset(self):
- """reset all values to zeros"""
- self.avg = 0
- self.sum = 0
- self.cnt = 0
-
- def update(self, val, n=1):
- """update avg by val and n, where val is the avg of n values"""
- self.sum += val * n
- self.cnt += n
- self.avg = self.sum / self.cnt
-
-
-
-def get_exclude_from_weight_decay_fn(exclude_list=[]):
- """ Set params with no weight decay during the training
-
- For certain params, e.g., positional encoding in ViT, weight decay
- may not needed during the learning, this method is used to find
- these params.
-
- Args:
- exclude_list: a list of params names which need to exclude
- from weight decay.
- Returns:
- exclude_from_weight_decay_fn: a function returns True if param
- will be excluded from weight decay
- """
- if len(exclude_list) == 0:
- exclude_from_weight_decay_fn = None
- else:
- def exclude_fn(param):
- for name in exclude_list:
- if param.endswith(name):
- return False
- return True
- exclude_from_weight_decay_fn = exclude_fn
- return exclude_from_weight_decay_fn
-
-
-class WarmupCosineScheduler(LRScheduler):
- """Warmup Cosine Scheduler
-
- First apply linear warmup, then apply cosine decay schedule.
- Linearly increase learning rate from "warmup_start_lr" to "start_lr" over "warmup_epochs"
- Cosinely decrease learning rate from "start_lr" to "end_lr" over remaining
- "total_epochs - warmup_epochs"
-
- Attributes:
- learning_rate: the starting learning rate (without warmup), not used here!
- warmup_start_lr: warmup starting learning rate
- start_lr: the starting learning rate (without warmup)
- end_lr: the ending learning rate after whole loop
- warmup_epochs: # of epochs for warmup
- total_epochs: # of total epochs (include warmup)
- """
- def __init__(self,
- learning_rate,
- warmup_start_lr,
- start_lr,
- end_lr,
- warmup_epochs,
- total_epochs,
- cycles=0.5,
- last_epoch=-1,
- verbose=False):
- """init WarmupCosineScheduler """
- self.warmup_epochs = warmup_epochs
- self.total_epochs = total_epochs
- self.warmup_start_lr = warmup_start_lr
- self.start_lr = start_lr
- self.end_lr = end_lr
- self.cycles = cycles
- super(WarmupCosineScheduler, self).__init__(learning_rate, last_epoch, verbose)
-
- def get_lr(self):
- """ return lr value """
- if self.last_epoch < self.warmup_epochs:
- val = (self.start_lr - self.warmup_start_lr) * float(
- self.last_epoch)/float(self.warmup_epochs) + self.warmup_start_lr
- return val
-
- progress = float(self.last_epoch - self.warmup_epochs) / float(
- max(1, self.total_epochs - self.warmup_epochs))
- val = max(0.0, 0.5 * (1. + math.cos(math.pi * float(self.cycles) * 2.0 * progress)))
- val = max(0.0, val * (self.start_lr - self.end_lr) + self.end_lr)
- return val
diff --git a/image_classification/README.md b/image_classification/README.md
index 025a21f3..2d23bfe5 100644
--- a/image_classification/README.md
+++ b/image_classification/README.md
@@ -6,6 +6,7 @@ PaddlePaddle training/validation code and pretrained models for **Image Classifi
This implementation is part of [PaddleViT](https://github.com/BR-IDL/PaddleViT.git) project.
## Update
+* Update (2022-02-14): Add imagenet train_list.txt and val_list.txt links.
* Update (2021-12-30): Add MobileViT model and multi scale sampler.
* Update (2021-12-28): Add HvT model.
* Update (2021-12-24): Add CvT model.
@@ -78,6 +79,8 @@ cd PaddleViT/image_classification
ImageNet2012 dataset is used in the following folder structure:
```
│imagenet/
+├──train_list.txt
+├──val_list.txt
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
@@ -91,6 +94,10 @@ ImageNet2012 dataset is used in the following folder structure:
│ │ ├── ......
│ ├── ......
```
+- `train_list.txt`: list of relative paths and labels of training images. You can download it from: [google](https://drive.google.com/file/d/10YGzx_aO3IYjBOhInKT_gY6p0mC3beaC/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1G5xYPczfs9koDb7rM4c0lA?pwd=a4vm)(a4vm)
+- `val_list.txt`: list of relative paths and labels of validation images. You can download it from: [google](https://drive.google.com/file/d/1aXHu0svock6MJSur4-FKjW0nyjiJaWHE/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1TFGda7uBZjR7g-A6YjQo-g?pwd=kdga)(kdga)
+
+
### Demo Example
To use the model with pretrained weights, go to the specific subfolder, then download the `.pdparam` weight file and change related file paths in the following python scripts. The model config files are located in `./configs/`.
diff --git a/image_classification/README_cn.md b/image_classification/README_cn.md
index 4bf06982..6711855d 100644
--- a/image_classification/README_cn.md
+++ b/image_classification/README_cn.md
@@ -6,6 +6,7 @@ PaddlePaddle用于图像分类的训练/评估代码和预训练模型。
此实现是 [PaddleViT](https://github.com/BR-IDL/PaddleViT.git) 项目的一部分.
## 更新
+* 更新 (2021-02-14): 添加 imagenet1k 的 train_list.txt 和 val_list.txt
* 更新 (2021-12-30): 添加 MobileViT 模型和 multi scale sampler.
* 更新 (2021-12-28): 添加 HvT 模型.
* 更新 (2021-12-24): 添加 CvT 模型.
@@ -74,9 +75,11 @@ cd PaddleViT/image_classification
## 基本用法
### 数据准备
-ImageNet2012 数据集用于以下文件结构:
+ImageNet2012 数据集使用以下的格式存储:
```
│imagenet/
+├──train_list.txt
+├──val_list.txt
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
@@ -90,6 +93,9 @@ ImageNet2012 数据集用于以下文件结构:
│ │ ├── ......
│ ├── ......
```
+- `train_list.txt`: 训练集图片的路径和标签。下载链接: [google](https://drive.google.com/file/d/10YGzx_aO3IYjBOhInKT_gY6p0mC3beaC/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1G5xYPczfs9koDb7rM4c0lA?pwd=a4vm)(a4vm)
+- `val_list.txt`: 验证集图片的相对路径和标签。下载链接: [google](https://drive.google.com/file/d/1aXHu0svock6MJSur4-FKjW0nyjiJaWHE/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1TFGda7uBZjR7g-A6YjQo-g?pwd=kdga)(kdga)
+
### Demo 示例
如果需要使用具有预训练权重的模型,请转到特定子文件夹,然后下载 `.pdparam` 权重文件,并在以下python脚本中更改相关文件路径,模型配置文件位于 `./configs/`.
diff --git a/self_supervised_learning/MAE/README.md b/self_supervised_learning/MAE/README.md
new file mode 100644
index 00000000..216e757e
--- /dev/null
+++ b/self_supervised_learning/MAE/README.md
@@ -0,0 +1,180 @@
+# Masked Autoencoders Are Scalable Vision Learners, [arxiv](https://arxiv.org/abs/2111.06377)
+
+PaddlePaddle training/validation code and pretrained models for **MAE**.
+
+The official pytorch implementation is [here](https://github.com/facebookresearch/mae).
+
+This implementation is developed by [PaddleViT](https://github.com/BR-IDL/PaddleViT.git).
+
+
+
+
+
MAE Model Overview
+
+
+
+### Update
+- Update (2022-03-02): Code is refactored and bugs are fixed.
+- Update (2022-02-15): Code is refactored and ported weights are uploaded.
+- Update (2021-12-13): Code is released.
+
+## Note:
+Current Version requires extra packages installed: `paddlenlp`.
+You can use the following command to install paddlenlp:
+```shell
+pip install paddlenlp
+```
+> Note: the reason to use paddlenlp is we found the AdamW in paddle cannot handle layer wise decay properly, instead the paddlenlp.ops.optimizer.AdamWLD works well in our case, so we import this op for temp fix.
+
+
+## Models Zoo
+| Finetuned Model | Acc@1 | Acc@5 | #Params | FLOPs | Image Size | Crop_pct | Interpolation | Link |
+|-------------------------------|-------|-------|---------|--------|------------|----------|---------------|--------------|
+| mae_finetuned_vit_base | 83.72 | 96.54 | 86.4M | 17.0G | 224 | 0.875 | bicubic | [google](https://drive.google.com/file/d/1txV3fWnu_Jr17tCCqk9e_pFeuh7GkmvU/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1rIV2lYHEIYhD0ScTxmMi5A?pwd=svaw)(svaw) |
+| mae_finetuned_vit_large | 85.95 | 97.57 | 304.1M | 59.9G | 224 | 0.875 | bicubic | [google](https://drive.google.com/file/d/1dzVWxQ0_XTKqKKpA3pSSVU57rT_g8nOe/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1zlqmA-_fqCNZiuKOPMTtQA?pwd=tp48)(tp48) |
+| mae_finetuned_vit_huge | 86.90 | 98.07 | 631.7M | 162.5G | 224 | 0.875 | bicubic | [google](https://drive.google.com/file/d/1xqjdPez4uG495w3akVbHbn4YqUB1Nmmk/view?usp=sharing)/[baidu](https://pan.baidu.com/s/17z-NK-akSlvYJSRZkUU2CQ?pwd=1fds)(1fds) |
+> *The results are evaluated on ImageNet2012 validation set.
+
+| Pretrained Model | Link |
+|-------------------------------|--------------|
+| mae_pretrain_vit_base | [google](https://drive.google.com/file/d/1K7ZEaDj1D56i7uTX46hSelf0Ydbpmtie/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1aFdDhA61-5lB9g6LoAlKoQ?pwd=3fu3)(3fu3) |
+| mae_pretrain_vit_large | [google](https://drive.google.com/file/d/1UagT3mz_cLHcjyIQfyyLOkXtJXda3UbS/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1UIZuA_3uk5v-AHX41rjd0A?pwd=9c3s)(9c3s) |
+| mae_pretrain_vit_huge | [google](https://drive.google.com/file/d/1Y1lIO_COL2vkz2YvrmYt2yI8iAiRNiPh/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1XN-WkiiICqQUXcmv44PUxw?pwd=vc42)(vc42) |
+
+> Note: current model weighs are ported from official repo for paddle, our trainied model weights are coming soon.
+
+## Notebooks
+We provide a few notebooks in aistudio to help you get started:
+
+**\*(coming soon)\***
+
+
+## Requirements
+- Python>=3.6
+- yaml>=0.2.5
+- [PaddlePaddle](https://www.paddlepaddle.org.cn/documentation/docs/en/install/index_en.html)>=2.2.0
+- [yacs](https://github.com/rbgirshick/yacs)>=0.1.8
+
+## Data
+ImageNet2012 dataset is used in the following folder structure:
+```
+│imagenet/
+├──train_list.txt
+├──val_list.txt
+├──train/
+│ ├── n01440764
+│ │ ├── n01440764_10026.JPEG
+│ │ ├── n01440764_10027.JPEG
+│ │ ├── ......
+│ ├── ......
+├──val/
+│ ├── n01440764
+│ │ ├── ILSVRC2012_val_00000293.JPEG
+│ │ ├── ILSVRC2012_val_00002138.JPEG
+│ │ ├── ......
+│ ├── ......
+```
+- `train_list.txt`: list of relative paths and labels of training images. You can download it from: [google](https://drive.google.com/file/d/10YGzx_aO3IYjBOhInKT_gY6p0mC3beaC/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1G5xYPczfs9koDb7rM4c0lA?pwd=a4vm)(a4vm)
+- `val_list.txt`: list of relative paths and labels of validation images. You can download it from: [google](https://drive.google.com/file/d/1aXHu0svock6MJSur4-FKjW0nyjiJaWHE/view?usp=sharing)/[baidu](https://pan.baidu.com/s/1TFGda7uBZjR7g-A6YjQo-g?pwd=kdga)(kdga)
+
+## Usage
+To use the model with pretrained weights, download the `.pdparam` weight file and change related file paths in the following python scripts. The model config files are located in `./configs/`.
+
+For example, assume the downloaded weight file is stored in `./vit_base_patch16_224.pdparams`, to use the `vit_base_patch16_224` model in python:
+```python
+from config import get_config
+from transformer import build_transformer as build_model
+# config files in ./configs/
+config = get_config('./configs/vit_base_patch16_224.yaml')
+# build model
+model = build_model(config)
+# load pretrained weights, .pdparams is NOT needed
+model_state_dict = paddle.load('./vit_base_patch16_224.pdparams')
+model.set_state_dict(model_state_dict)
+```
+
+## Evaluation
+To evaluate ViT model performance on ImageNet2012, run the following script using command line:
+```shell
+sh run_eval_multi.sh
+```
+or
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3 \
+python main_multi_gpu_finetune.py \
+ -cfg='./configs/vit_base_patch16_224_finetune.yaml' \
+ -dataset='imagenet2012' \
+ -batch_size=32 \
+ -data_path='/dataset/imagenet' \
+ -eval \
+ -pretrained='./mae_finetuned_vit_base'
+```
+
+
+## Finetuning
+To finetune the ViT model on ImageNet2012, run the following script using command line:
+
+```shell
+sh run_finetune_multi.sh
+```
+or
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+python main_multi_gpu_finetune.py \
+ -cfg='./configs/vit_base_patch16_224_finetune.yaml' \
+ -dataset='imagenet2012' \
+ -batch_size=32 \
+ -data_path='/dataset/imagenet' \
+ -pretrained='./mae_pretrain_vit_base'
+ -amp
+```
+
+## Linear probing
+To finetune(linear probe) the ViT model on ImageNet2012, run the following script using command line:
+
+```shell
+sh run_linear_probe_multi.sh
+```
+or
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+python main_multi_gpu_linearprobe.py \
+ -cfg='./configs/vit_base_patch16_224_linearprobe.yaml' \
+ -dataset='imagenet2012' \
+ -batch_size=32 \
+ -data_path='/dataset/imagenet' \
+ -pretrained='./mae_pretrain_vit_base'
+ -amp
+```
+
+## Pretraining
+To pretrain the ViT model on ImageNet2012, run the following script using command line:
+
+```shell
+sh run_pretrain_multi.sh
+```
+or
+```shell
+CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+python main_multi_gpu_pretrain.py \
+-cfg='./configs/vit_base_patch16_224_pretrain.yaml' \
+-dataset='imagenet2012' \
+-batch_size=32 \
+-data_path='/dataset/imagenet' \
+-amp
+```
+
+> Note: it is recommended to train the MAE model on multi-node GPUs.
+
+## Visualization Attention Map
+**(coming soon)**
+
+## Reference
+```
+@Article{MaskedAutoencoders2021,
+ author = {Kaiming He and Xinlei Chen and Saining Xie and Yanghao Li and Piotr Doll{\'a}r and Ross Girshick},
+ journal = {arXiv:2111.06377},
+ title = {Masked Autoencoders Are Scalable Vision Learners},
+ year = {2021},
+}
+```
diff --git a/self_supervised_learning/MAE/augment.py b/self_supervised_learning/MAE/augment.py
new file mode 100644
index 00000000..51b41090
--- /dev/null
+++ b/self_supervised_learning/MAE/augment.py
@@ -0,0 +1,506 @@
+# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Augmentation
+RandAug:
+- reference: RandAugment: Practical automated data augmentation with a reduced search space
+- https://arxiv.org/abs/1909.13719
+AutoAug:
+- reference: AutoAugment: Learning Augmentation Policies from Data
+- https://arxiv.org/abs/1805.09501
+"""
+
+import random
+import numpy as np
+from PIL import Image, ImageEnhance, ImageOps
+
+LEVEL_DENOM = 10
+#fill color is set to 128 instead fo image mean
+
+
+def auto_augment_policy_v0():
+ """policy v0: hack from timm"""
+ # ImageNet v0 policy from TPU EfficientNet impl, cannot find a paper reference.
+ policy = [
+ [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)],
+ [('Color', 0.4, 9), ('Equalize', 0.6, 3)],
+ [('Color', 0.4, 1), ('Rotate', 0.6, 8)],
+ [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
+ [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
+ [('Color', 0.2, 0), ('Equalize', 0.8, 8)],
+ [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
+ [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)],
+ [('Color', 0.6, 1), ('Equalize', 1.0, 2)],
+ [('Invert', 0.4, 9), ('Rotate', 0.6, 0)],
+ [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)],
+ [('Color', 0.4, 7), ('Equalize', 0.6, 0)],
+ [('Posterize', 0.4, 6), ('AutoContrast', 0.4, 7)],
+ [('Solarize', 0.6, 8), ('Color', 0.6, 9)],
+ [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)],
+ [('Rotate', 1.0, 7), ('TranslateYRel', 0.8, 9)],
+ [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)],
+ [('ShearY', 0.8, 0), ('Color', 0.6, 4)],
+ [('Color', 1.0, 0), ('Rotate', 0.6, 2)],
+ [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
+ [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
+ [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)],
+ [('Posterize', 0.8, 2), ('Solarize', 0.6, 10)],
+ [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
+ [('Color', 0.8, 6), ('Rotate', 0.4, 5)],
+ ]
+ policy = [[SubPolicy(*args) for args in subpolicy] for subpolicy in policy]
+ return policy
+
+
+def auto_augment_policy_v0r():
+ """policy v0r: hack from timm"""
+ # ImageNet v0 policy from TPU EfficientNet impl, with variation of Posterize used
+ # in Google research implementation (number of bits discarded increases with magnitude)
+ policy = [
+ [('Equalize', 0.8, 1), ('ShearY', 0.8, 4)],
+ [('Color', 0.4, 9), ('Equalize', 0.6, 3)],
+ [('Color', 0.4, 1), ('Rotate', 0.6, 8)],
+ [('Solarize', 0.8, 3), ('Equalize', 0.4, 7)],
+ [('Solarize', 0.4, 2), ('Solarize', 0.6, 2)],
+ [('Color', 0.2, 0), ('Equalize', 0.8, 8)],
+ [('Equalize', 0.4, 8), ('SolarizeAdd', 0.8, 3)],
+ [('ShearX', 0.2, 9), ('Rotate', 0.6, 8)],
+ [('Color', 0.6, 1), ('Equalize', 1.0, 2)],
+ [('Invert', 0.4, 9), ('Rotate', 0.6, 0)],
+ [('Equalize', 1.0, 9), ('ShearY', 0.6, 3)],
+ [('Color', 0.4, 7), ('Equalize', 0.6, 0)],
+ [('PosterizeIncreasing', 0.4, 6), ('AutoContrast', 0.4, 7)],
+ [('Solarize', 0.6, 8), ('Color', 0.6, 9)],
+ [('Solarize', 0.2, 4), ('Rotate', 0.8, 9)],
+ [('Rotate', 1.0, 7), ('TranslateYRel', 0.8, 9)],
+ [('ShearX', 0.0, 0), ('Solarize', 0.8, 4)],
+ [('ShearY', 0.8, 0), ('Color', 0.6, 4)],
+ [('Color', 1.0, 0), ('Rotate', 0.6, 2)],
+ [('Equalize', 0.8, 4), ('Equalize', 0.0, 8)],
+ [('Equalize', 1.0, 4), ('AutoContrast', 0.6, 2)],
+ [('ShearY', 0.4, 7), ('SolarizeAdd', 0.6, 7)],
+ [('PosterizeIncreasing', 0.8, 2), ('Solarize', 0.6, 10)],
+ [('Solarize', 0.6, 8), ('Equalize', 0.6, 1)],
+ [('Color', 0.8, 6), ('Rotate', 0.4, 5)],
+ ]
+ policy = [[SubPolicy(*args) for args in subpolicy] for subpolicy in policy]
+ return policy
+
+
+def auto_augment_policy_originalr():
+ """policy originalr: hack from timm"""
+ # ImageNet policy from https://arxiv.org/abs/1805.09501 with research posterize variation
+ policy = [
+ [('PosterizeIncreasing', 0.4, 8), ('Rotate', 0.6, 9)],
+ [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+ [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+ [('PosterizeIncreasing', 0.6, 7), ('PosterizeIncreasing', 0.6, 6)],
+ [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+ [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
+ [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
+ [('PosterizeIncreasing', 0.8, 5), ('Equalize', 1.0, 2)],
+ [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
+ [('Equalize', 0.6, 8), ('PosterizeIncreasing', 0.4, 6)],
+ [('Rotate', 0.8, 8), ('Color', 0.4, 0)],
+ [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
+ [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
+ [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+ [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+ [('Rotate', 0.8, 8), ('Color', 1.0, 2)],
+ [('Color', 0.8, 8), ('Solarize', 0.8, 7)],
+ [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
+ [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
+ [('Color', 0.4, 0), ('Equalize', 0.6, 3)],
+ [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+ [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+ [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+ [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+ [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+ ]
+ policy = [[SubPolicy(*args) for args in subpolicy] for subpolicy in policy]
+ return policy
+
+
+def auto_augment_policy_original():
+ """25 types of augment policies in original paper"""
+ policy = [
+ [('PosterizeOriginal', 0.4, 8), ('Rotate', 0.6, 9)],
+ [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+ [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+ [('PosterizeOriginal', 0.6, 7), ('PosterizeOriginal', 0.6, 6)],
+ [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+ [('Equalize', 0.4, 4), ('Rotate', 0.8, 8)],
+ [('Solarize', 0.6, 3), ('Equalize', 0.6, 7)],
+ [('PosterizeOriginal', 0.8, 5), ('Equalize', 1.0, 2)],
+ [('Rotate', 0.2, 3), ('Solarize', 0.6, 8)],
+ [('Equalize', 0.6, 8), ('PosterizeOriginal', 0.4, 6)],
+ [('Rotate', 0.8, 8), ('Color', 0.4, 0)],
+ [('Rotate', 0.4, 9), ('Equalize', 0.6, 2)],
+ [('Equalize', 0.0, 7), ('Equalize', 0.8, 8)],
+ [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+ [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+ [('Rotate', 0.8, 8), ('Color', 1.0, 2)],
+ [('Color', 0.8, 8), ('Solarize', 0.8, 7)],
+ [('Sharpness', 0.4, 7), ('Invert', 0.6, 8)],
+ [('ShearX', 0.6, 5), ('Equalize', 1.0, 9)],
+ [('Color', 0.4, 0), ('Equalize', 0.6, 3)],
+ [('Equalize', 0.4, 7), ('Solarize', 0.2, 4)],
+ [('Solarize', 0.6, 5), ('AutoContrast', 0.6, 5)],
+ [('Invert', 0.6, 4), ('Equalize', 1.0, 8)],
+ [('Color', 0.6, 4), ('Contrast', 1.0, 8)],
+ [('Equalize', 0.8, 8), ('Equalize', 0.6, 3)],
+ ]
+ policy = [[SubPolicy(*args) for args in subpolicy] for subpolicy in policy]
+ return policy
+
+
+class AutoAugment():
+ """Auto Augment
+ Randomly choose a tuple of augment ops from a list of policy
+ Then apply the tuple of augment ops to input image
+
+ Examples:
+ policy = auto_augment_policy_original()
+ augment = AutoAugment(policy)
+ transformed_image = augment(image)
+ """
+
+ def __init__(self, policy):
+ self.policy = policy
+
+ def __call__(self, image, policy_idx=None):
+ if policy_idx is None:
+ policy_idx = random.randint(0, len(self.policy) - 1)
+
+ sub_policy = self.policy[policy_idx]
+ for operation in sub_policy:
+ image = operation(image)
+ return image
+
+
+def rand_augment_policy_increasing(prob=0.5, magnitude_idx=9, magnitude_std=0.5):
+ """
+ Rand augment policy: default rand-m9-mstd0.5-inc1
+ """
+ policy = [
+ ('AutoContrast', prob, magnitude_idx, magnitude_std),
+ ('Equalize', prob, magnitude_idx, magnitude_std),
+ ('Invert', prob, magnitude_idx, magnitude_std),
+ ('Rotate', prob, magnitude_idx, magnitude_std),
+
+ ('PosterizeIncreasing', prob, magnitude_idx, magnitude_std),
+ ('SolarizeIncreasing', prob, magnitude_idx, magnitude_std),
+ ('SolarizeAdd', prob, magnitude_idx, magnitude_std),
+ ('ColorIncreasing', prob, magnitude_idx, magnitude_std),
+ ('ContrastIncreasing', prob, magnitude_idx, magnitude_std),
+ ('BrightnessIncreasing', prob, magnitude_idx, magnitude_std),
+ ('SharpnessIncreasing', prob, magnitude_idx, magnitude_std),
+
+ ('ShearX', prob, magnitude_idx, magnitude_std),
+ ('ShearY', prob, magnitude_idx, magnitude_std),
+ ('TranslateX', prob, magnitude_idx, magnitude_std),
+ ('TranslateY', prob, magnitude_idx, magnitude_std),
+ ]
+ policy = [SubPolicy(*args) for args in policy]
+ return policy
+
+
+class RandAugment():
+ """Rand Augment
+ Randomly choose N augment ops from a list of K policies
+ Then apply the N ops to input image
+
+ Examples:
+ policy = rand_augment_policy_original(magnitude_idx)
+ augment = RandAugment(policy)
+ transformed_image = augment(image)
+ """
+
+ def __init__(self, policy, num_layers=2):
+ """
+ Args:
+ policy: list of SubPolicy
+ num_layers: int
+ """
+ self.policy = policy
+ self.num_layers = num_layers
+
+ def __call__(self, image):
+ selected_idx = np.random.choice(len(self.policy), self.num_layers)
+
+ for policy_idx in selected_idx:
+ sub_policy = self.policy[policy_idx]
+ image = sub_policy(image)
+ return image
+
+
+class SubPolicy:
+ """Subpolicy
+ Read augment name and magnitude, apply augment with probability
+ Args:
+ op_name: str, augment operation name
+ prob: float, if prob > random prob, apply augment
+ magnitude: int, index of magnitude in preset magnitude ranges
+ magnitude_std: float, std of magnitude in preset magnitude ranges
+ """
+
+ def __init__(self, op_name, prob, magnitude, magnitude_std=0.5):
+ image_ops = {
+ 'ShearX': shear_x,
+ 'ShearY': shear_y,
+ 'TranslateX': translate_x_absolute,
+ 'TranslateY': translate_y_absolute,
+ 'TranslateXRel': translate_x_relative,
+ 'TranslateYRel': translate_y_relative,
+ 'Rotate': rotate,
+ 'AutoContrast': auto_contrast,
+ 'Invert': invert,
+ 'Equalize': equalize,
+ 'Solarize': solarize,
+ 'SolarizeIncreasing': solarize,
+ 'SolarizeAdd': solarize_add,
+ 'Posterize': posterize,
+ 'PosterizeIncreasing': posterize,
+ 'PosterizeOriginal': posterize,
+ 'Contrast': contrast,
+ 'ContrastIncreasing': contrast,
+ 'Color': color,
+ 'ColorIncreasing': color,
+ 'Brightness': brightness,
+ 'BrightnessIncreasing': brightness,
+ 'Sharpness': sharpness,
+ 'SharpnessIncreasing': sharpness,
+ }
+
+ level_fn = {
+ 'ShearX': shear_level_to_arg,
+ 'ShearY': shear_level_to_arg,
+ 'TranslateX': translate_absolute_level_to_arg,
+ 'TranslateY': translate_absolute_level_to_arg,
+ 'TranslateXRel': translate_relative_level_to_arg,
+ 'TranslateYRel': translate_relative_level_to_arg,
+ 'Rotate': rotate_level_to_arg,
+ 'AutoContrast': None,
+ 'Invert': None,
+ 'Equalize': None,
+ 'Solarize': solarize_level_to_arg,
+ 'SolarizeIncreasing': solarize_increasing_level_to_arg,
+ 'SolarizeAdd': solarize_add_level_to_arg,
+ 'Posterize': posterize_level_to_arg,
+ 'PosterizeIncreasing': posterize_increasing_level_to_arg,
+ 'PosterizeOriginal': posterize_original_level_to_arg,
+ 'Contrast': enhance_level_to_arg,
+ 'ContrastIncreasing': enhance_increasing_level_to_arg,
+ 'Color': enhance_level_to_arg,
+ 'ColorIncreasing': enhance_increasing_level_to_arg,
+ 'Brightness': enhance_level_to_arg,
+ 'BrightnessIncreasing': enhance_increasing_level_to_arg,
+ 'Sharpness': enhance_level_to_arg,
+ 'SharpnessIncreasing': enhance_increasing_level_to_arg,
+ }
+
+ self.prob = prob
+ self.magnitude = magnitude
+ self.magnitude_std = magnitude_std
+
+ self.ops = image_ops[op_name]
+ self.level_fn = level_fn[op_name]
+
+ def __call__(self, image):
+ if self.prob < 1.0 and random.random() > self.prob:
+ return image
+
+ magnitude = self.magnitude
+ # hack from timm auto_augment.py
+ if self.magnitude_std > 0:
+ if self.magnitude_std == float('inf'):
+ magnitude = random.uniform(0, magnitude)
+ elif self.magnitude_std > 0:
+ magnitude = random.gauss(magnitude, self.magnitude_std)
+ upper_bound = LEVEL_DENOM
+ magnitude = max(0, min(magnitude, upper_bound))
+ level_args = self.level_fn(magnitude) if self.level_fn is not None else tuple()
+ image = self.ops(image, *level_args)
+ return image
+
+
+#################################################################
+# Convert level to Image op arguments
+#################################################################
+def randomly_negate(value):
+ """negate the value with 0.5 prob"""
+ return -value if random.random() > 0.5 else value
+
+
+def shear_level_to_arg(level):
+ # range [-0.3, 0.3]
+ level = (level / LEVEL_DENOM) * 0.3
+ level = randomly_negate(level)
+ return level,
+
+
+def translate_absolute_level_to_arg(level):
+ # translate const = 100
+ level = (level / LEVEL_DENOM) * 100.
+ level = randomly_negate(level)
+ return level,
+
+
+def translate_relative_level_to_arg(level):
+ # range [-0.45, 0.45]
+ level = (level / LEVEL_DENOM) * 0.45
+ level = randomly_negate(level)
+ return level,
+
+
+def rotate_level_to_arg(level):
+ # range [-30, 30]
+ level = (level / LEVEL_DENOM) * 30.
+ level = randomly_negate(level)
+ return level,
+
+
+def solarize_level_to_arg(level):
+ # range [0, 256]
+ # intensity/severity of augmentation decreases with level
+ return int((level / LEVEL_DENOM) * 256),
+
+
+def solarize_increasing_level_to_arg(level):
+ # range [0, 256]
+ # intensity/severity of augmentation increases with level
+ return 256 - int((level / LEVEL_DENOM) * 256),
+
+
+def solarize_add_level_to_arg(level):
+ # range [0, 110]
+ return int((level / LEVEL_DENOM) * 110),
+
+
+def posterize_level_to_arg(level):
+ # range [0, 4]
+ # intensity/severity of augmentation decreases with level
+ return int((level / LEVEL_DENOM) * 4),
+
+
+def posterize_increasing_level_to_arg(level):
+ # range [4, 0]
+ # intensity/severity of augmentation increases with level
+ return 4 - int((level / LEVEL_DENOM) * 4),
+
+
+def posterize_original_level_to_arg(level):
+ # range [4, 8]
+ # intensity/severity of augmentation decreases with level
+ return int((level / LEVEL_DENOM) * 4) + 4,
+
+
+# For Contrast, Color, Brightness, Sharpness
+def enhance_level_to_arg(level):
+ # range [0.1, 1.9]
+ return (level / LEVEL_DENOM) * 1.8 + 0.1,
+
+
+# For ContrastIncreasing, ColorIncreasing, BrightnessIncreasing, SharpnessIncreasing
+def enhance_increasing_level_to_arg(level):
+ # range [0.1, 1.9]
+ level = (level / LEVEL_DENOM) * 0.9
+ level = max(0.1, 1.0 + randomly_negate(level))
+ return level,
+
+
+#################################################################
+# PIL Image transforms
+# https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.transform
+#################################################################
+def shear_x(image, factor, fillcolor=(128, 128, 128)):
+ return image.transform(image.size, Image.AFFINE, (1, factor, 0, 0, 1, 0), fillcolor=fillcolor)
+
+
+def shear_y(image, factor, fillcolor=(128, 128, 128)):
+ return image.transform(image.size, Image.AFFINE, (1, 0, 0, factor, 1, 0), fillcolor=fillcolor)
+
+
+def translate_x_absolute(image, pixels, fillcolor=(128, 128, 128)):
+ return image.transform(image.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0), fillcolor=fillcolor)
+
+
+def translate_y_absolute(image, pixels, fillcolor=(128, 128, 128)):
+ return image.transform(image.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels), fillcolor=fillcolor)
+
+
+def translate_x_relative(image, pct, fillcolor=(128, 128, 128)):
+ pixels = pct * image.size[0]
+ return image.transform(image.size, Image.AFFINE, (1, 0, pixels, 0, 1, 0), fillcolor=fillcolor)
+
+
+def translate_y_relative(image, pct, fillcolor=(128, 128, 128)):
+ pixels = pct * image.size[0]
+ return image.transform(image.size, Image.AFFINE, (1, 0, 0, 0, 1, pixels), fillcolor=fillcolor)
+
+
+def rotate(image, degrees):
+ return image.rotate(degrees)
+
+
+def auto_contrast(image, magnitude=None):
+ return ImageOps.autocontrast(image)
+
+
+def invert(image, magnitude=None):
+ return ImageOps.invert(image)
+
+
+def equalize(image, magnitude=None):
+ return ImageOps.equalize(image)
+
+
+def solarize(image, thresh):
+ return ImageOps.solarize(image, thresh)
+
+
+def solarize_add(image, add, thresh=128):
+ lut = []
+ for i in range(256):
+ if i < thresh:
+ lut.append(min(255, i + add))
+ else:
+ lut.append(i)
+ if image.mode in ("L", "RGB"):
+ if image.mode == "RGB" and len(lut) == 256:
+ lut = lut + lut + lut
+ return image.point(lut)
+
+ return image
+
+
+def posterize(image, bits_to_keep):
+ if bits_to_keep >= 8:
+ return image
+ return ImageOps.posterize(image, bits_to_keep)
+
+
+def contrast(image, factor):
+ return ImageEnhance.Contrast(image).enhance(factor)
+
+
+def color(image, factor):
+ return ImageEnhance.Color(image).enhance(factor)
+
+
+def brightness(image, factor):
+ return ImageEnhance.Brightness(image).enhance(factor)
+
+
+def sharpness(image, factor):
+ return ImageEnhance.Sharpness(image).enhance(factor)
diff --git a/image_classification/MAE/config.py b/self_supervised_learning/MAE/config.py
similarity index 51%
rename from image_classification/MAE/config.py
rename to self_supervised_learning/MAE/config.py
index 7a2cf65b..3860b1e9 100644
--- a/image_classification/MAE/config.py
+++ b/self_supervised_learning/MAE/config.py
@@ -13,11 +13,8 @@
# limitations under the License.
"""Configuration
-
-Configuration for data, model archtecture, and training, etc.
-Config can be set by .yaml file or by argparser(limited usage)
-
-
+Configurations for (1) data processing, (2) model archtecture, and (3) training settings, etc.
+Config can be set by .yaml file or by argparser
"""
import os
from yacs.config import CfgNode as CN
@@ -28,20 +25,20 @@
# data settings
_C.DATA = CN()
-_C.DATA.BATCH_SIZE = 256 # 256 # train batch_size for single GPU
-_C.DATA.BATCH_SIZE_EVAL = 8 # 64 # val batch_size for single GPU
+_C.DATA.BATCH_SIZE = 256 # train batch_size on single GPU
+_C.DATA.BATCH_SIZE_EVAL = None # (disabled in update_config) val batch_size on single GPU
_C.DATA.DATA_PATH = '/dataset/imagenet/' # path to dataset
-_C.DATA.DATASET = 'imagenet2012' # dataset name
-_C.DATA.IMAGE_SIZE = 224 # input image size: 224 for pretrain, 384 for finetune
-# input image scale ratio, scale is applied before centercrop in eval mode
-_C.DATA.CROP_PCT = 0.875
-_C.DATA.NUM_WORKERS = 4 # number of data loading threads
-_C.DATA.IMAGENET_MEAN = [0.485, 0.456, 0.406] # [0.5, 0.5, 0.5]
-_C.DATA.IMAGENET_STD = [0.229, 0.224, 0.225] # [0.5, 0.5, 0.5]
+_C.DATA.DATASET = 'imagenet2012' # dataset name, currently only support imagenet2012
+_C.DATA.IMAGE_SIZE = 224 # input image size e.g., 224
+_C.DATA.IMAGE_CHANNELS = 3 # input image channels: e.g., 3
+_C.DATA.CROP_PCT = 0.875 # input image scale ratio, scale is applied before centercrop in eval mode
+_C.DATA.NUM_WORKERS = 2 # number of data loading threads
+_C.DATA.IMAGENET_MEAN = [0.485, 0.456, 0.406] # imagenet mean values
+_C.DATA.IMAGENET_STD = [0.229, 0.224, 0.225] # imagenet std values
# model settings
_C.MODEL = CN()
-_C.MODEL.TYPE = 'MAE'
+_C.MODEL.TYPE = 'PRETRAIN' # [PRETRAIN, FINETUNE, LINEARPROBE] # used to fetch data augmentation
_C.MODEL.NAME = 'MAE'
_C.MODEL.RESUME = None
_C.MODEL.PRETRAINED = None
@@ -49,22 +46,22 @@
_C.MODEL.DROPOUT = 0.0
_C.MODEL.DROPPATH = 0.0
_C.MODEL.ATTENTION_DROPOUT = 0.0
-_C.MODEL.MAE_PRETRAIN = True
+_C.MODEL.GLOBAL_POOL = False # Pretrain: N/A, Finetune: True, Linearprobe: False
# transformer settings
-_C.MODEL.TRANS = CN()
-_C.MODEL.TRANS.PATCH_SIZE = 16
-_C.MODEL.TRANS.MLP_RATIO = 4.0
-_C.MODEL.TRANS.QKV_BIAS = True
-_C.MODEL.TRANS.MASK_RATIO = 0.75
-_C.MODEL.TRANS.ENCODER = CN()
-_C.MODEL.TRANS.ENCODER.DEPTH = 12
-_C.MODEL.TRANS.ENCODER.EMBED_DIM = 768
-_C.MODEL.TRANS.ENCODER.NUM_HEADS = 12
-_C.MODEL.TRANS.DECODER = CN()
-_C.MODEL.TRANS.DECODER.DEPTH = 8
-_C.MODEL.TRANS.DECODER.EMBED_DIM = 512
-_C.MODEL.TRANS.DECODER.NUM_HEADS = 8
+_C.MODEL.PATCH_SIZE = 16
+_C.MODEL.MLP_RATIO = 4.0
+_C.MODEL.QKV_BIAS = True
+_C.MODEL.MASK_RATIO = 0.75
+_C.MODEL.NORM_PIX_LOSS = True # effective only for Pretrain
+_C.MODEL.ENCODER = CN()
+_C.MODEL.ENCODER.DEPTH = 12
+_C.MODEL.ENCODER.EMBED_DIM = 768
+_C.MODEL.ENCODER.NUM_HEADS = 12
+_C.MODEL.DECODER = CN()
+_C.MODEL.DECODER.DEPTH = 8
+_C.MODEL.DECODER.EMBED_DIM = 512
+_C.MODEL.DECODER.NUM_HEADS = 16
# training settings (for Vit-L/16 pretrain)
@@ -73,53 +70,59 @@
_C.TRAIN.NUM_EPOCHS = 800
_C.TRAIN.WARMUP_EPOCHS = 40
_C.TRAIN.WEIGHT_DECAY = 0.05
-_C.TRAIN.BASE_LR = 1.5e-4
-_C.TRAIN.WARMUP_START_LR = 1e-6 # 0.0
-_C.TRAIN.END_LR = 0.0
+_C.TRAIN.BASE_LR = 1.5e-4
+_C.TRAIN.WARMUP_START_LR = 0.0
+_C.TRAIN.END_LR = 0.0 # 1e-6
_C.TRAIN.GRAD_CLIP = None
-_C.TRAIN.ACCUM_ITER = 1
-_C.TRAIN.LINEAR_SCALED_LR = 256
-_C.TRAIN.NORMALIZE_TARGET = True
+_C.TRAIN.ACCUM_ITER = 1
+_C.TRAIN.LINEAR_SCALED_LR = 512
+_C.TRAIN.LAYER_DECAY = None # used for finetuning only
+
+# optimizer
+_C.TRAIN.OPTIMIZER = CN()
+_C.TRAIN.OPTIMIZER.NAME = 'AdamW'
+_C.TRAIN.OPTIMIZER.EPS = 1e-8
+_C.TRAIN.OPTIMIZER.BETAS = (0.9, 0.95)
# train augmentation (only for finetune)
_C.TRAIN.SMOOTHING = 0.1
-_C.TRAIN.RAND_AUGMENT = False
-_C.TRAIN.RAND_AUGMENT_LAYERS = 9
-_C.TRAIN.RAND_AUGMENT_MAGNITUDE = 5 # scale from 0 to 10
+_C.TRAIN.COLOR_JITTER = 0.4
+_C.TRAIN.AUTO_AUGMENT = False
+_C.TRAIN.RAND_AUGMENT = True
+_C.TRAIN.RAND_AUGMENT_LAYERS = 2
+_C.TRAIN.RAND_AUGMENT_MAGNITUDE = 9 # scale from 0 to 9
+# mixup params
_C.TRAIN.MIXUP_ALPHA = 0.8
_C.TRAIN.MIXUP_PROB = 1.0
_C.TRAIN.MIXUP_SWITCH_PROB = 0.5
_C.TRAIN.MIXUP_MODE = 'batch'
_C.TRAIN.CUTMIX_ALPHA = 1.0
_C.TRAIN.CUTMIX_MINMAX = None
-
-_C.TRAIN.LR_SCHEDULER = CN()
-_C.TRAIN.LR_SCHEDULER.NAME = 'warmupcosine'
-_C.TRAIN.LR_SCHEDULER.MILESTONES = "30, 60, 90" # only used in StepLRScheduler
-_C.TRAIN.LR_SCHEDULER.DECAY_EPOCHS = 30 # only used in StepLRScheduler
-_C.TRAIN.LR_SCHEDULER.DECAY_RATE = 0.1 # only used in StepLRScheduler
-
-_C.TRAIN.OPTIMIZER = CN()
-_C.TRAIN.OPTIMIZER.NAME = 'AdamW'
-_C.TRAIN.OPTIMIZER.EPS = 1e-8
-_C.TRAIN.OPTIMIZER.BETAS = (0.9, 0.95) # same as MAE paper, for adamW
-_C.TRAIN.OPTIMIZER.MOMENTUM = 0.9
-
+# random erase parameters
+_C.TRAIN.RANDOM_ERASE_PROB = 0.25
+_C.TRAIN.RANDOM_ERASE_MODE = 'pixel'
+_C.TRAIN.RANDOM_ERASE_COUNT = 1
+_C.TRAIN.RANDOM_ERASE_SPLIT = False
# misc
-_C.SAVE = "./output"
-_C.TAG = "default"
-_C.SAVE_FREQ = 1 # freq to save chpt
-_C.REPORT_FREQ = 100 # freq to logging info
-_C.VALIDATE_FREQ = 100 # freq to do validation
-_C.SEED = 0
+_C.SAVE = "./output" # output folder, saves logs and weights
+_C.SAVE_FREQ = 10 # freq to save chpt
+_C.REPORT_FREQ = 20 # freq to logging info
+_C.VALIDATE_FREQ = 1 # freq to do validation
+_C.SEED = 0 # random seed
_C.EVAL = False # run evaluation only
-_C.AMP = False # mix precision training
-_C.LOCAL_RANK = 0
-_C.NGPUS = -1
+_C.AMP = False # auto mix precision training
def _update_config_from_file(config, cfg_file):
+ """Load cfg file (.yaml) and update config object
+
+ Args:
+ config: config object
+ cfg_file: config file (.yaml)
+ Return:
+ None
+ """
config.defrost()
with open(cfg_file, 'r') as infile:
yaml_cfg = yaml.load(infile, Loader=yaml.FullLoader)
@@ -128,13 +131,13 @@ def _update_config_from_file(config, cfg_file):
_update_config_from_file(
config, os.path.join(os.path.dirname(cfg_file), cfg)
)
- print('merging config from {}'.format(cfg_file))
config.merge_from_file(cfg_file)
config.freeze()
def update_config(config, args):
"""Update config by ArgumentParser
+ Configs that are often used can be updated from arguments
Args:
args: ArgumentParser contains options
Return:
@@ -145,40 +148,33 @@ def update_config(config, args):
config.defrost()
if args.dataset:
config.DATA.DATASET = args.dataset
- if args.eval:
- config.EVAL = True
if args.batch_size:
config.DATA.BATCH_SIZE = args.batch_size
- if config.EVAL:
- config.DATA.BATCH_SIZE_EVAL = args.batch_size
+ config.DATA.BATCH_SIZE_EVAL = args.batch_size
+ if args.batch_size_eval:
+ config.DATA.BATCH_SIZE_EVAL = args.batch_size_eval
if args.image_size:
config.DATA.IMAGE_SIZE = args.image_size
+ if args.accum_iter:
+ config.TRAIN.ACCUM_ITER = args.accum_iter
if args.data_path:
config.DATA.DATA_PATH = args.data_path
- if args.output is not None:
- config.SAVE = args.output
- if args.ngpus:
- config.NGPUS = args.ngpus
+ if args.eval:
+ config.EVAL = True
if args.pretrained:
config.MODEL.PRETRAINED = args.pretrained
- if args.mae_pretrain:
- config.MODEL.MAE_PRETRAIN = args.mae_pretrain
if args.resume:
config.MODEL.RESUME = args.resume
if args.last_epoch:
config.TRAIN.LAST_EPOCH = args.last_epoch
- if args.amp: # only during training
- if config.EVAL is True:
- config.AMP = False
- else:
- config.AMP = True
-
+ if args.amp: # only for training
+ config.AMP = not config.EVAL
# config.freeze()
return config
def get_config(cfg_file=None):
- """Return a clone of config or load from yaml file"""
+ """Return a clone of config and optionally overwrite it from yaml file"""
config = _C.clone()
if cfg_file:
_update_config_from_file(config, cfg_file)
diff --git a/self_supervised_learning/MAE/configs/vit_base_patch16_224_finetune.yaml b/self_supervised_learning/MAE/configs/vit_base_patch16_224_finetune.yaml
new file mode 100644
index 00000000..6bd2e558
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_base_patch16_224_finetune.yaml
@@ -0,0 +1,46 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: FINETUNE
+ NAME: vit_base_patch16_224
+ DROPPATH: 0.1
+ GLOBAL_POOL: True
+ PATCH_SIZE: 16
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ ENCODER:
+ EMBED_DIM: 768
+ DEPTH: 12
+ NUM_HEADS: 12
+TRAIN:
+ NUM_EPOCHS: 100 # same as MAE official readme
+ WARMUP_EPOCHS: 5
+ WEIGHT_DECAY: 0.05
+ BASE_LR: 5e-4
+ LINEAR_SCALED_LR: 256
+ END_LR: 1e-6
+ ACCUM_ITER: 1
+ OPTIMIZER:
+ NAME: 'AdamWDL'
+ BETAS: (0.9, 0.999)
+ LAYER_DECAY: 0.65
+ SMOOTHING: 0.1
+ RAND_AUGMENT: True
+ RAND_AUGMENT_LAYERS: 2
+ RAND_AUGMENT_MAGNITUDE: 9
+ MIXUP_ALPHA: 0.8
+ MIXUP_PROB: 1.0
+ MIXUP_SWITCH_PROB: 0.5
+ MIXUP_MODE: 'batch'
+ CUTMIX_ALPHA: 1.0
+ CUTMIX_MINMAX: None
+ RANDOM_ERASE_PROB: 0.25
+ RANDOM_ERASE_MODE: 'pixel'
+ RANDOM_ERASE_COUNT: 1
+ RANDOM_ERASE_SPLIT: False
+
+VALIDATE_FREQ: 1
+SAVE_FREQ: 10
+REPORT_FREQ: 100
+
diff --git a/self_supervised_learning/MAE/configs/vit_base_patch16_224_linearprobe.yaml b/self_supervised_learning/MAE/configs/vit_base_patch16_224_linearprobe.yaml
new file mode 100644
index 00000000..b4046fe8
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_base_patch16_224_linearprobe.yaml
@@ -0,0 +1,24 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: LINEARPROBE
+ NAME: vit_base_patch16_224
+ GLOBAL_POOL: False # enable cls_token
+ PATCH_SIZE: 16
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ ENCODER:
+ EMBED_DIM: 768
+ DEPTH: 12
+ NUM_HEADS: 12
+TRAIN:
+ NUM_EPOCHS: 90
+ WARMUP_EPOCHS: 10
+ WEIGHT_DECAY: 0.0
+ BASE_LR: 0.1
+ LINEAR_SCALED_LR: 256
+ END_LR: 0.0
+ ACCUM_ITER: 1
+ OPTIMIZER:
+ NAME: 'LARS'
diff --git a/self_supervised_learning/MAE/configs/vit_base_patch16_224_linearprobe_single_node.yaml b/self_supervised_learning/MAE/configs/vit_base_patch16_224_linearprobe_single_node.yaml
new file mode 100644
index 00000000..2fb83fbc
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_base_patch16_224_linearprobe_single_node.yaml
@@ -0,0 +1,24 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: LINEARPROBE
+ NAME: vit_base_patch16_224
+ GLOBAL_POOL: False # enable cls_token
+ PATCH_SIZE: 16
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ ENCODER:
+ EMBED_DIM: 768
+ DEPTH: 12
+ NUM_HEADS: 12
+TRAIN:
+ ACCUM_ITER: 4
+ NUM_EPOCHS: 90
+ WARMUP_EPOCHS: 10
+ WEIGHT_DECAY: 0.0
+ BASE_LR: 0.1
+ LINEAR_SCALED_LR: 256
+ END_LR: 0.0
+ OPTIMIZER:
+ NAME: 'LARS'
diff --git a/self_supervised_learning/MAE/configs/vit_base_patch16_224_pretrain.yaml b/self_supervised_learning/MAE/configs/vit_base_patch16_224_pretrain.yaml
new file mode 100644
index 00000000..df89c2e6
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_base_patch16_224_pretrain.yaml
@@ -0,0 +1,32 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: PRETRAIN
+ NAME: vit_base_patch16_224
+ DROPPATH: 0.0
+ PATCH_SIZE: 16
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ MASK_RATIO: 0.75
+ ENCODER:
+ EMBED_DIM: 768
+ DEPTH: 12
+ NUM_HEADS: 12
+ DECODER:
+ EMBED_DIM: 512
+ DEPTH: 8
+ NUM_HEADS: 16
+ NORM_PIX_LOSS: True
+TRAIN:
+ NUM_EPOCHS: 800
+ WARMUP_EPOCHS: 40
+ WEIGHT_DECAY: 0.05
+ BASE_LR: 1.5e-4
+ END_LR: 0.0
+ LINEAR_SCALED_LR: 256
+ GRAD_CLIP: None
+ ACCUM_ITER: 1
+ OPTIMIZER:
+ NAME: 'AdamW'
+ BETAS: (0.9, 0.95)
diff --git a/self_supervised_learning/MAE/configs/vit_base_patch16_224_pretrain_dec1.yaml b/self_supervised_learning/MAE/configs/vit_base_patch16_224_pretrain_dec1.yaml
new file mode 100644
index 00000000..20646c66
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_base_patch16_224_pretrain_dec1.yaml
@@ -0,0 +1,34 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: PRETRAIN
+ NAME: vit_base_patch16_224
+ DROPPATH: 0.0
+ PATCH_SIZE: 16
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ MASK_RATIO: 0.75
+ ENCODER:
+ EMBED_DIM: 768
+ DEPTH: 12
+ NUM_HEADS: 12
+ DECODER:
+ EMBED_DIM: 512
+ DEPTH: 1
+ NUM_HEADS: 16
+ NORM_PIX_LOSS: True
+TRAIN:
+ NUM_EPOCHS: 800
+ WARMUP_EPOCHS: 40
+ WEIGHT_DECAY: 0.05
+ BASE_LR: 1.5e-4
+ END_LR: 0.0
+ LINEAR_SCALED_LR: 256
+ GRAD_CLIP: None
+ ACCUM_ITER: 1
+ OPTIMIZER:
+ NAME: 'AdamWDL'
+ BETAS: (0.9, 0.95)
+
+SAVE_FREQ: 1
diff --git a/self_supervised_learning/MAE/configs/vit_huge_patch14_224_finetune.yaml b/self_supervised_learning/MAE/configs/vit_huge_patch14_224_finetune.yaml
new file mode 100644
index 00000000..f3bd0df1
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_huge_patch14_224_finetune.yaml
@@ -0,0 +1,46 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: FINETUNE
+ NAME: vit_huge_patch14_224
+ DROPPATH: 0.3
+ GLOBAL_POOL: True
+ PATCH_SIZE: 14
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ ENCODER:
+ EMBED_DIM: 1280
+ DEPTH: 32
+ NUM_HEADS: 16
+TRAIN:
+ NUM_EPOCHS: 50
+ WARMUP_EPOCHS: 5
+ WEIGHT_DECAY: 0.05
+ BASE_LR: 1e-3 # absolute_lr = base_lr * total_batch_size / 256
+ LINEAR_SCALED_LR: 256
+ END_LR: 1e-6
+ ACCUM_ITER: 1
+ OPTIMIZER:
+ NAME: 'AdamWDL'
+ BETAS: (0.9, 0.999)
+ LAYER_DECAY: 0.75 # same as MAE official readme
+ SMOOTHING: 0.1
+ RAND_AUGMENT: True
+ RAND_AUGMENT_LAYERS: 2
+ RAND_AUGMENT_MAGNITUDE: 9
+ MIXUP_ALPHA: 0.8
+ MIXUP_PROB: 1.0
+ MIXUP_SWITCH_PROB: 0.5
+ MIXUP_MODE: 'batch'
+ CUTMIX_ALPHA: 1.0
+ CUTMIX_MINMAX: None
+ RANDOM_ERASE_PROB: 0.25
+ RANDOM_ERASE_MODE: 'pixel'
+ RANDOM_ERASE_COUNT: 1
+ RANDOM_ERASE_SPLIT: False
+
+VALIDATE_FREQ: 1
+SAVE_FREQ: 10
+REPORT_FREQ: 100
+
diff --git a/self_supervised_learning/MAE/configs/vit_huge_patch14_224_linearprobe.yaml b/self_supervised_learning/MAE/configs/vit_huge_patch14_224_linearprobe.yaml
new file mode 100644
index 00000000..465d53dc
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_huge_patch14_224_linearprobe.yaml
@@ -0,0 +1,24 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: LINEARPROBE
+ NAME: vit_huge_patch14_224
+ GLOBAL_POOL: False
+ PATCH_SIZE: 14
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ ENCODER:
+ EMBED_DIM: 1280
+ DEPTH: 32
+ NUM_HEADS: 16
+TRAIN:
+ NUM_EPOCHS: 90
+ WARMUP_EPOCHS: 10
+ WEIGHT_DECAY: 0.0
+ BASE_LR: 0.1
+ LINEAR_SCALED_LR: 256
+ END_LR: 0.0
+ ACCUM_ITER: 1
+ OPTIMIZER:
+ NAME: 'LARS'
diff --git a/self_supervised_learning/MAE/configs/vit_huge_patch14_224_pretrain.yaml b/self_supervised_learning/MAE/configs/vit_huge_patch14_224_pretrain.yaml
new file mode 100644
index 00000000..a9bbc101
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_huge_patch14_224_pretrain.yaml
@@ -0,0 +1,32 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: PRETRAIN
+ NAME: vit_huge_patch14_224
+ DROPPATH: 0.0
+ PATCH_SIZE: 14
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ MASK_RATIO: 0.75
+ ENCODER:
+ EMBED_DIM: 1280
+ DEPTH: 32
+ NUM_HEADS: 16
+ DECODER:
+ EMBED_DIM: 512
+ DEPTH: 8
+ NUM_HEADS: 16
+ NORM_PIX_LOSS: True
+TRAIN:
+ NUM_EPOCHS: 800
+ WARMUP_EPOCHS: 40
+ WEIGHT_DECAY: 0.05
+ BASE_LR: 1.5e-4
+ END_LR: 0.0
+ LINEAR_SCALED_LR: 256
+ GRAD_CLIP: None
+ ACCUM_ITER: 1
+ OPTIMIZER:
+ NAME: 'AdamW'
+ BETAS: (0.9, 0.95)
diff --git a/self_supervised_learning/MAE/configs/vit_large_patch16_224_finetune.yaml b/self_supervised_learning/MAE/configs/vit_large_patch16_224_finetune.yaml
new file mode 100644
index 00000000..e09b9e4a
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_large_patch16_224_finetune.yaml
@@ -0,0 +1,46 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: FINETUNE
+ NAME: vit_large_patch16_224
+ DROPPATH: 0.2 # same as MAE official readme
+ GLOBAL_POOL: True
+ PATCH_SIZE: 16
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ ENCODER:
+ EMBED_DIM: 1024
+ DEPTH: 24
+ NUM_HEADS: 16
+TRAIN:
+ NUM_EPOCHS: 50
+ WARMUP_EPOCHS: 5
+ WEIGHT_DECAY: 0.05
+ BASE_LR: 1e-3 # absolute_lr = base_lr * total_batch_size / 256
+ LINEAR_SCALED_LR: 256
+ END_LR: 1e-6
+ ACCUM_ITER: 1
+ OPTIMIZER:
+ NAME: 'AdamWDL'
+ BETAS: (0.9, 0.999)
+ LAYER_DECAY: 0.75 # same as MAE official readme
+ SMOOTHING: 0.1
+ RAND_AUGMENT: True
+ RAND_AUGMENT_LAYERS: 2
+ RAND_AUGMENT_MAGNITUDE: 9
+ MIXUP_ALPHA: 0.8
+ MIXUP_PROB: 1.0
+ MIXUP_SWITCH_PROB: 0.5
+ MIXUP_MODE: 'batch'
+ CUTMIX_ALPHA: 1.0
+ CUTMIX_MINMAX: None
+ RANDOM_ERASE_PROB: 0.25
+ RANDOM_ERASE_MODE: 'pixel'
+ RANDOM_ERASE_COUNT: 1
+ RANDOM_ERASE_SPLIT: False
+
+VALIDATE_FREQ: 1
+SAVE_FREQ: 10
+REPORT_FREQ: 100
+
diff --git a/self_supervised_learning/MAE/configs/vit_large_patch16_224_linearprobe.yaml b/self_supervised_learning/MAE/configs/vit_large_patch16_224_linearprobe.yaml
new file mode 100644
index 00000000..249afd16
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_large_patch16_224_linearprobe.yaml
@@ -0,0 +1,24 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: LINEARPROBE
+ NAME: vit_large_patch16_224
+ GLOBAL_POOL: False # enable cls_token
+ PATCH_SIZE: 16
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ ENCODER:
+ EMBED_DIM: 1024
+ DEPTH: 24
+ NUM_HEADS: 16
+TRAIN:
+ NUM_EPOCHS: 90
+ WARMUP_EPOCHS: 10
+ WEIGHT_DECAY: 0.0
+ BASE_LR: 0.1
+ LINEAR_SCALED_LR: 256
+ END_LR: 0.0
+ ACCUM_ITER: 1
+ OPTIMIZER:
+ NAME: 'LARS'
diff --git a/self_supervised_learning/MAE/configs/vit_large_patch16_224_pretrain.yaml b/self_supervised_learning/MAE/configs/vit_large_patch16_224_pretrain.yaml
new file mode 100644
index 00000000..0fbd3d8e
--- /dev/null
+++ b/self_supervised_learning/MAE/configs/vit_large_patch16_224_pretrain.yaml
@@ -0,0 +1,32 @@
+DATA:
+ IMAGE_SIZE: 224
+ CROP_PCT: 0.875
+MODEL:
+ TYPE: PRETRAIN
+ NAME: vit_large_patch16_224
+ DROPPATH: 0.0
+ PATCH_SIZE: 16
+ MLP_RATIO: 4.0
+ QKV_BIAS: true
+ MASK_RATIO: 0.75
+ ENCODER:
+ EMBED_DIM: 1024
+ DEPTH: 24
+ NUM_HEADS: 16
+ DECODER:
+ EMBED_DIM: 512
+ DEPTH: 8
+ NUM_HEADS: 16
+ NORM_PIX_LOSS: True
+TRAIN:
+ NUM_EPOCHS: 800
+ WARMUP_EPOCHS: 40
+ WEIGHT_DECAY: 0.05
+ BASE_LR: 1.5e-4
+ END_LR: 0.0
+ LINEAR_SCALED_LR: 256
+ GRAD_CLIP: None
+ ACCUM_ITER: 1
+ OPTIMIZER:
+ NAME: 'AdamW'
+ BETAS: (0.9, 0.95)
diff --git a/image_classification/MAE/datasets.py b/self_supervised_learning/MAE/datasets.py
similarity index 51%
rename from image_classification/MAE/datasets.py
rename to self_supervised_learning/MAE/datasets.py
index 1d6c17d3..91f8b30b 100644
--- a/image_classification/MAE/datasets.py
+++ b/self_supervised_learning/MAE/datasets.py
@@ -12,32 +12,29 @@
# See the License for the specific language governing permissions and
# limitations under the License.
-"""
-Dataset related classes and methods for ViT training and validation
-Cifar10, Cifar100 and ImageNet2012 are supported
-"""
+"""Dataset related classes and methods for ViT training and validation"""
import os
import math
-from PIL import Image
from paddle.io import Dataset
from paddle.io import DataLoader
from paddle.io import DistributedBatchSampler
from paddle.vision import transforms
-from paddle.vision import datasets
from paddle.vision import image_load
from augment import auto_augment_policy_original
from augment import AutoAugment
-from augment import rand_augment_policy_original
+from augment import rand_augment_policy_increasing
from augment import RandAugment
-from masking_generator import RandomMaskingGenerator
-from transforms import RandomHorizontalFlip
from random_erasing import RandomErasing
+
class ImageNet2012Dataset(Dataset):
"""Build ImageNet2012 dataset
This class gets train/val imagenet datasets, which loads transfomed data and labels.
+ Note:
+ train_list.txt and val_list.txt is required.
+ Please refer https://github.com/BR-IDL/PaddleViT/image_classification#data-preparation
Attributes:
file_folder: path where imagenet images are stored
@@ -46,27 +43,17 @@ class ImageNet2012Dataset(Dataset):
label_list: list of labels of whole dataset
"""
- def __init__(self, file_folder, mode="train", transform=None):
+ def __init__(self, file_folder, is_train=True, transform_ops=None):
"""Init ImageNet2012 Dataset with dataset file path, mode(train/val), and transform"""
- super(ImageNet2012Dataset, self).__init__()
- assert mode in ["train", "val"]
+ super().__init__()
self.file_folder = file_folder
-
- if isinstance(transform, tuple):
- # training: transform = [transform, mask_generator]
- self.transform = transform[0]
- self.mask_generator = transform[1] # if mae finetune, mask_generator is None
- else:
- # val: transform = transform
- self.transform = transform
- self.mask_generator = None
+ self.transforms = transform_ops
self.img_path_list = []
self.label_list = []
- if mode == "train":
- self.list_file = os.path.join(self.file_folder, "train_list.txt")
- else:
- self.list_file = os.path.join(self.file_folder, "val_list.txt")
+ list_name = 'train_list.txt' if is_train else 'val_list.txt'
+ self.list_file = os.path.join(self.file_folder, list_name)
+ assert os.path.isfile(self.list_file), f'{self.list_file} not exist!'
with open(self.list_file, 'r') as infile:
for line in infile:
@@ -74,54 +61,64 @@ def __init__(self, file_folder, mode="train", transform=None):
img_label = int(line.strip().split()[1])
self.img_path_list.append(os.path.join(self.file_folder, img_path))
self.label_list.append(img_label)
- print(f'----- Imagenet2012 image {mode} list len = {len(self.label_list)}')
+ print(f'----- Imagenet2012 {list_name} len = {len(self.label_list)}')
def __len__(self):
return len(self.label_list)
def __getitem__(self, index):
data = image_load(self.img_path_list[index]).convert('RGB')
- data = self.transform(data)
- if self.mask_generator is not None:
- mask = self.mask_generator()
- else:
- mask = None
+ data = self.transforms(data)
+ label = self.label_list[index]
- if mask is None:
- label = self.label_list[index]
- return data, label
+ return data, label
- return data, mask
+def get_train_transforms_pretrain(config):
+ """Simple augmentation for pretraining"""
+ aug_op_list = [transforms.RandomResizedCrop(size=(config.DATA.IMAGE_SIZE, config.DATA.IMAGE_SIZE),
+ scale=(0.2, 1.0),
+ interpolation='bicubic'), # same as MAE pytorch
+ transforms.RandomHorizontalFlip(),
+ transforms.ToTensor(),
+ transforms.Normalize(mean=config.DATA.IMAGENET_MEAN, std=config.DATA.IMAGENET_STD)]
+ transforms_train = transforms.Compose(aug_op_list)
+ return transforms_train
-def get_train_transforms(config):
- """ Get training transforms
- For training, a RandomResizedCrop is applied, then normalization is applied with
- [0.5, 0.5, 0.5] mean and std. The input pixel values must be rescaled to [0, 1.]
- Outputs is converted to tensor
+def get_train_transforms_linearprobe(config):
+ """Weak augmentation for linear probing"""
+ aug_op_list = [transforms.RandomResizedCrop(size=(config.DATA.IMAGE_SIZE, config.DATA.IMAGE_SIZE),
+ scale=(0.08, 1.0),
+ interpolation='bicubic'), # same as MAE pytorch
+ transforms.RandomHorizontalFlip(),
+ transforms.ToTensor(),
+ transforms.Normalize(mean=config.DATA.IMAGENET_MEAN, std=config.DATA.IMAGENET_STD)]
+ transforms_train = transforms.Compose(aug_op_list)
+ return transforms_train
- Args:
- config: configs contains IMAGE_SIZE, see config.py for details
- Returns:
- transforms_train: training transforms
- """
+def get_train_transforms_finetune(config):
+ """Full augmentation for finetuning"""
aug_op_list = []
# STEP1: random crop and resize
aug_op_list.append(
transforms.RandomResizedCrop((config.DATA.IMAGE_SIZE, config.DATA.IMAGE_SIZE),
- scale=(0.05, 1.0), interpolation='bicubic'))
- # STEP2: auto_augment or color jitter
- if config.TRAIN.AUTO_AUGMENT:
+ scale=(0.08, 1.0), interpolation='bicubic'))# Same as MAE pytorch
+ # STEP2: random horizontalflip
+ aug_op_list.append(transforms.RandomHorizontalFlip())
+ # STEP3: rand_augment or auto_augment or color jitter
+ if config.TRAIN.RAND_AUGMENT: # MAE: True
+ policy = rand_augment_policy_increasing(
+ magnitude_idx=config.TRAIN.RAND_AUGMENT_MAGNITUDE)
+ rand_augment = RandAugment(
+ policy=policy, num_layers=config.TRAIN.RAND_AUGMENT_LAYERS)
+ aug_op_list.append(rand_augment)
+ elif config.TRAIN.AUTO_AUGMENT: # MAE: None
policy = auto_augment_policy_original()
auto_augment = AutoAugment(policy)
aug_op_list.append(auto_augment)
- elif config.TRAIN.RAND_AUGMENT:
- policy = rand_augment_policy_original()
- rand_augment = RandAugment(policy)
- aug_op_list.append(rand_augment)
- else:
+ else: # MAE: None
jitter = (float(config.TRAIN.COLOR_JITTER), ) * 3
aug_op_list.append(transforms.ColorJitter(*jitter))
# STEP3: other ops
@@ -138,29 +135,50 @@ def get_train_transforms(config):
# Final: compose transforms and return
transforms_train = transforms.Compose(aug_op_list)
- if config.MODEL.MAE_PRETRAIN:
- # for MAE pretraining
- mask_generator = RandomMaskingGenerator(
- input_size=config.DATA.IMAGE_SIZE // config.MODEL.TRANS.PATCH_SIZE,
- mask_ratio=config.MODEL.TRANS.MASK_RATIO)
+ return transforms_train
+
+
+def get_train_transforms(config):
+ """ Get training transforms
+
+ For training, a RandomResizedCrop is applied, then normalization is applied with
+ mean and std. The input pixel values must be rescaled to [0, 1.]
+ Outputs is converted to tensor
+
+ Args:
+ config: configs contains IMAGE_SIZE, see config.py for details
+ Returns:
+ transforms_train: training transforms
+ """
+ assert config.MODEL.TYPE in ["PRETRAIN", "FINETUNE", "LINEARPROBE"]
+ if config.MODEL.TYPE == "PRETRAIN":
+ transforms_train = get_train_transforms_pretrain
+ elif config.MODEL.TYPE == "FINETUNE":
+ transforms_train = get_train_transforms_finetune
+ elif config.MODEL.TYPE == "LINEARPROBE":
+ transforms_train = get_train_transforms_linearprobe
else:
- mask_generator = None
+ raise ValueError(f'{config.MODEL.TYPE} not supported!')
- return (transforms_train, mask_generator)
+ transforms = transforms_train(config)
+ print(transforms)
+
+ return transforms
+# val transform is for MAE finetune and line probing
def get_val_transforms(config):
""" Get training transforms
For validation, image is first Resize then CenterCrop to image_size.
- Then normalization is applied with [0.5, 0.5, 0.5] mean and std.
+ Then normalization is applied with mean and std.
The input pixel values must be rescaled to [0, 1.]
Outputs is converted to tensor
Args:
config: configs contains IMAGE_SIZE, see config.py for details
Returns:
- transforms_train: training transforms
+ transforms_val: transform ops
"""
scale_size = int(math.floor(config.DATA.IMAGE_SIZE / config.DATA.CROP_PCT))
@@ -168,78 +186,60 @@ def get_val_transforms(config):
transforms.Resize(scale_size, 'bicubic'), # single int for resize shorter side of image
transforms.CenterCrop((config.DATA.IMAGE_SIZE, config.DATA.IMAGE_SIZE)),
transforms.ToTensor(),
- transforms.Normalize(mean=config.DATA.IMAGENET_MEAN, std=config.DATA.IMAGENET_STD),
- ])
+ transforms.Normalize(mean=config.DATA.IMAGENET_MEAN, std=config.DATA.IMAGENET_STD)])
return transforms_val
-def get_dataset(config, mode='train'):
+def get_dataset(config, is_train=True):
""" Get dataset from config and mode (train/val)
-
Returns the related dataset object according to configs and mode(train/val)
Args:
config: configs contains dataset related settings. see config.py for details
+ is_train: bool, set True to use training set, otherwise val set. Default: True
Returns:
dataset: dataset object
"""
- assert mode in ['train', 'val']
- if config.DATA.DATASET == "cifar10":
- if mode == 'train':
- dataset = datasets.Cifar10(mode=mode, transform=get_train_transforms(config))
- else:
- mode = 'test'
- dataset = datasets.Cifar10(mode=mode, transform=get_val_transforms(config))
- elif config.DATA.DATASET == "cifar100":
- if mode == 'train':
- dataset = datasets.Cifar100(mode=mode, transform=get_train_transforms(config))
+ if config.DATA.DATASET == "imagenet2012":
+ if is_train:
+ transform_ops = get_train_transforms(config)
else:
- mode = 'test'
- dataset = datasets.Cifar100(mode=mode, transform=get_val_transforms(config))
- elif config.DATA.DATASET == "imagenet2012":
- if mode == 'train':
- dataset = ImageNet2012Dataset(config.DATA.DATA_PATH,
- mode=mode,
- transform=get_train_transforms(config))
- else:
- dataset = ImageNet2012Dataset(config.DATA.DATA_PATH,
- mode=mode,
- transform=get_val_transforms(config))
+ transform_ops = get_val_transforms(config)
+ dataset = ImageNet2012Dataset(config.DATA.DATA_PATH,
+ is_train=is_train,
+ transform_ops=transform_ops)
else:
raise NotImplementedError(
- "[{config.DATA.DATASET}] Only cifar10, cifar100, imagenet2012 are supported now")
+ "Wrong dataset name: [{config.DATA.DATASET}]. Only 'imagenet2012' is supported now")
return dataset
-def get_dataloader(config, dataset, mode='train', multi_process=False):
- """Get dataloader with config, dataset, mode as input, allows multiGPU settings.
-
- Multi-GPU loader is implements as distributedBatchSampler.
+def get_dataloader(config, dataset, is_train=True, use_dist_sampler=False):
+ """Get dataloader from dataset, allows multiGPU settings.
+ Multi-GPU loader is implements as distributedBatchSampler.
Args:
config: see config.py for details
dataset: paddle.io.dataset object
- mode: train/val
- multi_process: if True, use DistributedBatchSampler to support multi-processing
+ is_train: bool, when False, shuffle is off and BATCH_SIZE_EVAL is used, default: True
+ use_dist_sampler: if True, DistributedBatchSampler is used, default: False
Returns:
dataloader: paddle.io.DataLoader object.
"""
+ batch_size = config.DATA.BATCH_SIZE if is_train else config.DATA.BATCH_SIZE_EVAL
- if mode == 'train':
- batch_size = config.DATA.BATCH_SIZE
- else:
- batch_size = config.DATA.BATCH_SIZE_EVAL
-
- if multi_process is True:
- sampler = DistributedBatchSampler(dataset,
+ if use_dist_sampler is True:
+ sampler = DistributedBatchSampler(dataset=dataset,
batch_size=batch_size,
- shuffle=(mode == 'train'))
- dataloader = DataLoader(dataset,
+ shuffle=is_train,
+ drop_last=is_train)
+ dataloader = DataLoader(dataset=dataset,
batch_sampler=sampler,
num_workers=config.DATA.NUM_WORKERS)
else:
- dataloader = DataLoader(dataset,
+ dataloader = DataLoader(dataset=dataset,
batch_size=batch_size,
num_workers=config.DATA.NUM_WORKERS,
- shuffle=(mode == 'train'))
+ shuffle=is_train,
+ drop_last=is_train)
return dataloader
diff --git a/image_classification/MAE/droppath.py b/self_supervised_learning/MAE/droppath.py
similarity index 93%
rename from image_classification/MAE/droppath.py
rename to self_supervised_learning/MAE/droppath.py
index 25b8d5ff..b32f7310 100644
--- a/image_classification/MAE/droppath.py
+++ b/self_supervised_learning/MAE/droppath.py
@@ -15,7 +15,6 @@
"""
Droppath, reimplement from https://github.com/yueatsprograms/Stochastic_Depth
"""
-
import paddle
import paddle.nn as nn
@@ -23,7 +22,7 @@
class DropPath(nn.Layer):
"""DropPath class"""
def __init__(self, drop_prob=None):
- super(DropPath, self).__init__()
+ super().__init__()
self.drop_prob = drop_prob
def drop_path(self, inputs):
@@ -43,7 +42,7 @@ def drop_path(self, inputs):
shape = (inputs.shape[0], ) + (1, ) * (inputs.ndim - 1) # shape=(N, 1, 1, 1)
random_tensor = keep_prob + paddle.rand(shape, dtype=inputs.dtype)
random_tensor = random_tensor.floor() # mask
- output = inputs.divide(keep_prob) * random_tensor #divide is to keep same output expectation
+ output = inputs.divide(keep_prob) * random_tensor # divide to keep same output expectation
return output
def forward(self, inputs):
diff --git a/self_supervised_learning/MAE/load_pytorch_weights.py b/self_supervised_learning/MAE/load_pytorch_weights.py
new file mode 100644
index 00000000..28a118ca
--- /dev/null
+++ b/self_supervised_learning/MAE/load_pytorch_weights.py
@@ -0,0 +1,279 @@
+# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import numpy as np
+import paddle
+import torch
+import timm
+from mae_pytorch import models_mae, models_vit
+from transformer import build_mae_pretrain as build_model
+#from transformer import build_transformer as build_model
+from config import *
+import random
+
+seed = 0
+torch.manual_seed(seed)
+paddle.seed(seed)
+np.random.seed(seed)
+random.seed(seed)
+
+
+model_type = 'base'
+#model_type = 'large'
+#model_type = 'huge'
+
+if model_type == 'base':
+ model_name = 'mae_vit_base_patch16'
+ config = get_config(f'./configs/vit_base_patch16_224_pretrain.yaml')
+ pth_model_path = './mae_pretrain_vit_base.pth'
+ pd_model_path = './mae_pretrain_vit_base.pdparams'
+ npatches = 196
+elif model_type == 'large':
+ model_name = 'mae_vit_large_patch16'
+ config = get_config(f'./configs/vit_large_patch16_224_pretrain.yaml')
+ pth_model_path = './mae_pretrain_vit_large.pth'
+ pd_model_path = './mae_pretrain_vit_large.pdparams'
+ npatches = 196
+elif model_type == 'huge':
+ model_name = 'mae_vit_huge_patch14'
+ config = get_config(f'./configs/vit_huge_patch14_224_pretrain.yaml')
+ pth_model_path = './mae_pretrain_vit_huge.pth'
+ pd_model_path = './mae_pretrain_vit_huge.pdparams'
+ npatches = 256
+
+
+def print_model_named_params(model):
+ print('----------------------------------')
+ for name, param in model.named_parameters():
+ print(name, param.shape)
+ print('----------------------------------')
+
+
+def print_model_named_buffers(model):
+ print('----------------------------------')
+ for name, param in model.named_buffers():
+ print(name, param.shape)
+ print('----------------------------------')
+
+
+def torch_to_paddle_mapping():
+ mapping = [
+ ('cls_token', f'cls_token'),
+ ('mask_token', f'mask_token'),
+ ('pos_embed', f'encoder_position_embedding'),
+ ('patch_embed.proj', f'patch_embedding.patch_embedding'),
+ ('norm', 'encoder.norm'),
+ ('decoder_embed', f'linear_projection'),
+ ('decoder_pos_embed', f'decoder_position_embedding'),
+ ('decoder_norm', f'decoder.norm'),
+ ('decoder_pred', f'decoder_pred'),
+
+ ]
+
+ if 'large' in model_name:
+ num_enc_layers = 24
+ num_dec_layers = 8
+ elif 'base' in model_name:
+ num_enc_layers = 12
+ num_dec_layers = 8
+ elif 'huge' in model_name:
+ num_enc_layers = 32
+ num_dec_layers = 8
+ else:
+ raise ValueError('now only support large and base model conversion')
+
+ for idx in range(num_enc_layers):
+ pp_prefix = f'encoder.layers.{idx}'
+ th_prefix = f'blocks.{idx}'
+ layer_mapping = [
+ (f'{th_prefix}.norm1', f'{pp_prefix}.attn_norm'),
+ (f'{th_prefix}.norm2', f'{pp_prefix}.mlp_norm'),
+ (f'{th_prefix}.mlp.fc1', f'{pp_prefix}.mlp.fc1'),
+ (f'{th_prefix}.mlp.fc2', f'{pp_prefix}.mlp.fc2'),
+ (f'{th_prefix}.attn.qkv', f'{pp_prefix}.attn.qkv'),
+ (f'{th_prefix}.attn.proj', f'{pp_prefix}.attn.out'),
+ ]
+ mapping.extend(layer_mapping)
+
+ for idx in range(num_dec_layers):
+ pp_prefix = f'decoder.layers.{idx}'
+ th_prefix = f'decoder_blocks.{idx}'
+ layer_mapping = [
+ (f'{th_prefix}.norm1', f'{pp_prefix}.attn_norm'),
+ (f'{th_prefix}.norm2', f'{pp_prefix}.mlp_norm'),
+ (f'{th_prefix}.mlp.fc1', f'{pp_prefix}.mlp.fc1'),
+ (f'{th_prefix}.mlp.fc2', f'{pp_prefix}.mlp.fc2'),
+ (f'{th_prefix}.attn.qkv', f'{pp_prefix}.attn.qkv'),
+ (f'{th_prefix}.attn.proj', f'{pp_prefix}.attn.out'),
+ ]
+ mapping.extend(layer_mapping)
+
+ #head_mapping = [
+ # #('head', 'classifier')
+ #]
+ #mapping.extend(head_mapping)
+
+ return mapping
+
+
+def convert(torch_model, paddle_model):
+ def _set_value(th_name, pd_name, transpose=True):
+ th_shape = th_params[th_name].shape
+ pd_shape = tuple(pd_params[pd_name].shape) # paddle shape default type is list
+ #assert th_shape == pd_shape, f'{th_shape} != {pd_shape}'
+ print(f'**SET** {th_name} {th_shape} **TO** {pd_name} {pd_shape}')
+ if isinstance(th_params[th_name], torch.nn.parameter.Parameter):
+ value = th_params[th_name].data.numpy()
+ else:
+ value = th_params[th_name].numpy()
+
+ if len(value.shape) == 2 and transpose:
+ value = value.transpose((1, 0))
+ pd_params[pd_name].set_value(value)
+
+ # 1. get paddle and torch model parameters
+ pd_params = {}
+ th_params = {}
+ for name, param in paddle_model.named_parameters():
+ pd_params[name] = param
+ for name, param in torch_model.named_parameters():
+ th_params[name] = param
+
+ for name, param in paddle_model.named_buffers():
+ pd_params[name] = param
+ for name, param in torch_model.named_buffers():
+ th_params[name] = param
+
+ # 2. get name mapping pairs
+ mapping = torch_to_paddle_mapping()
+
+ # 3. set torch param values to paddle params: may needs transpose on weights
+ for th_name, pd_name in mapping:
+ if th_name in th_params.keys(): # nn.Parameters
+ _set_value(th_name, pd_name)
+ else: # weight & bias
+ th_name_w = f'{th_name}.weight'
+ pd_name_w = f'{pd_name}.weight'
+ _set_value(th_name_w, pd_name_w)
+
+ if f'{th_name}.bias' in th_params.keys():
+ th_name_b = f'{th_name}.bias'
+ pd_name_b = f'{pd_name}.bias'
+ _set_value(th_name_b, pd_name_b)
+
+ return paddle_model
+
+
+def main():
+
+ paddle.set_device('cpu')
+ paddle_model = build_model(config)
+ paddle_model.eval()
+ print_model_named_params(paddle_model)
+ print_model_named_buffers(paddle_model)
+
+ print('+++++++++++++++++++++++++++++++++++')
+ device = torch.device('cpu')
+ #torch_model = models_vit.__dict__[model_name](global_pool=True)
+ torch_model = models_mae.__dict__[model_name](norm_pix_loss=True)
+ print_model_named_params(torch_model)
+ print_model_named_buffers(torch_model)
+ state_dict = torch.load(pth_model_path, map_location='cpu')['model']
+ print('===========================')
+ for key in state_dict:
+ print(key)
+ print('===========================')
+ torch_model.load_state_dict(state_dict, strict=False)
+ torch_model = torch_model.to(device)
+ torch_model.eval()
+
+ # convert weights
+ paddle_model = convert(torch_model, paddle_model)
+
+ # check correctness
+ x = np.random.randn(4, 3, 224, 224).astype('float32')
+ x_paddle = paddle.to_tensor(x)
+ x_torch = torch.Tensor(x).to(device)
+
+ # manually set the same rand probs(noise) for random masking
+ rp = np.random.rand(4, npatches)
+ rand_probs = paddle.to_tensor(rp)
+ noise = torch.Tensor(rp)
+
+ # encoder out
+ # NOTE: need to modify the mae pytorch implementation
+ out_torch = torch_model.forward_encoder(x_torch, 0.75, noise)[0]
+ out_paddle = paddle_model.forward_encoder(x_paddle, 0.75, rand_probs)[0]
+
+ out_torch = out_torch.data.cpu().numpy()
+ out_paddle = out_paddle.cpu().numpy()
+
+ print(out_torch.shape, out_paddle.shape)
+ print(out_torch[0, 0:100])
+ print('========================================================')
+ print(out_paddle[0, 0:100])
+ assert np.allclose(out_torch, out_paddle, atol = 1e-5)
+
+
+ # encoder out: mask
+ out_torch = torch_model.forward_encoder(x_torch, 0.75, noise)[1]
+ out_paddle = paddle_model.forward_encoder(x_paddle, 0.75, rand_probs)[1]
+
+ out_torch = out_torch.data.cpu().numpy()
+ out_paddle = out_paddle.cpu().numpy()
+
+ print(out_torch.shape, out_paddle.shape)
+ print(out_torch[0, 0:100])
+ print('========================================================')
+ print(out_paddle[0, 0:100])
+ assert np.allclose(out_torch, out_paddle, atol = 1e-5)
+
+
+
+ # manually set the same rand probs(noise) for random masking
+ rp = np.random.rand(4, npatches)
+ rand_probs = paddle.to_tensor(rp)
+ noise = torch.Tensor(rp)
+ # [0]: loss, [1]: decoder_out
+ out_torch = torch_model(x_torch, 0.75, noise)[0]
+ out_paddle = paddle_model(x_paddle, 0.75, rand_probs)[0]
+
+ out_torch = out_torch.data.cpu().numpy()
+ out_paddle = out_paddle.cpu().numpy()
+
+ print('torch loss = ', out_torch)
+ print('paddle loss = ', out_paddle)
+
+ print(out_torch.shape, out_paddle.shape)
+ #print(out_torch[0, 0:100])
+ #print('========================================================')
+ #print(out_paddle[0, 0:100])
+ #print('--------------------------------------------------------')
+ #print(out_torch[1, 0:100])
+ #print('========================================================')
+ #print(out_paddle[1, 0:100])
+ assert np.allclose(out_torch, out_paddle, atol = 1e-5)
+ #assert np.allclose(out_torch[0, :, :], out_paddle[0, :, :], atol = 1e-5)
+ #assert np.allclose(out_torch[1, :, :], out_paddle[1, :, :], atol = 1e-5)
+
+
+ ## save weights for paddle model
+ model_path = os.path.join(f'./{pd_model_path}')
+ paddle.save(paddle_model.state_dict(), model_path)
+ print('all done')
+
+
+if __name__ == "__main__":
+ main()
diff --git a/self_supervised_learning/MAE/load_pytorch_weights_finetune.py b/self_supervised_learning/MAE/load_pytorch_weights_finetune.py
new file mode 100644
index 00000000..db8346d6
--- /dev/null
+++ b/self_supervised_learning/MAE/load_pytorch_weights_finetune.py
@@ -0,0 +1,188 @@
+# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import argparse
+import numpy as np
+import paddle
+import torch
+import timm
+from mae_pytorch import models_mae, models_vit
+from transformer import build_transformer as build_model
+from config import *
+
+## vit-base
+model_path='./mae_finetuned_vit_base'
+model_name = 'vit_base_patch16'
+config = get_config(f'./configs/vit_base_patch16_224_finetune.yaml')
+
+# vit-large
+#model_path='./mae_finetuned_vit_large'
+#model_name = 'vit_large_patch16'
+#config = get_config(f'./configs/vit_large_patch16_224_finetune.yaml')
+
+# vit-huge
+#model_path='./mae_finetuned_vit_huge'
+#model_name = 'vit_huge_patch14'
+#config = get_config(f'./configs/vit_huge_patch14_224_finetune.yaml')
+
+
+def print_model_named_params(model):
+ print('----------------------------------')
+ for name, param in model.named_parameters():
+ print(name, param.shape)
+ print('----------------------------------')
+
+
+def print_model_named_buffers(model):
+ print('----------------------------------')
+ for name, param in model.named_buffers():
+ print(name, param.shape)
+ print('----------------------------------')
+
+
+def torch_to_paddle_mapping():
+ mapping = [
+ ('cls_token', f'cls_token'),
+ ('pos_embed', f'encoder_position_embedding'),
+ ('patch_embed.proj', f'patch_embedding.patch_embedding'),
+ ]
+
+ if 'large' in model_name:
+ num_layers = 24
+ elif 'base' in model_name:
+ num_layers = 12
+ elif 'huge' in model_name:
+ num_layers = 32
+ else:
+ raise ValueError('now only support large and base model conversion')
+
+ for idx in range(num_layers):
+ pp_prefix = f'encoder.layers.{idx}'
+ th_prefix = f'blocks.{idx}'
+ layer_mapping = [
+ (f'{th_prefix}.norm1', f'{pp_prefix}.attn_norm'),
+ (f'{th_prefix}.norm2', f'{pp_prefix}.mlp_norm'),
+ (f'{th_prefix}.mlp.fc1', f'{pp_prefix}.mlp.fc1'),
+ (f'{th_prefix}.mlp.fc2', f'{pp_prefix}.mlp.fc2'),
+ (f'{th_prefix}.attn.qkv', f'{pp_prefix}.attn.qkv'),
+ (f'{th_prefix}.attn.proj', f'{pp_prefix}.attn.out'),
+ ]
+ mapping.extend(layer_mapping)
+
+ head_mapping = [
+ #('norm', 'encoder_norm'),
+ ('fc_norm', 'encoder_norm'),
+ ('head', 'classifier')
+ ]
+ mapping.extend(head_mapping)
+
+ return mapping
+
+
+
+def convert(torch_model, paddle_model):
+ def _set_value(th_name, pd_name, transpose=True):
+ th_shape = th_params[th_name].shape
+ pd_shape = tuple(pd_params[pd_name].shape) # paddle shape default type is list
+ #assert th_shape == pd_shape, f'{th_shape} != {pd_shape}'
+ print(f'**SET** {th_name} {th_shape} **TO** {pd_name} {pd_shape}')
+ if isinstance(th_params[th_name], torch.nn.parameter.Parameter):
+ value = th_params[th_name].data.numpy()
+ else:
+ value = th_params[th_name].numpy()
+
+ if len(value.shape) == 2 and transpose:
+ value = value.transpose((1, 0))
+ pd_params[pd_name].set_value(value)
+
+ # 1. get paddle and torch model parameters
+ pd_params = {}
+ th_params = {}
+ for name, param in paddle_model.named_parameters():
+ pd_params[name] = param
+ for name, param in torch_model.named_parameters():
+ th_params[name] = param
+
+ for name, param in paddle_model.named_buffers():
+ pd_params[name] = param
+ for name, param in torch_model.named_buffers():
+ th_params[name] = param
+
+ # 2. get name mapping pairs
+ mapping = torch_to_paddle_mapping()
+
+ # 3. set torch param values to paddle params: may needs transpose on weights
+ for th_name, pd_name in mapping:
+ if th_name in th_params.keys(): # nn.Parameters
+ _set_value(th_name, pd_name)
+ else: # weight & bias
+ th_name_w = f'{th_name}.weight'
+ pd_name_w = f'{pd_name}.weight'
+ _set_value(th_name_w, pd_name_w)
+
+ if f'{th_name}.bias' in th_params.keys():
+ th_name_b = f'{th_name}.bias'
+ pd_name_b = f'{pd_name}.bias'
+ _set_value(th_name_b, pd_name_b)
+
+ return paddle_model
+
+
+def main():
+
+ paddle.set_device('cpu')
+ paddle_model = build_model(config)
+ paddle_model.eval()
+ print_model_named_params(paddle_model)
+ print_model_named_buffers(paddle_model)
+
+ print('+++++++++++++++++++++++++++++++++++')
+ device = torch.device('cpu')
+ torch_model = models_vit.__dict__[model_name](global_pool=True)
+ print_model_named_params(torch_model)
+ print_model_named_buffers(torch_model)
+ state_dict = torch.load(f'{model_path}.pth', map_location='cpu')['model']
+ torch_model.load_state_dict(state_dict, strict=False)
+ torch_model = torch_model.to(device)
+ torch_model.eval()
+
+ #return
+
+ # convert weights
+ paddle_model = convert(torch_model, paddle_model)
+
+ # check correctness
+ x = np.random.randn(2, 3, 224, 224).astype('float32')
+ x_paddle = paddle.to_tensor(x)
+ x_torch = torch.Tensor(x).to(device)
+
+ out_torch = torch_model(x_torch)
+ out_paddle = paddle_model(x_paddle)
+
+ out_torch = out_torch.data.cpu().numpy()
+ out_paddle = out_paddle.cpu().numpy()
+
+ print(out_torch.shape, out_paddle.shape)
+ print(out_torch[0, 0:100])
+ print('========================================================')
+ print(out_paddle[0, 0:100])
+ assert np.allclose(out_torch, out_paddle, atol = 1e-5)
+
+ # save weights for paddle model
+ paddle.save(paddle_model.state_dict(), f'{model_path}.pdparams')
+ print('all done')
+
+
+if __name__ == "__main__":
+ main()
diff --git a/image_classification/MAE/losses.py b/self_supervised_learning/MAE/losses.py
similarity index 93%
rename from image_classification/MAE/losses.py
rename to self_supervised_learning/MAE/losses.py
index f67780a2..674d6b41 100644
--- a/image_classification/MAE/losses.py
+++ b/self_supervised_learning/MAE/losses.py
@@ -54,9 +54,6 @@ class SoftTargetCrossEntropyLoss(nn.Layer):
Returns:
loss: float, the mean loss value
"""
- def __init__(self):
- super().__init__()
-
def forward(self, x, target):
loss = paddle.sum(-target * F.log_softmax(x, axis=-1), axis=-1)
return loss.mean()
@@ -64,16 +61,16 @@ def forward(self, x, target):
class DistillationLoss(nn.Layer):
"""Distillation loss function
- This layer includes the orginal loss (criterion) and a extra
- distillation loss (criterion), which computes the loss with
- different type options, between current model and
+ This layer includes the orginal loss (criterion) and a extra
+ distillation loss (criterion), which computes the loss with
+ different type options, between current model and
a teacher model as its supervision.
Args:
base_criterion: nn.Layer, the original criterion
teacher_model: nn.Layer, the teacher model as supervision
distillation_type: str, one of ['none', 'soft', 'hard']
- alpha: float, ratio of base loss (* (1-alpha))
+ alpha: float, ratio of base loss (* (1-alpha))
and distillation loss( * alpha)
tao: float, temperature in distillation
"""
@@ -101,7 +98,9 @@ def forward(self, inputs, outputs, targets):
in the last layer of the model
targets: tensor, the labels for the base criterion
"""
- outputs, outputs_kd = outputs[0], outputs[1]
+ outputs_kd = None
+ if not isinstance(outputs, paddle.Tensor):
+ outputs, outputs_kd = outputs[0], outputs[1]
base_loss = self.base_criterion(outputs, targets)
if self.type == 'none':
return base_loss
diff --git a/self_supervised_learning/MAE/lr_decay.py b/self_supervised_learning/MAE/lr_decay.py
new file mode 100644
index 00000000..2efe3592
--- /dev/null
+++ b/self_supervised_learning/MAE/lr_decay.py
@@ -0,0 +1,86 @@
+# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""parameters groups for layer-wise lr decay, used in BeiT and MAE"""
+
+import json
+
+# Note: param_groups_lrd is NOT used because paddle Adam optimizer seems has problems which we don't know,
+# instead, we use paddlenlp.ops.optimizer.AdamWDL with lr_settings (see below) right now for temp fix.
+def param_groups_lrd(model, weight_decay=0.05, no_weight_decay_list=[], layer_decay=0.75):
+ """layer-wise decay
+ set learning rate decay according to layer depth
+ Note:
+ 1. In Paddle param_groups, dict key 'learning_rate' is in fact the 'lr_mult'
+ 2. param_names in no_weight_decay_list will have no decay
+ 3. model.encoder.layers may need to change for models other than MAE_finetune
+ """
+ param_group_names = {}
+ param_groups = {}
+ num_layers = len(model.encoder.layers) + 1
+ layer_scales = list(layer_decay ** (num_layers - i) for i in range(num_layers + 1))
+
+ for name, param in model.named_parameters():
+ if param.stop_gradient is True:
+ continue
+
+ # no decay
+ if param.ndim == 1 or name.endswith('.bias') or name in no_weight_decay_list:
+ g_decay = 'no_decay'
+ this_weight_decay = 0.
+ else:
+ g_decay = 'decay'
+ this_weight_decay = weight_decay
+
+ layer_id = get_layer_id_for_vit(name, num_layers)
+ group_name = f"layer_{layer_id}_{g_decay}"
+
+ if group_name not in param_group_names:
+ this_scale = layer_scales[layer_id]
+ param_group_names[group_name] = {
+ "learning_rate": this_scale,
+ "weight_decay": this_weight_decay,
+ "params": [],
+ }
+ param_groups[group_name] = {
+ "learning_rate": this_scale,
+ "weight_decay": this_weight_decay,
+ "params": [],
+ }
+
+ param_group_names[group_name]["params"].append(name)
+ param_groups[group_name]["params"].append(param)
+
+ print("parameter groups: \n%s" % json.dumps(param_group_names, indent=2))
+ return list(param_groups.values())
+
+
+def get_layer_id_for_vit(name, num_layers):
+ """assign a parameter with its layer id"""
+ if name in ['cls_token', 'mask_token', 'encoder_position_embedding']:
+ return 0
+ elif name.startswith('patch_embedding'):
+ return 0
+ elif name.startswith('encoder.layers'):
+ return int(name.split('.')[2]) + 1
+ else:
+ return num_layers
+
+
+def lr_setting(layer_decay, name_dict, num_layers, param):
+ layer_scales = list(layer_decay ** (num_layers - i) for i in range(num_layers + 1))
+ static_name = name_dict[param.name]
+ #print('static_name= ', static_name, ', param.name= ', param.name)
+ layer_id = get_layer_id_for_vit(static_name, num_layers)
+ param.optimize_attr["learning_rate"] *= layer_scales[layer_id]
diff --git a/self_supervised_learning/MAE/mae.png b/self_supervised_learning/MAE/mae.png
new file mode 100644
index 00000000..6ca07def
Binary files /dev/null and b/self_supervised_learning/MAE/mae.png differ
diff --git a/self_supervised_learning/MAE/main_multi_gpu_finetune.py b/self_supervised_learning/MAE/main_multi_gpu_finetune.py
new file mode 100644
index 00000000..77d66666
--- /dev/null
+++ b/self_supervised_learning/MAE/main_multi_gpu_finetune.py
@@ -0,0 +1,611 @@
+# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""MAE finetuning using multiple GPU """
+
+import sys
+import os
+import time
+import argparse
+import random
+import math
+import numpy as np
+import paddle
+from paddle.distributed import fleet
+from datasets import get_dataloader
+from datasets import get_dataset
+from config import get_config
+from config import update_config
+from utils import AverageMeter
+from utils import get_logger
+from utils import write_log
+from utils import all_reduce_mean
+from utils import skip_weight_decay_fn
+from utils import get_params_groups
+from utils import adjust_learning_rate
+from mixup import Mixup
+from losses import LabelSmoothingCrossEntropyLoss
+from losses import SoftTargetCrossEntropyLoss
+from utils import interpolate_pos_embed
+import lr_decay
+from transformer import build_transformer as build_model
+import paddlenlp
+
+
+def get_arguments():
+ """return argumeents, this will overwrite the config by (1) yaml file (2) argument values"""
+ parser = argparse.ArgumentParser('MAE Finetune')
+ parser.add_argument('-cfg', type=str, default=None)
+ parser.add_argument('-dataset', type=str, default=None)
+ parser.add_argument('-data_path', type=str, default=None)
+ parser.add_argument('-output', type=str, default=None)
+ parser.add_argument('-batch_size', type=int, default=None)
+ parser.add_argument('-batch_size_eval', type=int, default=None)
+ parser.add_argument('-image_size', type=int, default=None)
+ parser.add_argument('-accum_iter', type=int, default=None)
+ parser.add_argument('-pretrained', type=str, default=None)
+ parser.add_argument('-resume', type=str, default=None)
+ parser.add_argument('-last_epoch', type=int, default=None)
+ parser.add_argument('-eval', action='store_true')
+ parser.add_argument('-amp', action='store_true')
+ arguments = parser.parse_args()
+ return arguments
+
+
+def train(dataloader,
+ model,
+ optimizer,
+ criterion,
+ lr_scheduler,
+ base_lr,
+ min_lr,
+ epoch,
+ warmup_epochs,
+ total_epochs,
+ total_batches,
+ debug_steps=100,
+ accum_iter=1,
+ mixup_fn=None,
+ amp_grad_scaler=None,
+ local_logger=None,
+ master_logger=None):
+ """Training for one epoch
+ Args:
+ dataloader: paddle.io.DataLoader, dataloader instance
+ model: nn.Layer, a ViT model
+ optimizer: nn.optimizer
+ criterion: nn.XXLoss
+ epoch: int, current epoch
+ total_epochs: int, total num of epochs
+ total_batches: int, total num of batches for one epoch
+ debug_steps: int, num of iters to log info, default: 100
+ accum_iter: int, num of iters for accumulating gradients, default: 1
+ mixup_fn: Mixup, mixup instance, default: None
+ amp_grad_scaler: GradScaler/None, if not None, pass the GradScaler and enable AMP training, default: None
+ local_logger: logger for local process/gpu, default: None
+ master_logger: logger for main process, default: None
+ Returns:
+ train_loss_meter.avg: float, average loss on current process/gpu
+ train_acc_meter.avg: float, average acc@1 on current process/gpu
+ master_loss_meter.avg: float, average loss on all processes/gpus
+ master_acc_meter.avg: float, average acc@1 on all processes/gpus
+ train_time: float, training time
+ """
+ model.train()
+ train_loss_meter = AverageMeter()
+ train_acc_meter = AverageMeter()
+ master_loss_meter = AverageMeter()
+ master_acc_meter = AverageMeter()
+
+ time_st = time.time()
+
+ #if amp is True:
+ # scaler = paddle.amp.GradScaler() # default init_loss_scaling = 32768
+ optimizer.clear_grad()
+
+ for batch_id, data in enumerate(dataloader):
+ # get data
+ images = data[0]
+ label = data[1]
+ label_orig = label.clone()
+ batch_size = images.shape[0]
+
+ if mixup_fn is not None:
+ images, label = mixup_fn(images, label_orig)
+
+ if batch_id % accum_iter == 0:
+ lr_scheduler.step(batch_id / total_batches + epoch -1)
+ #adjust_learning_rate(optimizer,
+ # base_lr,
+ # min_lr,
+ # batch_id / total_batches + epoch - 1,
+ # warmup_epochs,
+ # total_epochs)
+ # forward
+ with paddle.amp.auto_cast(amp_grad_scaler is not None):
+ output = model(images)
+ loss = criterion(output, label)
+
+ loss_value = loss.item()
+ if not math.isfinite(loss_value):
+ print("Loss is {}, stopping training".format(loss_value))
+ sys.exit(1)
+
+ loss = loss / accum_iter
+
+ # backward and step
+ if amp_grad_scaler is None: # fp32
+ loss.backward()
+ if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
+ optimizer.step()
+ optimizer.clear_grad()
+ else: # amp
+ scaled_loss = amp_grad_scaler.scale(loss)
+ scaled_loss.backward()
+ if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
+ # amp for param group reference: https://github.com/PaddlePaddle/Paddle/issues/37188
+ amp_grad_scaler.step(optimizer)
+ amp_grad_scaler.update()
+ optimizer.clear_grad()
+
+ pred = paddle.nn.functional.softmax(output)
+ if mixup_fn:
+ acc = paddle.metric.accuracy(pred, label_orig).item()
+ else:
+ acc = paddle.metric.accuracy(pred, label_orig.unsqueeze(1)).item()
+
+ # sync from other gpus for overall loss and acc
+
+ master_loss = all_reduce_mean(loss_value)
+ master_acc = all_reduce_mean(acc)
+ master_batch_size = all_reduce_mean(batch_size)
+
+ master_loss_meter.update(master_loss, master_batch_size)
+ master_acc_meter.update(master_acc, master_batch_size)
+ train_loss_meter.update(loss_value, batch_size)
+ train_acc_meter.update(acc, batch_size)
+
+ if batch_id % debug_steps == 0 or batch_id + 1 == len(dataloader):
+ general_message = (f"Epoch[{epoch:03d}/{total_epochs:03d}], "
+ f"Step[{batch_id:04d}/{total_batches:04d}], "
+ f"Lr: {optimizer.get_lr():.6e}, ")
+ local_message = (general_message +
+ f"Loss: {loss_value:.4f} ({train_loss_meter.avg:.4f}), "
+ f"Avg Acc: {train_acc_meter.avg:.4f}")
+ master_message = (general_message +
+ f"Loss: {master_loss:.4f} ({master_loss_meter.avg:.4f}), "
+ f"Avg Acc: {master_acc_meter.avg:.4f}")
+ write_log(local_logger, master_logger, local_message, master_message)
+
+ train_time = time.time() - time_st
+ paddle.distributed.barrier()
+ return (train_loss_meter.avg,
+ train_acc_meter.avg,
+ master_loss_meter.avg,
+ master_acc_meter.avg,
+ train_time)
+
+
+@paddle.no_grad()
+def validate(dataloader,
+ model,
+ criterion,
+ total_batches,
+ debug_steps=100,
+ local_logger=None,
+ master_logger=None):
+ """Validation for the whole dataset
+ Args:
+ dataloader: paddle.io.DataLoader, dataloader instance
+ model: nn.Layer, a ViT model
+ total_batches: int, total num of batches for one epoch
+ debug_steps: int, num of iters to log info, default: 100
+ local_logger: logger for local process/gpu, default: None
+ master_logger: logger for main process, default: None
+ Returns:
+ val_loss_meter.avg: float, average loss on current process/gpu
+ val_acc1_meter.avg: float, average top1 accuracy on current processes/gpus
+ val_acc5_meter.avg: float, average top5 accuracy on current processes/gpus
+ master_loss_meter.avg: float, average loss on all processes/gpus
+ master_acc1_meter.avg: float, average top1 accuracy on all processes/gpus
+ master_acc5_meter.avg: float, average top5 accuracy on all processes/gpus
+ val_time: float, validation time
+ """
+ model.eval()
+ val_loss_meter = AverageMeter()
+ val_acc1_meter = AverageMeter()
+ val_acc5_meter = AverageMeter()
+ master_loss_meter = AverageMeter()
+ master_acc1_meter = AverageMeter()
+ master_acc5_meter = AverageMeter()
+
+ time_st = time.time()
+
+ for batch_id, data in enumerate(dataloader):
+ # get data
+ images = data[0]
+ label = data[1]
+ batch_size = images.shape[0]
+
+ output = model(images)
+ loss = criterion(output, label)
+ loss_value = loss.item()
+
+ pred = paddle.nn.functional.softmax(output)
+ acc1 = paddle.metric.accuracy(pred, label.unsqueeze(1)).item()
+ acc5 = paddle.metric.accuracy(pred, label.unsqueeze(1), k=5).item()
+
+ # sync from other gpus for overall loss and acc
+ master_loss = all_reduce_mean(loss_value)
+ master_acc1 = all_reduce_mean(acc1)
+ master_acc5 = all_reduce_mean(acc5)
+ master_batch_size = all_reduce_mean(batch_size)
+
+ master_loss_meter.update(master_loss, master_batch_size)
+ master_acc1_meter.update(master_acc1, master_batch_size)
+ master_acc5_meter.update(master_acc5, master_batch_size)
+ val_loss_meter.update(loss_value, batch_size)
+ val_acc1_meter.update(acc1, batch_size)
+ val_acc5_meter.update(acc5, batch_size)
+
+ if batch_id % debug_steps == 0:
+ local_message = (f"Step[{batch_id:04d}/{total_batches:04d}], " +
+ f"Avg Loss: {val_loss_meter.avg:.4f}, " +
+ f"Avg Acc@1: {val_acc1_meter.avg:.4f}, " +
+ f"Avg Acc@5: {val_acc5_meter.avg:.4f}")
+ master_message = (f"Step[{batch_id:04d}/{total_batches:04d}], " +
+ f"Avg Loss: {master_loss_meter.avg:.4f}, " +
+ f"Avg Acc@1: {master_acc1_meter.avg:.4f}, " +
+ f"Avg Acc@5: {master_acc5_meter.avg:.4f}")
+ write_log(local_logger, master_logger, local_message, master_message)
+ paddle.distrtibuted.barrier()
+ val_time = time.time() - time_st
+ return (val_loss_meter.avg,
+ val_acc1_meter.avg,
+ val_acc5_meter.avg,
+ master_loss_meter.avg,
+ master_acc1_meter.avg,
+ master_acc5_meter.avg,
+ val_time)
+
+
+def main_worker(*args):
+ """main method for each process"""
+ # STEP 0: Preparation
+ paddle.device.set_device('gpu')
+ #paddle.distributed.init_parallel_env()
+ world_size = paddle.distributed.get_world_size()
+ local_rank = paddle.distributed.get_rank()
+ config = args[0]
+ last_epoch = config.TRAIN.LAST_EPOCH
+ seed = config.SEED + local_rank
+ paddle.seed(seed)
+ np.random.seed(seed)
+ random.seed(seed)
+ # logger for each process/gpu
+ local_logger, master_logger = get_logger(config.SAVE)
+ message = (f'----- world_size = {world_size}, local_rank = {local_rank} \n'
+ f'----- {config}')
+ write_log(local_logger, master_logger, message)
+
+ # STEP 1: Create model
+ model = build_model(config)
+ if paddle.distributed.get_world_size() > 1:
+ strategy = fleet.DistributedStrategy()
+ ## Hybrid Parallel Training
+ strategy.hybrid_configs = {}
+ fleet.init(is_collective=True, strategy=strategy)
+
+ # STEP 2: Create train and val dataloader
+ if not config.EVAL:
+ dataset_train = args[1]
+ dataloader_train = get_dataloader(config, dataset_train, True, True)
+ total_batch_train = len(dataloader_train)
+ message = f'----- Total # of train batch (single gpu): {total_batch_train}'
+ write_log(local_logger, master_logger, message)
+
+ dataset_val = args[2]
+ dataloader_val = get_dataloader(config, dataset_val, False, True)
+ total_batch_val = len(dataloader_val)
+ message = f'----- Total # of val batch (single gpu): {total_batch_val}'
+ write_log(local_logger, master_logger, message)
+
+ # STEP 3: Define Mixup function
+ mixup_fn = None
+ if config.TRAIN.MIXUP_PROB > 0 or config.TRAIN.CUTMIX_ALPHA > 0 or config.TRAIN.CUTMIX_MINMAX is not None:
+ mixup_fn = Mixup(mixup_alpha=config.TRAIN.MIXUP_ALPHA,
+ cutmix_alpha=config.TRAIN.CUTMIX_ALPHA,
+ cutmix_minmax=config.TRAIN.CUTMIX_MINMAX,
+ prob=config.TRAIN.MIXUP_PROB,
+ switch_prob=config.TRAIN.MIXUP_SWITCH_PROB,
+ mode=config.TRAIN.MIXUP_MODE,
+ label_smoothing=config.TRAIN.SMOOTHING)
+
+ # STEP 4: Define criterion
+ if config.TRAIN.MIXUP_PROB > 0.:
+ criterion = SoftTargetCrossEntropyLoss()
+ elif config.TRAIN.SMOOTHING:
+ criterion = LabelSmoothingCrossEntropyLoss()
+ else:
+ criterion = paddle.nn.CrossEntropyLoss()
+ # only use cross entropy for val
+ criterion_val = paddle.nn.CrossEntropyLoss()
+
+ # STEP 5: Define optimizer and lr_scheduler
+ # set lr according to batch size and world size (hacked from Swin official code and modified for CSwin)
+ if not config.EVAL:
+ if config.TRAIN.LINEAR_SCALED_LR is not None:
+ effective_batch_size = config.DATA.BATCH_SIZE * config.TRAIN.ACCUM_ITER * world_size
+ config.TRAIN.BASE_LR = (
+ config.TRAIN.BASE_LR * effective_batch_size / config.TRAIN.LINEAR_SCALED_LR
+ )
+ write_log(local_logger, master_logger, f'Base lr is scaled to: {config.TRAIN.BASE_LR}')
+
+ # define scaler for amp training
+ amp_grad_scaler = paddle.amp.GradScaler() if config.AMP else None
+ # set gradient clip
+ if config.TRAIN.GRAD_CLIP:
+ clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
+ else:
+ clip = None
+ # set optimizer
+ # create warmup and cosine decay lr scheduler
+ if config.TRAIN.WARMUP_EPOCHS > 0:
+ cosine_lr_scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
+ learning_rate=config.TRAIN.BASE_LR,
+ T_max=config.TRAIN.NUM_EPOCHS - config.TRAIN.WARMUP_EPOCHS,
+ eta_min=config.TRAIN.END_LR,
+ last_epoch=-1) # do not set last epoch, handled in warmup sched get_lr()
+ lr_scheduler = paddle.optimizer.lr.LinearWarmup(
+ learning_rate=cosine_lr_scheduler, # use cosine lr sched after warmup
+ warmup_steps=config.TRAIN.WARMUP_EPOCHS, # only support position integet
+ start_lr=config.TRAIN.WARMUP_START_LR,
+ end_lr=config.TRAIN.BASE_LR,
+ last_epoch=config.TRAIN.LAST_EPOCH)
+ else: # create cosine decay lr scheduler if no warmup epochs
+ lr_scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
+ learning_rate=config.TRAIN.BASE_LR,
+ T_max=config.TRAIN.NUM_EPOCHS,
+ eta_min=config.TRAIN.END_LR,
+ last_epoch=config.TRAIN.LAST_EPOCH)
+
+ if config.TRAIN.OPTIMIZER.NAME == "AdamW":
+ params_groups = lr_decay.param_groups_lrd(
+ model=model,
+ no_weight_decay_list=['encoder_position_embedding', 'cls_token'],
+ weight_decay=config.TRAIN.WEIGHT_DECAY,
+ layer_decay=config.TRAIN.LAYER_DECAY)
+
+ optimizer = paddle.optimizer.AdamW(
+ parameters=params_groups,
+ learning_rate=lr_scheduler, # now only support warmup + cosine
+ beta1=config.TRAIN.OPTIMIZER.BETAS[0],
+ beta2=config.TRAIN.OPTIMIZER.BETAS[1],
+ weight_decay=config.TRAIN.WEIGHT_DECAY, # set by params_groups, this vaule is not effectitve
+ epsilon=config.TRAIN.OPTIMIZER.EPS,
+ grad_clip=clip)
+ elif config.TRAIN.OPTIMIZER.NAME == "AdamWDL":
+ name_dict = dict()
+ for n, p in model.named_parameters():
+ # name_dict is for AdamWDL argument 'name_dict'
+ name_dict[p.name] = n
+ optimizer = paddlenlp.ops.optimizer.AdamWDL(
+ learning_rate=lr_scheduler,
+ weight_decay=config.TRAIN.WEIGHT_DECAY,
+ layerwise_decay=config.TRAIN.LAYER_DECAY,
+ n_layers=config.MODEL.ENCODER.DEPTH,
+ set_param_lr_fun=lr_decay.lr_setting,
+ parameters=model.parameters(),
+ name_dict=name_dict,
+ apply_decay_param_fun=skip_weight_decay_fn(
+ model, # skip bn and bias in model
+ ['encoder_position_embedding', 'cls_token']), # skip custom ops
+ beta1=config.TRAIN.OPTIMIZER.BETAS[0],
+ beta2=config.TRAIN.OPTIMIZER.BETAS[1],
+ epsilon=config.TRAIN.OPTIMIZER.EPS,
+ grad_clip=clip)
+ else:
+ message = f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}."
+ write_log(local_logger, master_logger, message, None, 'fatal')
+ raise NotImplementedError(message)
+
+ # STEP 6: Load pretrained model / load resumt model and optimizer states
+ if config.MODEL.PRETRAINED:
+ assert os.path.isfile(config.MODEL.PRETRAINED) is True
+ model_state = paddle.load(config.MODEL.PRETRAINED)
+ if 'model' in model_state: # load state_dict with multi items: model, optimier, and epoch
+ # pretrain only load model weight, opt and epoch are ignored
+ model_state = model_state['model']
+ if not config.EVAL:
+ keys = ['encoder.norm.weight', 'encoder.norm.bias',
+ 'classfier.weight', 'classifier.bias']
+ if config.MODEL.GLOBAL_POOL:
+ if keys[0] in model_state:
+ del model_state[keys[0]]
+ if keys[1] in model_state:
+ del model_state[keys[1]]
+ if keys[2] in model_state:
+ del model_state[keys[2]]
+ if keys[3] in model_state:
+ del model_state[keys[3]]
+
+ # interpolate position embedding
+ interpolate_pos_embed(model, model_state)
+
+ model.set_state_dict(model_state)
+ # set fc layer initialization (follow official code)
+ init_fn = nn.initializer.TruncatedNormal(std=0.02)
+ init_fn(model.classifier.weight)
+
+ message = f"----- Pretrained: Load model state from {config.MODEL.PRETRAINED}"
+ write_log(local_logger, master_logger, message)
+
+ if config.MODEL.RESUME:
+ assert os.path.isfile(config.MODEL.RESUME) is True
+ model_state = paddle.load(config.MODEL.RESUME)
+ if 'model' in model_state: # load state_dict with multi items: model, optimier, and epoch
+ model.set_state_dict(model_state['model'])
+ if 'optimizer' in model_state:
+ optimizer.set_state_dict(model_state['optimizer'])
+ if 'epoch' in model_state:
+ config.TRAIN.LAST_EPOCH = model_state['epoch']
+ if 'lr_scheduler' in model_state and lr_scheduler is not None:
+ lr_scheduler.set_state_dict(model_state['lr_scheduler'])
+ if 'amp_grad_scaler' in model_state and amp_grad_scaler is not None:
+ amp_grad_scaler.load_state_dict(model_state['amp_grad_scaler'])
+ lr_scheduler.step(config.TRAIN.LAST_EPOCH)
+ message = (f"----- Resume Training: Load model from {config.MODEL.RESUME}, "
+ f"opt = [{'optimizer' in model_state}], "
+ f"lr_scheduler = [{'lr_scheduler' in model_state}], "
+ f"model_ema = [{'model_ema' in model_state}], "
+ f"epoch = [{model_state.get('epoch', -1)}], "
+ f"amp_grad_scaler = [{'amp_grad_scaler' in model_state}]")
+ write_log(local_logger, master_logger, message)
+ else: # direct load pdparams without other items
+ message = f"----- Resume Training: Load from {config.MODEL.RESUME}, no opt/epoch/scaler"
+ write_log(local_logger, master_logger, message, 'warning')
+ model.set_state_dict(model_state)
+
+ if paddle.distributed.get_world_size() > 1:
+ model = fleet.distributed_model(model)
+
+ # STEP 7: Validation (eval mode)
+ if config.EVAL:
+ write_log(local_logger, master_logger, f"----- Start Validation")
+ val_loss, val_acc1, val_acc5, avg_loss, avg_acc1, avg_acc5, val_time = validate(
+ dataloader=dataloader_val,
+ model=model,
+ criterion=criterion_val,
+ total_batches=total_batch_val,
+ debug_steps=config.REPORT_FREQ,
+ local_logger=local_logger,
+ master_logger=master_logger)
+
+ local_message = (f"----- Validation: " +
+ f"Validation Loss: {val_loss:.4f}, " +
+ f"Validation Acc@1: {val_acc1:.4f}, " +
+ f"Validation Acc@5: {val_acc5:.4f}, " +
+ f"time: {val_time:.2f}")
+
+ master_message = (f"----- Validation: " +
+ f"Validation Loss: {avg_loss:.4f}, " +
+ f"Validation Acc@1: {avg_acc1:.4f}, " +
+ f"Validation Acc@5: {avg_acc5:.4f}, " +
+ f"time: {val_time:.2f}")
+ write_log(local_logger, master_logger, local_message, master_message)
+ return
+
+
+ # STEP 7: Start training (train mode)
+ write_log(local_logger, master_logger, f"----- Start training from epoch {last_epoch+1}.")
+ for epoch in range(last_epoch + 1, config.TRAIN.NUM_EPOCHS + 1):
+ # train
+ write_log(local_logger, master_logger, f"Train epoch {epoch}. LR={optimizer.get_lr():.6e}")
+
+ train_loss, train_acc, avg_loss, avg_acc, train_time = train(
+ dataloader=dataloader_train,
+ model=model,
+ optimizer=optimizer,
+ criterion=criterion,
+ lr_scheduler=lr_scheduler,
+ base_lr=config.TRAIN.BASE_LR,
+ min_lr=config.TRAIN.END_LR,
+ epoch=epoch,
+ warmup_epochs=config.TRAIN.WARMUP_EPOCHS,
+ total_epochs=config.TRAIN.NUM_EPOCHS,
+ total_batches=total_batch_train,
+ debug_steps=config.REPORT_FREQ,
+ accum_iter=config.TRAIN.ACCUM_ITER,
+ mixup_fn=mixup_fn,
+ amp_grad_scaler=amp_grad_scaler,
+ local_logger=local_logger,
+ master_logger=master_logger)
+
+ general_message = (f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], "
+ f"Lr: {optimizer.get_lr():.6e}, "
+ f"time: {train_time:.2f}")
+
+ local_message = (general_message +
+ f"Train Loss: {train_loss:.4f}, "
+ f"Train Acc: {train_acc:.4f}")
+ master_message = (general_message +
+ f"Train Loss: {avg_loss:.4f}, "
+ f"Train Acc: {avg_acc:.4f}")
+
+ write_log(local_logger, master_logger, local_message, master_message)
+
+ # validation
+ if epoch % config.VALIDATE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
+ write_log(local_logger, master_logger, f'----- Validation after Epoch: {epoch}')
+ val_loss, val_acc1, val_acc5, avg_loss, avg_acc1, avg_acc5, val_time = validate(
+ dataloader=dataloader_val,
+ model=model,
+ criterion=criterion_val,
+ total_batches=total_batch_val,
+ debug_steps=config.REPORT_FREQ,
+ local_logger=local_logger,
+ master_logger=master_logger)
+
+ local_message = (f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
+ f"Validation Loss: {val_loss:.4f}, " +
+ f"Validation Acc@1: {val_acc1:.4f}, " +
+ f"Validation Acc@5: {val_acc5:.4f}, " +
+ f"time: {val_time:.2f}")
+
+ master_message = (f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
+ f"Validation Loss: {avg_loss:.4f}, " +
+ f"Validation Acc@1: {avg_acc1:.4f}, " +
+ f"Validation Acc@5: {avg_acc5:.4f}, " +
+ f"time: {val_time:.2f}")
+ write_log(local_logger, master_logger, local_message, master_message)
+
+ # model save
+ if local_rank == 0:
+ if epoch % config.SAVE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
+ model_path = os.path.join(
+ config.SAVE, f"{config.MODEL.TYPE}-Epoch-{epoch}-Loss-{avg_loss}.pdparams")
+ state_dict = dict()
+ state_dict['model'] = model.state_dict()
+ state_dict['optimizer'] = optimizer.state_dict()
+ state_dict['epoch'] = epoch
+ if amp_grad_scaler is not None:
+ state_dict['amp_grad_scaler'] = amp_grad_scaler.state_dict()
+ if lr_scheduler is not None:
+ state_dict['lr_scheduler'] = lr_scheduler.state_dict()
+ paddle.save(state_dict, model_path)
+ message = (f"----- Save model: {model_path}")
+ write_log(local_logger, master_logger, message)
+
+
+def main():
+ # config is updated in order: (1) default in config.py, (2) yaml file, (3) arguments
+ config = update_config(get_config(), get_arguments())
+
+ # set output folder
+ config.SAVE = os.path.join(config.SAVE,
+ f"{'eval' if config.EVAL else 'finetune'}-{time.strftime('%Y%m%d-%H-%M')}")
+ if not os.path.exists(config.SAVE):
+ os.makedirs(config.SAVE, exist_ok=True)
+
+ # get train dataset if in train mode and val dataset
+ dataset_train = get_dataset(config, is_train=True) if not config.EVAL else None
+ dataset_val = get_dataset(config, is_train=False)
+
+ # dist spawn lunch: use CUDA_VISIBLE_DEVICES to set available gpus
+ #paddle.distributed.spawn(main_worker, args=(config, dataset_train, dataset_val))
+ main_worker(config, dataset_train, dataset_val)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/self_supervised_learning/MAE/main_multi_gpu_linearprobe.py b/self_supervised_learning/MAE/main_multi_gpu_linearprobe.py
new file mode 100644
index 00000000..8b30c4c6
--- /dev/null
+++ b/self_supervised_learning/MAE/main_multi_gpu_linearprobe.py
@@ -0,0 +1,622 @@
+# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""MAE linear probing using multiple GPU """
+
+import sys
+import os
+import time
+import argparse
+import random
+import math
+import numpy as np
+import paddle
+from paddle.distributed import fleet
+from datasets import get_dataloader
+from datasets import get_dataset
+from config import get_config
+from config import update_config
+from utils import AverageMeter
+from utils import get_logger
+from utils import write_log
+from utils import all_reduce_mean
+from utils import skip_weight_decay_fn
+from utils import get_params_groups
+from utils import adjust_learning_rate
+from mixup import Mixup
+from losses import LabelSmoothingCrossEntropyLoss
+from losses import SoftTargetCrossEntropyLoss
+from utils import interpolate_pos_embed
+import lr_decay
+from transformer import build_transformer as build_model
+import paddlenlp
+
+
+def get_arguments():
+ """return argumeents, this will overwrite the config by (1) yaml file (2) argument values"""
+ parser = argparse.ArgumentParser('MAE Linearprobe')
+ parser.add_argument('-cfg', type=str, default=None)
+ parser.add_argument('-dataset', type=str, default=None)
+ parser.add_argument('-data_path', type=str, default=None)
+ parser.add_argument('-output', type=str, default=None)
+ parser.add_argument('-batch_size', type=int, default=None)
+ parser.add_argument('-batch_size_eval', type=int, default=None)
+ parser.add_argument('-image_size', type=int, default=None)
+ parser.add_argument('-accum_iter', type=int, default=None)
+ parser.add_argument('-pretrained', type=str, default=None)
+ parser.add_argument('-resume', type=str, default=None)
+ parser.add_argument('-last_epoch', type=int, default=None)
+ parser.add_argument('-eval', action='store_true')
+ parser.add_argument('-amp', action='store_true')
+ arguments = parser.parse_args()
+ return arguments
+
+
+def train(dataloader,
+ model,
+ optimizer,
+ criterion,
+ lr_scheduler,
+ base_lr,
+ min_lr,
+ epoch,
+ warmup_epochs,
+ total_epochs,
+ total_batches,
+ debug_steps=100,
+ accum_iter=1,
+ mixup_fn=None,
+ amp_grad_scaler=None,
+ local_logger=None,
+ master_logger=None):
+ """Training for one epoch
+ Args:
+ dataloader: paddle.io.DataLoader, dataloader instance
+ model: nn.Layer, a ViT model
+ optimizer: nn.optimizer
+ criterion: nn.XXLoss
+ epoch: int, current epoch
+ total_epochs: int, total num of epochs
+ total_batches: int, total num of batches for one epoch
+ debug_steps: int, num of iters to log info, default: 100
+ accum_iter: int, num of iters for accumulating gradients, default: 1
+ mixup_fn: Mixup, mixup instance, default: None
+ amp_grad_scaler: GradScaler/None, if not None, pass the GradScaler and enable AMP training, default: None
+ local_logger: logger for local process/gpu, default: None
+ master_logger: logger for main process, default: None
+ Returns:
+ train_loss_meter.avg: float, average loss on current process/gpu
+ train_acc_meter.avg: float, average acc@1 on current process/gpu
+ master_loss_meter.avg: float, average loss on all processes/gpus
+ master_acc_meter.avg: float, average acc@1 on all processes/gpus
+ train_time: float, training time
+ """
+ model.train()
+ train_loss_meter = AverageMeter()
+ train_acc_meter = AverageMeter()
+ master_loss_meter = AverageMeter()
+ master_acc_meter = AverageMeter()
+
+ time_st = time.time()
+
+ #if amp is True:
+ # scaler = paddle.amp.GradScaler() # default init_loss_scaling = 32768
+ optimizer.clear_grad()
+
+ for batch_id, data in enumerate(dataloader):
+ # get data
+ images = data[0]
+ label = data[1]
+ label_orig = label.clone()
+ batch_size = images.shape[0]
+
+ if mixup_fn is not None:
+ images, label = mixup_fn(images, label_orig)
+
+ if batch_id % accum_iter == 0:
+ lr_scheduler.step(batch_id / total_batches + epoch -1)
+ #adjust_learning_rate(optimizer,
+ # base_lr,
+ # min_lr,
+ # batch_id / total_batches + epoch - 1,
+ # warmup_epochs,
+ # total_epochs)
+ # forward
+ with paddle.amp.auto_cast(amp_grad_scaler is not None):
+ output = model(images)
+ loss = criterion(output, label)
+
+ loss_value = loss.item()
+ if not math.isfinite(loss_value):
+ print("Loss is {}, stopping training".format(loss_value))
+ sys.exit(1)
+
+ loss = loss / accum_iter
+
+ # backward and step
+ if amp_grad_scaler is None: # fp32
+ loss.backward()
+ if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
+ optimizer.step()
+ optimizer.clear_grad()
+ else: # amp
+ scaled_loss = amp_grad_scaler.scale(loss)
+ scaled_loss.backward()
+ if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
+ # amp for param group reference: https://github.com/PaddlePaddle/Paddle/issues/37188
+ amp_grad_scaler.step(optimizer)
+ amp_grad_scaler.update()
+ optimizer.clear_grad()
+
+ pred = paddle.nn.functional.softmax(output)
+ if mixup_fn:
+ acc = paddle.metric.accuracy(pred, label_orig).item()
+ else:
+ acc = paddle.metric.accuracy(pred, label_orig.unsqueeze(1)).item()
+
+ # sync from other gpus for overall loss and acc
+
+ master_loss = all_reduce_mean(loss_value)
+ master_acc = all_reduce_mean(acc)
+ master_batch_size = all_reduce_mean(batch_size)
+
+ master_loss_meter.update(master_loss, master_batch_size)
+ master_acc_meter.update(master_acc, master_batch_size)
+ train_loss_meter.update(loss_value, batch_size)
+ train_acc_meter.update(acc, batch_size)
+
+ if batch_id % debug_steps == 0 or batch_id + 1 == len(dataloader):
+ general_message = (f"Epoch[{epoch:03d}/{total_epochs:03d}], "
+ f"Step[{batch_id:04d}/{total_batches:04d}], "
+ f"Lr: {optimizer.get_lr():.6e}, ")
+ local_message = (general_message +
+ f"Loss: {loss_value:.4f} ({train_loss_meter.avg:.4f}), "
+ f"Avg Acc: {train_acc_meter.avg:.4f}")
+ master_message = (general_message +
+ f"Loss: {master_loss:.4f} ({master_loss_meter.avg:.4f}), "
+ f"Avg Acc: {master_acc_meter.avg:.4f}")
+ write_log(local_logger, master_logger, local_message, master_message)
+
+ train_time = time.time() - time_st
+ paddle.distributed.barrier()
+ return (train_loss_meter.avg,
+ train_acc_meter.avg,
+ master_loss_meter.avg,
+ master_acc_meter.avg,
+ train_time)
+
+
+@paddle.no_grad()
+def validate(dataloader,
+ model,
+ criterion,
+ total_batches,
+ debug_steps=100,
+ local_logger=None,
+ master_logger=None):
+ """Validation for the whole dataset
+ Args:
+ dataloader: paddle.io.DataLoader, dataloader instance
+ model: nn.Layer, a ViT model
+ total_batches: int, total num of batches for one epoch
+ debug_steps: int, num of iters to log info, default: 100
+ local_logger: logger for local process/gpu, default: None
+ master_logger: logger for main process, default: None
+ Returns:
+ val_loss_meter.avg: float, average loss on current process/gpu
+ val_acc1_meter.avg: float, average top1 accuracy on current processes/gpus
+ val_acc5_meter.avg: float, average top5 accuracy on current processes/gpus
+ master_loss_meter.avg: float, average loss on all processes/gpus
+ master_acc1_meter.avg: float, average top1 accuracy on all processes/gpus
+ master_acc5_meter.avg: float, average top5 accuracy on all processes/gpus
+ val_time: float, validation time
+ """
+ model.eval()
+ val_loss_meter = AverageMeter()
+ val_acc1_meter = AverageMeter()
+ val_acc5_meter = AverageMeter()
+ master_loss_meter = AverageMeter()
+ master_acc1_meter = AverageMeter()
+ master_acc5_meter = AverageMeter()
+
+ time_st = time.time()
+
+ for batch_id, data in enumerate(dataloader):
+ # get data
+ images = data[0]
+ label = data[1]
+ batch_size = images.shape[0]
+
+ output = model(images)
+ loss = criterion(output, label)
+ loss_value = loss.item()
+
+ pred = paddle.nn.functional.softmax(output)
+ acc1 = paddle.metric.accuracy(pred, label.unsqueeze(1)).item()
+ acc5 = paddle.metric.accuracy(pred, label.unsqueeze(1), k=5).item()
+
+ # sync from other gpus for overall loss and acc
+ master_loss = all_reduce_mean(loss_value)
+ master_acc1 = all_reduce_mean(acc1)
+ master_acc5 = all_reduce_mean(acc5)
+ master_batch_size = all_reduce_mean(batch_size)
+
+ master_loss_meter.update(master_loss, master_batch_size)
+ master_acc1_meter.update(master_acc1, master_batch_size)
+ master_acc5_meter.update(master_acc5, master_batch_size)
+ val_loss_meter.update(loss_value, batch_size)
+ val_acc1_meter.update(acc1, batch_size)
+ val_acc5_meter.update(acc5, batch_size)
+
+ if batch_id % debug_steps == 0:
+ local_message = (f"Step[{batch_id:04d}/{total_batches:04d}], " +
+ f"Avg Loss: {val_loss_meter.avg:.4f}, " +
+ f"Avg Acc@1: {val_acc1_meter.avg:.4f}, " +
+ f"Avg Acc@5: {val_acc5_meter.avg:.4f}")
+ master_message = (f"Step[{batch_id:04d}/{total_batches:04d}], " +
+ f"Avg Loss: {master_loss_meter.avg:.4f}, " +
+ f"Avg Acc@1: {master_acc1_meter.avg:.4f}, " +
+ f"Avg Acc@5: {master_acc5_meter.avg:.4f}")
+ write_log(local_logger, master_logger, local_message, master_message)
+ paddle.distributed.barrier()
+ val_time = time.time() - time_st
+ return (val_loss_meter.avg,
+ val_acc1_meter.avg,
+ val_acc5_meter.avg,
+ master_loss_meter.avg,
+ master_acc1_meter.avg,
+ master_acc5_meter.avg,
+ val_time)
+
+
+def main_worker(*args):
+ """main method for each process"""
+ # STEP 0: Preparation
+ paddle.device.set_device('gpu')
+ #paddle.distributed.init_parallel_env()
+ world_size = paddle.distributed.get_world_size()
+ local_rank = paddle.distributed.get_rank()
+ config = args[0]
+ last_epoch = config.TRAIN.LAST_EPOCH
+ seed = config.SEED + local_rank
+ paddle.seed(seed)
+ np.random.seed(seed)
+ random.seed(seed)
+ # logger for each process/gpu
+ local_logger, master_logger = get_logger(config.SAVE)
+ message = (f'----- world_size = {world_size}, local_rank = {local_rank} \n'
+ f'----- {config}')
+ write_log(local_logger, master_logger, message)
+
+ # STEP 1: Create model
+ model = build_model(config)
+ # for linear prob: add bn1d to classifier layer
+ model.classifier = paddle.nn.Sequential(
+ paddle.nn.BatchNorm1D(model.classifier.weight.shape[0], weight_attr=False, bias_attr=False, epsilon=1e-6),
+ model.classifier)
+ # freeze all but the classifier
+ for _, p in model.named_parameters():
+ p.stop_gradient = True
+ for _, p in model.classifier.named_parameters():
+ p.stop_gradient = False
+
+ for n, p in model.named_parameters():
+ print(n, p.shape, p.stop_gradient)
+
+ if paddle.distributed.get_world_size() > 1:
+ strategy = fleet.DistributedStrategy()
+ # lars
+ if config.TRAIN.OPTIMIZER.NAME == "LARS":
+ strategy.lars = True
+ strategy.lars_configs = {
+ "lars_coeff": 0.001,
+ "lars_weight_decay": config.TRAIN.WEIGHT_DECAY,
+ "exclude_from_weight_decay": ['cls_token', 'encoder_position_embedding', 'classifier.0', 'classifier.1.bias']
+ }
+ ## Hybrid Parallel Training
+ strategy.hybrid_configs = {}
+ fleet.init(is_collective=True, strategy=strategy)
+
+ # STEP 2: Create train and val dataloader
+ if not config.EVAL:
+ dataset_train = args[1]
+ dataloader_train = get_dataloader(config, dataset_train, True, True)
+ total_batch_train = len(dataloader_train)
+ message = f'----- Total # of train batch (single gpu): {total_batch_train}'
+ write_log(local_logger, master_logger, message)
+
+ dataset_val = args[2]
+ dataloader_val = get_dataloader(config, dataset_val, False, True)
+ total_batch_val = len(dataloader_val)
+ message = f'----- Total # of val batch (single gpu): {total_batch_val}'
+ write_log(local_logger, master_logger, message)
+
+ # STEP 3: Define criterion
+ criterion = paddle.nn.CrossEntropyLoss()
+
+ # STEP 4: Define optimizer and lr_scheduler
+ # set lr according to batch size and world size (hacked from Swin official code and modified for CSwin)
+ if not config.EVAL:
+ if config.TRAIN.LINEAR_SCALED_LR is not None:
+ effective_batch_size = config.DATA.BATCH_SIZE * config.TRAIN.ACCUM_ITER * world_size
+ config.TRAIN.BASE_LR = (
+ config.TRAIN.BASE_LR * effective_batch_size / config.TRAIN.LINEAR_SCALED_LR
+ )
+ write_log(local_logger, master_logger, f'Base lr is scaled to: {config.TRAIN.BASE_LR}')
+
+ # define scaler for amp training
+ amp_grad_scaler = paddle.amp.GradScaler() if config.AMP else None
+ # set gradient clip
+ if config.TRAIN.GRAD_CLIP:
+ clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
+ else:
+ clip = None
+ # set optimizer
+ # create warmup and cosine decay lr scheduler
+ if config.TRAIN.WARMUP_EPOCHS > 0:
+ cosine_lr_scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
+ learning_rate=config.TRAIN.BASE_LR,
+ T_max=config.TRAIN.NUM_EPOCHS - config.TRAIN.WARMUP_EPOCHS,
+ eta_min=config.TRAIN.END_LR,
+ last_epoch=-1) # do not set last epoch, handled in warmup sched get_lr()
+ lr_scheduler = paddle.optimizer.lr.LinearWarmup(
+ learning_rate=cosine_lr_scheduler, # use cosine lr sched after warmup
+ warmup_steps=config.TRAIN.WARMUP_EPOCHS, # only support position integet
+ start_lr=config.TRAIN.WARMUP_START_LR,
+ end_lr=config.TRAIN.BASE_LR,
+ last_epoch=config.TRAIN.LAST_EPOCH)
+ else: # create cosine decay lr scheduler if no warmup epochs
+ lr_scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
+ learning_rate=config.TRAIN.BASE_LR,
+ T_max=config.TRAIN.NUM_EPOCHS,
+ eta_min=config.TRAIN.END_LR,
+ last_epoch=config.TRAIN.LAST_EPOCH)
+
+ if config.TRAIN.OPTIMIZER.NAME == "AdamW":
+ params_groups = lr_decay.param_groups_lrd(
+ model=model,
+ no_weight_decay_list=['encoder_position_embedding', 'cls_token'],
+ weight_decay=config.TRAIN.WEIGHT_DECAY,
+ layer_decay=config.TRAIN.LAYER_DECAY)
+
+ optimizer = paddle.optimizer.AdamW(
+ parameters=params_groups,
+ learning_rate=lr_scheduler, # now only support warmup + cosine
+ beta1=config.TRAIN.OPTIMIZER.BETAS[0],
+ beta2=config.TRAIN.OPTIMIZER.BETAS[1],
+ weight_decay=config.TRAIN.WEIGHT_DECAY, # set by params_groups, this vaule is not effectitve
+ epsilon=config.TRAIN.OPTIMIZER.EPS,
+ grad_clip=clip)
+ elif config.TRAIN.OPTIMIZER.NAME == "AdamWDL":
+ name_dict = dict()
+ for n, p in model.named_parameters():
+ # name_dict is for AdamWDL argument 'name_dict'
+ name_dict[p.name] = n
+
+ optimizer = paddlenlp.ops.optimizer.AdamWDL(
+ learning_rate=lr_scheduler,
+ weight_decay=config.TRAIN.WEIGHT_DECAY,
+ layerwise_decay=config.TRAIN.LAYER_DECAY,
+ n_layers=config.MODEL.ENCODER.DEPTH,
+ set_param_lr_fun=lr_decay.lr_setting,
+ parameters=model.classifier.parameters(),
+ name_dict=name_dict,
+ apply_decay_param_fun=skip_weight_decay_fn(
+ model, # skip bn and bias in model
+ ['encoder_position_embedding', 'cls_token']), # skip custom ops
+ beta1=config.TRAIN.OPTIMIZER.BETAS[0],
+ beta2=config.TRAIN.OPTIMIZER.BETAS[1],
+ epsilon=config.TRAIN.OPTIMIZER.EPS,
+ grad_clip=clip)
+ elif config.TRAIN.OPTIMIZER.NAME == "LARS":
+ optimizer = paddle.optimizer.Momentum(
+ parameters=model.classifier.parameters(),
+ learning_rate=lr_scheduler,
+ momentum=0.9,
+ grad_clip=clip,
+ weight_decay=None, # set by fleet lars
+ )
+ else:
+ message = f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}."
+ write_log(local_logger, master_logger, message, None, 'fatal')
+ raise NotImplementedError(message)
+
+
+ # STEP 5: Load pretrained model / load resumt model and optimizer states
+ if config.MODEL.PRETRAINED:
+ assert os.path.isfile(config.MODEL.PRETRAINED) is True
+ model_state = paddle.load(config.MODEL.PRETRAINED)
+ if 'model' in model_state: # load state_dict with multi items: model, optimier, and epoch
+ # pretrain only load model weight, opt and epoch are ignored
+ model_state = model_state['model']
+ if not config.EVAL:
+ keys = ['encoder.norm.weight', 'encoder.norm.bias',
+ 'classfier.weight', 'classifier.bias']
+ if config.MODEL.GLOBAL_POOL:
+ if keys[0] in model_state:
+ del model_state[keys[0]]
+ if keys[1] in model_state:
+ del model_state[keys[1]]
+ if keys[2] in model_state:
+ del model_state[keys[2]]
+ if keys[3] in model_state:
+ del model_state[keys[3]]
+
+ # interpolate position embedding
+ interpolate_pos_embed(model, model_state)
+
+ model.set_state_dict(model_state)
+ message = f"----- Pretrained: Load model state from {config.MODEL.PRETRAINED}"
+ write_log(local_logger, master_logger, message)
+
+ if config.MODEL.RESUME:
+ assert os.path.isfile(config.MODEL.RESUME) is True
+ model_state = paddle.load(config.MODEL.RESUME)
+ if 'model' in model_state: # load state_dict with multi items: model, optimier, and epoch
+ model.set_state_dict(model_state['model'])
+ if 'optimizer' in model_state:
+ optimizer.set_state_dict(model_state['optimizer'])
+ if 'epoch' in model_state:
+ config.TRAIN.LAST_EPOCH = model_state['epoch']
+ if 'lr_scheduler' in model_state and lr_scheduler is not None:
+ lr_scheduler.set_state_dict(model_state['lr_scheduler'])
+ if 'amp_grad_scaler' in model_state and amp_grad_scaler is not None:
+ amp_grad_scaler.load_state_dict(model_state['amp_grad_scaler'])
+ lr_scheduler.step(config.TRAIN.LAST_EPOCH)
+ message = (f"----- Resume Training: Load model from {config.MODEL.RESUME}, "
+ f"opt = [{'optimizer' in model_state}], "
+ f"lr_scheduler = [{'lr_scheduler' in model_state}], "
+ f"model_ema = [{'model_ema' in model_state}], "
+ f"epoch = [{model_state.get('epoch', -1)}], "
+ f"amp_grad_scaler = [{'amp_grad_scaler' in model_state}]")
+ write_log(local_logger, master_logger, message)
+ else: # direct load pdparams without other items
+ message = f"----- Resume Training: Load from {config.MODEL.RESUME}, no opt/epoch/scaler"
+ write_log(local_logger, master_logger, message, 'warning')
+ model.set_state_dict(model_state)
+
+ if paddle.distributed.get_world_size() > 1:
+ model = fleet.distributed_model(model)
+ if not config.EVAL:
+ optimizer = fleet.distributed_optimizer(optimizer)
+
+ # STEP 7: Validation (eval mode)
+ if config.EVAL:
+ write_log(local_logger, master_logger, f"----- Start Validation")
+ val_loss, val_acc1, val_acc5, avg_loss, avg_acc1, avg_acc5, val_time = validate(
+ dataloader=dataloader_val,
+ model=model,
+ criterion=criterion,
+ total_batches=total_batch_val,
+ debug_steps=config.REPORT_FREQ,
+ local_logger=local_logger,
+ master_logger=master_logger)
+
+ local_message = (f"----- Validation: " +
+ f"Validation Loss: {val_loss:.4f}, " +
+ f"Validation Acc@1: {val_acc1:.4f}, " +
+ f"Validation Acc@5: {val_acc5:.4f}, " +
+ f"time: {val_time:.2f}")
+
+ master_message = (f"----- Validation: " +
+ f"Validation Loss: {avg_loss:.4f}, " +
+ f"Validation Acc@1: {avg_acc1:.4f}, " +
+ f"Validation Acc@5: {avg_acc5:.4f}, " +
+ f"time: {val_time:.2f}")
+ write_log(local_logger, master_logger, local_message, master_message)
+ return
+
+
+ # STEP 7: Start training (train mode)
+ write_log(local_logger, master_logger, f"----- Start training from epoch {last_epoch+1}.")
+ for epoch in range(last_epoch + 1, config.TRAIN.NUM_EPOCHS + 1):
+ # train
+ write_log(local_logger, master_logger, f"Train epoch {epoch}. LR={optimizer.get_lr():.6e}")
+
+ train_loss, train_acc, avg_loss, avg_acc, train_time = train(
+ dataloader=dataloader_train,
+ model=model,
+ optimizer=optimizer,
+ criterion=criterion,
+ lr_scheduler=lr_scheduler,
+ base_lr=config.TRAIN.BASE_LR,
+ min_lr=config.TRAIN.END_LR,
+ epoch=epoch,
+ warmup_epochs=config.TRAIN.WARMUP_EPOCHS,
+ total_epochs=config.TRAIN.NUM_EPOCHS,
+ total_batches=total_batch_train,
+ debug_steps=config.REPORT_FREQ,
+ accum_iter=config.TRAIN.ACCUM_ITER,
+ mixup_fn=None,
+ amp_grad_scaler=amp_grad_scaler,
+ local_logger=local_logger,
+ master_logger=master_logger)
+
+ general_message = (f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], "
+ f"Lr: {optimizer.get_lr():.6e}, "
+ f"time: {train_time:.2f}")
+
+ local_message = (general_message +
+ f"Train Loss: {train_loss:.4f}, "
+ f"Train Acc: {train_acc:.4f}")
+ master_message = (general_message +
+ f"Train Loss: {avg_loss:.4f}, "
+ f"Train Acc: {avg_acc:.4f}")
+
+ write_log(local_logger, master_logger, local_message, master_message)
+
+ # validation
+ if epoch % config.VALIDATE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
+ write_log(local_logger, master_logger, f'----- Validation after Epoch: {epoch}')
+ val_loss, val_acc1, val_acc5, avg_loss, avg_acc1, avg_acc5, val_time = validate(
+ dataloader=dataloader_val,
+ model=model,
+ criterion=criterion,
+ total_batches=total_batch_val,
+ debug_steps=config.REPORT_FREQ,
+ local_logger=local_logger,
+ master_logger=master_logger)
+
+ local_message = (f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
+ f"Validation Loss: {val_loss:.4f}, " +
+ f"Validation Acc@1: {val_acc1:.4f}, " +
+ f"Validation Acc@5: {val_acc5:.4f}, " +
+ f"time: {val_time:.2f}")
+
+ master_message = (f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], " +
+ f"Validation Loss: {avg_loss:.4f}, " +
+ f"Validation Acc@1: {avg_acc1:.4f}, " +
+ f"Validation Acc@5: {avg_acc5:.4f}, " +
+ f"time: {val_time:.2f}")
+ write_log(local_logger, master_logger, local_message, master_message)
+
+ # model save
+ if local_rank == 0:
+ if epoch % config.SAVE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
+ model_path = os.path.join(
+ config.SAVE, f"{config.MODEL.TYPE}-Epoch-{epoch}-Loss-{avg_loss}.pdparams")
+ state_dict = dict()
+ state_dict['model'] = model.state_dict()
+ state_dict['optimizer'] = optimizer.state_dict()
+ state_dict['epoch'] = epoch
+ if amp_grad_scaler is not None:
+ state_dict['amp_grad_scaler'] = amp_grad_scaler.state_dict()
+ if lr_scheduler is not None:
+ state_dict['lr_scheduler'] = lr_scheduler.state_dict()
+ paddle.save(state_dict, model_path)
+ message = (f"----- Save model: {model_path}")
+ write_log(local_logger, master_logger, message)
+
+
+def main():
+ # config is updated in order: (1) default in config.py, (2) yaml file, (3) arguments
+ config = update_config(get_config(), get_arguments())
+
+ # set output folder
+ config.SAVE = os.path.join(config.SAVE,
+ f"{'eval' if config.EVAL else 'linearprobe'}-{time.strftime('%Y%m%d-%H-%M')}")
+ if not os.path.exists(config.SAVE):
+ os.makedirs(config.SAVE, exist_ok=True)
+
+ # get train dataset if in train mode and val dataset
+ dataset_train = get_dataset(config, is_train=True) if not config.EVAL else None
+ dataset_val = get_dataset(config, is_train=False)
+
+ # dist spawn lunch: use CUDA_VISIBLE_DEVICES to set available gpus
+ #paddle.distributed.spawn(main_worker, args=(config, dataset_train, dataset_val))
+ main_worker(config, dataset_train, dataset_val)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/self_supervised_learning/MAE/main_multi_gpu_pretrain.py b/self_supervised_learning/MAE/main_multi_gpu_pretrain.py
new file mode 100644
index 00000000..31a59041
--- /dev/null
+++ b/self_supervised_learning/MAE/main_multi_gpu_pretrain.py
@@ -0,0 +1,378 @@
+# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""MAE pretraining using multiple GPU """
+
+import sys
+import os
+import time
+import argparse
+import random
+import math
+import numpy as np
+import paddle
+from paddle.distributed import fleet
+from datasets import get_dataloader
+from datasets import get_dataset
+from config import get_config
+from config import update_config
+from utils import AverageMeter
+from utils import get_logger
+from utils import write_log
+from utils import all_reduce_mean
+from utils import skip_weight_decay_fn
+from utils import get_params_groups
+from utils import adjust_learning_rate
+#from mixup import Mixup
+#from losses import LabelSmoothingCrossEntropyLoss
+#from losses import SoftTargetCrossEntropyLoss
+from transformer import build_mae_pretrain as build_model
+import paddlenlp
+
+def get_arguments():
+ """return argumeents, this will overwrite the config by (1) yaml file (2) argument values"""
+ parser = argparse.ArgumentParser('MAE Pretrain')
+ parser.add_argument('-cfg', type=str, default=None)
+ parser.add_argument('-dataset', type=str, default=None)
+ parser.add_argument('-data_path', type=str, default=None)
+ parser.add_argument('-output', type=str, default=None)
+ parser.add_argument('-batch_size', type=int, default=None)
+ parser.add_argument('-batch_size_eval', type=int, default=None)
+ parser.add_argument('-image_size', type=int, default=None)
+ parser.add_argument('-accum_iter', type=int, default=None)
+ parser.add_argument('-pretrained', type=str, default=None)
+ parser.add_argument('-resume', type=str, default=None)
+ parser.add_argument('-last_epoch', type=int, default=None)
+ parser.add_argument('-eval', action='store_true')
+ parser.add_argument('-amp', action='store_true')
+ arguments = parser.parse_args()
+ return arguments
+
+
+def train(dataloader,
+ model,
+ mask_ratio,
+ optimizer,
+ lr_scheduler,
+ base_lr,
+ min_lr,
+ epoch,
+ warmup_epochs,
+ total_epochs,
+ total_batches,
+ debug_steps=100,
+ accum_iter=1,
+ amp_grad_scaler=None,
+ local_logger=None,
+ master_logger=None):
+ """Training for one epoch
+ Args:
+ dataloader: paddle.io.DataLoader, dataloader instance
+ model: nn.Layer, a ViT model
+ masek_ratio, float, mask ratio
+ optimizer: nn.optimizer
+ base_lr: float, base learning rate
+ min_lr: float, minimum lr
+ epoch: int, current epoch
+ total_epochs: int, total num of epochs
+ total_batches: int, total num of batches for one epoch
+ debug_steps: int, num of iters to log info, default: 100
+ accum_iter: int, num of iters for accumulating gradients, default: 1
+ amp_grad_scaler: GradScaler/None, if not None, pass the GradScaler and enable AMP training, default: None
+ local_logger: logger for local process/gpu, default: None
+ master_logger: logger for main process, default: None
+ Returns:
+ train_loss_meter.avg: float, average loss on current process/gpu
+ master_loss_meter.avg: float, average loss on all processes/gpus
+ train_time: float, training time
+ """
+ model.train()
+ train_loss_meter = AverageMeter()
+ master_loss_meter = AverageMeter()
+
+ time_st = time.time()
+
+ #if amp is True:
+ # scaler = paddle.amp.GradScaler() # default init_loss_scaling = 32768
+ optimizer.clear_grad()
+
+ for batch_id, data in enumerate(dataloader):
+ # get data
+ images = data[0]
+ batch_size = images.shape[0]
+ # adjust learning rate
+ if batch_id % accum_iter == 0:
+ lr_scheduler.step(batch_id / total_batches + epoch -1)
+ #adjust_learning_rate(optimizer,
+ # base_lr,
+ # min_lr,
+ # batch_id / total_batches + epoch - 1,
+ # warmup_epochs,
+ # total_epochs)
+ # forward
+ with paddle.amp.auto_cast(amp_grad_scaler is not None):
+ loss, _, _ = model(images)
+
+ loss_value = loss.item()
+ if not math.isfinite(loss_value):
+ print("Loss is {}, stopping training".format(loss_value))
+ sys.exit(1)
+
+ loss = loss / accum_iter
+
+ # backward and step
+ if amp_grad_scaler is None: # fp32
+ loss.backward()
+ if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
+ optimizer.step()
+ optimizer.clear_grad()
+ else: # amp
+ scaled_loss = amp_grad_scaler.scale(loss)
+ scaled_loss.backward()
+ if ((batch_id + 1) % accum_iter == 0) or (batch_id + 1 == len(dataloader)):
+ # amp for param group reference: https://github.com/PaddlePaddle/Paddle/issues/37188
+ amp_grad_scaler.step(optimizer)
+ amp_grad_scaler.update()
+ optimizer.clear_grad()
+
+ # sync from other gpus for overall loss and acc
+ master_loss = all_reduce_mean(loss_value)
+ master_batch_size = all_reduce_mean(batch_size)
+ master_loss_meter.update(master_loss, master_batch_size)
+ train_loss_meter.update(loss_value, batch_size)
+ if batch_id % debug_steps == 0 or batch_id + 1 == len(dataloader):
+ general_message = (f"Epoch[{epoch:03d}/{total_epochs:03d}], "
+ f"Step[{batch_id:04d}/{total_batches:04d}], "
+ f"Lr: {optimizer.get_lr():.6e}, ")
+ local_message = (general_message +
+ f"Loss: {loss_value:.4f} ({train_loss_meter.avg:.4f})")
+ master_message = (general_message +
+ f"Loss: {master_loss:.4f} ({master_loss_meter.avg:.4f})")
+ write_log(local_logger, master_logger, local_message, master_message)
+
+ paddle.distributed.barrier()
+ train_time = time.time() - time_st
+ return train_loss_meter.avg, master_loss_meter.avg, train_time
+
+
+def main_worker(*args):
+ """main method for each process"""
+ # STEP 0: Preparation
+ paddle.device.set_device('gpu')
+ #paddle.distributed.init_parallel_env()
+ world_size = paddle.distributed.get_world_size()
+ local_rank = paddle.distributed.get_rank()
+ config = args[0]
+ last_epoch = config.TRAIN.LAST_EPOCH
+ seed = config.SEED + local_rank
+ paddle.seed(seed)
+ np.random.seed(seed)
+ random.seed(seed)
+ # logger for each process/gpu
+ local_logger, master_logger = get_logger(config.SAVE)
+ message = (f'----- world_size = {world_size}, local_rank = {local_rank} \n'
+ f'----- {config}')
+ write_log(local_logger, master_logger, message)
+
+ # STEP 1: Create model
+ model = build_model(config)
+ if paddle.distributed.get_world_size() > 1:
+ strategy = fleet.DistributedStrategy()
+ ## Hybrid Parallel Training
+ strategy.hybrid_configs = {}
+ fleet.init(is_collective=True, strategy=strategy)
+
+ # STEP 2: Create train dataloader
+ dataset_train = args[1]
+ dataloader_train = get_dataloader(config, dataset_train, True, True)
+ total_batch_train = len(dataloader_train)
+ message = f'----- Total # of train batch (single gpu): {total_batch_train}'
+ write_log(local_logger, master_logger, message)
+
+ # STEP 3: Define optimizer and lr_scheduler
+ # set lr according to batch size and world size (hacked from Swin official code and modified for CSwin)
+ if config.TRAIN.LINEAR_SCALED_LR is not None:
+ effective_batch_size = config.DATA.BATCH_SIZE * config.TRAIN.ACCUM_ITER * world_size
+ config.TRAIN.BASE_LR = (
+ config.TRAIN.BASE_LR * effective_batch_size / config.TRAIN.LINEAR_SCALED_LR
+ )
+ write_log(local_logger, master_logger, f'Base lr is scaled to: {config.TRAIN.BASE_LR}')
+ # define scaler for amp training
+ amp_grad_scaler = paddle.amp.GradScaler() if config.AMP else None
+ # set gradient clip
+ if config.TRAIN.GRAD_CLIP:
+ clip = paddle.nn.ClipGradByGlobalNorm(config.TRAIN.GRAD_CLIP)
+ else:
+ clip = None
+ # set optimizer
+ # create warmup and cosine decay lr scheduler
+ if config.TRAIN.WARMUP_EPOCHS > 0:
+ cosine_lr_scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
+ learning_rate=config.TRAIN.BASE_LR,
+ T_max=config.TRAIN.NUM_EPOCHS - config.TRAIN.WARMUP_EPOCHS,
+ eta_min=config.TRAIN.END_LR,
+ last_epoch=-1) # do not set last epoch, handled in warmup sched get_lr()
+ lr_scheduler = paddle.optimizer.lr.LinearWarmup(
+ learning_rate=cosine_lr_scheduler, # use cosine lr sched after warmup
+ warmup_steps=config.TRAIN.WARMUP_EPOCHS, # only support position integet
+ start_lr=config.TRAIN.WARMUP_START_LR,
+ end_lr=config.TRAIN.BASE_LR,
+ last_epoch=config.TRAIN.LAST_EPOCH)
+ else: # create cosine decay lr scheduler if no warmup epochs
+ lr_scheduler = paddle.optimizer.lr.CosineAnnealingDecay(
+ learning_rate=config.TRAIN.BASE_LR,
+ T_max=config.TRAIN.NUM_EPOCHS,
+ eta_min=config.TRAIN.END_LR,
+ last_epoch=config.TRAIN.LAST_EPOCH)
+
+ if config.TRAIN.OPTIMIZER.NAME == "AdamW":
+ optimizer = paddle.optimizer.AdamW(
+ parameters=model.parameters(),
+ learning_rate=lr_scheduler, # now only support warmup + consine
+ beta1=config.TRAIN.OPTIMIZER.BETAS[0],
+ beta2=config.TRAIN.OPTIMIZER.BETAS[1],
+ weight_decay=config.TRAIN.WEIGHT_DECAY,
+ epsilon=config.TRAIN.OPTIMIZER.EPS,
+ grad_clip=clip,
+ apply_decay_param_fun=skip_weight_decay_fn(
+ model, # skip bn and bias in model
+ ['encoder_position_embedding', 'cls_token']), # skip custom ops
+ )
+ elif config.TRAIN.OPTIMIZER.NAME == "AdamWDL": # using paddlenlp's impl
+ optimizer = paddlenlp.ops.optimizer.AdamWDL(
+ learning_rate=lr_scheduler,
+ weight_decay=config.TRAIN.WEIGHT_DECAY,
+ layerwise_decay=config.TRAIN.LAYER_DECAY,
+ n_layers=config.MODEL.ENCODER.DEPTH,
+ set_param_lr_fun=lr_decay.lr_setting,
+ parameters=model.parameters(),
+ name_dict=name_dict,
+ apply_decay_param_fun=skip_weight_decay_fn(
+ model, # skip bn and bias in model
+ ['encoder_position_embedding', 'cls_token']), # skip custom ops
+ beta1=config.TRAIN.OPTIMIZER.BETAS[0],
+ beta2=config.TRAIN.OPTIMIZER.BETAS[1],
+ epsilon=config.TRAIN.OPTIMIZER.EPS,
+ grad_clip=clip)
+ else:
+ message = f"Unsupported Optimizer: {config.TRAIN.OPTIMIZER.NAME}."
+ write_log(local_logger, master_logger, message, None, 'fatal')
+ raise NotImplementedError(message)
+
+ # STEP 4: Load pretrained model / load resumt model and optimizer states
+ if config.MODEL.PRETRAINED:
+ assert os.path.isfile(config.MODEL.PRETRAINED) is True
+ model_state = paddle.load(config.MODEL.PRETRAINED)
+ if 'model' in model_state: # load state_dict with multi items: model, optimier, and epoch
+ # pretrain only load model weight, opt and epoch are ignored
+ model.set_state_dict(model_state['model'])
+ else: # direct load pdparams without other items
+ model.set_state_dict(model_state)
+ message = f"----- Pretrained: Load model state from {config.MODEL.PRETRAINED}"
+ write_log(local_logger, master_logger, message)
+
+ if config.MODEL.RESUME:
+ assert os.path.isfile(config.MODEL.RESUME) is True
+ model_state = paddle.load(config.MODEL.RESUME)
+ if 'model' in model_state: # load state_dict with multi items: model, optimier, and epoch
+ model.set_state_dict(model_state['model'])
+ if 'optimizer' in model_state:
+ optimizer.set_state_dict(model_state['optimizer'])
+ if 'lr_scheduler' in model_state and lr_scheduler is not None:
+ lr_scheduler.set_state_dict(model_state['lr_scheduler'])
+ if 'epoch' in model_state:
+ config.TRAIN.LAST_EPOCH = model_state['epoch']
+ if 'amp_grad_scaler' in model_state and amp_grad_scaler is not None:
+ amp_grad_scaler.load_state_dict(model_state['amp_grad_scaler'])
+ if config.TRAIN.MODEL_EMA:
+ model_ema.module.set_state_dict(model_state['model_ema'])
+ lr_scheduler.step(config.TRAIN.LAST_EPOCH)
+ message = (f"----- Resume Training: Load model from {config.MODEL.RESUME}, "
+ f"opt = [{'optimizer' in model_state}], "
+ f"lr_scheduler = [{'lr_scheduler' in model_state}], "
+ f"model_ema = [{'model_ema' in model_state}], "
+ f"epoch = [{model_state.get('epoch', -1)}], "
+ f"amp_grad_scaler = [{'amp_grad_scaler' in model_state}]")
+ write_log(local_logger, master_logger, message)
+ else: # direct load pdparams without other items
+ message = f"----- Resume Training: Load from {config.MODEL.RESUME}, no opt/epoch/scaler"
+ write_log(local_logger, master_logger, message, 'warning')
+ model.set_state_dict(model_state)
+
+ write_log(local_logger, master_logger, f"----- Start training from epoch {last_epoch + 1}.")
+ for epoch in range(last_epoch + 1, config.TRAIN.NUM_EPOCHS + 1):
+ # Train one epoch
+ write_log(local_logger, master_logger, f"Train epoch {epoch}. LR={optimizer.get_lr():.6e}")
+
+ train_loss, avg_loss, train_time = train(
+ dataloader=dataloader_train,
+ model=model,
+ mask_ratio=config.MODEL.MASK_RATIO,
+ optimizer=optimizer,
+ lr_scheduler=lr_scheduler,
+ base_lr=config.TRAIN.BASE_LR,
+ min_lr=config.TRAIN.END_LR,
+ epoch=epoch,
+ warmup_epochs=config.TRAIN.WARMUP_EPOCHS,
+ total_epochs=config.TRAIN.NUM_EPOCHS,
+ total_batches=total_batch_train,
+ debug_steps=config.REPORT_FREQ,
+ accum_iter=config.TRAIN.ACCUM_ITER,
+ amp_grad_scaler=amp_grad_scaler,
+ local_logger=local_logger,
+ master_logger=master_logger)
+
+ general_message = (f"----- Epoch[{epoch:03d}/{config.TRAIN.NUM_EPOCHS:03d}], "
+ f"Lr: {optimizer.get_lr():.6e}, "
+ f"time: {train_time:.2f}, ")
+ local_message = (general_message +
+ f"Train Loss: {train_loss:.4f}")
+ master_message = (general_message +
+ f"Train Loss: {avg_loss:.4f}")
+ write_log(local_logger, master_logger, local_message, master_message)
+
+ # model save
+ if local_rank == 0:
+ if epoch % config.SAVE_FREQ == 0 or epoch == config.TRAIN.NUM_EPOCHS:
+ model_path = os.path.join(
+ config.SAVE, f"{config.MODEL.TYPE}-Epoch-{epoch}-Loss-{avg_loss}.pdparams")
+ state_dict = dict()
+ state_dict['model'] = model.state_dict()
+ state_dict['optimizer'] = optimizer.state_dict()
+ state_dict['epoch'] = epoch
+ if amp_grad_scaler is not None:
+ state_dict['amp_grad_scaler'] = amp_grad_scaler.state_dict()
+ if lr_scheduler is not None:
+ state_dict['lr_scheduler'] = lr_scheduler.state_dict()
+ paddle.save(state_dict, model_path)
+ message = (f"----- Save model: {model_path}")
+ write_log(local_logger, master_logger, message)
+
+
+def main():
+ # config is updated in order: (1) default in config.py, (2) yaml file, (3) arguments
+ config = update_config(get_config(), get_arguments())
+ # set output folder
+ config.SAVE = '{}/pretrain-{}'.format(config.SAVE, time.strftime('%Y%m%d-%H-%M'))
+ if not os.path.exists(config.SAVE):
+ os.makedirs(config.SAVE, exist_ok=True)
+ # get dataset
+ dataset_train = get_dataset(config, is_train=True)
+ # start training
+ #paddle.distributed.spawn(main_worker, args=(config, dataset_train, ))
+ main_worker(config, dataset_train, )
+
+
+if __name__ == "__main__":
+ main()
diff --git a/image_classification/MAE/mixup.py b/self_supervised_learning/MAE/mixup.py
similarity index 97%
rename from image_classification/MAE/mixup.py
rename to self_supervised_learning/MAE/mixup.py
index 1d2db493..c365dcdf 100644
--- a/image_classification/MAE/mixup.py
+++ b/self_supervised_learning/MAE/mixup.py
@@ -1,4 +1,4 @@
-# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
+# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
@@ -43,7 +43,7 @@ def rand_bbox(image_shape, lam, count=None):
bbox_y2 = np.clip(cy + cut_h // 2, 0, image_h)
# NOTE: in paddle, tensor indexing e.g., a[x1:x2],
- # if x1 == x2, paddle will raise ValueErros,
+ # if x1 == x2, paddle will raise ValueErros,
# while in pytorch, it will return [] tensor
return bbox_x1, bbox_y1, bbox_x2, bbox_y2
@@ -63,8 +63,8 @@ def rand_bbox_minmax(image_shape, minmax, count=None):
image_h, image_w = image_shape[-2:]
min_ratio = minmax[0]
max_ratio = minmax[1]
- cut_h = np.random.randint(int(image_h * min_ratio), int(image_h * max_ratio), size=count)
- cut_w = np.random.randint(int(image_w * min_ratio), int(image_w * max_ratio), size=count)
+ cut_h = np.random.randint(int(image_h * min_ratio), int(image_h * max_ratio), size=count)
+ cut_w = np.random.randint(int(image_w * min_ratio), int(image_w * max_ratio), size=count)
bbox_x1 = np.random.randint(0, image_w - cut_w, size=count)
bbox_y1 = np.random.randint(0, image_h - cut_h, size=count)
@@ -213,7 +213,7 @@ def _mix_batch(self, x):
correct_lam=self.correct_lam)
# NOTE: in paddle, tensor indexing e.g., a[x1:x2],
- # if x1 == x2, paddle will raise ValueErros,
+ # if x1 == x2, paddle will raise ValueErros,
# but in pytorch, it will return [] tensor without errors
if int(bbox_x1) != int(bbox_x2) and int(bbox_y1) != int(bbox_y2):
x[:, :, int(bbox_x1): int(bbox_x2), int(bbox_y1): int(bbox_y2)] = x.flip(axis=[0])[
diff --git a/self_supervised_learning/MAE/nohup.out b/self_supervised_learning/MAE/nohup.out
new file mode 100644
index 00000000..5bd262f2
--- /dev/null
+++ b/self_supervised_learning/MAE/nohup.out
@@ -0,0 +1,1677 @@
+WARNING 2022-03-22 22:11:35,173 launch.py:503] Not found distinct arguments and compiled with cuda or xpu or npu. Default use collective mode
+INFO 2022-03-22 22:11:35,175 launch_utils.py:557] Local start 8 processes. First process distributed environment info (Only For Debug):
+ +=======================================================================================+
+ | Distributed Envs Value |
+ +---------------------------------------------------------------------------------------+
+ | PADDLE_TRAINER_ID 0 |
+ | PADDLE_CURRENT_ENDPOINT 127.0.0.1:11075 |
+ | PADDLE_TRAINERS_NUM 8 |
+ | PADDLE_TRAINER_ENDPOINTS ... 0.1:15064,127.0.0.1:41881,127.0.0.1:50174|
+ | PADDLE_RANK_IN_NODE 0 |
+ | PADDLE_LOCAL_DEVICE_IDS 0 |
+ | PADDLE_WORLD_DEVICE_IDS 0,1,2,3,4,5,6,7 |
+ | FLAGS_selected_gpus 0 |
+ | FLAGS_selected_accelerators 0 |
+ +=======================================================================================+
+
+INFO 2022-03-22 22:11:35,175 launch_utils.py:562] details about PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
+----------- Configuration Arguments -----------
+backend: auto
+cluster_topo_path: None
+elastic_pre_hook: None
+elastic_server: None
+enable_auto_mapping: False
+force: False
+gpus: 0,1,2,3,4,5,6,7
+heter_devices:
+heter_worker_num: None
+heter_workers:
+host: None
+http_port: None
+ips: 127.0.0.1
+job_id: None
+log_dir: log
+np: None
+nproc_per_node: None
+rank_mapping_path: None
+run_mode: None
+scale: 0
+server_num: None
+servers:
+training_script: main_multi_gpu_linearprobe.py
+training_script_args: ['-cfg=./configs/vit_base_patch16_224_linearprobe_single_node.yaml', '-dataset=imagenet2012', '-batch_size=512', '-data_path=/dataset/imagenet', '-pretrained=./mae_pretrain_vit_base.pdparams', '-amp']
+worker_num: None
+workers:
+------------------------------------------------
+launch train in GPU mode!
+launch proc_id:5826 idx:0
+launch proc_id:5829 idx:1
+launch proc_id:5832 idx:2
+launch proc_id:5835 idx:3
+launch proc_id:5838 idx:4
+launch proc_id:5841 idx:5
+launch proc_id:5844 idx:6
+launch proc_id:5847 idx:7
+/usr/local/lib/python3.7/site-packages/paddlenlp/transformers/funnel/modeling.py:30: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
+ from collections import Iterable
+Compose(
+
+
+
+
+)
+----- Imagenet2012 train_list.txt len = 1281167
+----- Imagenet2012 val_list.txt len = 50000
+2022-03-22 22:11:41,621 MASTER_LOG ----- world_size = 8, local_rank = 0
+----- AMP: True
+BASE: ['']
+DATA:
+ BATCH_SIZE: 512
+ BATCH_SIZE_EVAL: 512
+ CROP_PCT: 0.875
+ DATASET: imagenet2012
+ DATA_PATH: /dataset/imagenet
+ IMAGENET_MEAN: [0.485, 0.456, 0.406]
+ IMAGENET_STD: [0.229, 0.224, 0.225]
+ IMAGE_CHANNELS: 3
+ IMAGE_SIZE: 224
+ NUM_WORKERS: 2
+EVAL: False
+MODEL:
+ ATTENTION_DROPOUT: 0.0
+ DECODER:
+ DEPTH: 8
+ EMBED_DIM: 512
+ NUM_HEADS: 16
+ DROPOUT: 0.0
+ DROPPATH: 0.0
+ ENCODER:
+ DEPTH: 12
+ EMBED_DIM: 768
+ NUM_HEADS: 12
+ GLOBAL_POOL: False
+ MASK_RATIO: 0.75
+ MLP_RATIO: 4.0
+ NAME: vit_base_patch16_224
+ NORM_PIX_LOSS: True
+ NUM_CLASSES: 1000
+ PATCH_SIZE: 16
+ PRETRAINED: ./mae_pretrain_vit_base.pdparams
+ QKV_BIAS: True
+ RESUME: None
+ TYPE: LINEARPROBE
+REPORT_FREQ: 20
+SAVE: ./output/linearprobe-20220322-22-11
+SAVE_FREQ: 10
+SEED: 0
+TRAIN:
+ ACCUM_ITER: 4
+ AUTO_AUGMENT: False
+ BASE_LR: 0.1
+ COLOR_JITTER: 0.4
+ CUTMIX_ALPHA: 1.0
+ CUTMIX_MINMAX: None
+ END_LR: 0.0
+ GRAD_CLIP: None
+ LAST_EPOCH: 0
+ LAYER_DECAY: None
+ LINEAR_SCALED_LR: 256
+ MIXUP_ALPHA: 0.8
+ MIXUP_MODE: batch
+ MIXUP_PROB: 1.0
+ MIXUP_SWITCH_PROB: 0.5
+ NUM_EPOCHS: 90
+ OPTIMIZER:
+ BETAS: (0.9, 0.95)
+ EPS: 1e-08
+ NAME: LARS
+ RANDOM_ERASE_COUNT: 1
+ RANDOM_ERASE_MODE: pixel
+ RANDOM_ERASE_PROB: 0.25
+ RANDOM_ERASE_SPLIT: False
+ RAND_AUGMENT: True
+ RAND_AUGMENT_LAYERS: 2
+ RAND_AUGMENT_MAGNITUDE: 9
+ SMOOTHING: 0.1
+ WARMUP_EPOCHS: 10
+ WARMUP_START_LR: 0.0
+ WEIGHT_DECAY: 0.0
+VALIDATE_FREQ: 1
+W0322 22:11:41.623504 5826 gpu_context.cc:240] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
+W0322 22:11:41.630729 5826 gpu_context.cc:268] device: 0, cuDNN Version: 7.6.
+encoder_position_embedding [1, 197, 768] True
+cls_token [1, 1, 768] True
+patch_embedding.patch_embedding.weight [768, 3, 16, 16] True
+patch_embedding.patch_embedding.bias [768] True
+encoder.layers.0.attn_norm.weight [768] True
+encoder.layers.0.attn_norm.bias [768] True
+encoder.layers.0.attn.qkv.weight [768, 2304] True
+encoder.layers.0.attn.qkv.bias [2304] True
+encoder.layers.0.attn.out.weight [768, 768] True
+encoder.layers.0.attn.out.bias [768] True
+encoder.layers.0.mlp_norm.weight [768] True
+encoder.layers.0.mlp_norm.bias [768] True
+encoder.layers.0.mlp.fc1.weight [768, 3072] True
+encoder.layers.0.mlp.fc1.bias [3072] True
+encoder.layers.0.mlp.fc2.weight [3072, 768] True
+encoder.layers.0.mlp.fc2.bias [768] True
+encoder.layers.1.attn_norm.weight [768] True
+encoder.layers.1.attn_norm.bias [768] True
+encoder.layers.1.attn.qkv.weight [768, 2304] True
+encoder.layers.1.attn.qkv.bias [2304] True
+encoder.layers.1.attn.out.weight [768, 768] True
+encoder.layers.1.attn.out.bias [768] True
+encoder.layers.1.mlp_norm.weight [768] True
+encoder.layers.1.mlp_norm.bias [768] True
+encoder.layers.1.mlp.fc1.weight [768, 3072] True
+encoder.layers.1.mlp.fc1.bias [3072] True
+encoder.layers.1.mlp.fc2.weight [3072, 768] True
+encoder.layers.1.mlp.fc2.bias [768] True
+encoder.layers.2.attn_norm.weight [768] True
+encoder.layers.2.attn_norm.bias [768] True
+encoder.layers.2.attn.qkv.weight [768, 2304] True
+encoder.layers.2.attn.qkv.bias [2304] True
+encoder.layers.2.attn.out.weight [768, 768] True
+encoder.layers.2.attn.out.bias [768] True
+encoder.layers.2.mlp_norm.weight [768] True
+encoder.layers.2.mlp_norm.bias [768] True
+encoder.layers.2.mlp.fc1.weight [768, 3072] True
+encoder.layers.2.mlp.fc1.bias [3072] True
+encoder.layers.2.mlp.fc2.weight [3072, 768] True
+encoder.layers.2.mlp.fc2.bias [768] True
+encoder.layers.3.attn_norm.weight [768] True
+encoder.layers.3.attn_norm.bias [768] True
+encoder.layers.3.attn.qkv.weight [768, 2304] True
+encoder.layers.3.attn.qkv.bias [2304] True
+encoder.layers.3.attn.out.weight [768, 768] True
+encoder.layers.3.attn.out.bias [768] True
+encoder.layers.3.mlp_norm.weight [768] True
+encoder.layers.3.mlp_norm.bias [768] True
+encoder.layers.3.mlp.fc1.weight [768, 3072] True
+encoder.layers.3.mlp.fc1.bias [3072] True
+encoder.layers.3.mlp.fc2.weight [3072, 768] True
+encoder.layers.3.mlp.fc2.bias [768] True
+encoder.layers.4.attn_norm.weight [768] True
+encoder.layers.4.attn_norm.bias [768] True
+encoder.layers.4.attn.qkv.weight [768, 2304] True
+encoder.layers.4.attn.qkv.bias [2304] True
+encoder.layers.4.attn.out.weight [768, 768] True
+encoder.layers.4.attn.out.bias [768] True
+encoder.layers.4.mlp_norm.weight [768] True
+encoder.layers.4.mlp_norm.bias [768] True
+encoder.layers.4.mlp.fc1.weight [768, 3072] True
+encoder.layers.4.mlp.fc1.bias [3072] True
+encoder.layers.4.mlp.fc2.weight [3072, 768] True
+encoder.layers.4.mlp.fc2.bias [768] True
+encoder.layers.5.attn_norm.weight [768] True
+encoder.layers.5.attn_norm.bias [768] True
+encoder.layers.5.attn.qkv.weight [768, 2304] True
+encoder.layers.5.attn.qkv.bias [2304] True
+encoder.layers.5.attn.out.weight [768, 768] True
+encoder.layers.5.attn.out.bias [768] True
+encoder.layers.5.mlp_norm.weight [768] True
+encoder.layers.5.mlp_norm.bias [768] True
+encoder.layers.5.mlp.fc1.weight [768, 3072] True
+encoder.layers.5.mlp.fc1.bias [3072] True
+encoder.layers.5.mlp.fc2.weight [3072, 768] True
+encoder.layers.5.mlp.fc2.bias [768] True
+encoder.layers.6.attn_norm.weight [768] True
+encoder.layers.6.attn_norm.bias [768] True
+encoder.layers.6.attn.qkv.weight [768, 2304] True
+encoder.layers.6.attn.qkv.bias [2304] True
+encoder.layers.6.attn.out.weight [768, 768] True
+encoder.layers.6.attn.out.bias [768] True
+encoder.layers.6.mlp_norm.weight [768] True
+encoder.layers.6.mlp_norm.bias [768] True
+encoder.layers.6.mlp.fc1.weight [768, 3072] True
+encoder.layers.6.mlp.fc1.bias [3072] True
+encoder.layers.6.mlp.fc2.weight [3072, 768] True
+encoder.layers.6.mlp.fc2.bias [768] True
+encoder.layers.7.attn_norm.weight [768] True
+encoder.layers.7.attn_norm.bias [768] True
+encoder.layers.7.attn.qkv.weight [768, 2304] True
+encoder.layers.7.attn.qkv.bias [2304] True
+encoder.layers.7.attn.out.weight [768, 768] True
+encoder.layers.7.attn.out.bias [768] True
+encoder.layers.7.mlp_norm.weight [768] True
+encoder.layers.7.mlp_norm.bias [768] True
+encoder.layers.7.mlp.fc1.weight [768, 3072] True
+encoder.layers.7.mlp.fc1.bias [3072] True
+encoder.layers.7.mlp.fc2.weight [3072, 768] True
+encoder.layers.7.mlp.fc2.bias [768] True
+encoder.layers.8.attn_norm.weight [768] True
+encoder.layers.8.attn_norm.bias [768] True
+encoder.layers.8.attn.qkv.weight [768, 2304] True
+encoder.layers.8.attn.qkv.bias [2304] True
+encoder.layers.8.attn.out.weight [768, 768] True
+encoder.layers.8.attn.out.bias [768] True
+INFO 2022-03-22 22:12:02,351 launch_utils.py:321] terminate process group gid:5838
+INFO 2022-03-22 22:12:02,351 launch_utils.py:321] terminate process group gid:5841
+INFO 2022-03-22 22:12:02,351 launch_utils.py:321] terminate process group gid:5844
+INFO 2022-03-22 22:12:02,352 launch_utils.py:321] terminate process group gid:5847
+INFO 2022-03-22 22:12:06,355 launch_utils.py:342] terminate all the procs
+ERROR 2022-03-22 22:12:06,355 launch_utils.py:638] ABORT!!! Out of all 8 trainers, the trainer process with rank=[0, 1, 2, 3] was aborted. Please check its log.
+INFO 2022-03-22 22:12:10,359 launch_utils.py:342] terminate all the procs
+INFO 2022-03-22 22:12:10,359 launch.py:391] Local processes completed.
+encoder.layers.8.mlp_norm.weight [768] True
+encoder.layers.8.mlp_norm.bias [768] True
+encoder.layers.8.mlp.fc1.weight [768, 3072] True
+encoder.layers.8.mlp.fc1.bias [3072] True
+encoder.layers.8.mlp.fc2.weight [3072, 768] True
+encoder.layers.8.mlp.fc2.bias [768] True
+encoder.layers.9.attn_norm.weight [768] True
+encoder.layers.9.attn_norm.bias [768] True
+encoder.layers.9.attn.qkv.weight [768, 2304] True
+encoder.layers.9.attn.qkv.bias [2304] True
+encoder.layers.9.attn.out.weight [768, 768] True
+encoder.layers.9.attn.out.bias [768] True
+encoder.layers.9.mlp_norm.weight [768] True
+encoder.layers.9.mlp_norm.bias [768] True
+encoder.layers.9.mlp.fc1.weight [768, 3072] True
+encoder.layers.9.mlp.fc1.bias [3072] True
+encoder.layers.9.mlp.fc2.weight [3072, 768] True
+encoder.layers.9.mlp.fc2.bias [768] True
+encoder.layers.10.attn_norm.weight [768] True
+encoder.layers.10.attn_norm.bias [768] True
+encoder.layers.10.attn.qkv.weight [768, 2304] True
+encoder.layers.10.attn.qkv.bias [2304] True
+encoder.layers.10.attn.out.weight [768, 768] True
+encoder.layers.10.attn.out.bias [768] True
+encoder.layers.10.mlp_norm.weight [768] True
+encoder.layers.10.mlp_norm.bias [768] True
+encoder.layers.10.mlp.fc1.weight [768, 3072] True
+encoder.layers.10.mlp.fc1.bias [3072] True
+encoder.layers.10.mlp.fc2.weight [3072, 768] True
+encoder.layers.10.mlp.fc2.bias [768] True
+encoder.layers.11.attn_norm.weight [768] True
+encoder.layers.11.attn_norm.bias [768] True
+encoder.layers.11.attn.qkv.weight [768, 2304] True
+encoder.layers.11.attn.qkv.bias [2304] True
+encoder.layers.11.attn.out.weight [768, 768] True
+encoder.layers.11.attn.out.bias [768] True
+encoder.layers.11.mlp_norm.weight [768] True
+encoder.layers.11.mlp_norm.bias [768] True
+encoder.layers.11.mlp.fc1.weight [768, 3072] True
+encoder.layers.11.mlp.fc1.bias [3072] True
+encoder.layers.11.mlp.fc2.weight [3072, 768] True
+encoder.layers.11.mlp.fc2.bias [768] True
+encoder.norm.weight [768] True
+encoder.norm.bias [768] True
+classifier.0.weight [768] False
+classifier.0.bias [768] False
+classifier.0._mean [768] False
+classifier.0._variance [768] False
+classifier.1.weight [768, 1000] False
+classifier.1.bias [1000] False
+server not ready, wait 3 sec to retry...
+not ready endpoints:['127.0.0.1:17571', '127.0.0.1:20006', '127.0.0.1:55374', '127.0.0.1:25328', '127.0.0.1:15064', '127.0.0.1:41881', '127.0.0.1:50174']
+I0322 22:11:54.265852 5826 nccl_context.cc:82] init nccl context nranks: 8 local rank: 0 gpu id: 0 ring id: 0
+I0322 22:11:56.234949 5826 nccl_context.cc:114] init nccl context nranks: 8 local rank: 0 gpu id: 0 ring id: 10
+2022-03-22 22:11:57,826-INFO: [topology.py:169:__init__] HybridParallelInfo: rank_id: 0, mp_degree: 1, sharding_degree: 1, pp_degree: 1, dp_degree: 8, mp_group: [0], sharding_group: [0], pp_group: [0], dp_group: [0, 1, 2, 3, 4, 5, 6, 7], check/clip group: [0]
+2022-03-22 22:11:57,828 MASTER_LOG ----- Total # of train batch (single gpu): 312
+2022-03-22 22:11:57,828 MASTER_LOG ----- Total # of val batch (single gpu): 13
+2022-03-22 22:11:57,829 MASTER_LOG Base lr is scaled to: 6.4
+Traceback (most recent call last):
+ File "main_multi_gpu_linearprobe.py", line 622, in
+ main()
+ File "main_multi_gpu_linearprobe.py", line 618, in main
+ main_worker(config, dataset_train, dataset_val)
+ File "main_multi_gpu_linearprobe.py", line 438, in main_worker
+ assert os.path.isfile(config.MODEL.PRETRAINED) is True
+AssertionError
+WARNING 2022-03-23 10:07:48,191 launch.py:503] Not found distinct arguments and compiled with cuda or xpu or npu. Default use collective mode
+INFO 2022-03-23 10:07:48,193 launch_utils.py:557] Local start 8 processes. First process distributed environment info (Only For Debug):
+ +=======================================================================================+
+ | Distributed Envs Value |
+ +---------------------------------------------------------------------------------------+
+ | PADDLE_TRAINER_ID 0 |
+ | PADDLE_CURRENT_ENDPOINT 127.0.0.1:16322 |
+ | PADDLE_TRAINERS_NUM 8 |
+ | PADDLE_TRAINER_ENDPOINTS ... 0.1:33084,127.0.0.1:60249,127.0.0.1:48028|
+ | PADDLE_RANK_IN_NODE 0 |
+ | PADDLE_LOCAL_DEVICE_IDS 0 |
+ | PADDLE_WORLD_DEVICE_IDS 0,1,2,3,4,5,6,7 |
+ | FLAGS_selected_gpus 0 |
+ | FLAGS_selected_accelerators 0 |
+ +=======================================================================================+
+
+INFO 2022-03-23 10:07:48,193 launch_utils.py:562] details about PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
+----------- Configuration Arguments -----------
+backend: auto
+cluster_topo_path: None
+elastic_pre_hook: None
+elastic_server: None
+enable_auto_mapping: False
+force: False
+gpus: 0,1,2,3,4,5,6,7
+heter_devices:
+heter_worker_num: None
+heter_workers:
+host: None
+http_port: None
+ips: 127.0.0.1
+job_id: None
+log_dir: log
+np: None
+nproc_per_node: None
+rank_mapping_path: None
+run_mode: None
+scale: 0
+server_num: None
+servers:
+training_script: main_multi_gpu_linearprobe.py
+training_script_args: ['-cfg=./configs/vit_base_patch16_224_linearprobe_single_node.yaml', '-dataset=imagenet2012', '-batch_size=512', '-data_path=/dataset/imagenet', '-pretrained=./mae_pretrain_vit_base.pdparams', '-amp']
+worker_num: None
+workers:
+------------------------------------------------
+launch train in GPU mode!
+launch proc_id:6102 idx:0
+launch proc_id:6105 idx:1
+launch proc_id:6108 idx:2
+launch proc_id:6111 idx:3
+launch proc_id:6114 idx:4
+launch proc_id:6117 idx:5
+launch proc_id:6120 idx:6
+launch proc_id:6123 idx:7
+/usr/local/lib/python3.7/site-packages/paddlenlp/transformers/funnel/modeling.py:30: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
+ from collections import Iterable
+Compose(
+
+
+
+
+)
+----- Imagenet2012 train_list.txt len = 1281167
+----- Imagenet2012 val_list.txt len = 50000
+2022-03-23 10:07:54,812 MASTER_LOG ----- world_size = 8, local_rank = 0
+----- AMP: True
+BASE: ['']
+DATA:
+ BATCH_SIZE: 512
+ BATCH_SIZE_EVAL: 512
+ CROP_PCT: 0.875
+ DATASET: imagenet2012
+ DATA_PATH: /dataset/imagenet
+ IMAGENET_MEAN: [0.485, 0.456, 0.406]
+ IMAGENET_STD: [0.229, 0.224, 0.225]
+ IMAGE_CHANNELS: 3
+ IMAGE_SIZE: 224
+ NUM_WORKERS: 2
+EVAL: False
+MODEL:
+ ATTENTION_DROPOUT: 0.0
+ DECODER:
+ DEPTH: 8
+ EMBED_DIM: 512
+ NUM_HEADS: 16
+ DROPOUT: 0.0
+ DROPPATH: 0.0
+ ENCODER:
+ DEPTH: 12
+ EMBED_DIM: 768
+ NUM_HEADS: 12
+ GLOBAL_POOL: False
+ MASK_RATIO: 0.75
+ MLP_RATIO: 4.0
+ NAME: vit_base_patch16_224
+ NORM_PIX_LOSS: True
+ NUM_CLASSES: 1000
+ PATCH_SIZE: 16
+ PRETRAINED: ./mae_pretrain_vit_base.pdparams
+ QKV_BIAS: True
+ RESUME: None
+ TYPE: LINEARPROBE
+REPORT_FREQ: 20
+SAVE: ./output/linearprobe-20220323-10-07
+SAVE_FREQ: 10
+SEED: 0
+TRAIN:
+ ACCUM_ITER: 4
+ AUTO_AUGMENT: False
+ BASE_LR: 0.1
+ COLOR_JITTER: 0.4
+ CUTMIX_ALPHA: 1.0
+ CUTMIX_MINMAX: None
+ END_LR: 0.0
+ GRAD_CLIP: None
+ LAST_EPOCH: 0
+ LAYER_DECAY: None
+ LINEAR_SCALED_LR: 256
+ MIXUP_ALPHA: 0.8
+ MIXUP_MODE: batch
+ MIXUP_PROB: 1.0
+ MIXUP_SWITCH_PROB: 0.5
+ NUM_EPOCHS: 90
+ OPTIMIZER:
+ BETAS: (0.9, 0.95)
+ EPS: 1e-08
+ NAME: LARS
+ RANDOM_ERASE_COUNT: 1
+ RANDOM_ERASE_MODE: pixel
+ RANDOM_ERASE_PROB: 0.25
+ RANDOM_ERASE_SPLIT: False
+ RAND_AUGMENT: True
+ RAND_AUGMENT_LAYERS: 2
+ RAND_AUGMENT_MAGNITUDE: 9
+ SMOOTHING: 0.1
+ WARMUP_EPOCHS: 10
+ WARMUP_START_LR: 0.0
+ WEIGHT_DECAY: 0.0
+VALIDATE_FREQ: 1
+W0323 10:07:54.817029 6102 gpu_context.cc:240] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
+W0323 10:07:54.823293 6102 gpu_context.cc:268] device: 0, cuDNN Version: 7.6.
+encoder_position_embedding [1, 197, 768] True
+cls_token [1, 1, 768] True
+patch_embedding.patch_embedding.weight [768, 3, 16, 16] True
+patch_embedding.patch_embedding.bias [768] True
+encoder.layers.0.attn_norm.weight [768] True
+encoder.layers.0.attn_norm.bias [768] True
+encoder.layers.0.attn.qkv.weight [768, 2304] True
+encoder.layers.0.attn.qkv.bias [2304] True
+encoder.layers.0.attn.out.weight [768, 768] True
+encoder.layers.0.attn.out.bias [768] True
+encoder.layers.0.mlp_norm.weight [768] True
+encoder.layers.0.mlp_norm.bias [768] True
+encoder.layers.0.mlp.fc1.weight [768, 3072] True
+encoder.layers.0.mlp.fc1.bias [3072] True
+encoder.layers.0.mlp.fc2.weight [3072, 768] True
+encoder.layers.0.mlp.fc2.bias [768] True
+encoder.layers.1.attn_norm.weight [768] True
+encoder.layers.1.attn_norm.bias [768] True
+encoder.layers.1.attn.qkv.weight [768, 2304] True
+encoder.layers.1.attn.qkv.bias [2304] True
+encoder.layers.1.attn.out.weight [768, 768] True
+encoder.layers.1.attn.out.bias [768] True
+encoder.layers.1.mlp_norm.weight [768] True
+encoder.layers.1.mlp_norm.bias [768] True
+encoder.layers.1.mlp.fc1.weight [768, 3072] True
+encoder.layers.1.mlp.fc1.bias [3072] True
+encoder.layers.1.mlp.fc2.weight [3072, 768] True
+encoder.layers.1.mlp.fc2.bias [768] True
+encoder.layers.2.attn_norm.weight [768] True
+encoder.layers.2.attn_norm.bias [768] True
+encoder.layers.2.attn.qkv.weight [768, 2304] True
+encoder.layers.2.attn.qkv.bias [2304] True
+encoder.layers.2.attn.out.weight [768, 768] True
+encoder.layers.2.attn.out.bias [768] True
+encoder.layers.2.mlp_norm.weight [768] True
+encoder.layers.2.mlp_norm.bias [768] True
+encoder.layers.2.mlp.fc1.weight [768, 3072] True
+encoder.layers.2.mlp.fc1.bias [3072] True
+encoder.layers.2.mlp.fc2.weight [3072, 768] True
+encoder.layers.2.mlp.fc2.bias [768] True
+encoder.layers.3.attn_norm.weight [768] True
+encoder.layers.3.attn_norm.bias [768] True
+encoder.layers.3.attn.qkv.weight [768, 2304] True
+encoder.layers.3.attn.qkv.bias [2304] True
+encoder.layers.3.attn.out.weight [768, 768] True
+encoder.layers.3.attn.out.bias [768] True
+encoder.layers.3.mlp_norm.weight [768] True
+encoder.layers.3.mlp_norm.bias [768] True
+encoder.layers.3.mlp.fc1.weight [768, 3072] True
+encoder.layers.3.mlp.fc1.bias [3072] True
+encoder.layers.3.mlp.fc2.weight [3072, 768] True
+encoder.layers.3.mlp.fc2.bias [768] True
+encoder.layers.4.attn_norm.weight [768] True
+encoder.layers.4.attn_norm.bias [768] True
+encoder.layers.4.attn.qkv.weight [768, 2304] True
+encoder.layers.4.attn.qkv.bias [2304] True
+encoder.layers.4.attn.out.weight [768, 768] True
+encoder.layers.4.attn.out.bias [768] True
+encoder.layers.4.mlp_norm.weight [768] True
+encoder.layers.4.mlp_norm.bias [768] True
+encoder.layers.4.mlp.fc1.weight [768, 3072] True
+encoder.layers.4.mlp.fc1.bias [3072] True
+encoder.layers.4.mlp.fc2.weight [3072, 768] True
+encoder.layers.4.mlp.fc2.bias [768] True
+encoder.layers.5.attn_norm.weight [768] True
+encoder.layers.5.attn_norm.bias [768] True
+encoder.layers.5.attn.qkv.weight [768, 2304] True
+encoder.layers.5.attn.qkv.bias [2304] True
+encoder.layers.5.attn.out.weight [768, 768] True
+encoder.layers.5.attn.out.bias [768] True
+encoder.layers.5.mlp_norm.weight [768] True
+encoder.layers.5.mlp_norm.bias [768] True
+encoder.layers.5.mlp.fc1.weight [768, 3072] True
+encoder.layers.5.mlp.fc1.bias [3072] True
+encoder.layers.5.mlp.fc2.weight [3072, 768] True
+encoder.layers.5.mlp.fc2.bias [768] True
+encoder.layers.6.attn_norm.weight [768] True
+encoder.layers.6.attn_norm.bias [768] True
+encoder.layers.6.attn.qkv.weight [768, 2304] True
+encoder.layers.6.attn.qkv.bias [2304] True
+encoder.layers.6.attn.out.weight [768, 768] True
+encoder.layers.6.attn.out.bias [768] True
+encoder.layers.6.mlp_norm.weight [768] True
+encoder.layers.6.mlp_norm.bias [768] True
+encoder.layers.6.mlp.fc1.weight [768, 3072] True
+encoder.layers.6.mlp.fc1.bias [3072] True
+encoder.layers.6.mlp.fc2.weight [3072, 768] True
+encoder.layers.6.mlp.fc2.bias [768] True
+encoder.layers.7.attn_norm.weight [768] True
+encoder.layers.7.attn_norm.bias [768] True
+encoder.layers.7.attn.qkv.weight [768, 2304] True
+encoder.layers.7.attn.qkv.bias [2304] True
+encoder.layers.7.attn.out.weight [768, 768] True
+encoder.layers.7.attn.out.bias [768] True
+encoder.layers.7.mlp_norm.weight [768] True
+encoder.layers.7.mlp_norm.bias [768] True
+encoder.layers.7.mlp.fc1.weight [768, 3072] True
+encoder.layers.7.mlp.fc1.bias [3072] True
+encoder.layers.7.mlp.fc2.weight [3072, 768] True
+encoder.layers.7.mlp.fc2.bias [768] True
+encoder.layers.8.attn_norm.weight [768] True
+encoder.layers.8.attn_norm.bias [768] True
+encoder.layers.8.attn.qkv.weight [768, 2304] True
+encoder.layers.8.attn.qkv.bias [2304] True
+encoder.layers.8.attn.out.weight [768, 768] True
+encoder.layers.8.attn.out.bias [768] True
+INFO 2022-03-23 10:21:40,155 launch_utils.py:321] terminate process group gid:6102
+INFO 2022-03-23 10:21:40,155 launch_utils.py:321] terminate process group gid:6117
+INFO 2022-03-23 10:21:44,159 launch_utils.py:342] terminate all the procs
+ERROR 2022-03-23 10:21:44,159 launch_utils.py:638] ABORT!!! Out of all 8 trainers, the trainer process with rank=[1, 2, 3, 4, 6, 7] was aborted. Please check its log.
+INFO 2022-03-23 10:21:48,163 launch_utils.py:342] terminate all the procs
+INFO 2022-03-23 10:21:48,163 launch.py:391] Local processes completed.
+encoder.layers.8.mlp_norm.weight [768] True
+encoder.layers.8.mlp_norm.bias [768] True
+encoder.layers.8.mlp.fc1.weight [768, 3072] True
+encoder.layers.8.mlp.fc1.bias [3072] True
+encoder.layers.8.mlp.fc2.weight [3072, 768] True
+encoder.layers.8.mlp.fc2.bias [768] True
+encoder.layers.9.attn_norm.weight [768] True
+encoder.layers.9.attn_norm.bias [768] True
+encoder.layers.9.attn.qkv.weight [768, 2304] True
+encoder.layers.9.attn.qkv.bias [2304] True
+encoder.layers.9.attn.out.weight [768, 768] True
+encoder.layers.9.attn.out.bias [768] True
+encoder.layers.9.mlp_norm.weight [768] True
+encoder.layers.9.mlp_norm.bias [768] True
+encoder.layers.9.mlp.fc1.weight [768, 3072] True
+encoder.layers.9.mlp.fc1.bias [3072] True
+encoder.layers.9.mlp.fc2.weight [3072, 768] True
+encoder.layers.9.mlp.fc2.bias [768] True
+encoder.layers.10.attn_norm.weight [768] True
+encoder.layers.10.attn_norm.bias [768] True
+encoder.layers.10.attn.qkv.weight [768, 2304] True
+encoder.layers.10.attn.qkv.bias [2304] True
+encoder.layers.10.attn.out.weight [768, 768] True
+encoder.layers.10.attn.out.bias [768] True
+encoder.layers.10.mlp_norm.weight [768] True
+encoder.layers.10.mlp_norm.bias [768] True
+encoder.layers.10.mlp.fc1.weight [768, 3072] True
+encoder.layers.10.mlp.fc1.bias [3072] True
+encoder.layers.10.mlp.fc2.weight [3072, 768] True
+encoder.layers.10.mlp.fc2.bias [768] True
+encoder.layers.11.attn_norm.weight [768] True
+encoder.layers.11.attn_norm.bias [768] True
+encoder.layers.11.attn.qkv.weight [768, 2304] True
+encoder.layers.11.attn.qkv.bias [2304] True
+encoder.layers.11.attn.out.weight [768, 768] True
+encoder.layers.11.attn.out.bias [768] True
+encoder.layers.11.mlp_norm.weight [768] True
+encoder.layers.11.mlp_norm.bias [768] True
+encoder.layers.11.mlp.fc1.weight [768, 3072] True
+encoder.layers.11.mlp.fc1.bias [3072] True
+encoder.layers.11.mlp.fc2.weight [3072, 768] True
+encoder.layers.11.mlp.fc2.bias [768] True
+encoder.norm.weight [768] True
+encoder.norm.bias [768] True
+classifier.0.weight [768] False
+classifier.0.bias [768] False
+classifier.0._mean [768] False
+classifier.0._variance [768] False
+classifier.1.weight [768, 1000] False
+classifier.1.bias [1000] False
+server not ready, wait 3 sec to retry...
+not ready endpoints:['127.0.0.1:24455', '127.0.0.1:49357', '127.0.0.1:29615', '127.0.0.1:17426', '127.0.0.1:33084', '127.0.0.1:60249', '127.0.0.1:48028']
+I0323 10:08:07.629041 6102 nccl_context.cc:82] init nccl context nranks: 8 local rank: 0 gpu id: 0 ring id: 0
+I0323 10:08:09.865425 6102 nccl_context.cc:114] init nccl context nranks: 8 local rank: 0 gpu id: 0 ring id: 10
+2022-03-23 10:08:11,574-INFO: [topology.py:169:__init__] HybridParallelInfo: rank_id: 0, mp_degree: 1, sharding_degree: 1, pp_degree: 1, dp_degree: 8, mp_group: [0], sharding_group: [0], pp_group: [0], dp_group: [0, 1, 2, 3, 4, 5, 6, 7], check/clip group: [0]
+2022-03-23 10:08:11,575 MASTER_LOG ----- Total # of train batch (single gpu): 312
+2022-03-23 10:08:11,576 MASTER_LOG ----- Total # of val batch (single gpu): 13
+2022-03-23 10:08:11,576 MASTER_LOG Base lr is scaled to: 6.4
+2022-03-23 10:08:12,843 MASTER_LOG ----- Pretrained: Load model state from ./mae_pretrain_vit_base.pdparams
+2022-03-23 10:08:12,880 MASTER_LOG ----- Start training from epoch 1.
+2022-03-23 10:08:12,880 MASTER_LOG Train epoch 1. LR=6.400000e-01
+2022-03-23 10:08:20,797 MASTER_LOG Epoch[001/090], Step[0000/0312], Lr: 0.000000e+00, Loss: 6.9421 (6.9421), Avg Acc: 0.0010
+2022-03-23 10:09:09,470 MASTER_LOG Epoch[001/090], Step[0020/0312], Lr: 4.102564e-02, Loss: 6.8971 (6.9324), Avg Acc: 0.0010
+2022-03-23 10:09:58,105 MASTER_LOG Epoch[001/090], Step[0040/0312], Lr: 8.205128e-02, Loss: 6.6101 (6.8617), Avg Acc: 0.0052
+2022-03-23 10:10:45,402 MASTER_LOG Epoch[001/090], Step[0060/0312], Lr: 1.230769e-01, Loss: 6.0317 (6.7012), Avg Acc: 0.0279
+2022-03-23 10:11:32,415 MASTER_LOG Epoch[001/090], Step[0080/0312], Lr: 1.641026e-01, Loss: 5.3711 (6.4640), Avg Acc: 0.0567
+2022-03-23 10:12:20,143 MASTER_LOG Epoch[001/090], Step[0100/0312], Lr: 2.051282e-01, Loss: 4.7358 (6.1868), Avg Acc: 0.0836
+2022-03-23 10:13:08,619 MASTER_LOG Epoch[001/090], Step[0120/0312], Lr: 2.461538e-01, Loss: 4.1960 (5.9053), Avg Acc: 0.1104
+2022-03-23 10:13:55,861 MASTER_LOG Epoch[001/090], Step[0140/0312], Lr: 2.871795e-01, Loss: 3.8316 (5.6387), Avg Acc: 0.1366
+2022-03-23 10:14:44,284 MASTER_LOG Epoch[001/090], Step[0160/0312], Lr: 3.282051e-01, Loss: 3.5256 (5.3938), Avg Acc: 0.1613
+2022-03-23 10:15:32,316 MASTER_LOG Epoch[001/090], Step[0180/0312], Lr: 3.692308e-01, Loss: 3.2858 (5.1746), Avg Acc: 0.1840
+2022-03-23 10:16:21,607 MASTER_LOG Epoch[001/090], Step[0200/0312], Lr: 4.102564e-01, Loss: 3.0855 (4.9790), Avg Acc: 0.2046
+2022-03-23 10:17:09,678 MASTER_LOG Epoch[001/090], Step[0220/0312], Lr: 4.512821e-01, Loss: 2.9397 (4.8059), Avg Acc: 0.2231
+2022-03-23 10:17:57,601 MASTER_LOG Epoch[001/090], Step[0240/0312], Lr: 4.923077e-01, Loss: 2.8253 (4.6502), Avg Acc: 0.2400
+2022-03-23 10:18:45,861 MASTER_LOG Epoch[001/090], Step[0260/0312], Lr: 5.333333e-01, Loss: 2.7706 (4.5104), Avg Acc: 0.2556
+2022-03-23 10:19:33,782 MASTER_LOG Epoch[001/090], Step[0280/0312], Lr: 5.743590e-01, Loss: 2.7140 (4.3843), Avg Acc: 0.2697
+2022-03-23 10:20:22,668 MASTER_LOG Epoch[001/090], Step[0300/0312], Lr: 6.153846e-01, Loss: 2.6050 (4.2708), Avg Acc: 0.2826
+2022-03-23 10:20:47,517 MASTER_LOG Epoch[001/090], Step[0311/0312], Lr: 6.317949e-01, Loss: 2.6399 (4.2128), Avg Acc: 0.2891
+2022-03-23 10:20:49,805 MASTER_LOG ----- Epoch[001/090], Lr: 6.317949e-01, time: 756.92Train Loss: 4.2128, Train Acc: 0.2891
+2022-03-23 10:20:49,805 MASTER_LOG ----- Validation after Epoch: 1
+2022-03-23 10:20:59,660 MASTER_LOG Step[0000/0013], Avg Loss: 1.7821, Avg Acc@1: 0.5928, Avg Acc@5: 0.8276
+Traceback (most recent call last):
+ File "main_multi_gpu_linearprobe.py", line 622, in
+ main()
+ File "main_multi_gpu_linearprobe.py", line 618, in main
+ main_worker(config, dataset_train, dataset_val)
+ File "main_multi_gpu_linearprobe.py", line 569, in main_worker
+ master_logger=master_logger)
+ File "", line 2, in validate
+ File "/usr/local/lib/python3.7/site-packages/paddle/fluid/dygraph/base.py", line 351, in _decorate_function
+ return func(*args, **kwargs)
+ File "main_multi_gpu_linearprobe.py", line 272, in validate
+ paddle.distrtibuted.barrier()
+AttributeError: module 'paddle' has no attribute 'distrtibuted'
+WARNING 2022-03-23 16:59:15,049 launch.py:503] Not found distinct arguments and compiled with cuda or xpu or npu. Default use collective mode
+INFO 2022-03-23 16:59:15,051 launch_utils.py:557] Local start 8 processes. First process distributed environment info (Only For Debug):
+ +=======================================================================================+
+ | Distributed Envs Value |
+ +---------------------------------------------------------------------------------------+
+ | PADDLE_TRAINER_ID 0 |
+ | PADDLE_CURRENT_ENDPOINT 127.0.0.1:43622 |
+ | PADDLE_TRAINERS_NUM 8 |
+ | PADDLE_TRAINER_ENDPOINTS ... 0.1:10870,127.0.0.1:15574,127.0.0.1:32888|
+ | PADDLE_RANK_IN_NODE 0 |
+ | PADDLE_LOCAL_DEVICE_IDS 0 |
+ | PADDLE_WORLD_DEVICE_IDS 0,1,2,3,4,5,6,7 |
+ | FLAGS_selected_gpus 0 |
+ | FLAGS_selected_accelerators 0 |
+ +=======================================================================================+
+
+INFO 2022-03-23 16:59:15,052 launch_utils.py:562] details about PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
+INFO 2022-03-23 16:59:22,188 launch_utils.py:342] terminate all the procs
+ERROR 2022-03-23 16:59:22,188 launch_utils.py:638] ABORT!!! Out of all 8 trainers, the trainer process with rank=[0, 1, 2, 3, 4, 5, 6, 7] was aborted. Please check its log.
+INFO 2022-03-23 16:59:26,192 launch_utils.py:342] terminate all the procs
+INFO 2022-03-23 16:59:26,192 launch.py:391] Local processes completed.
+----------- Configuration Arguments -----------
+backend: auto
+cluster_topo_path: None
+elastic_pre_hook: None
+elastic_server: None
+enable_auto_mapping: False
+force: False
+gpus: 0,1,2,3,4,5,6,7
+heter_devices:
+heter_worker_num: None
+heter_workers:
+host: None
+http_port: None
+ips: 127.0.0.1
+job_id: None
+log_dir: log
+np: None
+nproc_per_node: None
+rank_mapping_path: None
+run_mode: None
+scale: 0
+server_num: None
+servers:
+training_script: main_multi_gpu_linearprobe.py
+training_script_args: ['-cfg=./configs/vit_base_patch16_224_linearprobe_single_node.yaml', '-dataset=imagenet2012', '-batch_size=512', '-data_path=/dataset/imagenet', '-pretrained=./mae_pretrain_vit_base.pdparams', '-amp']
+worker_num: None
+workers:
+------------------------------------------------
+launch train in GPU mode!
+launch proc_id:6736 idx:0
+launch proc_id:6739 idx:1
+launch proc_id:6742 idx:2
+launch proc_id:6745 idx:3
+launch proc_id:6748 idx:4
+launch proc_id:6751 idx:5
+launch proc_id:6754 idx:6
+launch proc_id:6757 idx:7
+Traceback (most recent call last):
+ File "main_multi_gpu_linearprobe.py", line 42, in
+ from transformer import build_transformer as build_model
+ File "/workspace/ppvit_github/PaddleViT_Train/PaddleViT/image_classification/paddlecloud/MAE_gitlab/MAE_paddle/transformer.py", line 748
+ bias_attr = paddle.ParamAttr(initializer=paddle.nn.initializer.Constant(0))
+ ^
+SyntaxError: invalid syntax
+WARNING 2022-03-23 17:10:58,917 launch.py:503] Not found distinct arguments and compiled with cuda or xpu or npu. Default use collective mode
+INFO 2022-03-23 17:10:58,919 launch_utils.py:557] Local start 8 processes. First process distributed environment info (Only For Debug):
+ +=======================================================================================+
+ | Distributed Envs Value |
+ +---------------------------------------------------------------------------------------+
+ | PADDLE_TRAINER_ID 0 |
+ | PADDLE_CURRENT_ENDPOINT 127.0.0.1:25091 |
+ | PADDLE_TRAINERS_NUM 8 |
+ | PADDLE_TRAINER_ENDPOINTS ... 0.1:56764,127.0.0.1:55133,127.0.0.1:16222|
+ | PADDLE_RANK_IN_NODE 0 |
+ | PADDLE_LOCAL_DEVICE_IDS 0 |
+ | PADDLE_WORLD_DEVICE_IDS 0,1,2,3,4,5,6,7 |
+ | FLAGS_selected_gpus 0 |
+ | FLAGS_selected_accelerators 0 |
+ +=======================================================================================+
+
+INFO 2022-03-23 17:10:58,919 launch_utils.py:562] details about PADDLE_TRAINER_ENDPOINTS can be found in log/endpoints.log, and detail running logs maybe found in log/workerlog.0
+----------- Configuration Arguments -----------
+backend: auto
+cluster_topo_path: None
+elastic_pre_hook: None
+elastic_server: None
+enable_auto_mapping: False
+force: False
+gpus: 0,1,2,3,4,5,6,7
+heter_devices:
+heter_worker_num: None
+heter_workers:
+host: None
+http_port: None
+ips: 127.0.0.1
+job_id: None
+log_dir: log
+np: None
+nproc_per_node: None
+rank_mapping_path: None
+run_mode: None
+scale: 0
+server_num: None
+servers:
+training_script: main_multi_gpu_linearprobe.py
+training_script_args: ['-cfg=./configs/vit_base_patch16_224_linearprobe_single_node.yaml', '-dataset=imagenet2012', '-batch_size=512', '-data_path=/dataset/imagenet', '-pretrained=./mae_pretrain_vit_base.pdparams', '-amp']
+worker_num: None
+workers:
+------------------------------------------------
+launch train in GPU mode!
+launch proc_id:6911 idx:0
+launch proc_id:6914 idx:1
+launch proc_id:6917 idx:2
+launch proc_id:6920 idx:3
+launch proc_id:6923 idx:4
+launch proc_id:6926 idx:5
+launch proc_id:6929 idx:6
+launch proc_id:6932 idx:7
+/usr/local/lib/python3.7/site-packages/paddlenlp/transformers/funnel/modeling.py:30: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
+ from collections import Iterable
+Compose(
+
+
+
+
+)
+----- Imagenet2012 train_list.txt len = 1281167
+----- Imagenet2012 val_list.txt len = 50000
+2022-03-23 17:11:06,114 MASTER_LOG ----- world_size = 8, local_rank = 0
+----- AMP: True
+BASE: ['']
+DATA:
+ BATCH_SIZE: 512
+ BATCH_SIZE_EVAL: 512
+ CROP_PCT: 0.875
+ DATASET: imagenet2012
+ DATA_PATH: /dataset/imagenet
+ IMAGENET_MEAN: [0.485, 0.456, 0.406]
+ IMAGENET_STD: [0.229, 0.224, 0.225]
+ IMAGE_CHANNELS: 3
+ IMAGE_SIZE: 224
+ NUM_WORKERS: 2
+EVAL: False
+MODEL:
+ ATTENTION_DROPOUT: 0.0
+ DECODER:
+ DEPTH: 8
+ EMBED_DIM: 512
+ NUM_HEADS: 16
+ DROPOUT: 0.0
+ DROPPATH: 0.0
+ ENCODER:
+ DEPTH: 12
+ EMBED_DIM: 768
+ NUM_HEADS: 12
+ GLOBAL_POOL: False
+ MASK_RATIO: 0.75
+ MLP_RATIO: 4.0
+ NAME: vit_base_patch16_224
+ NORM_PIX_LOSS: True
+ NUM_CLASSES: 1000
+ PATCH_SIZE: 16
+ PRETRAINED: ./mae_pretrain_vit_base.pdparams
+ QKV_BIAS: True
+ RESUME: None
+ TYPE: LINEARPROBE
+REPORT_FREQ: 20
+SAVE: ./output/linearprobe-20220323-17-11
+SAVE_FREQ: 10
+SEED: 0
+TRAIN:
+ ACCUM_ITER: 4
+ AUTO_AUGMENT: False
+ BASE_LR: 0.1
+ COLOR_JITTER: 0.4
+ CUTMIX_ALPHA: 1.0
+ CUTMIX_MINMAX: None
+ END_LR: 0.0
+ GRAD_CLIP: None
+ LAST_EPOCH: 0
+ LAYER_DECAY: None
+ LINEAR_SCALED_LR: 256
+ MIXUP_ALPHA: 0.8
+ MIXUP_MODE: batch
+ MIXUP_PROB: 1.0
+ MIXUP_SWITCH_PROB: 0.5
+ NUM_EPOCHS: 90
+ OPTIMIZER:
+ BETAS: (0.9, 0.95)
+ EPS: 1e-08
+ NAME: LARS
+ RANDOM_ERASE_COUNT: 1
+ RANDOM_ERASE_MODE: pixel
+ RANDOM_ERASE_PROB: 0.25
+ RANDOM_ERASE_SPLIT: False
+ RAND_AUGMENT: True
+ RAND_AUGMENT_LAYERS: 2
+ RAND_AUGMENT_MAGNITUDE: 9
+ SMOOTHING: 0.1
+ WARMUP_EPOCHS: 10
+ WARMUP_START_LR: 0.0
+ WEIGHT_DECAY: 0.0
+VALIDATE_FREQ: 1
+W0323 17:11:06.116573 6911 gpu_context.cc:240] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 10.2, Runtime API Version: 10.2
+W0323 17:11:06.121726 6911 gpu_context.cc:268] device: 0, cuDNN Version: 7.6.
+encoder_position_embedding [1, 197, 768] True
+cls_token [1, 1, 768] True
+patch_embedding.patch_embedding.weight [768, 3, 16, 16] True
+patch_embedding.patch_embedding.bias [768] True
+encoder.layers.0.attn_norm.weight [768] True
+encoder.layers.0.attn_norm.bias [768] True
+encoder.layers.0.attn.qkv.weight [768, 2304] True
+encoder.layers.0.attn.qkv.bias [2304] True
+encoder.layers.0.attn.out.weight [768, 768] True
+encoder.layers.0.attn.out.bias [768] True
+encoder.layers.0.mlp_norm.weight [768] True
+encoder.layers.0.mlp_norm.bias [768] True
+encoder.layers.0.mlp.fc1.weight [768, 3072] True
+encoder.layers.0.mlp.fc1.bias [3072] True
+encoder.layers.0.mlp.fc2.weight [3072, 768] True
+encoder.layers.0.mlp.fc2.bias [768] True
+encoder.layers.1.attn_norm.weight [768] True
+encoder.layers.1.attn_norm.bias [768] True
+encoder.layers.1.attn.qkv.weight [768, 2304] True
+encoder.layers.1.attn.qkv.bias [2304] True
+encoder.layers.1.attn.out.weight [768, 768] True
+encoder.layers.1.attn.out.bias [768] True
+encoder.layers.1.mlp_norm.weight [768] True
+encoder.layers.1.mlp_norm.bias [768] True
+encoder.layers.1.mlp.fc1.weight [768, 3072] True
+encoder.layers.1.mlp.fc1.bias [3072] True
+encoder.layers.1.mlp.fc2.weight [3072, 768] True
+encoder.layers.1.mlp.fc2.bias [768] True
+encoder.layers.2.attn_norm.weight [768] True
+encoder.layers.2.attn_norm.bias [768] True
+encoder.layers.2.attn.qkv.weight [768, 2304] True
+encoder.layers.2.attn.qkv.bias [2304] True
+encoder.layers.2.attn.out.weight [768, 768] True
+encoder.layers.2.attn.out.bias [768] True
+encoder.layers.2.mlp_norm.weight [768] True
+encoder.layers.2.mlp_norm.bias [768] True
+encoder.layers.2.mlp.fc1.weight [768, 3072] True
+encoder.layers.2.mlp.fc1.bias [3072] True
+encoder.layers.2.mlp.fc2.weight [3072, 768] True
+encoder.layers.2.mlp.fc2.bias [768] True
+encoder.layers.3.attn_norm.weight [768] True
+encoder.layers.3.attn_norm.bias [768] True
+encoder.layers.3.attn.qkv.weight [768, 2304] True
+encoder.layers.3.attn.qkv.bias [2304] True
+encoder.layers.3.attn.out.weight [768, 768] True
+encoder.layers.3.attn.out.bias [768] True
+encoder.layers.3.mlp_norm.weight [768] True
+encoder.layers.3.mlp_norm.bias [768] True
+encoder.layers.3.mlp.fc1.weight [768, 3072] True
+encoder.layers.3.mlp.fc1.bias [3072] True
+encoder.layers.3.mlp.fc2.weight [3072, 768] True
+encoder.layers.3.mlp.fc2.bias [768] True
+encoder.layers.4.attn_norm.weight [768] True
+encoder.layers.4.attn_norm.bias [768] True
+encoder.layers.4.attn.qkv.weight [768, 2304] True
+encoder.layers.4.attn.qkv.bias [2304] True
+encoder.layers.4.attn.out.weight [768, 768] True
+encoder.layers.4.attn.out.bias [768] True
+encoder.layers.4.mlp_norm.weight [768] True
+encoder.layers.4.mlp_norm.bias [768] True
+encoder.layers.4.mlp.fc1.weight [768, 3072] True
+encoder.layers.4.mlp.fc1.bias [3072] True
+encoder.layers.4.mlp.fc2.weight [3072, 768] True
+encoder.layers.4.mlp.fc2.bias [768] True
+encoder.layers.5.attn_norm.weight [768] True
+encoder.layers.5.attn_norm.bias [768] True
+encoder.layers.5.attn.qkv.weight [768, 2304] True
+encoder.layers.5.attn.qkv.bias [2304] True
+encoder.layers.5.attn.out.weight [768, 768] True
+encoder.layers.5.attn.out.bias [768] True
+encoder.layers.5.mlp_norm.weight [768] True
+encoder.layers.5.mlp_norm.bias [768] True
+encoder.layers.5.mlp.fc1.weight [768, 3072] True
+encoder.layers.5.mlp.fc1.bias [3072] True
+encoder.layers.5.mlp.fc2.weight [3072, 768] True
+encoder.layers.5.mlp.fc2.bias [768] True
+encoder.layers.6.attn_norm.weight [768] True
+encoder.layers.6.attn_norm.bias [768] True
+encoder.layers.6.attn.qkv.weight [768, 2304] True
+encoder.layers.6.attn.qkv.bias [2304] True
+encoder.layers.6.attn.out.weight [768, 768] True
+encoder.layers.6.attn.out.bias [768] True
+encoder.layers.6.mlp_norm.weight [768] True
+encoder.layers.6.mlp_norm.bias [768] True
+encoder.layers.6.mlp.fc1.weight [768, 3072] True
+encoder.layers.6.mlp.fc1.bias [3072] True
+encoder.layers.6.mlp.fc2.weight [3072, 768] True
+encoder.layers.6.mlp.fc2.bias [768] True
+encoder.layers.7.attn_norm.weight [768] True
+encoder.layers.7.attn_norm.bias [768] True
+encoder.layers.7.attn.qkv.weight [768, 2304] True
+encoder.layers.7.attn.qkv.bias [2304] True
+encoder.layers.7.attn.out.weight [768, 768] True
+encoder.layers.7.attn.out.bias [768] True
+encoder.layers.7.mlp_norm.weight [768] True
+encoder.layers.7.mlp_norm.bias [768] True
+encoder.layers.7.mlp.fc1.weight [768, 3072] True
+encoder.layers.7.mlp.fc1.bias [3072] True
+encoder.layers.7.mlp.fc2.weight [3072, 768] True
+encoder.layers.7.mlp.fc2.bias [768] True
+encoder.layers.8.attn_norm.weight [768] True
+encoder.layers.8.attn_norm.bias [768] True
+encoder.layers.8.attn.qkv.weight [768, 2304] True
+encoder.layers.8.attn.qkv.bias [2304] True
+encoder.layers.8.attn.out.weight [768, 768] True
+encoder.layers.8.attn.out.bias [768] True
+encoder.layers.8.mlp_norm.weight [768] True
+encoder.layers.8.mlp_norm.bias [768] True
+encoder.layers.8.mlp.fc1.weight [768, 3072] True
+encoder.layers.8.mlp.fc1.bias [3072] True
+encoder.layers.8.mlp.fc2.weight [3072, 768] True
+encoder.layers.8.mlp.fc2.bias [768] True
+encoder.layers.9.attn_norm.weight [768] True
+encoder.layers.9.attn_norm.bias [768] True
+encoder.layers.9.attn.qkv.weight [768, 2304] True
+encoder.layers.9.attn.qkv.bias [2304] True
+encoder.layers.9.attn.out.weight [768, 768] True
+encoder.layers.9.attn.out.bias [768] True
+encoder.layers.9.mlp_norm.weight [768] True
+encoder.layers.9.mlp_norm.bias [768] True
+encoder.layers.9.mlp.fc1.weight [768, 3072] True
+encoder.layers.9.mlp.fc1.bias [3072] True
+encoder.layers.9.mlp.fc2.weight [3072, 768] True
+encoder.layers.9.mlp.fc2.bias [768] True
+encoder.layers.10.attn_norm.weight [768] True
+encoder.layers.10.attn_norm.bias [768] True
+encoder.layers.10.attn.qkv.weight [768, 2304] True
+encoder.layers.10.attn.qkv.bias [2304] True
+encoder.layers.10.attn.out.weight [768, 768] True
+encoder.layers.10.attn.out.bias [768] True
+encoder.layers.10.mlp_norm.weight [768] True
+encoder.layers.10.mlp_norm.bias [768] True
+encoder.layers.10.mlp.fc1.weight [768, 3072] True
+encoder.layers.10.mlp.fc1.bias [3072] True
+encoder.layers.10.mlp.fc2.weight [3072, 768] True
+encoder.layers.10.mlp.fc2.bias [768] True
+encoder.layers.11.attn_norm.weight [768] True
+encoder.layers.11.attn_norm.bias [768] True
+encoder.layers.11.attn.qkv.weight [768, 2304] True
+encoder.layers.11.attn.qkv.bias [2304] True
+encoder.layers.11.attn.out.weight [768, 768] True
+encoder.layers.11.attn.out.bias [768] True
+encoder.layers.11.mlp_norm.weight [768] True
+encoder.layers.11.mlp_norm.bias [768] True
+encoder.layers.11.mlp.fc1.weight [768, 3072] True
+encoder.layers.11.mlp.fc1.bias [3072] True
+encoder.layers.11.mlp.fc2.weight [3072, 768] True
+encoder.layers.11.mlp.fc2.bias [768] True
+encoder.norm.weight [768] True
+encoder.norm.bias [768] True
+classifier.0.weight [768] False
+classifier.0.bias [768] False
+classifier.0._mean [768] False
+classifier.0._variance [768] False
+classifier.1.weight [768, 1000] False
+classifier.1.bias [1000] False
+server not ready, wait 3 sec to retry...
+not ready endpoints:['127.0.0.1:56293', '127.0.0.1:32775', '127.0.0.1:34540', '127.0.0.1:39983', '127.0.0.1:56764', '127.0.0.1:16222']
+I0323 17:11:19.512193 6911 nccl_context.cc:82] init nccl context nranks: 8 local rank: 0 gpu id: 0 ring id: 0
+I0323 17:11:22.297502 6911 nccl_context.cc:114] init nccl context nranks: 8 local rank: 0 gpu id: 0 ring id: 10
+2022-03-23 17:11:23,813-INFO: [topology.py:169:__init__] HybridParallelInfo: rank_id: 0, mp_degree: 1, sharding_degree: 1, pp_degree: 1, dp_degree: 8, mp_group: [0], sharding_group: [0], pp_group: [0], dp_group: [0, 1, 2, 3, 4, 5, 6, 7], check/clip group: [0]
+2022-03-23 17:11:23,815 MASTER_LOG ----- Total # of train batch (single gpu): 312
+2022-03-23 17:11:23,816 MASTER_LOG ----- Total # of val batch (single gpu): 13
+2022-03-23 17:11:23,816 MASTER_LOG Base lr is scaled to: 6.4
+2022-03-23 17:11:24,992 MASTER_LOG ----- Pretrained: Load model state from ./mae_pretrain_vit_base.pdparams
+2022-03-23 17:11:25,028 MASTER_LOG ----- Start training from epoch 1.
+2022-03-23 17:11:25,028 MASTER_LOG Train epoch 1. LR=6.400000e-01
+2022-03-23 17:11:33,357 MASTER_LOG Epoch[001/090], Step[0000/0312], Lr: 0.000000e+00, Loss: 6.9446 (6.9446), Avg Acc: 0.0007
+2022-03-23 17:12:20,673 MASTER_LOG Epoch[001/090], Step[0020/0312], Lr: 4.102564e-02, Loss: 6.8917 (6.9327), Avg Acc: 0.0012
+2022-03-23 17:13:07,815 MASTER_LOG Epoch[001/090], Step[0040/0312], Lr: 8.205128e-02, Loss: 6.6096 (6.8621), Avg Acc: 0.0055
+2022-03-23 17:13:56,638 MASTER_LOG Epoch[001/090], Step[0060/0312], Lr: 1.230769e-01, Loss: 6.0556 (6.7020), Avg Acc: 0.0276
+2022-03-23 17:14:44,062 MASTER_LOG Epoch[001/090], Step[0080/0312], Lr: 1.641026e-01, Loss: 5.3570 (6.4639), Avg Acc: 0.0569
+2022-03-23 17:15:31,357 MASTER_LOG Epoch[001/090], Step[0100/0312], Lr: 2.051282e-01, Loss: 4.7307 (6.1849), Avg Acc: 0.0838
+2022-03-23 17:16:18,717 MASTER_LOG Epoch[001/090], Step[0120/0312], Lr: 2.461538e-01, Loss: 4.2049 (5.9027), Avg Acc: 0.1108
+2022-03-23 17:17:07,088 MASTER_LOG Epoch[001/090], Step[0140/0312], Lr: 2.871795e-01, Loss: 3.8519 (5.6359), Avg Acc: 0.1372
+2022-03-23 17:17:52,834 MASTER_LOG Epoch[001/090], Step[0160/0312], Lr: 3.282051e-01, Loss: 3.5104 (5.3910), Avg Acc: 0.1618
+2022-03-23 17:18:40,421 MASTER_LOG Epoch[001/090], Step[0180/0312], Lr: 3.692308e-01, Loss: 3.2756 (5.1712), Avg Acc: 0.1848
+2022-03-23 17:19:27,238 MASTER_LOG Epoch[001/090], Step[0200/0312], Lr: 4.102564e-01, Loss: 3.1019 (4.9761), Avg Acc: 0.2053
+2022-03-23 17:20:15,138 MASTER_LOG Epoch[001/090], Step[0220/0312], Lr: 4.512821e-01, Loss: 2.9507 (4.8022), Avg Acc: 0.2238
+2022-03-23 17:21:03,445 MASTER_LOG Epoch[001/090], Step[0240/0312], Lr: 4.923077e-01, Loss: 2.8121 (4.6464), Avg Acc: 0.2408
+2022-03-23 17:21:51,046 MASTER_LOG Epoch[001/090], Step[0260/0312], Lr: 5.333333e-01, Loss: 2.8340 (4.5066), Avg Acc: 0.2561
+2022-03-23 17:22:38,265 MASTER_LOG Epoch[001/090], Step[0280/0312], Lr: 5.743590e-01, Loss: 2.7239 (4.3801), Avg Acc: 0.2704
+2022-03-23 17:23:25,086 MASTER_LOG Epoch[001/090], Step[0300/0312], Lr: 6.153846e-01, Loss: 2.6461 (4.2663), Avg Acc: 0.2833
+2022-03-23 17:23:52,120 MASTER_LOG Epoch[001/090], Step[0311/0312], Lr: 6.317949e-01, Loss: 2.5715 (4.2076), Avg Acc: 0.2901
+2022-03-23 17:23:53,861 MASTER_LOG ----- Epoch[001/090], Lr: 6.317949e-01, time: 748.83Train Loss: 4.2076, Train Acc: 0.2901
+2022-03-23 17:23:53,861 MASTER_LOG ----- Validation after Epoch: 1
+2022-03-23 17:24:03,513 MASTER_LOG Step[0000/0013], Avg Loss: 1.7718, Avg Acc@1: 0.6011, Avg Acc@5: 0.8276
+2022-03-23 17:24:38,941 MASTER_LOG ----- Epoch[001/090], Validation Loss: 2.3240, Validation Acc@1: 0.5098, Validation Acc@5: 0.7485, time: 45.08
+2022-03-23 17:24:38,941 MASTER_LOG Train epoch 2. LR=6.317949e-01
+2022-03-23 17:24:46,390 MASTER_LOG Epoch[002/090], Step[0000/0312], Lr: 6.400000e-01, Loss: 2.6023 (2.6023), Avg Acc: 0.4768
+2022-03-23 17:25:35,702 MASTER_LOG Epoch[002/090], Step[0020/0312], Lr: 6.810256e-01, Loss: 2.5761 (2.5589), Avg Acc: 0.4805
+2022-03-23 17:26:23,060 MASTER_LOG Epoch[002/090], Step[0040/0312], Lr: 7.220513e-01, Loss: 2.4909 (2.5406), Avg Acc: 0.4835
+2022-03-23 17:27:11,715 MASTER_LOG Epoch[002/090], Step[0060/0312], Lr: 7.630769e-01, Loss: 2.4347 (2.5125), Avg Acc: 0.4874
+2022-03-23 17:27:58,314 MASTER_LOG Epoch[002/090], Step[0080/0312], Lr: 8.041026e-01, Loss: 2.4801 (2.4908), Avg Acc: 0.4906
+2022-03-23 17:28:47,354 MASTER_LOG Epoch[002/090], Step[0100/0312], Lr: 8.451282e-01, Loss: 2.3328 (2.4697), Avg Acc: 0.4936
+2022-03-23 17:29:33,551 MASTER_LOG Epoch[002/090], Step[0120/0312], Lr: 8.861538e-01, Loss: 2.3951 (2.4528), Avg Acc: 0.4959
+2022-03-23 17:30:20,979 MASTER_LOG Epoch[002/090], Step[0140/0312], Lr: 9.271795e-01, Loss: 2.3484 (2.4384), Avg Acc: 0.4980
+2022-03-23 17:31:08,425 MASTER_LOG Epoch[002/090], Step[0160/0312], Lr: 9.682051e-01, Loss: 2.3388 (2.4240), Avg Acc: 0.4999
+2022-03-23 17:31:55,103 MASTER_LOG Epoch[002/090], Step[0180/0312], Lr: 1.009231e+00, Loss: 2.2949 (2.4089), Avg Acc: 0.5016
+2022-03-23 17:32:42,267 MASTER_LOG Epoch[002/090], Step[0200/0312], Lr: 1.050256e+00, Loss: 2.2508 (2.3963), Avg Acc: 0.5033
+2022-03-23 17:33:29,309 MASTER_LOG Epoch[002/090], Step[0220/0312], Lr: 1.091282e+00, Loss: 2.2273 (2.3832), Avg Acc: 0.5050
+2022-03-23 17:34:16,509 MASTER_LOG Epoch[002/090], Step[0240/0312], Lr: 1.132308e+00, Loss: 2.2667 (2.3711), Avg Acc: 0.5066
+2022-03-23 17:35:03,547 MASTER_LOG Epoch[002/090], Step[0260/0312], Lr: 1.173333e+00, Loss: 2.1516 (2.3599), Avg Acc: 0.5081
+2022-03-23 17:35:50,751 MASTER_LOG Epoch[002/090], Step[0280/0312], Lr: 1.214359e+00, Loss: 2.1651 (2.3495), Avg Acc: 0.5097
+2022-03-23 17:36:38,494 MASTER_LOG Epoch[002/090], Step[0300/0312], Lr: 1.255385e+00, Loss: 2.1509 (2.3386), Avg Acc: 0.5112
+2022-03-23 17:37:02,761 MASTER_LOG Epoch[002/090], Step[0311/0312], Lr: 1.271795e+00, Loss: 2.1339 (2.3334), Avg Acc: 0.5119
+2022-03-23 17:37:04,586 MASTER_LOG ----- Epoch[002/090], Lr: 1.271795e+00, time: 744.91Train Loss: 2.3334, Train Acc: 0.5119
+2022-03-23 17:37:04,586 MASTER_LOG ----- Validation after Epoch: 2
+2022-03-23 17:37:14,165 MASTER_LOG Step[0000/0013], Avg Loss: 1.4541, Avg Acc@1: 0.6462, Avg Acc@5: 0.8638
+2022-03-23 17:37:49,147 MASTER_LOG ----- Epoch[002/090], Validation Loss: 1.8723, Validation Acc@1: 0.5792, Validation Acc@5: 0.8077, time: 44.56
+2022-03-23 17:37:49,147 MASTER_LOG Train epoch 3. LR=1.271795e+00
+2022-03-23 17:37:56,819 MASTER_LOG Epoch[003/090], Step[0000/0312], Lr: 1.280000e+00, Loss: 2.1946 (2.1946), Avg Acc: 0.5337
+2022-03-23 17:38:46,531 MASTER_LOG Epoch[003/090], Step[0020/0312], Lr: 1.321026e+00, Loss: 2.1591 (2.1450), Avg Acc: 0.5418
+2022-03-23 17:39:33,529 MASTER_LOG Epoch[003/090], Step[0040/0312], Lr: 1.362051e+00, Loss: 2.1572 (2.1336), Avg Acc: 0.5431
+2022-03-23 17:40:22,091 MASTER_LOG Epoch[003/090], Step[0060/0312], Lr: 1.403077e+00, Loss: 2.1495 (2.1311), Avg Acc: 0.5427
+2022-03-23 17:41:09,779 MASTER_LOG Epoch[003/090], Step[0080/0312], Lr: 1.444103e+00, Loss: 2.1288 (2.1306), Avg Acc: 0.5425
+2022-03-23 17:41:57,564 MASTER_LOG Epoch[003/090], Step[0100/0312], Lr: 1.485128e+00, Loss: 2.1564 (2.1282), Avg Acc: 0.5426
+2022-03-23 17:42:46,372 MASTER_LOG Epoch[003/090], Step[0120/0312], Lr: 1.526154e+00, Loss: 2.0892 (2.1246), Avg Acc: 0.5435
+2022-03-23 17:43:33,219 MASTER_LOG Epoch[003/090], Step[0140/0312], Lr: 1.567179e+00, Loss: 2.0518 (2.1210), Avg Acc: 0.5442
+2022-03-23 17:44:19,415 MASTER_LOG Epoch[003/090], Step[0160/0312], Lr: 1.608205e+00, Loss: 2.0591 (2.1162), Avg Acc: 0.5449
+2022-03-23 17:45:06,776 MASTER_LOG Epoch[003/090], Step[0180/0312], Lr: 1.649231e+00, Loss: 2.0602 (2.1116), Avg Acc: 0.5457
+2022-03-23 17:45:55,760 MASTER_LOG Epoch[003/090], Step[0200/0312], Lr: 1.690256e+00, Loss: 2.1306 (2.1082), Avg Acc: 0.5461
+2022-03-23 17:46:44,146 MASTER_LOG Epoch[003/090], Step[0220/0312], Lr: 1.731282e+00, Loss: 2.0988 (2.1037), Avg Acc: 0.5465
+2022-03-23 17:47:31,329 MASTER_LOG Epoch[003/090], Step[0240/0312], Lr: 1.772308e+00, Loss: 2.0522 (2.1021), Avg Acc: 0.5466
+2022-03-23 17:48:18,243 MASTER_LOG Epoch[003/090], Step[0260/0312], Lr: 1.813333e+00, Loss: 2.0880 (2.0995), Avg Acc: 0.5470
+2022-03-23 17:49:06,664 MASTER_LOG Epoch[003/090], Step[0280/0312], Lr: 1.854359e+00, Loss: 2.0344 (2.0961), Avg Acc: 0.5474
+2022-03-23 17:49:55,323 MASTER_LOG Epoch[003/090], Step[0300/0312], Lr: 1.895385e+00, Loss: 2.0421 (2.0943), Avg Acc: 0.5477
+2022-03-23 17:50:19,965 MASTER_LOG Epoch[003/090], Step[0311/0312], Lr: 1.911795e+00, Loss: 2.1169 (2.0920), Avg Acc: 0.5479
+2022-03-23 17:50:21,824 MASTER_LOG ----- Epoch[003/090], Lr: 1.911795e+00, time: 752.63Train Loss: 2.0920, Train Acc: 0.5479
+2022-03-23 17:50:21,824 MASTER_LOG ----- Validation after Epoch: 3
+2022-03-23 17:50:31,269 MASTER_LOG Step[0000/0013], Avg Loss: 1.3419, Avg Acc@1: 0.6663, Avg Acc@5: 0.8821
+2022-03-23 17:51:05,538 MASTER_LOG ----- Epoch[003/090], Validation Loss: 1.7411, Validation Acc@1: 0.5952, Validation Acc@5: 0.8243, time: 43.71
+2022-03-23 17:51:05,538 MASTER_LOG Train epoch 4. LR=1.911795e+00
+2022-03-23 17:51:13,044 MASTER_LOG Epoch[004/090], Step[0000/0312], Lr: 1.920000e+00, Loss: 2.0631 (2.0631), Avg Acc: 0.5610
+2022-03-23 17:52:02,762 MASTER_LOG Epoch[004/090], Step[0020/0312], Lr: 1.961026e+00, Loss: 1.9861 (2.0117), Avg Acc: 0.5606
+2022-03-23 17:52:50,620 MASTER_LOG Epoch[004/090], Step[0040/0312], Lr: 2.002051e+00, Loss: 2.0450 (2.0117), Avg Acc: 0.5607
+2022-03-23 17:53:36,868 MASTER_LOG Epoch[004/090], Step[0060/0312], Lr: 2.043077e+00, Loss: 2.0140 (2.0095), Avg Acc: 0.5615
+2022-03-23 17:54:23,936 MASTER_LOG Epoch[004/090], Step[0080/0312], Lr: 2.084103e+00, Loss: 2.0372 (2.0074), Avg Acc: 0.5618
+2022-03-23 17:55:12,399 MASTER_LOG Epoch[004/090], Step[0100/0312], Lr: 2.125128e+00, Loss: 1.9818 (2.0083), Avg Acc: 0.5613
+2022-03-23 17:55:58,946 MASTER_LOG Epoch[004/090], Step[0120/0312], Lr: 2.166154e+00, Loss: 1.9951 (2.0078), Avg Acc: 0.5612
+2022-03-23 17:56:46,416 MASTER_LOG Epoch[004/090], Step[0140/0312], Lr: 2.207179e+00, Loss: 2.0044 (2.0089), Avg Acc: 0.5608
+2022-03-23 17:57:33,428 MASTER_LOG Epoch[004/090], Step[0160/0312], Lr: 2.248205e+00, Loss: 2.0099 (2.0087), Avg Acc: 0.5608
+2022-03-23 17:58:20,428 MASTER_LOG Epoch[004/090], Step[0180/0312], Lr: 2.289231e+00, Loss: 2.0586 (2.0084), Avg Acc: 0.5606
+2022-03-23 17:59:07,997 MASTER_LOG Epoch[004/090], Step[0200/0312], Lr: 2.330256e+00, Loss: 2.0418 (2.0088), Avg Acc: 0.5607
+2022-03-23 17:59:55,232 MASTER_LOG Epoch[004/090], Step[0220/0312], Lr: 2.371282e+00, Loss: 1.9856 (2.0089), Avg Acc: 0.5605
+2022-03-23 18:00:42,545 MASTER_LOG Epoch[004/090], Step[0240/0312], Lr: 2.412308e+00, Loss: 1.9600 (2.0096), Avg Acc: 0.5602
+2022-03-23 18:01:29,816 MASTER_LOG Epoch[004/090], Step[0260/0312], Lr: 2.453333e+00, Loss: 2.0218 (2.0077), Avg Acc: 0.5604
+2022-03-23 18:02:16,909 MASTER_LOG Epoch[004/090], Step[0280/0312], Lr: 2.494359e+00, Loss: 1.9706 (2.0078), Avg Acc: 0.5604
+2022-03-23 18:03:03,939 MASTER_LOG Epoch[004/090], Step[0300/0312], Lr: 2.535385e+00, Loss: 2.0153 (2.0068), Avg Acc: 0.5605
+2022-03-23 18:03:28,744 MASTER_LOG Epoch[004/090], Step[0311/0312], Lr: 2.551795e+00, Loss: 2.0599 (2.0073), Avg Acc: 0.5604
+2022-03-23 18:03:30,373 MASTER_LOG ----- Epoch[004/090], Lr: 2.551795e+00, time: 744.83Train Loss: 2.0073, Train Acc: 0.5604
+2022-03-23 18:03:30,373 MASTER_LOG ----- Validation after Epoch: 4
+2022-03-23 18:03:40,089 MASTER_LOG Step[0000/0013], Avg Loss: 1.3449, Avg Acc@1: 0.6704, Avg Acc@5: 0.8762
+2022-03-23 18:04:14,236 MASTER_LOG ----- Epoch[004/090], Validation Loss: 1.6943, Validation Acc@1: 0.6047, Validation Acc@5: 0.8297, time: 43.86
+2022-03-23 18:04:14,236 MASTER_LOG Train epoch 5. LR=2.551795e+00
+2022-03-23 18:04:22,001 MASTER_LOG Epoch[005/090], Step[0000/0312], Lr: 2.560000e+00, Loss: 1.9499 (1.9499), Avg Acc: 0.5603
+2022-03-23 18:05:10,757 MASTER_LOG Epoch[005/090], Step[0020/0312], Lr: 2.601026e+00, Loss: 1.9676 (1.9491), Avg Acc: 0.5711
+2022-03-23 18:05:58,470 MASTER_LOG Epoch[005/090], Step[0040/0312], Lr: 2.642051e+00, Loss: 1.9360 (1.9537), Avg Acc: 0.5702
+2022-03-23 18:06:46,148 MASTER_LOG Epoch[005/090], Step[0060/0312], Lr: 2.683077e+00, Loss: 1.9794 (1.9533), Avg Acc: 0.5703
+2022-03-23 18:07:34,693 MASTER_LOG Epoch[005/090], Step[0080/0312], Lr: 2.724103e+00, Loss: 2.0194 (1.9603), Avg Acc: 0.5692
+2022-03-23 18:08:22,115 MASTER_LOG Epoch[005/090], Step[0100/0312], Lr: 2.765128e+00, Loss: 1.9801 (1.9642), Avg Acc: 0.5684
+2022-03-23 18:09:10,254 MASTER_LOG Epoch[005/090], Step[0120/0312], Lr: 2.806154e+00, Loss: 2.0328 (1.9661), Avg Acc: 0.5678
+2022-03-23 18:09:57,442 MASTER_LOG Epoch[005/090], Step[0140/0312], Lr: 2.847179e+00, Loss: 1.9159 (1.9655), Avg Acc: 0.5679
+2022-03-23 18:10:44,426 MASTER_LOG Epoch[005/090], Step[0160/0312], Lr: 2.888205e+00, Loss: 1.9509 (1.9667), Avg Acc: 0.5675
+2022-03-23 18:11:30,605 MASTER_LOG Epoch[005/090], Step[0180/0312], Lr: 2.929231e+00, Loss: 2.0758 (1.9692), Avg Acc: 0.5670
+2022-03-23 18:12:18,510 MASTER_LOG Epoch[005/090], Step[0200/0312], Lr: 2.970256e+00, Loss: 1.9531 (1.9691), Avg Acc: 0.5668
+2022-03-23 18:13:05,109 MASTER_LOG Epoch[005/090], Step[0220/0312], Lr: 3.011282e+00, Loss: 1.9572 (1.9695), Avg Acc: 0.5667
+2022-03-23 18:13:51,330 MASTER_LOG Epoch[005/090], Step[0240/0312], Lr: 3.052308e+00, Loss: 1.9586 (1.9719), Avg Acc: 0.5662
+2022-03-23 18:14:37,712 MASTER_LOG Epoch[005/090], Step[0260/0312], Lr: 3.093333e+00, Loss: 1.9429 (1.9727), Avg Acc: 0.5660
+2022-03-23 18:15:24,336 MASTER_LOG Epoch[005/090], Step[0280/0312], Lr: 3.134359e+00, Loss: 2.0274 (1.9741), Avg Acc: 0.5658
+2022-03-23 18:16:11,806 MASTER_LOG Epoch[005/090], Step[0300/0312], Lr: 3.175385e+00, Loss: 1.9580 (1.9742), Avg Acc: 0.5658
+2022-03-23 18:16:36,199 MASTER_LOG Epoch[005/090], Step[0311/0312], Lr: 3.191795e+00, Loss: 2.0199 (1.9751), Avg Acc: 0.5656
+2022-03-23 18:16:38,181 MASTER_LOG ----- Epoch[005/090], Lr: 3.191795e+00, time: 742.88Train Loss: 1.9751, Train Acc: 0.5656
+2022-03-23 18:16:38,181 MASTER_LOG ----- Validation after Epoch: 5
+2022-03-23 18:16:47,921 MASTER_LOG Step[0000/0013], Avg Loss: 1.3404, Avg Acc@1: 0.6614, Avg Acc@5: 0.8789
+2022-03-23 18:17:21,776 MASTER_LOG ----- Epoch[005/090], Validation Loss: 1.6810, Validation Acc@1: 0.6075, Validation Acc@5: 0.8324, time: 43.59
+2022-03-23 18:17:21,776 MASTER_LOG Train epoch 6. LR=3.191795e+00
+2022-03-23 18:17:28,904 MASTER_LOG Epoch[006/090], Step[0000/0312], Lr: 3.200000e+00, Loss: 2.0079 (2.0079), Avg Acc: 0.5610
+2022-03-23 18:18:17,204 MASTER_LOG Epoch[006/090], Step[0020/0312], Lr: 3.241026e+00, Loss: 1.8793 (1.9536), Avg Acc: 0.5699
+2022-03-23 18:19:05,063 MASTER_LOG Epoch[006/090], Step[0040/0312], Lr: 3.282051e+00, Loss: 1.9792 (1.9587), Avg Acc: 0.5686
+2022-03-23 18:19:54,814 MASTER_LOG Epoch[006/090], Step[0060/0312], Lr: 3.323077e+00, Loss: 1.9389 (1.9552), Avg Acc: 0.5688
+2022-03-23 18:20:43,388 MASTER_LOG Epoch[006/090], Step[0080/0312], Lr: 3.364103e+00, Loss: 1.9704 (1.9550), Avg Acc: 0.5692
+2022-03-23 18:21:31,471 MASTER_LOG Epoch[006/090], Step[0100/0312], Lr: 3.405128e+00, Loss: 1.9553 (1.9595), Avg Acc: 0.5681
+2022-03-23 18:22:19,837 MASTER_LOG Epoch[006/090], Step[0120/0312], Lr: 3.446154e+00, Loss: 2.0309 (1.9606), Avg Acc: 0.5682
+2022-03-23 18:23:08,896 MASTER_LOG Epoch[006/090], Step[0140/0312], Lr: 3.487179e+00, Loss: 1.8885 (1.9628), Avg Acc: 0.5678
+2022-03-23 18:23:57,965 MASTER_LOG Epoch[006/090], Step[0160/0312], Lr: 3.528205e+00, Loss: 1.9828 (1.9651), Avg Acc: 0.5676
+2022-03-23 18:24:46,842 MASTER_LOG Epoch[006/090], Step[0180/0312], Lr: 3.569231e+00, Loss: 1.9718 (1.9668), Avg Acc: 0.5674
+2022-03-23 18:25:37,370 MASTER_LOG Epoch[006/090], Step[0200/0312], Lr: 3.610256e+00, Loss: 1.9931 (1.9692), Avg Acc: 0.5671
+2022-03-23 18:26:26,272 MASTER_LOG Epoch[006/090], Step[0220/0312], Lr: 3.651282e+00, Loss: 1.9926 (1.9708), Avg Acc: 0.5667
+2022-03-23 18:27:14,380 MASTER_LOG Epoch[006/090], Step[0240/0312], Lr: 3.692308e+00, Loss: 1.9778 (1.9703), Avg Acc: 0.5667
+2022-03-23 18:28:02,335 MASTER_LOG Epoch[006/090], Step[0260/0312], Lr: 3.733333e+00, Loss: 1.9698 (1.9701), Avg Acc: 0.5669
+2022-03-23 18:28:49,586 MASTER_LOG Epoch[006/090], Step[0280/0312], Lr: 3.774359e+00, Loss: 2.0005 (1.9708), Avg Acc: 0.5668
+2022-03-23 18:29:37,070 MASTER_LOG Epoch[006/090], Step[0300/0312], Lr: 3.815385e+00, Loss: 1.9354 (1.9712), Avg Acc: 0.5668
+2022-03-23 18:30:05,555 MASTER_LOG Epoch[006/090], Step[0311/0312], Lr: 3.831795e+00, Loss: 1.9675 (1.9715), Avg Acc: 0.5667
+2022-03-23 18:30:08,052 MASTER_LOG ----- Epoch[006/090], Lr: 3.831795e+00, time: 766.27Train Loss: 1.9715, Train Acc: 0.5667
+2022-03-23 18:30:08,053 MASTER_LOG ----- Validation after Epoch: 6
+2022-03-23 18:30:17,520 MASTER_LOG Step[0000/0013], Avg Loss: 1.3391, Avg Acc@1: 0.6692, Avg Acc@5: 0.8872
+2022-03-23 18:30:52,336 MASTER_LOG ----- Epoch[006/090], Validation Loss: 1.6683, Validation Acc@1: 0.6104, Validation Acc@5: 0.8349, time: 44.28
+2022-03-23 18:30:52,337 MASTER_LOG Train epoch 7. LR=3.831795e+00
+2022-03-23 18:30:59,768 MASTER_LOG Epoch[007/090], Step[0000/0312], Lr: 3.840000e+00, Loss: 1.9089 (1.9089), Avg Acc: 0.5718
+2022-03-23 18:31:48,528 MASTER_LOG Epoch[007/090], Step[0020/0312], Lr: 3.881026e+00, Loss: 1.8937 (1.9345), Avg Acc: 0.5740
+2022-03-23 18:32:36,396 MASTER_LOG Epoch[007/090], Step[0040/0312], Lr: 3.922051e+00, Loss: 1.9312 (1.9355), Avg Acc: 0.5732
+2022-03-23 18:33:24,162 MASTER_LOG Epoch[007/090], Step[0060/0312], Lr: 3.963077e+00, Loss: 1.8898 (1.9373), Avg Acc: 0.5734
+2022-03-23 18:34:09,914 MASTER_LOG Epoch[007/090], Step[0080/0312], Lr: 4.004103e+00, Loss: 1.9330 (1.9451), Avg Acc: 0.5721
+2022-03-23 18:34:57,262 MASTER_LOG Epoch[007/090], Step[0100/0312], Lr: 4.045128e+00, Loss: 1.9476 (1.9478), Avg Acc: 0.5713
+2022-03-23 18:35:44,508 MASTER_LOG Epoch[007/090], Step[0120/0312], Lr: 4.086154e+00, Loss: 2.0406 (1.9505), Avg Acc: 0.5715
+2022-03-23 18:36:33,007 MASTER_LOG Epoch[007/090], Step[0140/0312], Lr: 4.127179e+00, Loss: 1.9690 (1.9527), Avg Acc: 0.5712
+2022-03-23 18:37:19,871 MASTER_LOG Epoch[007/090], Step[0160/0312], Lr: 4.168205e+00, Loss: 1.9940 (1.9568), Avg Acc: 0.5708
+2022-03-23 18:38:07,252 MASTER_LOG Epoch[007/090], Step[0180/0312], Lr: 4.209231e+00, Loss: 2.0022 (1.9591), Avg Acc: 0.5704
+2022-03-23 18:38:53,683 MASTER_LOG Epoch[007/090], Step[0200/0312], Lr: 4.250256e+00, Loss: 1.9668 (1.9613), Avg Acc: 0.5702
+2022-03-23 18:39:41,476 MASTER_LOG Epoch[007/090], Step[0220/0312], Lr: 4.291282e+00, Loss: 1.9307 (1.9622), Avg Acc: 0.5700
+2022-03-23 18:40:28,089 MASTER_LOG Epoch[007/090], Step[0240/0312], Lr: 4.332308e+00, Loss: 1.9802 (1.9629), Avg Acc: 0.5700
+2022-03-23 18:41:15,803 MASTER_LOG Epoch[007/090], Step[0260/0312], Lr: 4.373333e+00, Loss: 1.9879 (1.9639), Avg Acc: 0.5699
+2022-03-23 18:42:02,967 MASTER_LOG Epoch[007/090], Step[0280/0312], Lr: 4.414359e+00, Loss: 1.9997 (1.9647), Avg Acc: 0.5698
+2022-03-23 18:42:50,373 MASTER_LOG Epoch[007/090], Step[0300/0312], Lr: 4.455385e+00, Loss: 2.0135 (1.9663), Avg Acc: 0.5695
+2022-03-23 18:43:17,468 MASTER_LOG Epoch[007/090], Step[0311/0312], Lr: 4.471795e+00, Loss: 2.0479 (1.9670), Avg Acc: 0.5694
+2022-03-23 18:43:18,985 MASTER_LOG ----- Epoch[007/090], Lr: 4.471795e+00, time: 746.64Train Loss: 1.9670, Train Acc: 0.5694
+2022-03-23 18:43:18,985 MASTER_LOG ----- Validation after Epoch: 7
+2022-03-23 18:43:28,981 MASTER_LOG Step[0000/0013], Avg Loss: 1.3498, Avg Acc@1: 0.6665, Avg Acc@5: 0.8889
+2022-03-23 18:44:03,968 MASTER_LOG ----- Epoch[007/090], Validation Loss: 1.6847, Validation Acc@1: 0.6066, Validation Acc@5: 0.8333, time: 44.98
+2022-03-23 18:44:03,968 MASTER_LOG Train epoch 8. LR=4.471795e+00
+2022-03-23 18:44:11,385 MASTER_LOG Epoch[008/090], Step[0000/0312], Lr: 4.480000e+00, Loss: 1.9220 (1.9220), Avg Acc: 0.5820
+2022-03-23 18:44:59,945 MASTER_LOG Epoch[008/090], Step[0020/0312], Lr: 4.521026e+00, Loss: 1.9748 (1.9283), Avg Acc: 0.5739
+2022-03-23 18:45:48,443 MASTER_LOG Epoch[008/090], Step[0040/0312], Lr: 4.562051e+00, Loss: 1.9922 (1.9349), Avg Acc: 0.5727
+2022-03-23 18:46:35,541 MASTER_LOG Epoch[008/090], Step[0060/0312], Lr: 4.603077e+00, Loss: 1.9987 (1.9347), Avg Acc: 0.5731
+2022-03-23 18:47:23,778 MASTER_LOG Epoch[008/090], Step[0080/0312], Lr: 4.644103e+00, Loss: 1.9517 (1.9415), Avg Acc: 0.5728
+2022-03-23 18:48:11,672 MASTER_LOG Epoch[008/090], Step[0100/0312], Lr: 4.685128e+00, Loss: 2.1101 (1.9485), Avg Acc: 0.5719
+2022-03-23 18:48:59,875 MASTER_LOG Epoch[008/090], Step[0120/0312], Lr: 4.726154e+00, Loss: 1.9832 (1.9521), Avg Acc: 0.5718
+2022-03-23 18:49:48,702 MASTER_LOG Epoch[008/090], Step[0140/0312], Lr: 4.767179e+00, Loss: 2.0436 (1.9555), Avg Acc: 0.5712
+2022-03-23 18:50:37,095 MASTER_LOG Epoch[008/090], Step[0160/0312], Lr: 4.808205e+00, Loss: 1.8981 (1.9567), Avg Acc: 0.5712
+2022-03-23 18:51:24,224 MASTER_LOG Epoch[008/090], Step[0180/0312], Lr: 4.849231e+00, Loss: 1.9244 (1.9586), Avg Acc: 0.5710
+2022-03-23 18:52:12,147 MASTER_LOG Epoch[008/090], Step[0200/0312], Lr: 4.890256e+00, Loss: 1.9776 (1.9617), Avg Acc: 0.5706
+2022-03-23 18:52:59,311 MASTER_LOG Epoch[008/090], Step[0220/0312], Lr: 4.931282e+00, Loss: 1.9353 (1.9608), Avg Acc: 0.5708
+2022-03-23 18:53:46,599 MASTER_LOG Epoch[008/090], Step[0240/0312], Lr: 4.972308e+00, Loss: 1.9873 (1.9621), Avg Acc: 0.5705
+2022-03-23 18:54:34,561 MASTER_LOG Epoch[008/090], Step[0260/0312], Lr: 5.013333e+00, Loss: 2.0346 (1.9646), Avg Acc: 0.5702
+2022-03-23 18:55:22,271 MASTER_LOG Epoch[008/090], Step[0280/0312], Lr: 5.054359e+00, Loss: 1.9937 (1.9652), Avg Acc: 0.5700
+2022-03-23 18:56:10,719 MASTER_LOG Epoch[008/090], Step[0300/0312], Lr: 5.095385e+00, Loss: 2.0392 (1.9672), Avg Acc: 0.5698
+2022-03-23 18:56:38,115 MASTER_LOG Epoch[008/090], Step[0311/0312], Lr: 5.111795e+00, Loss: 1.8997 (1.9679), Avg Acc: 0.5697
+2022-03-23 18:56:41,882 MASTER_LOG ----- Epoch[008/090], Lr: 5.111795e+00, time: 757.85Train Loss: 1.9679, Train Acc: 0.5697
+2022-03-23 18:56:41,882 MASTER_LOG ----- Validation after Epoch: 8
+2022-03-23 18:56:51,498 MASTER_LOG Step[0000/0013], Avg Loss: 1.3126, Avg Acc@1: 0.6804, Avg Acc@5: 0.8872
+2022-03-23 18:57:27,581 MASTER_LOG ----- Epoch[008/090], Validation Loss: 1.6786, Validation Acc@1: 0.6125, Validation Acc@5: 0.8323, time: 45.70
+2022-03-23 18:57:27,581 MASTER_LOG Train epoch 9. LR=5.111795e+00
+2022-03-23 18:57:35,489 MASTER_LOG Epoch[009/090], Step[0000/0312], Lr: 5.120000e+00, Loss: 1.9738 (1.9738), Avg Acc: 0.5681
+2022-03-23 18:58:27,152 MASTER_LOG Epoch[009/090], Step[0020/0312], Lr: 5.161026e+00, Loss: 1.9118 (1.9394), Avg Acc: 0.5739
+2022-03-23 18:59:13,955 MASTER_LOG Epoch[009/090], Step[0040/0312], Lr: 5.202051e+00, Loss: 2.0332 (1.9442), Avg Acc: 0.5736
+2022-03-23 19:00:02,020 MASTER_LOG Epoch[009/090], Step[0060/0312], Lr: 5.243077e+00, Loss: 2.0133 (1.9440), Avg Acc: 0.5740
+2022-03-23 19:00:50,401 MASTER_LOG Epoch[009/090], Step[0080/0312], Lr: 5.284103e+00, Loss: 1.9586 (1.9533), Avg Acc: 0.5724
+2022-03-23 19:01:39,198 MASTER_LOG Epoch[009/090], Step[0100/0312], Lr: 5.325128e+00, Loss: 1.9244 (1.9562), Avg Acc: 0.5715
+2022-03-23 19:02:28,136 MASTER_LOG Epoch[009/090], Step[0120/0312], Lr: 5.366154e+00, Loss: 2.0224 (1.9611), Avg Acc: 0.5710
+2022-03-23 19:03:15,876 MASTER_LOG Epoch[009/090], Step[0140/0312], Lr: 5.407179e+00, Loss: 2.0974 (1.9666), Avg Acc: 0.5703
+2022-03-23 19:04:03,379 MASTER_LOG Epoch[009/090], Step[0160/0312], Lr: 5.448205e+00, Loss: 1.9680 (1.9666), Avg Acc: 0.5703
+2022-03-23 19:04:51,355 MASTER_LOG Epoch[009/090], Step[0180/0312], Lr: 5.489231e+00, Loss: 2.0121 (1.9676), Avg Acc: 0.5698
+2022-03-23 19:05:39,381 MASTER_LOG Epoch[009/090], Step[0200/0312], Lr: 5.530256e+00, Loss: 1.9885 (1.9704), Avg Acc: 0.5697
+2022-03-23 19:06:28,503 MASTER_LOG Epoch[009/090], Step[0220/0312], Lr: 5.571282e+00, Loss: 1.9638 (1.9721), Avg Acc: 0.5694
+2022-03-23 19:07:16,995 MASTER_LOG Epoch[009/090], Step[0240/0312], Lr: 5.612308e+00, Loss: 1.9483 (1.9715), Avg Acc: 0.5696
+2022-03-23 19:08:05,370 MASTER_LOG Epoch[009/090], Step[0260/0312], Lr: 5.653333e+00, Loss: 2.0017 (1.9725), Avg Acc: 0.5696
+2022-03-23 19:08:53,746 MASTER_LOG Epoch[009/090], Step[0280/0312], Lr: 5.694359e+00, Loss: 1.9450 (1.9744), Avg Acc: 0.5694
+2022-03-23 19:09:41,127 MASTER_LOG Epoch[009/090], Step[0300/0312], Lr: 5.735385e+00, Loss: 1.9227 (1.9739), Avg Acc: 0.5696
+2022-03-23 19:10:05,862 MASTER_LOG Epoch[009/090], Step[0311/0312], Lr: 5.751795e+00, Loss: 2.0063 (1.9745), Avg Acc: 0.5696
+2022-03-23 19:10:08,932 MASTER_LOG ----- Epoch[009/090], Lr: 5.751795e+00, time: 761.35Train Loss: 1.9745, Train Acc: 0.5696
+2022-03-23 19:10:08,932 MASTER_LOG ----- Validation after Epoch: 9
+2022-03-23 19:10:18,322 MASTER_LOG Step[0000/0013], Avg Loss: 1.3230, Avg Acc@1: 0.6697, Avg Acc@5: 0.8926
+2022-03-23 19:10:54,339 MASTER_LOG ----- Epoch[009/090], Validation Loss: 1.6816, Validation Acc@1: 0.6104, Validation Acc@5: 0.8350, time: 45.40
+2022-03-23 19:10:54,339 MASTER_LOG Train epoch 10. LR=5.751795e+00
+2022-03-23 19:11:02,176 MASTER_LOG Epoch[010/090], Step[0000/0312], Lr: 5.760000e+00, Loss: 1.9811 (1.9811), Avg Acc: 0.5703
+2022-03-23 19:11:51,349 MASTER_LOG Epoch[010/090], Step[0020/0312], Lr: 5.801026e+00, Loss: 1.9588 (1.9405), Avg Acc: 0.5732
+2022-03-23 19:12:39,033 MASTER_LOG Epoch[010/090], Step[0040/0312], Lr: 5.842051e+00, Loss: 2.0036 (1.9439), Avg Acc: 0.5741
+2022-03-23 19:13:26,987 MASTER_LOG Epoch[010/090], Step[0060/0312], Lr: 5.883077e+00, Loss: 1.9953 (1.9462), Avg Acc: 0.5749
+2022-03-23 19:14:14,383 MASTER_LOG Epoch[010/090], Step[0080/0312], Lr: 5.924103e+00, Loss: 1.9743 (1.9489), Avg Acc: 0.5744
+2022-03-23 19:15:02,268 MASTER_LOG Epoch[010/090], Step[0100/0312], Lr: 5.965128e+00, Loss: 2.0248 (1.9540), Avg Acc: 0.5736
+2022-03-23 19:15:50,736 MASTER_LOG Epoch[010/090], Step[0120/0312], Lr: 6.006154e+00, Loss: 2.0233 (1.9591), Avg Acc: 0.5728
+2022-03-23 19:16:38,736 MASTER_LOG Epoch[010/090], Step[0140/0312], Lr: 6.047179e+00, Loss: 1.9522 (1.9606), Avg Acc: 0.5723
+2022-03-23 19:17:26,780 MASTER_LOG Epoch[010/090], Step[0160/0312], Lr: 6.088205e+00, Loss: 1.9826 (1.9628), Avg Acc: 0.5717
+2022-03-23 19:18:13,751 MASTER_LOG Epoch[010/090], Step[0180/0312], Lr: 6.129231e+00, Loss: 1.8861 (1.9641), Avg Acc: 0.5714
+2022-03-23 19:19:01,280 MASTER_LOG Epoch[010/090], Step[0200/0312], Lr: 6.170256e+00, Loss: 2.1194 (1.9655), Avg Acc: 0.5711
+2022-03-23 19:19:49,253 MASTER_LOG Epoch[010/090], Step[0220/0312], Lr: 6.211282e+00, Loss: 2.0323 (1.9679), Avg Acc: 0.5709
+2022-03-23 19:20:37,390 MASTER_LOG Epoch[010/090], Step[0240/0312], Lr: 6.252308e+00, Loss: 2.0445 (1.9692), Avg Acc: 0.5708
+2022-03-23 19:21:25,783 MASTER_LOG Epoch[010/090], Step[0260/0312], Lr: 6.293333e+00, Loss: 1.9631 (1.9703), Avg Acc: 0.5706
+2022-03-23 19:22:14,360 MASTER_LOG Epoch[010/090], Step[0280/0312], Lr: 6.334359e+00, Loss: 1.9837 (1.9719), Avg Acc: 0.5702
+2022-03-23 19:23:01,660 MASTER_LOG Epoch[010/090], Step[0300/0312], Lr: 6.375385e+00, Loss: 1.9544 (1.9737), Avg Acc: 0.5700
+2022-03-23 19:23:25,722 MASTER_LOG Epoch[010/090], Step[0311/0312], Lr: 6.391795e+00, Loss: 2.0441 (1.9748), Avg Acc: 0.5699
+2022-03-23 19:23:27,854 MASTER_LOG ----- Epoch[010/090], Lr: 6.391795e+00, time: 752.45Train Loss: 1.9748, Train Acc: 0.5699
+2022-03-23 19:23:27,854 MASTER_LOG ----- Validation after Epoch: 10
+2022-03-23 19:23:37,539 MASTER_LOG Step[0000/0013], Avg Loss: 1.3566, Avg Acc@1: 0.6753, Avg Acc@5: 0.8889
+2022-03-23 19:24:12,907 MASTER_LOG ----- Epoch[010/090], Validation Loss: 1.6874, Validation Acc@1: 0.6084, Validation Acc@5: 0.8349, time: 45.05
+2022-03-23 19:24:13,842 MASTER_LOG ----- Save model: ./output/linearprobe-20220323-17-11/LINEARPROBE-Epoch-10-Loss-1.6873605159378051.pdparams
+2022-03-23 19:24:13,842 MASTER_LOG Train epoch 11. LR=6.391795e+00
+2022-03-23 19:24:20,976 MASTER_LOG Epoch[011/090], Step[0000/0312], Lr: 6.400000e+00, Loss: 1.9267 (1.9267), Avg Acc: 0.5764
+2022-03-23 19:25:08,853 MASTER_LOG Epoch[011/090], Step[0020/0312], Lr: 6.399990e+00, Loss: 1.9046 (1.9351), Avg Acc: 0.5773
+2022-03-23 19:25:56,867 MASTER_LOG Epoch[011/090], Step[0040/0312], Lr: 6.399959e+00, Loss: 1.9576 (1.9370), Avg Acc: 0.5764
+2022-03-23 19:26:43,725 MASTER_LOG Epoch[011/090], Step[0060/0312], Lr: 6.399909e+00, Loss: 1.9133 (1.9371), Avg Acc: 0.5763
+2022-03-23 19:27:31,782 MASTER_LOG Epoch[011/090], Step[0080/0312], Lr: 6.399838e+00, Loss: 1.9721 (1.9435), Avg Acc: 0.5754
+2022-03-23 19:28:19,758 MASTER_LOG Epoch[011/090], Step[0100/0312], Lr: 6.399747e+00, Loss: 1.9133 (1.9431), Avg Acc: 0.5755
+2022-03-23 19:29:07,513 MASTER_LOG Epoch[011/090], Step[0120/0312], Lr: 6.399635e+00, Loss: 1.9686 (1.9452), Avg Acc: 0.5750
+2022-03-23 19:29:53,615 MASTER_LOG Epoch[011/090], Step[0140/0312], Lr: 6.399503e+00, Loss: 1.9385 (1.9446), Avg Acc: 0.5749
+2022-03-23 19:30:42,881 MASTER_LOG Epoch[011/090], Step[0160/0312], Lr: 6.399351e+00, Loss: 1.9348 (1.9454), Avg Acc: 0.5748
+2022-03-23 19:31:29,793 MASTER_LOG Epoch[011/090], Step[0180/0312], Lr: 6.399179e+00, Loss: 1.9315 (1.9463), Avg Acc: 0.5744
+2022-03-23 19:32:16,837 MASTER_LOG Epoch[011/090], Step[0200/0312], Lr: 6.398986e+00, Loss: 1.9348 (1.9478), Avg Acc: 0.5743
+2022-03-23 19:33:04,313 MASTER_LOG Epoch[011/090], Step[0220/0312], Lr: 6.398773e+00, Loss: 1.9564 (1.9496), Avg Acc: 0.5740
+2022-03-23 19:33:52,415 MASTER_LOG Epoch[011/090], Step[0240/0312], Lr: 6.398540e+00, Loss: 1.9416 (1.9496), Avg Acc: 0.5738
+2022-03-23 19:34:39,618 MASTER_LOG Epoch[011/090], Step[0260/0312], Lr: 6.398287e+00, Loss: 1.9005 (1.9498), Avg Acc: 0.5739
+2022-03-23 19:35:27,701 MASTER_LOG Epoch[011/090], Step[0280/0312], Lr: 6.398013e+00, Loss: 1.8958 (1.9496), Avg Acc: 0.5739
+2022-03-23 19:36:15,086 MASTER_LOG Epoch[011/090], Step[0300/0312], Lr: 6.397719e+00, Loss: 1.8763 (1.9488), Avg Acc: 0.5741
+2022-03-23 19:36:42,982 MASTER_LOG Epoch[011/090], Step[0311/0312], Lr: 6.397596e+00, Loss: 1.9948 (1.9486), Avg Acc: 0.5741
+2022-03-23 19:36:44,986 MASTER_LOG ----- Epoch[011/090], Lr: 6.397596e+00, time: 750.56Train Loss: 1.9486, Train Acc: 0.5741
+2022-03-23 19:36:44,986 MASTER_LOG ----- Validation after Epoch: 11
+2022-03-23 19:36:54,640 MASTER_LOG Step[0000/0013], Avg Loss: 1.2828, Avg Acc@1: 0.6833, Avg Acc@5: 0.8860
+2022-03-23 19:37:29,323 MASTER_LOG ----- Epoch[011/090], Validation Loss: 1.6456, Validation Acc@1: 0.6165, Validation Acc@5: 0.8388, time: 44.33
+2022-03-23 19:37:29,324 MASTER_LOG Train epoch 12. LR=6.397596e+00
+2022-03-23 19:37:37,268 MASTER_LOG Epoch[012/090], Step[0000/0312], Lr: 6.397533e+00, Loss: 1.8674 (1.8674), Avg Acc: 0.5964
+2022-03-23 19:38:27,243 MASTER_LOG Epoch[012/090], Step[0020/0312], Lr: 6.397207e+00, Loss: 1.9347 (1.8849), Avg Acc: 0.5854
+2022-03-23 19:39:15,060 MASTER_LOG Epoch[012/090], Step[0040/0312], Lr: 6.396860e+00, Loss: 1.8737 (1.8910), Avg Acc: 0.5832
+2022-03-23 19:40:02,443 MASTER_LOG Epoch[012/090], Step[0060/0312], Lr: 6.396493e+00, Loss: 1.9122 (1.8962), Avg Acc: 0.5830
+2022-03-23 19:40:51,822 MASTER_LOG Epoch[012/090], Step[0080/0312], Lr: 6.396106e+00, Loss: 1.9316 (1.8928), Avg Acc: 0.5835
+2022-03-23 19:41:40,699 MASTER_LOG Epoch[012/090], Step[0100/0312], Lr: 6.395698e+00, Loss: 1.8575 (1.8982), Avg Acc: 0.5827
+2022-03-23 19:42:28,459 MASTER_LOG Epoch[012/090], Step[0120/0312], Lr: 6.395271e+00, Loss: 1.8704 (1.9004), Avg Acc: 0.5824
+2022-03-23 19:43:17,128 MASTER_LOG Epoch[012/090], Step[0140/0312], Lr: 6.394823e+00, Loss: 1.9260 (1.9033), Avg Acc: 0.5820
+2022-03-23 19:44:04,466 MASTER_LOG Epoch[012/090], Step[0160/0312], Lr: 6.394355e+00, Loss: 1.9274 (1.9040), Avg Acc: 0.5818
+2022-03-23 19:44:51,942 MASTER_LOG Epoch[012/090], Step[0180/0312], Lr: 6.393866e+00, Loss: 1.9152 (1.9079), Avg Acc: 0.5811
+2022-03-23 19:45:39,109 MASTER_LOG Epoch[012/090], Step[0200/0312], Lr: 6.393358e+00, Loss: 1.8977 (1.9091), Avg Acc: 0.5808
+2022-03-23 19:46:25,603 MASTER_LOG Epoch[012/090], Step[0220/0312], Lr: 6.392829e+00, Loss: 1.9236 (1.9088), Avg Acc: 0.5805
+2022-03-23 19:47:13,484 MASTER_LOG Epoch[012/090], Step[0240/0312], Lr: 6.392280e+00, Loss: 1.9516 (1.9102), Avg Acc: 0.5803
+2022-03-23 19:48:00,756 MASTER_LOG Epoch[012/090], Step[0260/0312], Lr: 6.391710e+00, Loss: 1.9077 (1.9103), Avg Acc: 0.5802
+2022-03-23 19:48:47,520 MASTER_LOG Epoch[012/090], Step[0280/0312], Lr: 6.391121e+00, Loss: 1.9250 (1.9102), Avg Acc: 0.5804
+2022-03-23 19:49:34,054 MASTER_LOG Epoch[012/090], Step[0300/0312], Lr: 6.390511e+00, Loss: 1.9672 (1.9114), Avg Acc: 0.5802
+2022-03-23 19:50:00,740 MASTER_LOG Epoch[012/090], Step[0311/0312], Lr: 6.390261e+00, Loss: 1.9579 (1.9118), Avg Acc: 0.5802
+2022-03-23 19:50:03,042 MASTER_LOG ----- Epoch[012/090], Lr: 6.390261e+00, time: 753.65Train Loss: 1.9118, Train Acc: 0.5802
+2022-03-23 19:50:03,042 MASTER_LOG ----- Validation after Epoch: 12
+2022-03-23 19:50:12,547 MASTER_LOG Step[0000/0013], Avg Loss: 1.2312, Avg Acc@1: 0.6909, Avg Acc@5: 0.8936
+2022-03-23 19:50:48,825 MASTER_LOG ----- Epoch[012/090], Validation Loss: 1.6169, Validation Acc@1: 0.6248, Validation Acc@5: 0.8394, time: 45.78
+2022-03-23 19:50:48,825 MASTER_LOG Train epoch 13. LR=6.390261e+00
+2022-03-23 19:50:56,169 MASTER_LOG Epoch[013/090], Step[0000/0312], Lr: 6.390135e+00, Loss: 1.7806 (1.7806), Avg Acc: 0.6003
+2022-03-23 19:51:45,457 MASTER_LOG Epoch[013/090], Step[0020/0312], Lr: 6.389493e+00, Loss: 1.8600 (1.8455), Avg Acc: 0.5917
+2022-03-23 19:52:32,690 MASTER_LOG Epoch[013/090], Step[0040/0312], Lr: 6.388831e+00, Loss: 1.8242 (1.8494), Avg Acc: 0.5902
+2022-03-23 19:53:20,785 MASTER_LOG Epoch[013/090], Step[0060/0312], Lr: 6.388148e+00, Loss: 1.8588 (1.8594), Avg Acc: 0.5881
+2022-03-23 19:54:08,357 MASTER_LOG Epoch[013/090], Step[0080/0312], Lr: 6.387446e+00, Loss: 1.8359 (1.8626), Avg Acc: 0.5879
+2022-03-23 19:54:56,150 MASTER_LOG Epoch[013/090], Step[0100/0312], Lr: 6.386723e+00, Loss: 1.8936 (1.8641), Avg Acc: 0.5880
+2022-03-23 19:55:43,479 MASTER_LOG Epoch[013/090], Step[0120/0312], Lr: 6.385980e+00, Loss: 1.8148 (1.8665), Avg Acc: 0.5879
+2022-03-23 19:56:31,705 MASTER_LOG Epoch[013/090], Step[0140/0312], Lr: 6.385216e+00, Loss: 1.9487 (1.8713), Avg Acc: 0.5869
+2022-03-23 19:57:19,358 MASTER_LOG Epoch[013/090], Step[0160/0312], Lr: 6.384433e+00, Loss: 1.8823 (1.8735), Avg Acc: 0.5864
+2022-03-23 19:58:07,391 MASTER_LOG Epoch[013/090], Step[0180/0312], Lr: 6.383629e+00, Loss: 1.7822 (1.8740), Avg Acc: 0.5863
+2022-03-23 19:58:54,818 MASTER_LOG Epoch[013/090], Step[0200/0312], Lr: 6.382805e+00, Loss: 1.8696 (1.8754), Avg Acc: 0.5858
+2022-03-23 19:59:42,340 MASTER_LOG Epoch[013/090], Step[0220/0312], Lr: 6.381961e+00, Loss: 1.9471 (1.8769), Avg Acc: 0.5856
+2022-03-23 20:00:29,417 MASTER_LOG Epoch[013/090], Step[0240/0312], Lr: 6.381097e+00, Loss: 1.8829 (1.8777), Avg Acc: 0.5854
+2022-03-23 20:01:16,663 MASTER_LOG Epoch[013/090], Step[0260/0312], Lr: 6.380213e+00, Loss: 1.8602 (1.8775), Avg Acc: 0.5855
+2022-03-23 20:02:04,954 MASTER_LOG Epoch[013/090], Step[0280/0312], Lr: 6.379308e+00, Loss: 1.8996 (1.8776), Avg Acc: 0.5856
+2022-03-23 20:02:52,912 MASTER_LOG Epoch[013/090], Step[0300/0312], Lr: 6.378384e+00, Loss: 1.8973 (1.8777), Avg Acc: 0.5856
+2022-03-23 20:03:18,219 MASTER_LOG Epoch[013/090], Step[0311/0312], Lr: 6.378008e+00, Loss: 1.8635 (1.8773), Avg Acc: 0.5856
+2022-03-23 20:03:20,292 MASTER_LOG ----- Epoch[013/090], Lr: 6.378008e+00, time: 751.46Train Loss: 1.8773, Train Acc: 0.5856
+2022-03-23 20:03:20,292 MASTER_LOG ----- Validation after Epoch: 13
+2022-03-23 20:03:30,029 MASTER_LOG Step[0000/0013], Avg Loss: 1.2380, Avg Acc@1: 0.6953, Avg Acc@5: 0.8962
+2022-03-23 20:04:04,209 MASTER_LOG ----- Epoch[013/090], Validation Loss: 1.5961, Validation Acc@1: 0.6255, Validation Acc@5: 0.8436, time: 43.91
+2022-03-23 20:04:04,210 MASTER_LOG Train epoch 14. LR=6.378008e+00
+2022-03-23 20:04:11,970 MASTER_LOG Epoch[014/090], Step[0000/0312], Lr: 6.377819e+00, Loss: 1.8118 (1.8118), Avg Acc: 0.5911
+2022-03-23 20:05:01,565 MASTER_LOG Epoch[014/090], Step[0020/0312], Lr: 6.376862e+00, Loss: 1.7704 (1.8262), Avg Acc: 0.5940
+2022-03-23 20:05:50,357 MASTER_LOG Epoch[014/090], Step[0040/0312], Lr: 6.375885e+00, Loss: 1.8377 (1.8226), Avg Acc: 0.5945
+2022-03-23 20:06:38,567 MASTER_LOG Epoch[014/090], Step[0060/0312], Lr: 6.374888e+00, Loss: 1.8551 (1.8296), Avg Acc: 0.5933
+2022-03-23 20:07:26,638 MASTER_LOG Epoch[014/090], Step[0080/0312], Lr: 6.373871e+00, Loss: 1.9680 (1.8363), Avg Acc: 0.5927
+2022-03-23 20:08:13,843 MASTER_LOG Epoch[014/090], Step[0100/0312], Lr: 6.372833e+00, Loss: 1.8188 (1.8341), Avg Acc: 0.5931
+2022-03-23 20:09:02,356 MASTER_LOG Epoch[014/090], Step[0120/0312], Lr: 6.371776e+00, Loss: 1.7792 (1.8363), Avg Acc: 0.5931
+2022-03-23 20:09:50,217 MASTER_LOG Epoch[014/090], Step[0140/0312], Lr: 6.370698e+00, Loss: 1.9810 (1.8397), Avg Acc: 0.5925
+2022-03-23 20:10:37,390 MASTER_LOG Epoch[014/090], Step[0160/0312], Lr: 6.369601e+00, Loss: 1.8568 (1.8431), Avg Acc: 0.5918
+2022-03-23 20:11:24,613 MASTER_LOG Epoch[014/090], Step[0180/0312], Lr: 6.368483e+00, Loss: 1.7800 (1.8441), Avg Acc: 0.5917
+2022-03-23 20:12:11,954 MASTER_LOG Epoch[014/090], Step[0200/0312], Lr: 6.367345e+00, Loss: 1.9335 (1.8458), Avg Acc: 0.5915
+2022-03-23 20:13:01,049 MASTER_LOG Epoch[014/090], Step[0220/0312], Lr: 6.366187e+00, Loss: 1.9046 (1.8477), Avg Acc: 0.5911
+2022-03-23 20:13:49,727 MASTER_LOG Epoch[014/090], Step[0240/0312], Lr: 6.365009e+00, Loss: 1.8469 (1.8478), Avg Acc: 0.5911
+2022-03-23 20:14:37,162 MASTER_LOG Epoch[014/090], Step[0260/0312], Lr: 6.363811e+00, Loss: 1.8067 (1.8487), Avg Acc: 0.5909
+2022-03-23 20:15:25,063 MASTER_LOG Epoch[014/090], Step[0280/0312], Lr: 6.362593e+00, Loss: 1.8619 (1.8498), Avg Acc: 0.5907
+2022-03-23 20:16:13,612 MASTER_LOG Epoch[014/090], Step[0300/0312], Lr: 6.361355e+00, Loss: 1.8062 (1.8506), Avg Acc: 0.5905
+2022-03-23 20:16:37,833 MASTER_LOG Epoch[014/090], Step[0311/0312], Lr: 6.360854e+00, Loss: 1.8878 (1.8512), Avg Acc: 0.5905
+2022-03-23 20:16:40,197 MASTER_LOG ----- Epoch[014/090], Lr: 6.360854e+00, time: 755.96Train Loss: 1.8512, Train Acc: 0.5905
+2022-03-23 20:16:40,197 MASTER_LOG ----- Validation after Epoch: 14
+2022-03-23 20:16:49,756 MASTER_LOG Step[0000/0013], Avg Loss: 1.2487, Avg Acc@1: 0.6885, Avg Acc@5: 0.8977
+2022-03-23 20:17:25,390 MASTER_LOG ----- Epoch[014/090], Validation Loss: 1.5789, Validation Acc@1: 0.6288, Validation Acc@5: 0.8470, time: 45.19
+2022-03-23 20:17:25,390 MASTER_LOG Train epoch 15. LR=6.360854e+00
+2022-03-23 20:17:33,369 MASTER_LOG Epoch[015/090], Step[0000/0312], Lr: 6.360603e+00, Loss: 1.7248 (1.7248), Avg Acc: 0.6096
+2022-03-23 20:18:23,044 MASTER_LOG Epoch[015/090], Step[0020/0312], Lr: 6.359333e+00, Loss: 1.8203 (1.8114), Avg Acc: 0.5978
+2022-03-23 20:19:11,393 MASTER_LOG Epoch[015/090], Step[0040/0312], Lr: 6.358042e+00, Loss: 1.7478 (1.8130), Avg Acc: 0.5982
+2022-03-23 20:20:00,206 MASTER_LOG Epoch[015/090], Step[0060/0312], Lr: 6.356732e+00, Loss: 1.7485 (1.8148), Avg Acc: 0.5973
+2022-03-23 20:20:48,498 MASTER_LOG Epoch[015/090], Step[0080/0312], Lr: 6.355402e+00, Loss: 1.7948 (1.8200), Avg Acc: 0.5963
+2022-03-23 20:21:35,331 MASTER_LOG Epoch[015/090], Step[0100/0312], Lr: 6.354052e+00, Loss: 1.8764 (1.8213), Avg Acc: 0.5960
+2022-03-23 20:22:22,198 MASTER_LOG Epoch[015/090], Step[0120/0312], Lr: 6.352682e+00, Loss: 1.8772 (1.8229), Avg Acc: 0.5954
+2022-03-23 20:23:10,600 MASTER_LOG Epoch[015/090], Step[0140/0312], Lr: 6.351292e+00, Loss: 1.8091 (1.8223), Avg Acc: 0.5958
+2022-03-23 20:23:59,499 MASTER_LOG Epoch[015/090], Step[0160/0312], Lr: 6.349881e+00, Loss: 1.8063 (1.8241), Avg Acc: 0.5954
+2022-03-23 20:24:46,609 MASTER_LOG Epoch[015/090], Step[0180/0312], Lr: 6.348451e+00, Loss: 1.7346 (1.8246), Avg Acc: 0.5954
+2022-03-23 20:25:34,258 MASTER_LOG Epoch[015/090], Step[0200/0312], Lr: 6.347001e+00, Loss: 1.7634 (1.8258), Avg Acc: 0.5953
+2022-03-23 20:26:20,518 MASTER_LOG Epoch[015/090], Step[0220/0312], Lr: 6.345531e+00, Loss: 1.8289 (1.8274), Avg Acc: 0.5949
+2022-03-23 20:27:08,007 MASTER_LOG Epoch[015/090], Step[0240/0312], Lr: 6.344041e+00, Loss: 1.8394 (1.8277), Avg Acc: 0.5950
+2022-03-23 20:27:55,874 MASTER_LOG Epoch[015/090], Step[0260/0312], Lr: 6.342532e+00, Loss: 1.9220 (1.8296), Avg Acc: 0.5945
+2022-03-23 20:28:45,421 MASTER_LOG Epoch[015/090], Step[0280/0312], Lr: 6.341002e+00, Loss: 1.8444 (1.8295), Avg Acc: 0.5944
+2022-03-23 20:29:33,499 MASTER_LOG Epoch[015/090], Step[0300/0312], Lr: 6.339452e+00, Loss: 1.7443 (1.8306), Avg Acc: 0.5941
+2022-03-23 20:29:57,476 MASTER_LOG Epoch[015/090], Step[0311/0312], Lr: 6.338827e+00, Loss: 1.8068 (1.8311), Avg Acc: 0.5940
+2022-03-23 20:30:00,079 MASTER_LOG ----- Epoch[015/090], Lr: 6.338827e+00, time: 754.62Train Loss: 1.8311, Train Acc: 0.5940
+2022-03-23 20:30:00,079 MASTER_LOG ----- Validation after Epoch: 15
+2022-03-23 20:30:09,703 MASTER_LOG Step[0000/0013], Avg Loss: 1.2546, Avg Acc@1: 0.6833, Avg Acc@5: 0.8970
+2022-03-23 20:30:44,103 MASTER_LOG ----- Epoch[015/090], Validation Loss: 1.5601, Validation Acc@1: 0.6327, Validation Acc@5: 0.8483, time: 44.02
+2022-03-23 20:30:44,104 MASTER_LOG Train epoch 16. LR=6.338827e+00
+2022-03-23 20:30:51,567 MASTER_LOG Epoch[016/090], Step[0000/0312], Lr: 6.338513e+00, Loss: 1.8000 (1.8000), Avg Acc: 0.6038
+2022-03-23 20:31:40,302 MASTER_LOG Epoch[016/090], Step[0020/0312], Lr: 6.336931e+00, Loss: 1.8106 (1.7873), Avg Acc: 0.6022
+2022-03-23 20:32:28,795 MASTER_LOG Epoch[016/090], Step[0040/0312], Lr: 6.335330e+00, Loss: 1.8351 (1.7942), Avg Acc: 0.6008
+2022-03-23 20:33:15,444 MASTER_LOG Epoch[016/090], Step[0060/0312], Lr: 6.333709e+00, Loss: 1.7562 (1.7971), Avg Acc: 0.6004
+2022-03-23 20:34:02,536 MASTER_LOG Epoch[016/090], Step[0080/0312], Lr: 6.332068e+00, Loss: 1.8773 (1.8028), Avg Acc: 0.5993
+2022-03-23 20:34:49,342 MASTER_LOG Epoch[016/090], Step[0100/0312], Lr: 6.330407e+00, Loss: 1.8464 (1.8045), Avg Acc: 0.5989
+2022-03-23 20:35:36,534 MASTER_LOG Epoch[016/090], Step[0120/0312], Lr: 6.328726e+00, Loss: 1.8631 (1.8054), Avg Acc: 0.5990
+2022-03-23 20:36:26,081 MASTER_LOG Epoch[016/090], Step[0140/0312], Lr: 6.327026e+00, Loss: 1.7449 (1.8048), Avg Acc: 0.5990
+2022-03-23 20:37:14,640 MASTER_LOG Epoch[016/090], Step[0160/0312], Lr: 6.325305e+00, Loss: 1.7647 (1.8044), Avg Acc: 0.5989
+2022-03-23 20:38:01,642 MASTER_LOG Epoch[016/090], Step[0180/0312], Lr: 6.323565e+00, Loss: 1.8593 (1.8063), Avg Acc: 0.5984
+2022-03-23 20:38:49,399 MASTER_LOG Epoch[016/090], Step[0200/0312], Lr: 6.321805e+00, Loss: 1.8039 (1.8077), Avg Acc: 0.5979
+2022-03-23 20:39:37,607 MASTER_LOG Epoch[016/090], Step[0220/0312], Lr: 6.320025e+00, Loss: 1.8422 (1.8079), Avg Acc: 0.5980
+2022-03-23 20:40:25,830 MASTER_LOG Epoch[016/090], Step[0240/0312], Lr: 6.318226e+00, Loss: 1.8059 (1.8085), Avg Acc: 0.5978
+2022-03-23 20:41:13,189 MASTER_LOG Epoch[016/090], Step[0260/0312], Lr: 6.316406e+00, Loss: 1.8920 (1.8090), Avg Acc: 0.5978
+2022-03-23 20:42:01,783 MASTER_LOG Epoch[016/090], Step[0280/0312], Lr: 6.314567e+00, Loss: 1.8680 (1.8109), Avg Acc: 0.5974
+2022-03-23 20:42:49,418 MASTER_LOG Epoch[016/090], Step[0300/0312], Lr: 6.312708e+00, Loss: 1.8260 (1.8133), Avg Acc: 0.5970
+2022-03-23 20:43:13,592 MASTER_LOG Epoch[016/090], Step[0311/0312], Lr: 6.311959e+00, Loss: 1.8062 (1.8135), Avg Acc: 0.5969
+2022-03-23 20:43:16,076 MASTER_LOG ----- Epoch[016/090], Lr: 6.311959e+00, time: 751.93Train Loss: 1.8135, Train Acc: 0.5969
+2022-03-23 20:43:16,076 MASTER_LOG ----- Validation after Epoch: 16
+2022-03-23 20:43:25,606 MASTER_LOG Step[0000/0013], Avg Loss: 1.2228, Avg Acc@1: 0.6968, Avg Acc@5: 0.8933
+2022-03-23 20:43:59,423 MASTER_LOG ----- Epoch[016/090], Validation Loss: 1.5559, Validation Acc@1: 0.6336, Validation Acc@5: 0.8491, time: 43.34
+2022-03-23 20:43:59,424 MASTER_LOG Train epoch 17. LR=6.311959e+00
+2022-03-23 20:44:06,902 MASTER_LOG Epoch[017/090], Step[0000/0312], Lr: 6.311584e+00, Loss: 1.7571 (1.7571), Avg Acc: 0.6069
+2022-03-23 20:44:55,141 MASTER_LOG Epoch[017/090], Step[0020/0312], Lr: 6.309693e+00, Loss: 1.7478 (1.7928), Avg Acc: 0.6005
+2022-03-23 20:45:42,573 MASTER_LOG Epoch[017/090], Step[0040/0312], Lr: 6.307783e+00, Loss: 1.7913 (1.7867), Avg Acc: 0.6017
+2022-03-23 20:46:30,850 MASTER_LOG Epoch[017/090], Step[0060/0312], Lr: 6.305854e+00, Loss: 1.8293 (1.7863), Avg Acc: 0.6012
+2022-03-23 20:47:18,309 MASTER_LOG Epoch[017/090], Step[0080/0312], Lr: 6.303904e+00, Loss: 1.8258 (1.7845), Avg Acc: 0.6015
+2022-03-23 20:48:05,797 MASTER_LOG Epoch[017/090], Step[0100/0312], Lr: 6.301935e+00, Loss: 1.7645 (1.7847), Avg Acc: 0.6017
+2022-03-23 20:48:53,097 MASTER_LOG Epoch[017/090], Step[0120/0312], Lr: 6.299946e+00, Loss: 1.8011 (1.7883), Avg Acc: 0.6009
+2022-03-23 20:49:39,947 MASTER_LOG Epoch[017/090], Step[0140/0312], Lr: 6.297938e+00, Loss: 1.8073 (1.7917), Avg Acc: 0.6004
+2022-03-23 20:50:26,695 MASTER_LOG Epoch[017/090], Step[0160/0312], Lr: 6.295910e+00, Loss: 1.8019 (1.7931), Avg Acc: 0.6001
+2022-03-23 20:51:13,743 MASTER_LOG Epoch[017/090], Step[0180/0312], Lr: 6.293862e+00, Loss: 1.7651 (1.7948), Avg Acc: 0.5998
+2022-03-23 20:52:01,301 MASTER_LOG Epoch[017/090], Step[0200/0312], Lr: 6.291795e+00, Loss: 1.7256 (1.7951), Avg Acc: 0.5997
+2022-03-23 20:52:48,857 MASTER_LOG Epoch[017/090], Step[0220/0312], Lr: 6.289708e+00, Loss: 1.6860 (1.7972), Avg Acc: 0.5995
+2022-03-23 20:53:36,456 MASTER_LOG Epoch[017/090], Step[0240/0312], Lr: 6.287602e+00, Loss: 1.7771 (1.7983), Avg Acc: 0.5994
+2022-03-23 20:54:23,696 MASTER_LOG Epoch[017/090], Step[0260/0312], Lr: 6.285476e+00, Loss: 1.8034 (1.8003), Avg Acc: 0.5991
+2022-03-23 20:55:12,085 MASTER_LOG Epoch[017/090], Step[0280/0312], Lr: 6.283330e+00, Loss: 1.8201 (1.7999), Avg Acc: 0.5993
+2022-03-23 20:56:01,202 MASTER_LOG Epoch[017/090], Step[0300/0312], Lr: 6.281165e+00, Loss: 1.7719 (1.8012), Avg Acc: 0.5991
+2022-03-23 20:56:25,985 MASTER_LOG Epoch[017/090], Step[0311/0312], Lr: 6.280294e+00, Loss: 1.8351 (1.8015), Avg Acc: 0.5990
+2022-03-23 20:56:28,174 MASTER_LOG ----- Epoch[017/090], Lr: 6.280294e+00, time: 747.51Train Loss: 1.8015, Train Acc: 0.5990
+2022-03-23 20:56:28,174 MASTER_LOG ----- Validation after Epoch: 17
+2022-03-23 20:56:37,729 MASTER_LOG Step[0000/0013], Avg Loss: 1.1828, Avg Acc@1: 0.7012, Avg Acc@5: 0.8999
+2022-03-23 20:57:13,141 MASTER_LOG ----- Epoch[017/090], Validation Loss: 1.5399, Validation Acc@1: 0.6373, Validation Acc@5: 0.8508, time: 44.96
+2022-03-23 20:57:13,141 MASTER_LOG Train epoch 18. LR=6.280294e+00
+2022-03-23 20:57:21,053 MASTER_LOG Epoch[018/090], Step[0000/0312], Lr: 6.279857e+00, Loss: 1.6889 (1.6889), Avg Acc: 0.6179
+2022-03-23 20:58:09,312 MASTER_LOG Epoch[018/090], Step[0020/0312], Lr: 6.277660e+00, Loss: 1.8207 (1.7733), Avg Acc: 0.6061
+2022-03-23 20:58:57,552 MASTER_LOG Epoch[018/090], Step[0040/0312], Lr: 6.275445e+00, Loss: 1.7266 (1.7639), Avg Acc: 0.6061
+2022-03-23 20:59:45,878 MASTER_LOG Epoch[018/090], Step[0060/0312], Lr: 6.273209e+00, Loss: 1.7479 (1.7669), Avg Acc: 0.6056
+2022-03-23 21:00:33,339 MASTER_LOG Epoch[018/090], Step[0080/0312], Lr: 6.270955e+00, Loss: 1.7447 (1.7705), Avg Acc: 0.6051
+2022-03-23 21:01:21,499 MASTER_LOG Epoch[018/090], Step[0100/0312], Lr: 6.268680e+00, Loss: 1.7214 (1.7703), Avg Acc: 0.6045
+2022-03-23 21:02:08,502 MASTER_LOG Epoch[018/090], Step[0120/0312], Lr: 6.266387e+00, Loss: 1.7730 (1.7729), Avg Acc: 0.6041
+2022-03-23 21:02:55,601 MASTER_LOG Epoch[018/090], Step[0140/0312], Lr: 6.264074e+00, Loss: 1.8123 (1.7747), Avg Acc: 0.6036
+2022-03-23 21:03:43,512 MASTER_LOG Epoch[018/090], Step[0160/0312], Lr: 6.261741e+00, Loss: 1.7212 (1.7777), Avg Acc: 0.6032
+2022-03-23 21:04:30,655 MASTER_LOG Epoch[018/090], Step[0180/0312], Lr: 6.259389e+00, Loss: 1.7605 (1.7795), Avg Acc: 0.6030
+2022-03-23 21:05:19,697 MASTER_LOG Epoch[018/090], Step[0200/0312], Lr: 6.257018e+00, Loss: 1.8141 (1.7811), Avg Acc: 0.6027
+2022-03-23 21:06:06,914 MASTER_LOG Epoch[018/090], Step[0220/0312], Lr: 6.254627e+00, Loss: 1.7788 (1.7819), Avg Acc: 0.6025
+2022-03-23 21:06:53,574 MASTER_LOG Epoch[018/090], Step[0240/0312], Lr: 6.252217e+00, Loss: 1.8607 (1.7849), Avg Acc: 0.6020
+2022-03-23 21:07:41,553 MASTER_LOG Epoch[018/090], Step[0260/0312], Lr: 6.249788e+00, Loss: 1.8300 (1.7862), Avg Acc: 0.6018
+2022-03-23 21:08:30,166 MASTER_LOG Epoch[018/090], Step[0280/0312], Lr: 6.247339e+00, Loss: 1.8413 (1.7872), Avg Acc: 0.6015
+2022-03-23 21:09:18,029 MASTER_LOG Epoch[018/090], Step[0300/0312], Lr: 6.244871e+00, Loss: 1.8008 (1.7882), Avg Acc: 0.6014
+2022-03-23 21:09:43,913 MASTER_LOG Epoch[018/090], Step[0311/0312], Lr: 6.243878e+00, Loss: 1.7580 (1.7884), Avg Acc: 0.6013
+2022-03-23 21:09:45,436 MASTER_LOG ----- Epoch[018/090], Lr: 6.243878e+00, time: 752.29Train Loss: 1.7884, Train Acc: 0.6013
+2022-03-23 21:09:45,436 MASTER_LOG ----- Validation after Epoch: 18
+2022-03-23 21:09:55,445 MASTER_LOG Step[0000/0013], Avg Loss: 1.1849, Avg Acc@1: 0.7065, Avg Acc@5: 0.8992
+2022-03-23 21:10:30,310 MASTER_LOG ----- Epoch[018/090], Validation Loss: 1.5351, Validation Acc@1: 0.6352, Validation Acc@5: 0.8516, time: 44.87
+2022-03-23 21:10:30,310 MASTER_LOG Train epoch 19. LR=6.243878e+00
+2022-03-23 21:10:37,855 MASTER_LOG Epoch[019/090], Step[0000/0312], Lr: 6.243381e+00, Loss: 1.7262 (1.7262), Avg Acc: 0.6021
+2022-03-23 21:11:26,190 MASTER_LOG Epoch[019/090], Step[0020/0312], Lr: 6.240882e+00, Loss: 1.7865 (1.7703), Avg Acc: 0.6051
+2022-03-23 21:12:14,151 MASTER_LOG Epoch[019/090], Step[0040/0312], Lr: 6.238364e+00, Loss: 1.7861 (1.7658), Avg Acc: 0.6051
+2022-03-23 21:13:01,570 MASTER_LOG Epoch[019/090], Step[0060/0312], Lr: 6.235826e+00, Loss: 1.7493 (1.7688), Avg Acc: 0.6052
+2022-03-23 21:13:48,928 MASTER_LOG Epoch[019/090], Step[0080/0312], Lr: 6.233270e+00, Loss: 1.7581 (1.7731), Avg Acc: 0.6046
+2022-03-23 21:14:35,522 MASTER_LOG Epoch[019/090], Step[0100/0312], Lr: 6.230694e+00, Loss: 1.7146 (1.7733), Avg Acc: 0.6044
+2022-03-23 21:15:23,033 MASTER_LOG Epoch[019/090], Step[0120/0312], Lr: 6.228099e+00, Loss: 1.7870 (1.7720), Avg Acc: 0.6045
+2022-03-23 21:16:09,798 MASTER_LOG Epoch[019/090], Step[0140/0312], Lr: 6.225485e+00, Loss: 1.7459 (1.7732), Avg Acc: 0.6043
+2022-03-23 21:16:57,441 MASTER_LOG Epoch[019/090], Step[0160/0312], Lr: 6.222851e+00, Loss: 1.7619 (1.7737), Avg Acc: 0.6043
+2022-03-23 21:17:44,866 MASTER_LOG Epoch[019/090], Step[0180/0312], Lr: 6.220199e+00, Loss: 1.7876 (1.7759), Avg Acc: 0.6040
+2022-03-23 21:18:32,692 MASTER_LOG Epoch[019/090], Step[0200/0312], Lr: 6.217527e+00, Loss: 1.7669 (1.7762), Avg Acc: 0.6039
+2022-03-23 21:19:21,075 MASTER_LOG Epoch[019/090], Step[0220/0312], Lr: 6.214836e+00, Loss: 1.7890 (1.7766), Avg Acc: 0.6036
+2022-03-23 21:20:07,694 MASTER_LOG Epoch[019/090], Step[0240/0312], Lr: 6.212126e+00, Loss: 1.8286 (1.7769), Avg Acc: 0.6036
+2022-03-23 21:20:55,738 MASTER_LOG Epoch[019/090], Step[0260/0312], Lr: 6.209397e+00, Loss: 1.7573 (1.7770), Avg Acc: 0.6036
+2022-03-23 21:21:44,257 MASTER_LOG Epoch[019/090], Step[0280/0312], Lr: 6.206649e+00, Loss: 1.8009 (1.7770), Avg Acc: 0.6035
+2022-03-23 21:22:31,710 MASTER_LOG Epoch[019/090], Step[0300/0312], Lr: 6.203882e+00, Loss: 1.8052 (1.7789), Avg Acc: 0.6032
+2022-03-23 21:22:56,359 MASTER_LOG Epoch[019/090], Step[0311/0312], Lr: 6.202770e+00, Loss: 1.8000 (1.7794), Avg Acc: 0.6031
+2022-03-23 21:22:57,972 MASTER_LOG ----- Epoch[019/090], Lr: 6.202770e+00, time: 747.55Train Loss: 1.7794, Train Acc: 0.6031
+2022-03-23 21:22:57,972 MASTER_LOG ----- Validation after Epoch: 19
+2022-03-23 21:23:07,949 MASTER_LOG Step[0000/0013], Avg Loss: 1.1821, Avg Acc@1: 0.6997, Avg Acc@5: 0.9026
+2022-03-23 21:23:42,546 MASTER_LOG ----- Epoch[019/090], Validation Loss: 1.5277, Validation Acc@1: 0.6365, Validation Acc@5: 0.8527, time: 44.57
+2022-03-23 21:23:42,546 MASTER_LOG Train epoch 20. LR=6.202770e+00
+2022-03-23 21:23:49,732 MASTER_LOG Epoch[020/090], Step[0000/0312], Lr: 6.202212e+00, Loss: 1.7179 (1.7179), Avg Acc: 0.6152
+2022-03-23 21:24:39,154 MASTER_LOG Epoch[020/090], Step[0020/0312], Lr: 6.199415e+00, Loss: 1.7531 (1.7315), Avg Acc: 0.6126
+2022-03-23 21:25:26,873 MASTER_LOG Epoch[020/090], Step[0040/0312], Lr: 6.196598e+00, Loss: 1.7739 (1.7403), Avg Acc: 0.6102
+2022-03-23 21:26:13,571 MASTER_LOG Epoch[020/090], Step[0060/0312], Lr: 6.193762e+00, Loss: 1.8049 (1.7522), Avg Acc: 0.6085
+2022-03-23 21:27:00,649 MASTER_LOG Epoch[020/090], Step[0080/0312], Lr: 6.190908e+00, Loss: 1.8493 (1.7610), Avg Acc: 0.6072
+2022-03-23 21:27:48,199 MASTER_LOG Epoch[020/090], Step[0100/0312], Lr: 6.188034e+00, Loss: 1.8441 (1.7624), Avg Acc: 0.6069
+2022-03-23 21:28:36,566 MASTER_LOG Epoch[020/090], Step[0120/0312], Lr: 6.185142e+00, Loss: 1.7810 (1.7633), Avg Acc: 0.6067
+2022-03-23 21:29:24,384 MASTER_LOG Epoch[020/090], Step[0140/0312], Lr: 6.182231e+00, Loss: 1.8142 (1.7632), Avg Acc: 0.6065
+2022-03-23 21:30:12,682 MASTER_LOG Epoch[020/090], Step[0160/0312], Lr: 6.179300e+00, Loss: 1.7346 (1.7651), Avg Acc: 0.6061
+2022-03-23 21:31:00,656 MASTER_LOG Epoch[020/090], Step[0180/0312], Lr: 6.176351e+00, Loss: 1.7095 (1.7650), Avg Acc: 0.6061
+2022-03-23 21:31:48,723 MASTER_LOG Epoch[020/090], Step[0200/0312], Lr: 6.173383e+00, Loss: 1.7334 (1.7659), Avg Acc: 0.6057
+2022-03-23 21:32:36,348 MASTER_LOG Epoch[020/090], Step[0220/0312], Lr: 6.170396e+00, Loss: 1.7154 (1.7667), Avg Acc: 0.6054
+2022-03-23 21:33:23,279 MASTER_LOG Epoch[020/090], Step[0240/0312], Lr: 6.167391e+00, Loss: 1.7712 (1.7686), Avg Acc: 0.6051
+2022-03-23 21:34:09,377 MASTER_LOG Epoch[020/090], Step[0260/0312], Lr: 6.164366e+00, Loss: 1.8693 (1.7704), Avg Acc: 0.6051
+2022-03-23 21:34:57,062 MASTER_LOG Epoch[020/090], Step[0280/0312], Lr: 6.161323e+00, Loss: 1.7692 (1.7699), Avg Acc: 0.6052
+2022-03-23 21:35:45,022 MASTER_LOG Epoch[020/090], Step[0300/0312], Lr: 6.158261e+00, Loss: 1.7622 (1.7697), Avg Acc: 0.6052
+2022-03-23 21:36:10,158 MASTER_LOG Epoch[020/090], Step[0311/0312], Lr: 6.157031e+00, Loss: 1.8980 (1.7706), Avg Acc: 0.6052
+2022-03-23 21:36:12,200 MASTER_LOG ----- Epoch[020/090], Lr: 6.157031e+00, time: 749.60Train Loss: 1.7706, Train Acc: 0.6052
+2022-03-23 21:36:12,200 MASTER_LOG ----- Validation after Epoch: 20
+2022-03-23 21:36:21,624 MASTER_LOG Step[0000/0013], Avg Loss: 1.1718, Avg Acc@1: 0.7034, Avg Acc@5: 0.8975
+2022-03-23 21:36:58,563 MASTER_LOG ----- Epoch[020/090], Validation Loss: 1.5293, Validation Acc@1: 0.6387, Validation Acc@5: 0.8520, time: 46.36
+2022-03-23 21:36:59,485 MASTER_LOG ----- Save model: ./output/linearprobe-20220323-17-11/LINEARPROBE-Epoch-20-Loss-1.5293068207359315.pdparams
+2022-03-23 21:36:59,485 MASTER_LOG Train epoch 21. LR=6.157031e+00
+2022-03-23 21:37:06,196 MASTER_LOG Epoch[021/090], Step[0000/0312], Lr: 6.156415e+00, Loss: 1.7359 (1.7359), Avg Acc: 0.6189
+2022-03-23 21:37:55,066 MASTER_LOG Epoch[021/090], Step[0020/0312], Lr: 6.153322e+00, Loss: 1.7529 (1.7473), Avg Acc: 0.6093
+2022-03-23 21:38:42,580 MASTER_LOG Epoch[021/090], Step[0040/0312], Lr: 6.150212e+00, Loss: 1.8073 (1.7423), Avg Acc: 0.6095
+2022-03-23 21:39:31,084 MASTER_LOG Epoch[021/090], Step[0060/0312], Lr: 6.147082e+00, Loss: 1.7614 (1.7431), Avg Acc: 0.6092
+2022-03-23 21:40:18,476 MASTER_LOG Epoch[021/090], Step[0080/0312], Lr: 6.143934e+00, Loss: 1.7751 (1.7444), Avg Acc: 0.6090
+2022-03-23 21:41:08,304 MASTER_LOG Epoch[021/090], Step[0100/0312], Lr: 6.140767e+00, Loss: 1.7964 (1.7510), Avg Acc: 0.6081
+2022-03-23 21:41:56,179 MASTER_LOG Epoch[021/090], Step[0120/0312], Lr: 6.137582e+00, Loss: 1.8092 (1.7529), Avg Acc: 0.6077
+2022-03-23 21:42:44,980 MASTER_LOG Epoch[021/090], Step[0140/0312], Lr: 6.134378e+00, Loss: 1.7442 (1.7531), Avg Acc: 0.6075
+2022-03-23 21:43:32,982 MASTER_LOG Epoch[021/090], Step[0160/0312], Lr: 6.131155e+00, Loss: 1.7009 (1.7544), Avg Acc: 0.6073
+2022-03-23 21:44:21,348 MASTER_LOG Epoch[021/090], Step[0180/0312], Lr: 6.127914e+00, Loss: 1.7572 (1.7551), Avg Acc: 0.6071
+2022-03-23 21:45:08,563 MASTER_LOG Epoch[021/090], Step[0200/0312], Lr: 6.124655e+00, Loss: 1.7200 (1.7577), Avg Acc: 0.6069
+2022-03-23 21:45:55,625 MASTER_LOG Epoch[021/090], Step[0220/0312], Lr: 6.121376e+00, Loss: 1.8005 (1.7588), Avg Acc: 0.6068
+2022-03-23 21:46:43,128 MASTER_LOG Epoch[021/090], Step[0240/0312], Lr: 6.118080e+00, Loss: 1.7958 (1.7601), Avg Acc: 0.6066
+2022-03-23 21:47:30,203 MASTER_LOG Epoch[021/090], Step[0260/0312], Lr: 6.114764e+00, Loss: 1.8398 (1.7616), Avg Acc: 0.6062
+2022-03-23 21:48:17,876 MASTER_LOG Epoch[021/090], Step[0280/0312], Lr: 6.111431e+00, Loss: 1.7314 (1.7621), Avg Acc: 0.6061
+2022-03-23 21:49:05,604 MASTER_LOG Epoch[021/090], Step[0300/0312], Lr: 6.108078e+00, Loss: 1.7352 (1.7623), Avg Acc: 0.6061
+2022-03-23 21:49:34,137 MASTER_LOG Epoch[021/090], Step[0311/0312], Lr: 6.106732e+00, Loss: 1.8182 (1.7625), Avg Acc: 0.6062
+2022-03-23 21:49:36,654 MASTER_LOG ----- Epoch[021/090], Lr: 6.106732e+00, time: 757.05Train Loss: 1.7625, Train Acc: 0.6062
+2022-03-23 21:49:36,654 MASTER_LOG ----- Validation after Epoch: 21
+2022-03-23 21:49:46,059 MASTER_LOG Step[0000/0013], Avg Loss: 1.1663, Avg Acc@1: 0.7026, Avg Acc@5: 0.9021
+2022-03-23 21:50:21,707 MASTER_LOG ----- Epoch[021/090], Validation Loss: 1.5158, Validation Acc@1: 0.6399, Validation Acc@5: 0.8543, time: 45.05
+2022-03-23 21:50:21,708 MASTER_LOG Train epoch 22. LR=6.106732e+00
+2022-03-23 21:50:29,050 MASTER_LOG Epoch[022/090], Step[0000/0312], Lr: 6.106058e+00, Loss: 1.6650 (1.6650), Avg Acc: 0.6182
+2022-03-23 21:51:17,473 MASTER_LOG Epoch[022/090], Step[0020/0312], Lr: 6.102676e+00, Loss: 1.7864 (1.7523), Avg Acc: 0.6099
+2022-03-23 21:52:05,548 MASTER_LOG Epoch[022/090], Step[0040/0312], Lr: 6.099276e+00, Loss: 1.7083 (1.7444), Avg Acc: 0.6105
+2022-03-23 21:52:54,137 MASTER_LOG Epoch[022/090], Step[0060/0312], Lr: 6.095858e+00, Loss: 1.7003 (1.7483), Avg Acc: 0.6097
+2022-03-23 21:53:41,301 MASTER_LOG Epoch[022/090], Step[0080/0312], Lr: 6.092421e+00, Loss: 1.7755 (1.7491), Avg Acc: 0.6095
+2022-03-23 21:54:28,534 MASTER_LOG Epoch[022/090], Step[0100/0312], Lr: 6.088966e+00, Loss: 1.7836 (1.7510), Avg Acc: 0.6091
+2022-03-23 21:55:15,544 MASTER_LOG Epoch[022/090], Step[0120/0312], Lr: 6.085493e+00, Loss: 1.8172 (1.7524), Avg Acc: 0.6088
+2022-03-23 21:56:02,372 MASTER_LOG Epoch[022/090], Step[0140/0312], Lr: 6.082001e+00, Loss: 1.8078 (1.7533), Avg Acc: 0.6086
+2022-03-23 21:56:49,631 MASTER_LOG Epoch[022/090], Step[0160/0312], Lr: 6.078491e+00, Loss: 1.7503 (1.7545), Avg Acc: 0.6082
+2022-03-23 21:57:36,759 MASTER_LOG Epoch[022/090], Step[0180/0312], Lr: 6.074963e+00, Loss: 1.7682 (1.7561), Avg Acc: 0.6080
+2022-03-23 21:58:23,616 MASTER_LOG Epoch[022/090], Step[0200/0312], Lr: 6.071416e+00, Loss: 1.6749 (1.7561), Avg Acc: 0.6079
+2022-03-23 21:59:11,700 MASTER_LOG Epoch[022/090], Step[0220/0312], Lr: 6.067852e+00, Loss: 1.6856 (1.7578), Avg Acc: 0.6077
+2022-03-23 21:59:59,243 MASTER_LOG Epoch[022/090], Step[0240/0312], Lr: 6.064269e+00, Loss: 1.6498 (1.7570), Avg Acc: 0.6078
+2022-03-23 22:00:46,193 MASTER_LOG Epoch[022/090], Step[0260/0312], Lr: 6.060668e+00, Loss: 1.6393 (1.7559), Avg Acc: 0.6080
+2022-03-23 22:01:33,113 MASTER_LOG Epoch[022/090], Step[0280/0312], Lr: 6.057049e+00, Loss: 1.7817 (1.7563), Avg Acc: 0.6080
+2022-03-23 22:02:19,313 MASTER_LOG Epoch[022/090], Step[0300/0312], Lr: 6.053412e+00, Loss: 1.7514 (1.7565), Avg Acc: 0.6079
+2022-03-23 22:02:43,113 MASTER_LOG Epoch[022/090], Step[0311/0312], Lr: 6.051952e+00, Loss: 1.7274 (1.7567), Avg Acc: 0.6079
+2022-03-23 22:02:44,536 MASTER_LOG ----- Epoch[022/090], Lr: 6.051952e+00, time: 742.76Train Loss: 1.7567, Train Acc: 0.6079
+2022-03-23 22:02:44,536 MASTER_LOG ----- Validation after Epoch: 22
+2022-03-23 22:02:53,949 MASTER_LOG Step[0000/0013], Avg Loss: 1.1730, Avg Acc@1: 0.7012, Avg Acc@5: 0.9028
+2022-03-23 22:03:27,869 MASTER_LOG ----- Epoch[022/090], Validation Loss: 1.5087, Validation Acc@1: 0.6450, Validation Acc@5: 0.8553, time: 43.33
+2022-03-23 22:03:27,869 MASTER_LOG Train epoch 23. LR=6.051952e+00
+2022-03-23 22:03:35,672 MASTER_LOG Epoch[023/090], Step[0000/0312], Lr: 6.051221e+00, Loss: 1.6996 (1.6996), Avg Acc: 0.6177
+2022-03-23 22:04:24,975 MASTER_LOG Epoch[023/090], Step[0020/0312], Lr: 6.047555e+00, Loss: 1.7716 (1.7292), Avg Acc: 0.6129
+2022-03-23 22:05:13,528 MASTER_LOG Epoch[023/090], Step[0040/0312], Lr: 6.043871e+00, Loss: 1.7322 (1.7258), Avg Acc: 0.6134
+2022-03-23 22:06:01,375 MASTER_LOG Epoch[023/090], Step[0060/0312], Lr: 6.040168e+00, Loss: 1.7378 (1.7269), Avg Acc: 0.6128
+2022-03-23 22:06:49,569 MASTER_LOG Epoch[023/090], Step[0080/0312], Lr: 6.036448e+00, Loss: 1.7751 (1.7292), Avg Acc: 0.6118
+2022-03-23 22:07:36,929 MASTER_LOG Epoch[023/090], Step[0100/0312], Lr: 6.032710e+00, Loss: 1.7473 (1.7328), Avg Acc: 0.6111
+2022-03-23 22:08:23,681 MASTER_LOG Epoch[023/090], Step[0120/0312], Lr: 6.028954e+00, Loss: 1.7562 (1.7354), Avg Acc: 0.6108
+2022-03-23 22:09:10,485 MASTER_LOG Epoch[023/090], Step[0140/0312], Lr: 6.025180e+00, Loss: 1.7505 (1.7353), Avg Acc: 0.6106
+2022-03-23 22:09:57,172 MASTER_LOG Epoch[023/090], Step[0160/0312], Lr: 6.021388e+00, Loss: 1.7875 (1.7380), Avg Acc: 0.6104
+/usr/local/lib/python3.7/site-packages/paddlenlp/transformers/funnel/modeling.py:30: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated, and in 3.8 it will stop working
+ from collections import Iterable
+Compose(
+
+
+
+
+
+
+)
+----- Imagenet2012 train_list.txt len = 1281167
+----- Imagenet2012 val_list.txt len = 50000
+2022-03-23 22:10:37,296 MASTER_LOG ----- world_size = 8, local_rank = 0
+----- AMP: True
+BASE: ['']
+DATA:
+ BATCH_SIZE: 2
+ BATCH_SIZE_EVAL: 2
+ CROP_PCT: 0.875
+ DATASET: imagenet2012
+ DATA_PATH: /dataset/imagenet
+ IMAGENET_MEAN: [0.485, 0.456, 0.406]
+ IMAGENET_STD: [0.229, 0.224, 0.225]
+ IMAGE_CHANNELS: 3
+ IMAGE_SIZE: 224
+ NUM_WORKERS: 2
+EVAL: False
+MODEL:
+ ATTENTION_DROPOUT: 0.0
+ DECODER:
+ DEPTH: 8
+ EMBED_DIM: 512
+ NUM_HEADS: 16
+ DROPOUT: 0.0
+ DROPPATH: 0.1
+ ENCODER:
+ DEPTH: 12
+ EMBED_DIM: 768
+ NUM_HEADS: 12
+ GLOBAL_POOL: True
+ MASK_RATIO: 0.75
+ MLP_RATIO: 4.0
+ NAME: vit_base_patch16_224
+ NORM_PIX_LOSS: True
+ NUM_CLASSES: 1000
+ PATCH_SIZE: 16
+ PRETRAINED: ./mae_pretrain_vit_base.pdparams
+ QKV_BIAS: True
+ RESUME: None
+ TYPE: FINETUNE
+REPORT_FREQ: 100
+SAVE: ./output/finetune-20220323-22-10
+SAVE_FREQ: 10
+SEED: 0
+TRAIN:
+ ACCUM_ITER: 4
+ AUTO_AUGMENT: False
+ BASE_LR: 0.0005
+ COLOR_JITTER: 0.4
+ CUTMIX_ALPHA: 1.0
+ CUTMIX_MINMAX: None
+ END_LR: 1e-06
+ GRAD_CLIP: None
+ LAST_EPOCH: 0
+ LAYER_DECAY: 0.65
+ LINEAR_SCALED_LR: 256
+ MIXUP_ALPHA: 0.8
+ MIXUP_MODE: batch
+ MIXUP_PROB: 1.0
+ MIXUP_SWITCH_PROB: 0.5
+ NUM_EPOCHS: 100
+ OPTIMIZER:
+ BETAS: (0.9, 0.999)
+ EPS: 1e-08
+ NAME: AdamWDL
+ RANDOM_ERASE_COUNT: 1
+ RANDOM_ERASE_MODE: pixel
+ RANDOM_ERASE_PROB: 0.25
+ RANDOM_ERASE_SPLIT: False
+ RAND_AUGMENT: True
+ RAND_AUGMENT_LAYERS: 2
+ RAND_AUGMENT_MAGNITUDE: 9
+ SMOOTHING: 0.1
+ WARMUP_EPOCHS: 5
+ WARMUP_START_LR: 0.0
+ WEIGHT_DECAY: 0.05
+VALIDATE_FREQ: 1
+2022-03-23 22:10:37,296 MASTER_LOG ----- world_size = 8, local_rank = 0
+----- AMP: True
+BASE: ['']
+DATA:
+ BATCH_SIZE: 2
+ BATCH_SIZE_EVAL: 2
+ CROP_PCT: 0.875
+ DATASET: imagenet2012
+ DATA_PATH: /dataset/imagenet
+ IMAGENET_MEAN: [0.485, 0.456, 0.406]
+ IMAGENET_STD: [0.229, 0.224, 0.225]
+ IMAGE_CHANNELS: 3
+ IMAGE_SIZE: 224
+ NUM_WORKERS: 2
+EVAL: False
+MODEL:
+ ATTENTION_DROPOUT: 0.0
+ DECODER:
+ DEPTH: 8
+ EMBED_DIM: 512
+ NUM_HEADS: 16
+ DROPOUT: 0.0
+ DROPPATH: 0.1
+ ENCODER:
+ DEPTH: 12
+ EMBED_DIM: 768
+ NUM_HEADS: 12
+ GLOBAL_POOL: True
+ MASK_RATIO: 0.75
+ MLP_RATIO: 4.0
+ NAME: vit_base_patch16_224
+ NORM_PIX_LOSS: True
+ NUM_CLASSES: 1000
+ PATCH_SIZE: 16
+ PRETRAINED: ./mae_pretrain_vit_base.pdparams
+ QKV_BIAS: True
+ RESUME: None
+ TYPE: FINETUNE
+REPORT_FREQ: 100
+SAVE: ./output/finetune-20220323-22-10
+SAVE_FREQ: 10
+SEED: 0
+TRAIN:
+ ACCUM_ITER: 4
+ AUTO_AUGMENT: False
+ BASE_LR: 0.0005
+ COLOR_JITTER: 0.4
+ CUTMIX_ALPHA: 1.0
+ CUTMIX_MINMAX: None
+ END_LR: 1e-06
+ GRAD_CLIP: None
+ LAST_EPOCH: 0
+ LAYER_DECAY: 0.65
+ LINEAR_SCALED_LR: 256
diff --git a/self_supervised_learning/MAE/pos_embed.py b/self_supervised_learning/MAE/pos_embed.py
new file mode 100644
index 00000000..c6c95853
--- /dev/null
+++ b/self_supervised_learning/MAE/pos_embed.py
@@ -0,0 +1,56 @@
+import numpy as np
+
+# --------------------------------------------------------
+# 2D sine-cosine position embedding
+# References:
+# Transformer: https://github.com/tensorflow/models/blob/master/official/nlp/transformer/model_utils.py
+# MoCo v3: https://github.com/facebookresearch/moco-v3
+# --------------------------------------------------------
+def get_2d_sincos_pos_embed(embed_dim, grid_size, cls_token=False):
+ """
+ grid_size: int of the grid height and width
+ return:
+ pos_embed: [grid_size*grid_size, embed_dim] or [1+grid_size*grid_size, embed_dim] (w/ or w/o cls_token)
+ """
+ grid_h = np.arange(grid_size, dtype=np.float32)
+ grid_w = np.arange(grid_size, dtype=np.float32)
+ grid = np.meshgrid(grid_w, grid_h) # here w goes first
+ grid = np.stack(grid, axis=0)
+
+ grid = grid.reshape([2, 1, grid_size, grid_size])
+ pos_embed = get_2d_sincos_pos_embed_from_grid(embed_dim, grid)
+ if cls_token:
+ pos_embed = np.concatenate([np.zeros([1, embed_dim]), pos_embed], axis=0)
+ return pos_embed
+
+
+def get_2d_sincos_pos_embed_from_grid(embed_dim, grid):
+ assert embed_dim % 2 == 0
+
+ # use half of dimensions to encode grid_h
+ emb_h = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[0]) # (H*W, D/2)
+ emb_w = get_1d_sincos_pos_embed_from_grid(embed_dim // 2, grid[1]) # (H*W, D/2)
+
+ emb = np.concatenate([emb_h, emb_w], axis=1) # (H*W, D)
+ return emb
+
+
+def get_1d_sincos_pos_embed_from_grid(embed_dim, pos):
+ """
+ embed_dim: output dimension for each position
+ pos: a list of positions to be encoded: size (M,)
+ out: (M, D)
+ """
+ assert embed_dim % 2 == 0
+ omega = np.arange(embed_dim // 2, dtype=np.float)
+ omega /= embed_dim / 2.
+ omega = 1. / 10000**omega # (D/2,)
+
+ pos = pos.reshape(-1) # (M,)
+ out = np.einsum('m,d->md', pos, omega) # (M, D/2), outer product
+
+ emb_sin = np.sin(out) # (M, D/2)
+ emb_cos = np.cos(out) # (M, D/2)
+
+ emb = np.concatenate([emb_sin, emb_cos], axis=1) # (M, D)
+ return emb
diff --git a/self_supervised_learning/MAE/random_erasing.py b/self_supervised_learning/MAE/random_erasing.py
new file mode 100644
index 00000000..31eea465
--- /dev/null
+++ b/self_supervised_learning/MAE/random_erasing.py
@@ -0,0 +1,118 @@
+# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""Random Erasing for image tensor"""
+
+import random
+import math
+import paddle
+
+
+def _get_pixels(per_pixel, rand_color, patch_size, dtype="float32"):
+ if per_pixel:
+ return paddle.normal(shape=patch_size).astype(dtype)
+ if rand_color:
+ return paddle.normal(shape=(patch_size[0], 1, 1)).astype(dtype)
+ return paddle.zeros((patch_size[0], 1, 1)).astype(dtype)
+
+
+class RandomErasing(object):
+ """
+ Args:
+ prob: probability of performing random erasing
+ min_area: Minimum percentage of erased area wrt input image area
+ max_area: Maximum percentage of erased area wrt input image area
+ min_aspect: Minimum aspect ratio of earsed area
+ max_aspect: Maximum aspect ratio of earsed area
+ mode: pixel color mode, in ['const', 'rand', 'pixel']
+ 'const' - erase block is constant valued 0 for all channels
+ 'rand' - erase block is valued random color (same per-channel)
+ 'pixel' - erase block is vauled random color per pixel
+ min_count: Minimum # of ereasing blocks per image.
+ max_count: Maximum # of ereasing blocks per image. Area per box is scaled by count
+ per-image count is randomly chosen between min_count to max_count
+ """
+ def __init__(self, prob=0.5, min_area=0.02, max_area=1/3, min_aspect=0.3, max_aspect=None,
+ mode='const', min_count=1, max_count=None, num_splits=0):
+ self.prob = prob
+ self.min_area = min_area
+ self.max_area = max_area
+ max_aspect = max_aspect or 1 / min_aspect
+ self.log_aspect_ratio = (math.log(min_aspect), math.log(max_aspect))
+ self.min_count = min_count
+ self.max_count = max_count or min_count
+ self.num_splits = num_splits
+ mode = mode.lower()
+ self.rand_color = False
+ self.per_pixel = False
+ if mode == "rand":
+ self.rand_color = True
+ elif mode == "pixel":
+ self.per_pixel = True
+ else:
+ assert not mode or mode == "const"
+
+ def _erase(self, img, chan, img_h, img_w, dtype):
+ if random.random() > self.prob:
+ return
+ area = img_h * img_w
+ count = self.min_count if self.min_count == self.max_count else \
+ random.randint(self.min_count, self.max_count)
+ for _ in range(count):
+ for attempt in range(10):
+ target_area = random.uniform(self.min_area, self.max_area) * area / count
+ aspect_ratio = math.exp(random.uniform(*self.log_aspect_ratio))
+ h = int(round(math.sqrt(target_area * aspect_ratio)))
+ w = int(round(math.sqrt(target_area / aspect_ratio)))
+ if w < img_w and h < img_h:
+ top = random.randint(0, img_h - h)
+ left = random.randint(0, img_w - w)
+ img[:, top:top+h, left:left+w] = _get_pixels(
+ self.per_pixel, self.rand_color, (chan, h, w),
+ dtype=dtype)
+ break
+
+ def __call__(self, input):
+ if len(input.shape) == 3:
+ self._erase(input, *input.shape, input.dtype)
+ else:
+ batch_size, chan, img_h, img_w = input.shape
+ batch_start = batch_size // self.num_splits if self.num_splits > 1 else 0
+ for i in range(batch_start, batch_size):
+ self._erase(input[i], chan, img_h, img_w, input.dtype)
+ return input
+
+
+
+#def main():
+# re = RandomErasing(prob=1.0, min_area=0.2, max_area=0.6, mode='rand')
+# #re = RandomErasing(prob=1.0, min_area=0.2, max_area=0.6, mode='const')
+# #re = RandomErasing(prob=1.0, min_area=0.2, max_area=0.6, mode='pixel')
+# import PIL.Image as Image
+# import numpy as np
+# paddle.set_device('cpu')
+# img = paddle.to_tensor(np.asarray(Image.open('./lenna.png'))).astype('float32')
+# img = img / 255.0
+# img = paddle.transpose(img, [2, 0, 1])
+# new_img = re(img)
+# new_img = new_img * 255.0
+# new_img = paddle.transpose(new_img, [1, 2, 0])
+# new_img = new_img.cpu().numpy()
+# new_img = Image.fromarray(new_img.astype('uint8'))
+# new_img.save('./res.png')
+#
+#
+#
+#if __name__ == "__main__":
+# main()
diff --git a/self_supervised_learning/MAE/run_finetune_vit_b.sh b/self_supervised_learning/MAE/run_finetune_vit_b.sh
new file mode 100644
index 00000000..ecc1fc10
--- /dev/null
+++ b/self_supervised_learning/MAE/run_finetune_vit_b.sh
@@ -0,0 +1,7 @@
+python3 -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" main_multi_gpu_finetune.py \
+-cfg='./configs/vit_base_patch16_224_finetune.yaml' \
+-dataset='imagenet2012' \
+-batch_size=32 \
+-data_path='/dataset/imagenet' \
+-pretrained='./mae_pretrain_vit_base.pdparams' \
+-amp \
diff --git a/self_supervised_learning/MAE/run_finetune_vit_b_1node.sh b/self_supervised_learning/MAE/run_finetune_vit_b_1node.sh
new file mode 100644
index 00000000..d3fd78ad
--- /dev/null
+++ b/self_supervised_learning/MAE/run_finetune_vit_b_1node.sh
@@ -0,0 +1,8 @@
+python3 -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" main_multi_gpu_finetune.py \
+-cfg='./configs/vit_base_patch16_224_finetune.yaml' \
+-dataset='imagenet2012' \
+-batch_size=2 \
+-data_path='/dataset/imagenet' \
+-pretrained='./mae_pretrain_vit_base.pdparams' \
+-accum_iter=4 \
+-amp \
diff --git a/self_supervised_learning/MAE/run_linearprobe_vit_b.sh b/self_supervised_learning/MAE/run_linearprobe_vit_b.sh
new file mode 100644
index 00000000..6b07f906
--- /dev/null
+++ b/self_supervised_learning/MAE/run_linearprobe_vit_b.sh
@@ -0,0 +1,9 @@
+#CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+#python main_multi_gpu_linearprobe.py \
+GLOG_v=0 python3 -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" main_multi_gpu_linearprobe.py \
+-cfg='./configs/vit_base_patch16_224_linearprobe_single_node.yaml' \
+-dataset='imagenet2012' \
+-batch_size=512 \
+-data_path='/dataset/imagenet' \
+-pretrained='./mae_pretrain_vit_base.pdparams' \
+-amp \
diff --git a/self_supervised_learning/MAE/run_linearprobe_vit_b_1node.sh b/self_supervised_learning/MAE/run_linearprobe_vit_b_1node.sh
new file mode 100644
index 00000000..6b07f906
--- /dev/null
+++ b/self_supervised_learning/MAE/run_linearprobe_vit_b_1node.sh
@@ -0,0 +1,9 @@
+#CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+#python main_multi_gpu_linearprobe.py \
+GLOG_v=0 python3 -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" main_multi_gpu_linearprobe.py \
+-cfg='./configs/vit_base_patch16_224_linearprobe_single_node.yaml' \
+-dataset='imagenet2012' \
+-batch_size=512 \
+-data_path='/dataset/imagenet' \
+-pretrained='./mae_pretrain_vit_base.pdparams' \
+-amp \
diff --git a/self_supervised_learning/MAE/run_pretrain_vit_b.sh b/self_supervised_learning/MAE/run_pretrain_vit_b.sh
new file mode 100644
index 00000000..a053f1fd
--- /dev/null
+++ b/self_supervised_learning/MAE/run_pretrain_vit_b.sh
@@ -0,0 +1,8 @@
+#CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 \
+#python main_multi_gpu_pretrain.py \
+GLOG_v=0 python3 -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7" main_multi_gpu_pretrain.py \
+-cfg='./configs/vit_base_patch16_224_pretrain.yaml' \
+-dataset='imagenet2012' \
+-batch_size=64 \
+-data_path='/dataset/imagenet' \
+-amp \
diff --git a/image_classification/MAE/transformer.py b/self_supervised_learning/MAE/transformer.py
similarity index 55%
rename from image_classification/MAE/transformer.py
rename to self_supervised_learning/MAE/transformer.py
index 62704ed8..1286bc77 100644
--- a/image_classification/MAE/transformer.py
+++ b/self_supervised_learning/MAE/transformer.py
@@ -24,18 +24,7 @@
import paddle.nn.functional as F
from droppath import DropPath
from config import get_config
-
-
-def get_position_encoding(seq_len, embed_dim):
- """ sinusoid position encoding table"""
- def get_position_angle_vec(embed_dim, position):
- return [position / np.power(10000, 2 * (hid_j // 2) / embed_dim) for hid_j in range(embed_dim)]
-
- sinusoid_table = np.array([get_position_angle_vec(embed_dim, pos_i) for pos_i in range(seq_len)])
- sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i
- sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1
- position_embedding = paddle.to_tensor([sinusoid_table])
- return position_embedding
+from pos_embed import get_2d_sincos_pos_embed
class Identity(nn.Layer):
@@ -44,41 +33,24 @@ class Identity(nn.Layer):
Use this layer to avoid using 'if' condition in forward methods
"""
def __init__(self):
- super().__init__()
+ super(Identity, self).__init__()
def forward(self, x):
return x
-class PositionalEmbedding(nn.Layer):
- """Position Embedding
-
- Apply positional embedding on input images.
-
- Attributes:
- position_embedding: sine-cosine version positional embedding
+def get_position_encoding(seq_len, embed_dim):
+ """ sinusoid position encoding table
+ Note: not used in MAE, use get_2d_sincos_pos_embed instead
"""
- def __init__(self, embed_dim, seq_len=197):
- """ Sinusoid position encoding table """
- super().__init__()
- self.seq_len = seq_len
-
- def get_position_angle_vec(embed_dim, position):
- return [position / np.power(10000, 2 * (hid_j // 2) / embed_dim) for hid_j in range(embed_dim)]
-
- sinusoid_table = np.array([get_position_angle_vec(
- embed_dim, pos_i) for pos_i in range(seq_len)])
- sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i
- sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1
- position_embedding = paddle.to_tensor([sinusoid_table])
-
- self.register_buffer('position_embedding',
- position_embedding)
+ def get_position_angle_vec(embed_dim, position):
+ return [position / np.power(10000, 2 * (hid_j // 2) / embed_dim) for hid_j in range(embed_dim)]
- def get_positional_embedding(self, seq_length=None):
- if seq_length is None:
- seq_length = self.seq_len
- return self.position_embedding[:, :seq_length, :]
+ sinusoid_table = np.array([get_position_angle_vec(embed_dim, pos_i) for pos_i in range(seq_len)])
+ sinusoid_table[:, 0::2] = np.sin(sinusoid_table[:, 0::2]) # dim 2i
+ sinusoid_table[:, 1::2] = np.cos(sinusoid_table[:, 1::2]) # dim 2i+1
+ position_embedding = paddle.to_tensor([sinusoid_table])
+ return position_embedding
class PatchEmbedding(nn.Layer):
@@ -98,29 +70,29 @@ def __init__(self,
embed_dim=768,
dropout=0.):
super().__init__()
- n_patches = (image_size // patch_size) * (image_size // patch_size)
-
+ self.n_patches = (image_size // patch_size) * (image_size // patch_size)
+ w_attr_1, b_attr_1 = self._init_weights()
self.patch_embedding = nn.Conv2D(in_channels=in_channels,
out_channels=embed_dim,
kernel_size=patch_size,
- stride=patch_size)
-
- self.cls_token = paddle.create_parameter(
- shape=[1, 1, embed_dim],
- dtype='float32',
- default_initializer=paddle.nn.initializer.Constant(0))
-
+ stride=patch_size,
+ weight_attr=w_attr_1,
+ bias_attr=b_attr_1)
self.dropout = nn.Dropout(dropout)
+ def _init_weights(self):
+ weight_attr = paddle.ParamAttr(
+ initializer=nn.initializer.XavierUniform()) # MAE
+ bias_attr = paddle.ParamAttr(
+ initializer=nn.initializer.Constant(0.0))
+ return weight_attr, bias_attr
+
def forward(self, x):
- cls_tokens = self.cls_token.expand(
- (x.shape[0], -1, -1))
x = self.patch_embedding(x)
x = x.flatten(2)
x = x.transpose([0, 2, 1])
- x = paddle.concat((cls_tokens, x), axis=1)
- embeddings = self.dropout(x)
- return embeddings
+ x = self.dropout(x)
+ return x
class Attention(nn.Layer):
@@ -140,6 +112,7 @@ class Attention(nn.Layer):
proj_dropout: final dropout before output
softmax: softmax op for attention
"""
+
def __init__(self,
embed_dim,
num_heads,
@@ -171,7 +144,7 @@ def __init__(self,
def _init_weights(self):
weight_attr = paddle.ParamAttr(
- initializer=nn.initializer.TruncatedNormal(std=.02))
+ initializer=nn.initializer.XavierUniform()) # MAE
bias_attr = paddle.ParamAttr(
initializer=nn.initializer.Constant(0.0))
return weight_attr, bias_attr
@@ -186,8 +159,8 @@ def forward(self, x):
qkv = self.qkv(x).chunk(3, axis=-1)
q, k, v = map(self.transpose_multihead, qkv)
+ q = q * self.scales
attn = paddle.matmul(q, k, transpose_y=True)
- attn = attn * self.scales
attn = self.softmax(attn)
attn = self.attn_dropout(attn)
@@ -211,9 +184,9 @@ class Mlp(nn.Layer):
fc1: nn.Linear
fc2: nn.Linear
act: GELU
- dropout1: dropout after fc1
- dropout2: dropout after fc2
+ dropout: dropout after fc
"""
+
def __init__(self,
embed_dim,
mlp_ratio,
@@ -231,12 +204,11 @@ def __init__(self,
weight_attr=w_attr_2,
bias_attr=b_attr_2)
self.act = nn.GELU()
- self.dropout1 = nn.Dropout(dropout)
- self.dropout2 = nn.Dropout(dropout)
+ self.dropout = nn.Dropout(dropout)
def _init_weights(self):
weight_attr = paddle.ParamAttr(
- initializer=paddle.nn.initializer.TruncatedNormal(std=.02))
+ initializer=paddle.nn.initializer.XavierUniform()) # MAE
bias_attr = paddle.ParamAttr(
initializer=paddle.nn.initializer.Constant(0.0))
return weight_attr, bias_attr
@@ -244,9 +216,9 @@ def _init_weights(self):
def forward(self, x):
x = self.fc1(x)
x = self.act(x)
- x = self.dropout1(x)
+ x = self.dropout(x)
x = self.fc2(x)
- x = self.dropout2(x)
+ x = self.dropout(x)
return x
@@ -262,6 +234,7 @@ class TransformerLayer(nn.Layer):
mlp: mlp modual
attn: attention modual
"""
+
def __init__(self,
embed_dim,
num_heads,
@@ -271,26 +244,22 @@ def __init__(self,
attention_dropout=0.,
droppath=0.):
super().__init__()
-
w_attr_1, b_attr_1 = self._init_weights()
self.attn_norm = nn.LayerNorm(embed_dim,
weight_attr=w_attr_1,
bias_attr=b_attr_1,
epsilon=1e-6)
-
self.attn = Attention(embed_dim,
num_heads,
qkv_bias,
dropout,
attention_dropout)
self.drop_path = DropPath(droppath) if droppath > 0. else Identity()
-
w_attr_2, b_attr_2 = self._init_weights()
self.mlp_norm = nn.LayerNorm(embed_dim,
weight_attr=w_attr_2,
bias_attr=b_attr_2,
epsilon=1e-6)
-
self.mlp = Mlp(embed_dim, mlp_ratio, dropout)
def _init_weights(self):
@@ -321,8 +290,9 @@ class Encoder(nn.Layer):
Attributes:
layers: nn.LayerList contains multiple TransformerLayers
- encoder_norm: nn.LayerNorm which is applied after last encoder layer
+ norm: nn.LayerNorm which is applied after last encoder layer
"""
+
def __init__(self,
embed_dim,
num_heads,
@@ -331,28 +301,30 @@ def __init__(self,
mlp_ratio=4.0,
dropout=0.,
attention_dropout=0.,
- droppath=0.):
- super().__init__()
+ droppath=0.,
+ has_norm=True):
+ super(Encoder, self).__init__()
# stochatic depth decay
depth_decay = [x.item() for x in paddle.linspace(0, droppath, depth)]
layer_list = []
for i in range(depth):
layer_list.append(TransformerLayer(embed_dim,
- num_heads,
- qkv_bias,
- mlp_ratio,
- dropout,
- attention_dropout,
- droppath=depth_decay[i]))
- # new paddle version fix this, deepcopy is no longer needed
- # layer_list.append(copy.deepcopy(encoder_layer))
+ num_heads,
+ qkv_bias,
+ mlp_ratio,
+ dropout,
+ attention_dropout,
+ droppath=depth_decay[i]))
self.layers = nn.LayerList(layer_list)
- w_attr, b_attr = self._init_weights()
- self.encoder_norm = nn.LayerNorm(embed_dim,
- weight_attr=w_attr,
- bias_attr=b_attr,
- epsilon=1e-6)
+ # move this norm out to upper level for global_pool (no cls_token settings)
+ self.has_norm = has_norm
+ if has_norm:
+ w_attr, b_attr = self._init_weights()
+ self.norm = nn.LayerNorm(embed_dim,
+ weight_attr=w_attr,
+ bias_attr=b_attr,
+ epsilon=1e-6)
def _init_weights(self):
weight_attr = paddle.ParamAttr(initializer=paddle.nn.initializer.Constant(1.0))
@@ -362,8 +334,10 @@ def _init_weights(self):
def forward(self, x):
for layer in self.layers:
x = layer(x)
- out = self.encoder_norm(x)
- return out
+
+ if self.has_norm:
+ x = self.norm(x)
+ return x
class Decoder(nn.Layer):
@@ -373,7 +347,7 @@ class Decoder(nn.Layer):
Attributes:
layers: nn.LayerList contains multiple TransformerLayers
- decoder_norm: nn.LayerNorm which is applied after last encoder layer
+ norm: nn.LayerNorm which is applied after last encoder layer
"""
def __init__(self,
@@ -385,7 +359,7 @@ def __init__(self,
dropout=0.,
attention_dropout=0.,
droppath=0.):
- super().__init__()
+ super(Decoder, self).__init__()
# stochatic depth decay
depth_decay = [x.item() for x in paddle.linspace(0, droppath, depth)]
@@ -398,29 +372,23 @@ def __init__(self,
dropout,
attention_dropout,
droppath=depth_decay[i]))
- # new paddle version fix this, deepcopy is no longer needed
- # layer_list.append(copy.deepcopy(encoder_layer))
self.layers = nn.LayerList(layer_list)
w_attr, b_attr = self._init_weights()
- self.decoder_norm = nn.LayerNorm(embed_dim,
- weight_attr=w_attr,
- bias_attr=b_attr,
- epsilon=1e-6)
+ self.norm = nn.LayerNorm(embed_dim,
+ weight_attr=w_attr,
+ bias_attr=b_attr,
+ epsilon=1e-6)
def _init_weights(self):
weight_attr = paddle.ParamAttr(initializer=paddle.nn.initializer.Constant(1.0))
bias_attr = paddle.ParamAttr(initializer=paddle.nn.initializer.Constant(0.0))
return weight_attr, bias_attr
- def forward(self, x, mask_len=0):
+ def forward(self, x):
for layer in self.layers:
x = layer(x)
- if mask_len > 0:
- # only sustain masked patches
- out = self.decoder_norm(x[:, -mask_len:])
- else:
- out = self.decoder_norm(x)
+ out = self.norm(x)
return out
@@ -456,102 +424,215 @@ def __init__(self,
encoder_depth=12,
decoder_depth=8,
encoder_num_heads=12,
- decoder_num_heads=8,
+ decoder_num_heads=16,
mlp_ratio=4,
qkv_bias=True,
dropout=0.,
attention_dropout=0.,
- droppath=0.):
+ droppath=0.,
+ norm_pix_loss=False):
super().__init__()
- self.patch_size = patch_size
self.num_patches = (image_size // patch_size) * (image_size // patch_size)
- self.mask_token = paddle.create_parameter(
- shape=[1, 1, decoder_embed_dim],
+ self.patch_size = patch_size
+ # -------------------- Encoder --------------------
+ self.patch_embedding = PatchEmbedding(
+ image_size,
+ patch_size,
+ in_channels,
+ encoder_embed_dim,
+ dropout)
+
+ self.cls_token = paddle.create_parameter(
+ shape=[1, 1, encoder_embed_dim],
dtype='float32',
- default_initializer=paddle.nn.initializer.TruncatedNormal(std=.02))
- self.perm = None
- self.mask_num = None
- # create positional embedding
- self.encoder_position_embedding = get_position_encoding(seq_len=1 + self.num_patches,
- embed_dim=encoder_embed_dim)
- self.decoder_position_embedding = get_position_encoding(seq_len=1 + self.num_patches,
- embed_dim=decoder_embed_dim)
- # create patch embedding with positional embedding
- self.patch_embedding = PatchEmbedding(image_size,
- patch_size,
- in_channels,
- encoder_embed_dim,
- dropout)
- # create multi head self-attention encoder
- self.encoder = Encoder(encoder_embed_dim,
- encoder_num_heads,
- encoder_depth,
- qkv_bias,
- mlp_ratio,
- dropout,
- attention_dropout,
- droppath)
+ default_initializer=paddle.nn.initializer.TruncatedNormal(std=.02)) #MAE
+
+ pos_embed = get_2d_sincos_pos_embed(embed_dim=encoder_embed_dim,
+ grid_size= int(self.num_patches ** 0.5),
+ cls_token=True)
+ self.encoder_position_embedding = paddle.create_parameter(
+ shape=[1, 1 + self.num_patches, encoder_embed_dim],
+ dtype='float32',
+ default_initializer=paddle.nn.initializer.Assign(
+ paddle.to_tensor(pos_embed, dtype='float32').unsqueeze(0)
+ )
+ )
+ self.encoder_position_embedding.stop_gradient = True
+
+ self.encoder = Encoder(
+ encoder_embed_dim,
+ encoder_num_heads,
+ encoder_depth,
+ qkv_bias,
+ mlp_ratio,
+ dropout,
+ attention_dropout,
+ droppath)
+
+ # -------------------- Decoder --------------------
# the embed_dim is different in encoder and decoder, so add a linear layer
w_attr_1, b_attr_1 = self._init_weights()
- self.linear_projection = nn.Linear(encoder_embed_dim,
- decoder_embed_dim,
- weight_attr=w_attr_1,
- bias_attr=b_attr_1)
- # create multi head self-attention decoder
- self.decoder = Decoder(decoder_embed_dim,
- decoder_num_heads,
- decoder_depth,
- qkv_bias,
- mlp_ratio,
- dropout,
- attention_dropout,
- droppath)
+ self.linear_projection = nn.Linear(
+ encoder_embed_dim,
+ decoder_embed_dim,
+ weight_attr=w_attr_1,
+ bias_attr=b_attr_1)
+
+ self.mask_token = paddle.create_parameter(
+ shape=[1, 1, decoder_embed_dim],
+ dtype='float32',
+ default_initializer=paddle.nn.initializer.TruncatedNormal(std=.02)) #MAE
+
+ pos_embed = get_2d_sincos_pos_embed(embed_dim=decoder_embed_dim,
+ grid_size= int(self.num_patches ** 0.5),
+ cls_token=True)
+ self.decoder_position_embedding = paddle.create_parameter(
+ shape=[1, 1 + self.num_patches, decoder_embed_dim],
+ dtype='float32',
+ default_initializer=paddle.nn.initializer.Assign(
+ paddle.to_tensor(pos_embed, dtype='float32').unsqueeze(0)
+ )
+ )
+ self.decoder_position_embedding.stop_gradient = True
+
+ self.decoder = Decoder(
+ decoder_embed_dim,
+ decoder_num_heads,
+ decoder_depth,
+ qkv_bias,
+ mlp_ratio,
+ dropout,
+ attention_dropout,
+ droppath)
+
# create reconstruction layer
w_attr_2, b_attr_2 = self._init_weights()
- self.reconstruction_layer = nn.Linear(decoder_embed_dim,
- in_channels * patch_size * patch_size,
- weight_attr=w_attr_2,
- bias_attr=b_attr_2)
+ self.decoder_pred = nn.Linear(
+ decoder_embed_dim,
+ in_channels * patch_size * patch_size,
+ weight_attr=w_attr_2,
+ bias_attr=b_attr_2)
+
+ self.norm_pix_loss = norm_pix_loss
def _init_weights(self):
weight_attr = paddle.ParamAttr(
- initializer=paddle.nn.initializer.TruncatedNormal(std=.02))
+ initializer=nn.initializer.XavierUniform()) # MAE
bias_attr = paddle.ParamAttr(
- initializer=paddle.nn.initializer.Constant(0.0))
+ initializer=nn.initializer.Constant(0.0))
return weight_attr, bias_attr
- def forward(self, x, masks):
- # x: [B, C, H, W]
- x = self.patch_embedding(x)
- # x: [B, num_patches, embed_dim]
- B, N, C = x.shape # B: batch_size, N: num_patches, C: embed_dim
- # mask: [B, num_patches], visible set to 0, masked set to 1
-
- # add pos embed
- x += self.encoder_position_embedding.clone().detach()
- # get no mask patches
- no_mask_x = x[~masks] # [B*0.25*L, embed_dim]
- # index slicing needs reshape back in paddle: [B, 0.25L, embed_dim]
- no_mask_x = no_mask_x.reshape([B, -1, C])
- # encoder
- enc_out = self.encoder(no_mask_x)
- # encoder to decoder linear proj
- enc_out = self.linear_projection(enc_out)
- # shuffle the position embedding is equivalent to unshuffling tokens
- expand_pos_embed = self.decoder_position_embedding.expand([B, -1, -1]).clone().detach()
- pos_embed_no_mask = expand_pos_embed[~masks].reshape([B, -1, enc_out.shape[-1]])
- pos_embed_mask = expand_pos_embed[masks].reshape([B, -1, enc_out.shape[-1]])
- # dec in put, here use broadcasting for mask_token
- dec_in = paddle.concat([enc_out + pos_embed_no_mask, self.mask_token + pos_embed_mask], axis=1)
- # decoder
- mask_len = pos_embed_mask.shape[1]
- dec_out = self.decoder(dec_in, mask_len)
- # reconstruct patches
- output = self.reconstruction_layer(dec_out)
- return output
-
-
-class MAEFinetuneTransformer(nn.Layer):
+ def patchify(self, images):
+ n_patches = images.shape[2] // self.patch_size
+ x = images.reshape([images.shape[0], # N
+ images.shape[1], # C
+ n_patches, # h
+ self.patch_size, # p
+ n_patches, # w
+ self.patch_size]) # p
+ x = x.transpose([0, 2, 4, 3, 5, 1])
+ x = x.reshape([images.shape[0], n_patches * n_patches, -1])
+ return x
+
+ def unpatchify(self, x):
+ n_patches = int(x.shape[1]**.5)
+
+ x = x.reshape([x.shape[0], # N
+ n_patches, # h
+ n_patches, # w
+ self.patch_size, # p
+ self.patch_size, # p
+ -1]) # C
+ x = x.transpose([0, 5, 1, 3, 2, 4])
+ x = x.reshape([images.shape[0], -1, n_patches * self.patch_size, n_patches * self.patch_size])
+ return x
+
+ def random_masking(self, x, mask_ratio, rand_probs=None):
+ """
+ Shuffle x then mask the last few tokens according to mask ratio.
+ Args:
+ x: tensor of [batch, seq_len, encoder_embed_dim]
+ mask_ratio: float, masking ratio
+ Returns:
+ masked_x: tensor of [batch, seq_len - mask_num, encoder_embed_dim]
+ """
+ batch_size, seq_len, embed_dim = x.shape
+ keep_len = int(seq_len * (1 - mask_ratio))
+ # for debug only
+ rand_probs = rand_probs if rand_probs is not None else paddle.rand([batch_size, seq_len])
+ #rand_probs = paddle.rand([batch_size, seq_len])
+ shuffle_ids = paddle.argsort(rand_probs, axis=-1)
+ restore_ids = paddle.argsort(shuffle_ids, axis=-1)
+
+ keep_ids = shuffle_ids[:, :keep_len]
+ ids = keep_ids + (paddle.arange(batch_size) * seq_len).unsqueeze(-1).expand([batch_size, -1])
+ x_masked = paddle.gather(x.flatten(0, 1), index=ids.flatten(), axis=0).reshape([batch_size, keep_len, -1])
+
+ mask = paddle.ones([batch_size, seq_len])
+ mask[:, :keep_len] = 0
+
+ restore_ids_expand = restore_ids + (paddle.arange(batch_size) * seq_len).unsqueeze(-1).expand([batch_size, -1])
+ mask = paddle.gather(mask.flatten(), index=restore_ids_expand.flatten()).reshape([batch_size, seq_len])
+ return x_masked, mask, restore_ids
+
+ def forward_encoder(self, images, mask_ratio, rand_probs=None):
+ x = self.patch_embedding(images)
+ # add pos embed w/o cls token
+ x = x + self.encoder_position_embedding[:, 1:, :]
+ # masking
+ x, mask, ids_restore = self.random_masking(x, mask_ratio, rand_probs)
+ # append cls token
+ cls_token = self.cls_token + self.encoder_position_embedding[:, :1, :]
+ cls_tokens = cls_token.expand((x.shape[0], -1, -1))
+ x = paddle.concat((cls_tokens, x), axis=1)
+ x = self.encoder(x)
+ return x, mask, ids_restore
+
+ def forward_decoder(self, x, ids_restore):
+ x = self.linear_projection(x) # [batch, keep_len+1(cls_token), decoder_embed_dim]
+ # self.mask_token: [1, 1, decoder_embed_dim]
+ # ids_store: [batch, num_patches]
+ # mask_tokens: [batch, masked_len, decoder_embed_dim]
+ mask_tokens = self.mask_token.expand([x.shape[0], ids_restore.shape[1] + 1 - x.shape[1], -1])
+ # x_: [batch, num_patches, decoder_embed_dim]
+ x_ = paddle.concat([x[:, 1:, :], mask_tokens], axis=1) # no cls token
+ x_shape = x_.shape
+ batch_size = x_shape[0]
+ seq_len = x_shape[1]
+
+ ## The following ops assures the paddle gather_nd op has the same behaviour as pytorch gather op.
+ ids_restore_expand = ids_restore + (paddle.arange(batch_size) * seq_len).unsqueeze(-1).expand([batch_size, -1])
+ x_ = paddle.gather_nd(x_.flatten(0, 1), index=ids_restore_expand.flatten().unsqueeze(-1))
+ x_ = x_.reshape(x_shape)
+
+ x = paddle.concat([x[:, :1, :], x_], axis=1) # append cls token
+
+ x = x + self.decoder_position_embedding
+ x = self.decoder(x)
+ x = self.decoder_pred(x)
+ x = x[:, 1:, :]
+
+ return x
+
+ def forward_loss(self, images, pred, mask):
+ target = self.patchify(images)
+ if self.norm_pix_loss:
+ mean = target.mean(axis=-1, keepdim=True)
+ var = target.var(axis=-1, keepdim=True)
+ target = (target - mean) / (var + 1.e-6) ** 0.5
+ loss = (pred - target) ** 2
+ loss = loss.mean(axis=-1) # mean loss per patch
+ loss = (loss * mask).sum() / mask.sum() # mean loss on removed patches
+ return loss
+
+ def forward(self, images, mask_ratio=0.75, rand_probs=None):
+ encoder_out, mask, restore_ids = self.forward_encoder(images, mask_ratio, rand_probs)
+ decoder_out = self.forward_decoder(encoder_out, restore_ids)
+ loss = self.forward_loss(images, decoder_out, mask)
+ return loss, decoder_out, mask
+
+
+class MAETransformer(nn.Layer):
"""ViT transformer
ViT Transformer, classifier is a single Linear layer for finetune,
@@ -583,20 +664,32 @@ def __init__(self,
num_heads=12,
mlp_ratio=4,
qkv_bias=True,
+ global_pool=False,
dropout=0.,
attention_dropout=0.,
droppath=0.):
super().__init__()
- self.num_patches = (image_size // patch_size) * (image_size // patch_size)
- # create positional embedding
- self.encoder_position_embedding = get_position_encoding(seq_len=1 + self.num_patches,
- embed_dim=embed_dim)
+ self.global_pool = global_pool
# create patch embedding with positional embedding
self.patch_embedding = PatchEmbedding(image_size,
patch_size,
in_channels,
embed_dim,
dropout)
+ # create positional embedding
+ self.encoder_position_embedding = paddle.create_parameter(
+ shape=[1, 1 + self.patch_embedding.n_patches, embed_dim],
+ dtype='float32',
+ default_initializer=paddle.nn.initializer.Assign(
+ get_position_encoding(seq_len=1 + self.patch_embedding.n_patches,
+ embed_dim=embed_dim)
+ )
+ )
+ # create class token
+ self.cls_token = paddle.create_parameter(
+ shape=[1, 1, embed_dim],
+ dtype='float32',
+ default_initializer=paddle.nn.initializer.Constant(0))
# create multi head self-attention encoder
self.encoder = Encoder(embed_dim,
num_heads,
@@ -605,57 +698,95 @@ def __init__(self,
mlp_ratio,
dropout,
attention_dropout,
- droppath)
+ droppath,
+ has_norm=not global_pool)
+ # define encoder norm here to aviod cls_token (when global_pool is True)
+ if global_pool:
+ w_attr, b_attr = self._init_weights_norm()
+ self.encoder_norm = nn.LayerNorm(embed_dim,
+ weight_attr=w_attr,
+ bias_attr=b_attr,
+ epsilon=1e-6)
# classifier head (for finetuning)
- w_attr_1, b_attr_1 = self._init_weights()
+ w_attr_1, b_attr_1 = self._init_weights_classifier()
self.classifier = nn.Linear(embed_dim,
num_classes,
weight_attr=w_attr_1,
bias_attr=b_attr_1)
- def forward(self, x):
+
+ def forward_features(self, x):
x = self.patch_embedding(x)
- # add pos embed
- x += self.encoder_position_embedding.clone().detach()
+ cls_tokens = self.cls_token.expand((x.shape[0], -1, -1))
+ x = paddle.concat((cls_tokens, x), axis=1)
+ x = x + self.encoder_position_embedding
x = self.encoder(x)
- logits = self.classifier(x[:, 0]) # take only cls_token as classifier
+
+ if self.global_pool:
+ x = x[:, 1:, :].mean(axis=1) # global pool w/o cls_token
+ out = self.encoder_norm(x)
+ else:
+ # norm is applied in encoder
+ out = x[:, 0] # return cls_token only
+
+ return out
+
+ def forward(self, x):
+ x = self.forward_features(x)
+ logits = self.classifier(x)
+
return logits
- def _init_weights(self):
- weight_attr = paddle.ParamAttr(initializer=paddle.nn.initializer.TruncatedNormal(std=.02))
+ def _init_weights_norm(self):
+ weight_attr = paddle.ParamAttr(initializer=paddle.nn.initializer.Constant(1.0))
+ bias_attr = paddle.ParamAttr(initializer=paddle.nn.initializer.Constant(0.0))
+ return weight_attr, bias_attr
+
+ def _init_weights_linear(self):
+ weight_attr = paddle.ParamAttr(initializer=nn.initializer.TruncatedNormal(std=0.02)) # MAE linearprobe
+ bias_attr = paddle.ParamAttr(initializer=paddle.nn.initializer.Constant(0))
+ return weight_attr, bias_attr
+
+ def _init_weights_classifier(self):
+ weight_attr = paddle.ParamAttr(initializer=paddle.nn.initializer.TruncatedNormal(std=0.01))
bias_attr = paddle.ParamAttr(initializer=paddle.nn.initializer.Constant(0))
return weight_attr, bias_attr
def build_mae_pretrain(config):
+ """ build MAE vit model for pretraining"""
model = MAEPretrainTransformer(image_size=config.DATA.IMAGE_SIZE,
- patch_size=config.MODEL.TRANS.PATCH_SIZE,
- in_channels=3,
- encoder_embed_dim=config.MODEL.TRANS.ENCODER.EMBED_DIM,
- decoder_embed_dim=config.MODEL.TRANS.DECODER.EMBED_DIM,
- encoder_depth=config.MODEL.TRANS.ENCODER.DEPTH,
- decoder_depth=config.MODEL.TRANS.DECODER.DEPTH,
- encoder_num_heads=config.MODEL.TRANS.ENCODER.NUM_HEADS,
- decoder_num_heads=config.MODEL.TRANS.DECODER.NUM_HEADS,
- mlp_ratio=config.MODEL.TRANS.MLP_RATIO,
- qkv_bias=config.MODEL.TRANS.QKV_BIAS,
+ patch_size=config.MODEL.PATCH_SIZE,
+ in_channels=config.DATA.IMAGE_CHANNELS,
+ encoder_embed_dim=config.MODEL.ENCODER.EMBED_DIM,
+ decoder_embed_dim=config.MODEL.DECODER.EMBED_DIM,
+ encoder_depth=config.MODEL.ENCODER.DEPTH,
+ decoder_depth=config.MODEL.DECODER.DEPTH,
+ encoder_num_heads=config.MODEL.ENCODER.NUM_HEADS,
+ decoder_num_heads=config.MODEL.DECODER.NUM_HEADS,
+ mlp_ratio=config.MODEL.MLP_RATIO,
+ qkv_bias=config.MODEL.QKV_BIAS,
dropout=config.MODEL.DROPOUT,
attention_dropout=config.MODEL.ATTENTION_DROPOUT,
- droppath=config.MODEL.DROPPATH)
+ droppath=config.MODEL.DROPPATH,
+ norm_pix_loss=config.MODEL.NORM_PIX_LOSS)
return model
-def build_mae_finetune(config):
- model = MAEFinetuneTransformer(image_size=config.DATA.IMAGE_SIZE,
- patch_size=config.MODEL.TRANS.PATCH_SIZE,
- in_channels=3,
- embed_dim=config.MODEL.TRANS.ENCODER.EMBED_DIM,
- depth=config.MODEL.TRANS.ENCODER.DEPTH,
- num_heads=config.MODEL.TRANS.ENCODER.NUM_HEADS,
- mlp_ratio=config.MODEL.TRANS.MLP_RATIO,
- qkv_bias=config.MODEL.TRANS.QKV_BIAS,
- dropout=config.MODEL.DROPOUT,
- attention_dropout=config.MODEL.ATTENTION_DROPOUT,
- droppath=config.MODEL.DROPPATH)
+def build_transformer(config):
+ """ build vit model for finetuning and linear probing"""
+ model = MAETransformer(image_size=config.DATA.IMAGE_SIZE,
+ patch_size=config.MODEL.PATCH_SIZE,
+ in_channels=config.DATA.IMAGE_CHANNELS,
+ num_classes=config.MODEL.NUM_CLASSES,
+ embed_dim=config.MODEL.ENCODER.EMBED_DIM,
+ depth=config.MODEL.ENCODER.DEPTH,
+ num_heads=config.MODEL.ENCODER.NUM_HEADS,
+ mlp_ratio=config.MODEL.MLP_RATIO,
+ qkv_bias=config.MODEL.QKV_BIAS,
+ global_pool=config.MODEL.GLOBAL_POOL,
+ dropout=config.MODEL.DROPOUT,
+ attention_dropout=config.MODEL.ATTENTION_DROPOUT,
+ droppath=config.MODEL.DROPPATH)
return model
diff --git a/self_supervised_learning/MAE/utils.py b/self_supervised_learning/MAE/utils.py
new file mode 100644
index 00000000..3636b954
--- /dev/null
+++ b/self_supervised_learning/MAE/utils.py
@@ -0,0 +1,288 @@
+# Copyright (c) 2021 PPViT Authors. All Rights Reserved.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""utils for ViT
+
+Contains AverageMeter for monitoring, get_exclude_from_decay_fn for training
+and WarmupCosineScheduler for training
+
+"""
+
+import logging
+import sys
+import os
+import paddle
+from paddle.optimizer.lr import LRScheduler
+
+
+def get_logger(file_path):
+ """Set logging file and format, logs are written in 2 loggers, one local_logger records
+ the information on its own gpu/process, one master_logger records the overall/average
+ information over all gpus/processes.
+ Args:
+ file_path: str, folder path of the logger files to write
+ Return:
+ local_logger: python logger for each process
+ master_logger: python logger for overall processes (on node 0)
+ """
+ local_rank = paddle.distributed.get_rank()
+ filename = os.path.join(file_path, 'log_all.txt')
+ log_format = "%(asctime)s %(message)s"
+ logging.basicConfig(filename=filename, level=logging.INFO,
+ format=log_format, datefmt="%m%d %I:%M:%S %p")
+
+ # local_logger for each process/GPU
+ local_logger = logging.getLogger(f'local_{local_rank}')
+ filename = os.path.join(file_path, f'log_{local_rank}.txt')
+ fh = logging.FileHandler(filename)
+ fh.setFormatter(logging.Formatter(log_format))
+ local_logger.addHandler(fh)
+
+ # master_logger records avg performance and general message
+ if local_rank == 0:
+ master_logger = logging.getLogger('master')
+ # log.txt
+ filename = os.path.join(file_path, 'log.txt')
+ fh = logging.FileHandler(filename)
+ fh.setFormatter(logging.Formatter(log_format))
+ master_logger.addHandler(fh)
+ # consol (stdout)
+ sh_1 = logging.StreamHandler(sys.stdout)
+ sh_1.setFormatter(logging.Formatter(log_format))
+ master_logger.addHandler(sh_1)
+ # consol (stderr)
+ sh_2 = logging.StreamHandler(sys.stderr)
+ sh_2.setFormatter(logging.Formatter(log_format))
+ master_logger.addHandler(sh_2)
+ else:
+ master_logger = None
+ return local_logger, master_logger
+
+
+def write_log(local_logger, master_logger, msg_local, msg_master=None, level='info'):
+ """Write messages in loggers
+ Args:
+ local_logger: python logger, logs information on single gpu
+ master_logger: python logger, logs information over all gpus
+ msg_local: str, message to log on local_logger
+ msg_master: str, message to log on master_logger, if None, use msg_local, default: None
+ level: str, log level, in ['info', 'warning', 'fatal'], default: 'info'
+ """
+ # write log to local logger
+ if local_logger:
+ if level == 'info':
+ local_logger.info(msg_local)
+ elif level == 'warning':
+ local_logger.warning(msg_local)
+ elif level == 'fatal':
+ local_logger.fatal(msg_local)
+ else:
+ raise ValueError("level must in ['info', 'warning', 'fatal']")
+ # write log to master logger on node 0
+ if master_logger and paddle.distributed.get_rank() == 0:
+ if msg_master is None:
+ msg_master = msg_local
+ if level == 'info':
+ master_logger.info("MASTER_LOG " + msg_master)
+ elif level == 'warning':
+ master_logger.warning("MASTER_LOG " + msg_master)
+ elif level == 'fatal':
+ master_logger.fatal("MASTER_LOG " + msg_master)
+ else:
+ raise ValueError("level must in ['info', 'warning', 'fatal']")
+
+
+def all_reduce_mean(x):
+ """perform all_reduce on Tensor for gathering results from multi-gpus"""
+ world_size = paddle.distributed.get_world_size()
+ if world_size > 1:
+ x_reduce = paddle.to_tensor(x)
+ paddle.distributed.all_reduce(x_reduce)
+ x_reduce = x_reduce / world_size
+ return x_reduce.item()
+ return x
+
+
+def get_params_groups(model, weight_decay=0.01):
+ regularized = []
+ not_regularized = []
+ for name, param in model.named_parameters():
+ if param.stop_gradient:
+ continue
+ # do not regularize biases and norm params
+ if name.endswith(".bias") or len(param.shape) == 1:
+ not_regularized.append(param)
+ else:
+ regularized.append(param)
+ return [{'params': regularized, 'weight_decay': weight_decay}, {'params': not_regularized, 'weight_decay': 0.}]
+
+
+def cosine_scheduler(base_value,
+ final_value,
+ epochs,
+ num_iters_per_epoch,
+ warmup_epochs=0,
+ start_warmup_value=0):
+ warmup_schedule = np.array([])
+ warmup_iters = warmup_epochs * num_iters_per_epoch
+ if warmup_epochs > 0:
+ # linear schedule for warmup epochs
+ warmup_schedule = np.linspace(start_warmup_value, base_value, warmup_iters)
+
+ iters = np.arange(epochs * num_iters_per_epoch - warmup_iters)
+ schedule = final_value + 0.5 * (base_value - final_value) * (1 + np.cos(np.pi * iters / len(iters)))
+ schedule = np.concatenate((warmup_schedule, schedule))
+ assert len(schedule) == epochs * num_iters_per_epoch
+ return schedule
+
+
+def adjust_learning_rate(optimizer,
+ base_lr,
+ min_lr,
+ cur_epoch,
+ warmup_epochs,
+ total_epochs):
+ if cur_epoch < warmup_epochs:
+ lr = base_lr * cur_epoch / warmup_epochs
+ else:
+ lr = min_lr + (base_lr - min_lr) * 0.5 * (
+ 1. + math.cos(math.pi * (cur_epoch - warmup_epochs) / (total_epochs - warmup_epochs)))
+ optimizer.set_lr(lr)
+ return lr
+
+
+def interpolate_pos_embed(model, state_dict, key_name='encoder_position_embedding'):
+ if key_name in state_dict:
+ pos_embed_w = state_dict[key_name]
+ embed_dim = pos_embed_w.shape[-1]
+ n_patches = model.patch_embedding.n_patches
+ n_extra_tokens = getattr(model, key_name).shape[-2] - n_patches
+ orig_size = int((pos_embed_w.shape[-2] - n_extra_tokens) ** 0.5)
+ new_size = int(n_patches ** 0.5)
+ if orig_size != new_size:
+ extra_tokens = pos_embed_w[:, :n_extra_tokens]
+ pos_tokens = pos_embed_w[:, n_extra_tokens:]
+ pos_tokens = pos_tokens.reshape([-1, orig_size, orig_size, embed_dim])
+ pos_tokens = pos_tokens.transpose([0, 3, 1, 2])
+ pos_tokens = paddle.nn.functional.interpolate(
+ pos_token, size=(new_size, new_size), mode='bicubic', align_corners=False)
+ pos_tokens = pos_tokens.transpose([0, 2, 3, 1])
+ pos_tokens = pos_tokens.flatten(1, 2)
+ new_pos_embed = paddle.concat([extra_tokens, pos_tokens], axis=1)
+ state_dict[key_name] = new_pos_embed
+
+
+class AverageMeter():
+ """ Meter for monitoring losses"""
+ def __init__(self):
+ self.avg = 0
+ self.sum = 0
+ self.cnt = 0
+ self.reset()
+
+ def reset(self):
+ """reset all values to zeros"""
+ self.avg = 0
+ self.sum = 0
+ self.cnt = 0
+
+ def update(self, val, n=1):
+ """update avg by val and n, where val is the avg of n values"""
+ self.sum += val * n
+ self.cnt += n
+ self.avg = self.sum / self.cnt
+
+
+def skip_weight_decay_fn(model, skip_list=[], filter_bias_and_bn=True):
+ """ Set params with no weight decay during the training
+
+ For certain params, e.g., positional encoding in ViT, weight decay
+ may not needed during the learning, this method is used to find
+ these params.
+
+ Args:
+ model: nn.Layer, model
+ skip_list: list, a list of params names which need to exclude
+ from weight decay, default: []
+ filter_bias_and_bn: bool, set True to exclude bias and bn in model, default: True
+ Returns:
+ exclude_from_weight_decay_fn: a function returns True if param
+ will be excluded from weight decay
+ """
+ if len(skip_list) == 0 and not filter_bias_and_bn:
+ exclude_from_weight_decay_fn = None
+ else:
+ skip_list_all = []
+ for name, param in model.named_parameters():
+ if param.stop_gradient:
+ continue
+ if len(param.shape) == 1 or name.endswith('.bias') or name in skip_list:
+ skip_list_all.append(name)
+
+ def exclude_fn(param):
+ for name in skip_list_all:
+ if param == name:
+ return False
+ return True
+ exclude_from_weight_decay_fn = exclude_fn
+ return exclude_from_weight_decay_fn
+
+
+class WarmupCosineScheduler(LRScheduler):
+ """Warmup Cosine Scheduler
+
+ First apply linear warmup, then apply cosine decay schedule.
+ Linearly increase learning rate from "warmup_start_lr" to "start_lr" over "warmup_epochs"
+ Cosinely decrease learning rate from "start_lr" to "end_lr" over remaining
+ "total_epochs - warmup_epochs"
+
+ Attributes:
+ learning_rate: the starting learning rate (without warmup), not used here!
+ warmup_start_lr: warmup starting learning rate
+ start_lr: the starting learning rate (without warmup)
+ end_lr: the ending learning rate after whole loop
+ warmup_epochs: # of epochs for warmup
+ total_epochs: # of total epochs (include warmup)
+ """
+ def __init__(self,
+ learning_rate,
+ warmup_start_lr,
+ start_lr,
+ end_lr,
+ warmup_epochs,
+ total_epochs,
+ cycles=0.5,
+ last_epoch=-1,
+ verbose=False):
+ """init WarmupCosineScheduler """
+ self.warmup_epochs = warmup_epochs
+ self.total_epochs = total_epochs
+ self.warmup_start_lr = warmup_start_lr
+ self.start_lr = start_lr
+ self.end_lr = end_lr
+ self.cycles = cycles
+ super(WarmupCosineScheduler, self).__init__(learning_rate, last_epoch, verbose)
+
+ def get_lr(self):
+ """ return lr value """
+ if self.last_epoch < self.warmup_epochs:
+ val = (self.start_lr - self.warmup_start_lr) * float(
+ self.last_epoch)/float(self.warmup_epochs) + self.warmup_start_lr
+ return val
+
+ progress = float(self.last_epoch - self.warmup_epochs) / float(
+ max(1, self.total_epochs - self.warmup_epochs))
+ val = max(0.0, 0.5 * (1. + math.cos(math.pi * float(self.cycles) * 2.0 * progress)))
+ val = max(0.0, val * (self.start_lr - self.end_lr) + self.end_lr)
+ return val