No inf checks were recorded for this optimizer. #7

johnny0234 · 2024-05-07T05:23:15Z

(botsort) PS C:\Users\user\AICUP_Baseline_BoT-SORT>

History restored

UP\bagtricks_R50-ibn.yml MODEL.DEVICE "cuda:0"
Command Line Args: Namespace(config_file='C:\Users\user\AICUP_Baseline_BoT-SORT\fast_reid\configs\AICUP\bagtricks_R50-ibn.yml', resume=False, eval_only=False, num_gpus=1, num_machines=1, machine_rank=0, dist_url='tcp://127.0.0.1:49153', opts=['MODEL.DEVICE', 'cuda:0'])
[05/07 13:18:32 fastreid]: Rank of current process: 0. World size: 1
[05/07 13:18:34 fastreid]: Environment info:

sys.platform win32
Python 3.9.19 (main, Mar 21 2024, 17:21:27) [MSC v.1916 64 bit (AMD64)]
numpy 1.26.4
fastreid failed to import
FASTREID_ENV_MODULE
PyTorch 2.3.0+cu118 @C:\Users\user\anaconda3\envs\botsort\lib\site-packages\torch
PyTorch debug build False
GPU available True
GPU 0 NVIDIA GeForce GTX 1650
CUDA_HOME C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.4
Pillow 10.3.0
torchvision 0.18.0+cpu @C:\Users\user\anaconda3\envs\botsort\lib\site-packages\torchvision
torchvision arch flags C:\Users\user\anaconda3\envs\botsort\lib\site-packages\torchvision_C.pyd; cannot find cuobjdump
cv2 4.9.0

PyTorch built with:

C++ Version: 201703
MSVC 192930151
Intel(R) oneAPI Math Kernel Library Version 2021.4-Product Build 20210904 for Intel(R) 64 architecture applications
Intel(R) MKL-DNN v3.3.6 (Git Hash 86e6af5974177e513fd3fee58425e1063e7f1361)
OpenMP 2019
LAPACK is enabled (usually provided by MKL)
CPU capability usage: AVX2
CUDA Runtime 11.8
NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90;-gencode;arch=compute_37,code=compute_37
CuDNN 8.7
Magma 2.5.4
Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.8, CUDNN_VERSION=8.7.0, CXX_COMPILER=C:/actions-runner/_work/pytorch/pytorch/builder/windows/tmp_bin/sccache-cl.exe, CXX_FLAGS=/DWIN32 /D_WINDOWS /GR /EHsc /Zc:__cplusplus /bigobj /FS /utf-8 -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI
-DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE /wd4624 /wd4068 /wd4067 /wd4267 /wd4661 /wd4717 /wd4244 /wd4804 /wd4273, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_VERSION=2.3.0, USE_CUDA=ON, USE_CUDNN=ON, USE_CUSPARSELT=OFF, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=OFF, USE_OPENMP=ON, USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF,

[05/07 13:18:34 fastreid]: Command line arguments: Namespace(config_file='C:\Users\user\AICUP_Baseline_BoT-SORT\fast_reid\configs\AICUP\bagtricks_R50-ibn.yml', resume=False, eval_only=False, num_gpus=1, num_machines=1, machine_rank=0, dist_url='tcp://127.0.0.1:49153', opts=['MODEL.DEVICE', 'cuda:0'])
[05/07 13:18:34 fastreid]: Contents of args.config_file=C:\Users\user\AICUP_Baseline_BoT-SORT\fast_reid\configs\AICUP\bagtricks_R50-ibn.yml:
BASE: ../Base-bagtricks.yml

INPUT:
SIZE_TRAIN: [256, 256]
SIZE_TEST: [256, 256]

MODEL:
BACKBONE:
WITH_IBN: True
HEADS:
POOL_LAYER: GeneralizedMeanPooling

LOSSES:
TRI:
HARD_MINING: False
MARGIN: 0.0

DATASETS:
NAMES: ("AICUP",)
TESTS: ("AICUP",)

SOLVER:
BIAS_LR_FACTOR: 1.

IMS_PER_BATCH: 32
MAX_EPOCH: 60
STEPS: [30, 50]
WARMUP_ITERS: 2000

CHECKPOINT_PERIOD: 1

TEST:
EVAL_PERIOD: 60 # We didn't provide eval dataset
IMS_PER_BATCH: 256

OUTPUT_DIR: logs/AICUP_115/bagtricks_R50-ibn

[05/07 13:18:34 fastreid]: Running with full config:
CUDNN_BENCHMARK: True
DATALOADER:
NUM_INSTANCE: 4
NUM_WORKERS: 8
SAMPLER_TRAIN: NaiveIdentitySampler
SET_WEIGHT: []
DATASETS:
COMBINEALL: False
NAMES: ('AICUP',)
TESTS: ('AICUP',)
INPUT:
AFFINE:
ENABLED: False
AUGMIX:
ENABLED: False
PROB: 0.0
AUTOAUG:
ENABLED: False
PROB: 0.0
CJ:
BRIGHTNESS: 0.15
CONTRAST: 0.15
ENABLED: False
HUE: 0.1
PROB: 0.5
SATURATION: 0.1
CROP:
ENABLED: False
RATIO: [0.75, 1.3333333333333333]
SCALE: [0.16, 1]
SIZE: [224, 224]
FLIP:
ENABLED: True
PROB: 0.5
PADDING:
ENABLED: True
MODE: constant
SIZE: 10
REA:
ENABLED: True
PROB: 0.5
VALUE: [123.675, 116.28, 103.53]
RPT:
ENABLED: False
PROB: 0.5
SIZE_TEST: [256, 256]
SIZE_TRAIN: [256, 256]
KD:
EMA:
ENABLED: False
MOMENTUM: 0.999
MODEL_CONFIG: []
MODEL_WEIGHTS: []
MODEL:
BACKBONE:
ATT_DROP_RATE: 0.0
DEPTH: 50x
DROP_PATH_RATIO: 0.1
DROP_RATIO: 0.0
FEAT_DIM: 2048
LAST_STRIDE: 1
NAME: build_resnet_backbone
NORM: BN
PRETRAIN: True
PRETRAIN_PATH:
SIE_COE: 3.0
STRIDE_SIZE: (16, 16)
WITH_IBN: True
WITH_NL: False
WITH_SE: False
DEVICE: cuda:0
FREEZE_LAYERS: []
HEADS:
CLS_LAYER: Linear
EMBEDDING_DIM: 0
MARGIN: 0.0
NAME: EmbeddingHead
NECK_FEAT: before
NORM: BN
NUM_CLASSES: 0
POOL_LAYER: GeneralizedMeanPooling
SCALE: 1
WITH_BNNECK: True
LOSSES:
CE:
ALPHA: 0.2
EPSILON: 0.1
SCALE: 1.0
CIRCLE:
GAMMA: 128
MARGIN: 0.25
SCALE: 1.0
COSFACE:
GAMMA: 128
MARGIN: 0.25
SCALE: 1.0
FL:
ALPHA: 0.25
GAMMA: 2
SCALE: 1.0
NAME: ('CrossEntropyLoss', 'TripletLoss')
TRI:
HARD_MINING: False
MARGIN: 0.0
NORM_FEAT: False
SCALE: 1.0
META_ARCHITECTURE: Baseline
PIXEL_MEAN: [123.675, 116.28, 103.53]
PIXEL_STD: [58.395, 57.120000000000005, 57.375]
QUEUE_SIZE: 8192
WEIGHTS:
OUTPUT_DIR: logs/AICUP_115/bagtricks_R50-ibn
SOLVER:
AMP:
ENABLED: True
BASE_LR: 0.00035
BIAS_LR_FACTOR: 1.0
CHECKPOINT_PERIOD: 1
CLIP_GRADIENTS:
CLIP_TYPE: norm
CLIP_VALUE: 5.0
ENABLED: False
NORM_TYPE: 2.0
DELAY_EPOCHS: 0
ETA_MIN_LR: 1e-07
FREEZE_ITERS: 0
GAMMA: 0.1
HEADS_LR_FACTOR: 1.0
IMS_PER_BATCH: 32
MAX_EPOCH: 60
MOMENTUM: 0.9
NESTEROV: False
OPT: Adam
SCHED: MultiStepLR
STEPS: [30, 50]
WARMUP_FACTOR: 0.1
WARMUP_ITERS: 2000
WARMUP_METHOD: linear
WEIGHT_DECAY: 0.0005
WEIGHT_DECAY_BIAS: 0.0005
WEIGHT_DECAY_NORM: 0.0005
TEST:
AQE:
ALPHA: 3.0
ENABLED: False
QE_K: 5
QE_TIME: 1
EVAL_PERIOD: 60
FLIP:
ENABLED: False
IMS_PER_BATCH: 256
METRIC: cosine
PRECISE_BN:
DATASET: Market1501
ENABLED: False
NUM_ITER: 300
RERANK:
ENABLED: False
K1: 20
K2: 6
LAMBDA: 0.3
ROC:
ENABLED: False
[05/07 13:18:34 fastreid]: Full config saved to C:\Users\user\AICUP_Baseline_BoT-SORT\logs\AICUP_115\bagtricks_R50-ibn\config.yaml
C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\data\transforms\functional.py:46: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\data\transforms\functional.py:46: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\data\transforms\functional.py:46: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\data\transforms\functional.py:46: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\data\transforms\functional.py:46: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\data\transforms\functional.py:46: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\data\transforms\functional.py:46: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\data\transforms\functional.py:46: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly. To access UntypedStorage directly, use tensor.untyped_storage() instead of tensor.storage()
img = torch.ByteTensor(torch.ByteStorage.from_buffer(pic.tobytes()))
start epoch 0
Exception during training:
Traceback (most recent call last):
File "C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\engine\train_loop.py", line 146, in train
self.run_step()
File "C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\engine\defaults.py", line 359, in run_step
self._trainer.run_step()
File "C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\engine\train_loop.py", line 354, in run_step
self.grad_scaler.step(self.optimizer)
File "C:\Users\user\anaconda3\envs\botsort\lib\site-packages\torch\amp\grad_scaler.py", line 449, in step
assert (
AssertionError: No inf checks were recorded for this optimizer.
Traceback (most recent call last):
File "C:\Users\user\AICUP_Baseline_BoT-SORT\fast_reid\tools\train_net.py", line 54, in
launch(
File "C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\engine\launch.py", line 71, in launch
main_func(*args)
File "C:\Users\user\AICUP_Baseline_BoT-SORT\fast_reid\tools\train_net.py", line 47, in main
return trainer.train()
File "C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\engine\defaults.py", line 350, in train
super().train(self.start_epoch, self.max_epoch, self.iters_per_epoch)
File "C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\engine\train_loop.py", line 146, in train
self.run_step()
File "C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\engine\defaults.py", line 359, in run_step
self._trainer.run_step()
File "C:\Users\user\AICUP_Baseline_BoT-SORT.\fast_reid\fastreid\engine\train_loop.py", line 354, in run_step
self.grad_scaler.step(self.optimizer)
File "C:\Users\user\anaconda3\envs\botsort\lib\site-packages\torch\amp\grad_scaler.py", line 449, in step
assert (
AssertionError: No inf checks were recorded for this optimizer.

No inf checks were recorded for this optimizer. 這該如何解決

MuennL · 2024-05-09T01:16:58Z

你是照baseline tutorial跑嗎還是有改過configs的甚麼東西? 看起來是你在使用某個pretrained model時，他的optimizer和fast reid裡面的train loop不相容。Fast reid的default trainer有預設蠻多訓練條件沒講清楚，只寫在comment裡面，可以往這個方向研究。供你參考。

ricky-696 · 2024-05-09T07:19:09Z

@johnny0234 哈囉~

你出現的錯誤是torch套件中optimizer問題，我有觀察到你的torch版本是使用最新的torch2.3.x版本，我覺的應該是版本與fast_reid差異太大，導致裡面有些func沒有實作，才會出現AssertionError。

原作者的torch版本為torch 1.11.0+cu113 torchvision 0.12.0 (在這邊有提到)

我自己是安裝torch 1.13.1+cu117 torchvision 0.14.1 給你參考

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No inf checks were recorded for this optimizer. #7

No inf checks were recorded for this optimizer. #7

johnny0234 commented May 7, 2024

MuennL commented May 9, 2024

ricky-696 commented May 9, 2024 •

edited

Loading

No inf checks were recorded for this optimizer. #7

No inf checks were recorded for this optimizer. #7

Comments

johnny0234 commented May 7, 2024

MuennL commented May 9, 2024

ricky-696 commented May 9, 2024 • edited Loading

ricky-696 commented May 9, 2024 •

edited

Loading