-
Notifications
You must be signed in to change notification settings - Fork 356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] Add the document for the transition between IterBasedTraining and EpochBasedTraining #926
Conversation
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #926 +/- ##
=======================================
Coverage ? 76.66%
=======================================
Files ? 138
Lines ? 10827
Branches ? 2162
=======================================
Hits ? 8301
Misses ? 2168
Partials ? 358
Flags with carried forward coverage won't be shown. Click here to find out more. Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!!!
Hi @HAOCHENYE , have you tested those steps in MMDet (EpochBased to IterBased) and MMSeg (IterBased to EpochBased)? |
In MMDet,if I want to train atss by iteration, the iterbased configuration will be: _base_ = './atss_r18_fpn_8xb8-amp-lsj-200e_coco.py'
train_cfg = dict(
_delete_=True,
by_epoch=False,
max_iters=10000,
val_interval=2000
)
default_hooks = dict(
logger=dict(type='LoggerHook', log_metric_by_epoch=False),
checkpoint=dict(type='CheckpointHook', by_epoch=False, interval=2000),
)
param_scheduler = [dict(
type='MultiStepLR',
milestones=[6000, 8000],
by_epoch=False,
)]
log_processor = dict(
by_epoch=False
) Besides the preserved field 02/21 00:06:02 - mmengine - INFO - Checkpoints will be saved to /home/yehaochen/codebase/mmdetection/work_dirs/atss_r18_fpn_iter_based.
02/21 00:06:12 - mmengine - INFO - Iter(train) [ 50/10000] lr: 4.0000e-02 eta: 0:34:04 time: 0.2055 data_time: 0.0109 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2836 loss_centerness: 4.9563
02/21 00:06:20 - mmengine - INFO - Iter(train) [ 100/10000] lr: 4.0000e-02 eta: 0:30:12 time: 0.1606 data_time: 0.0109 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2733 loss_centerness: 5.1688
02/21 00:06:28 - mmengine - INFO - Iter(train) [ 150/10000] lr: 4.0000e-02 eta: 0:28:47 time: 0.1602 data_time: 0.0112 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2755 loss_centerness: 4.9817
02/21 00:06:36 - mmengine - INFO - Iter(train) [ 200/10000] lr: 4.0000e-02 eta: 0:28:02 time: 0.1605 data_time: 0.0111 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2887 loss_centerness: 4.8633
02/21 00:06:44 - mmengine - INFO - Iter(train) [ 250/10000] lr: 4.0000e-02 eta: 0:27:29 time: 0.1591 data_time: 0.0109 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2874 loss_centerness: 5.0570
02/21 00:06:52 - mmengine - INFO - Iter(train) [ 300/10000] lr: 4.0000e-02 eta: 0:27:05 time: 0.1594 data_time: 0.0108 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2888 loss_centerness: 5.0962
02/21 00:07:00 - mmengine - INFO - Iter(train) [ 350/10000] lr: 4.0000e-02 eta: 0:26:46 time: 0.1601 data_time: 0.0111 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2692 loss_centerness: 4.9344
02/21 00:07:08 - mmengine - INFO - Iter(train) [ 400/10000] lr: 4.0000e-02 eta: 0:26:29 time: 0.1592 data_time: 0.0109 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2807 loss_centerness: 5.0980
02/21 00:07:16 - mmengine - INFO - Iter(train) [ 450/10000] lr: 4.0000e-02 eta: 0:26:13 time: 0.1583 data_time: 0.0110 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2753 loss_centerness: 5.0901
02/21 00:07:24 - mmengine - INFO - Iter(train) [ 500/10000] lr: 4.0000e-02 eta: 0:26:00 time: 0.1597 data_time: 0.0117 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2667 loss_centerness: 5.0081
02/21 00:07:32 - mmengine - INFO - Iter(train) [ 550/10000] lr: 4.0000e-02 eta: 0:25:48 time: 0.1603 data_time: 0.0118 memory: 4732 loss: nan loss_cls: nan loss_bbox: 1.2556 loss_centerness: 4.9613 In MMSeg, the epoch based config will be _base_ = './danet_r101-d8_4xb4-160k_ade20k-512x512.py'
param_scheduler = [dict(
type='MultiStepLR',
milestones=[6, 8]
)]
default_hooks = dict(
logger=dict(type='LoggerHook'),
checkpoint=dict(type='CheckpointHook', interval=2, by_epoch=True),
)
train_cfg = dict(
_delete_=True,
by_epoch=True,
max_epochs=10,
val_interval=2
)
log_processor = dict(
by_epoch=True
)
train_dataloader = dict(
sampler=None
) Besides the preserved field 02/21 00:56:12 - mmengine - INFO - Epoch(train) [1][ 50/5053] lr: 1.0000e-02 eta: 8:10:48 time: 0.3117 data_time: 0.0054 memory: 43885 loss: 8.9950 decode.pam_cam.loss_ce: 2.8637 decode.pam_cam.acc_seg: 17.8066 decode.pam.loss_ce: 2.5388 decode.pam.acc_seg: 26.3407 decode.cam.loss_ce: 2.5416 decode.cam.acc_seg: 26.0042 aux.loss_ce: 1.0509 aux.acc_seg: 25.4099
02/21 00:56:28 - mmengine - INFO - Epoch(train) [1][ 100/5053] lr: 1.0000e-02 eta: 6:17:16 time: 0.3150 data_time: 0.0054 memory: 12439 loss: 7.6784 decode.pam_cam.loss_ce: 2.2451 decode.pam_cam.acc_seg: 15.1995 decode.pam.loss_ce: 2.2386 decode.pam.acc_seg: 35.4270 decode.cam.loss_ce: 2.2463 decode.cam.acc_seg: 31.9748 aux.loss_ce: 0.9484 aux.acc_seg: 34.2752
02/21 00:56:43 - mmengine - INFO - Epoch(train) [1][ 150/5053] lr: 1.0000e-02 eta: 5:39:09 time: 0.3150 data_time: 0.0048 memory: 12439 loss: 13.0326 decode.pam_cam.loss_ce: 3.9024 decode.pam_cam.acc_seg: 0.0000 decode.pam.loss_ce: 3.8114 decode.pam.acc_seg: 22.9497 decode.cam.loss_ce: 3.7982 decode.cam.acc_seg: 22.5214 aux.loss_ce: 1.5206 aux.acc_seg: 16.3180 |
Co-authored-by: Zaida Zhou <[email protected]>
) | ||
``` | ||
|
||
如果想按照 iter 训练模型,需要做以下改动: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果想按照 iter 训练模型,需要做以下改动: | |
如果想以 IterBased 的方式训练模型,需要做以下改动: |
Thanks for your contribution and we appreciate it a lot. The following instructions would make your pull request more healthy and more easily get feedback. If you do not understand some items, don't worry, just make the pull request and seek help from maintainers.
Motivation
Please describe the motivation of this PR and the goal you want to achieve through this PR.
Modification
Please briefly describe what modification is made in this PR.
BC-breaking (Optional)
Does the modification introduce changes that break the backward-compatibility of the downstream repos?
If so, please describe how it breaks the compatibility and how the downstream projects should modify their code to keep compatibility with this PR.
Use cases (Optional)
If this PR introduces a new feature, it is better to list some use cases here, and update the documentation.
Checklist