Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

cosine annealing lr scheduler #864

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

CoinCheung
Copy link
Contributor

@CoinCheung CoinCheung commented Jun 5, 2019

Personally, I prefer lr schedulers with smooth shapes like this cosine lr scheduler, that is because I can no longer need to consider the plateau positions where I should drop the learning rate(Or maybe I could say this annealing method allows us to not consider the two hyper-parameters of mile-stones, and makes it simpler to decide the training configurations ?).

The efficiency of the usage of cosine lr scheduler is verified both in the task of classification (paper is here), and object-detection (paper is here). So I think maybe it is not improper to add this feature to this repository, and maybe other users are also in need of this feature.

If my modification is not beautiful, please tell me and I will be happy to make it satisfactory :)

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jun 5, 2019
@botcs botcs added the enhancement New feature or request label Sep 17, 2019
@botcs
Copy link
Contributor

botcs commented Sep 17, 2019

Hi @CoinCheung

Really nice extension!
Before merging it, could you run sum benchmarks to see if the performance increases accordingly?

@Jacobew
Copy link
Contributor

Jacobew commented Sep 18, 2019

@CoinCheung Hi, any update here? Just curious about the performance when using cosine annealing lr

@CoinCheung
Copy link
Contributor Author

Hi,

I did not test it coco dataset since I do not have enough gpus, but I did have tested it on our own dataset which is composed of around 30k images. On our own datasets, I observed an improvement of mAP50 by around 0.05, which shows that cosine annealing learning rate curve does no worse than its step-shaped counterpart. If you feel it is better to post benchmark results on coco dataset, I will try to train one, but I am afraid it will take some days.

@Jacobew
Copy link
Contributor

Jacobew commented Sep 18, 2019

Thanks for your reply. I think benchmark results are needed before merging this PR if it can really improve performance on COCO.

@CoinCheung
Copy link
Contributor Author

I have tested this pr on coco dataset. It is so sad to find that cosine lr schedule is not better than the multi-step lr scheduler, with map 40.7 and 39.5 separately. I used fbnet based faster-rcnn following the default configuration except that I doubled the image number per gpu and used 4 gpus to train in the fp16 mode. Training log can be found at: multi-step and cosine.

I think the reason behind this performance margin is that the milestones of the mult-step lr schedule is carefully picked and maybe many other hyper-parameters are tuned on the basis of using this lr curve rather than the cosine shaped lr. The performance margin varies case by case. On our own dataset where the default configurations tuned for coco might not be optimal, cosine lr performs on par with its multi-step counterpart. I have also tested it on cifar-10 dataset, where with careful choice of the stopping lr, cosine shaped schedule can perform better than multi-step scheduler. So I think cosine lr still makes sense and can be a meaningful choice in the general usages.

@gaussiangit
Copy link

@CoinCheung Did you combine different models (ensembling) from cosine annealing or just checked final model ?

@CoinCheung
Copy link
Contributor Author

@gaussiangit No, I didn't. I simply used the final model to test. What is the good strategy of ensembling. Would you be more specific ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
CLA Signed Do not delete this pull request or issue due to inactivity. enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants