Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The provided lr scheduler OneCycleLR doesn't follow PyTorch's LRScheduler API #4

Closed
gioivuathoi opened this issue Oct 13, 2023 · 2 comments

Comments

@gioivuathoi
Copy link

gioivuathoi commented Oct 13, 2023

Thank you for your great work!
I am trying to use Multilngual_CLIP to train clip4str for Vietnamese (with charset contains 229 tokens) (use Google Colab)
I have changed charset, code in strhub/models/vl_str/systems.py and other files so that I can use Text_encoder from Multilingual_CLIP for Vietnamese
Now I am getting an error for Learning rate scheduler as following:

The dimension of the visual decoder is 768.
Len of Tokenizer 232
Done creating model!
| Name | Type | Params

0 | clip_model | CLIP | 427 M
1 | clip_model.visual | VisionTransformer | 303 M
2 | clip_model.transformer | Transformer | 85.1 M
3 | clip_model.token_embedding | Embedding | 37.9 M
4 | clip_model.ln_final | LayerNorm | 1.5 K
5 | M_clip_model | MultilingualCLIP | 560 M
6 | M_clip_model.transformer | XLMRobertaModel | 559 M
7 | M_clip_model.LinearTransformation | Linear | 787 K
8 | visual_decoder | Decoder | 9.8 M
9 | visual_decoder.layers | ModuleList | 9.5 M
10 | visual_decoder.text_embed | TokenEmbedding | 178 K
11 | visual_decoder.norm | LayerNorm | 1.5 K
12 | visual_decoder.dropout | Dropout | 0
13 | visual_decoder.head | Linear | 176 K
14 | cross_decoder | Decoder | 9.8 M
15 | cross_decoder.layers | ModuleList | 9.5 M
16 | cross_decoder.text_embed | TokenEmbedding | 178 K
17 | cross_decoder.norm | LayerNorm | 1.5 K
18 | cross_decoder.dropout | Dropout | 0
19 | cross_decoder.head | Linear | 176 K

675 M Trainable params
332 M Non-trainable params
1.0 B Total params
4,031.815 Total estimated model params size (MB)
[dataset] mean (0.48145466, 0.4578275, 0.40821073), std (0.26862954, 0.26130258, 0.27577711)
Using 16bit native Automatic Mixed Precision (AMP)
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/configuration_validator.py:117: UserWarning: When using Trainer(accumulate_grad_batches != 1) and overriding LightningModule.optimizer_{step,zero_grad}, the hooks will not be called on every batch (rather, they are called on every optimization step).
rank_zero_warn(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
[VL4STR] The length of encoder params with and without weight decay is 259 and 479, respectively.
[VL4STR] The length of decoder params with and without weight decay is 14 and 38, respectively.
Loading train_dataloader to estimate number of stepping batches.
dataset root: /content/drive/MyDrive/clip4str/dataset/str_dataset/train/real
lmdb: ArT num samples: 34984
lmdb: The number of training samples is 34984
/usr/local/lib/python3.10/dist-packages/torch/utils/data/dataloader.py:560: UserWarning: This DataLoader will create 8 worker processes in total. Our suggested max number of worker in current system is 2, which is smaller than what this DataLoader is going to create. Please be aware that excessive worker creation might get DataLoader running slow or even freeze, lower the worker number to avoid potential slowness/freeze if necessary.
warnings.warn(_create_warning_msg(
Error executing job with overrides: []
Traceback (most recent call last):
File "/content/drive/MyDrive/clip4str/code/clip4str/train.py", line 145, in
main()
File "/usr/local/lib/python3.10/dist-packages/hydra/main.py", line 90, in decorated_main
_run_hydra(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 389, in _run_hydra
_run_app(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 452, in _run_app
run_and_report(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 216, in run_and_report
raise ex
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 213, in run_and_report
return func()
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/utils.py", line 453, in
lambda: hydra.run(
File "/usr/local/lib/python3.10/dist-packages/hydra/_internal/hydra.py", line 132, in run
_ = ret.return_value
File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 260, in return_value
raise self._return_value
File "/usr/local/lib/python3.10/dist-packages/hydra/core/utils.py", line 186, in run_job
ret.return_value = task_function(task_cfg)
File "/content/drive/MyDrive/clip4str/code/clip4str/train.py", line 104, in main
trainer.fit(model, datamodule=datamodule, ckpt_path=config.ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 770, in fit
self._call_and_handle_interrupt(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 723, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 811, in _fit_impl
results = self._run(model, ckpt_path=self.ckpt_path)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/trainer/trainer.py", line 1217, in _run
self.strategy.setup(self)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/single_device.py", line 72, in setup
super().setup(trainer)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 139, in setup
self.setup_optimizers(trainer)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/strategies/strategy.py", line 128, in setup_optimizers
self.optimizers, self.lr_scheduler_configs, self.optimizer_frequencies = _init_optimizers_and_lr_schedulers(
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 195, in _init_optimizers_and_lr_schedulers
_validate_scheduler_api(lr_scheduler_configs, model)
File "/usr/local/lib/python3.10/dist-packages/pytorch_lightning/core/optimizer.py", line 350, in _validate_scheduler_api
raise MisconfigurationException(
pytorch_lightning.utilities.exceptions.MisconfigurationException: The provided lr scheduler OneCycleLR doesn't follow PyTorch's LRScheduler API. You should override the LightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.

I can not see any problem in OneCycleLR, do you have any suggestions for me with this matter? Is it a problem of package version?

@gioivuathoi
Copy link
Author

Found this: Error in lr scheduler after upgrade torch 2.
I think the reason is I'm using torch 2.0.1, so the error can be fixed by downgrade torch <2.0 or fix source code of lightning pytorch 1.6.5 like in the link.
P/S: the bug has been fixed for new versions of lightning pytorch

@TruongNoDame
Copy link

Hi @gioivuathoi , I am also trying to use Multilngual_CLIP to train clip4str for Vietnamese (with charset contains 226 tokens) (use GPU of server) but I don't know which version of Multilngual_CLIP to use, if possible, can you send me the link or the name of that version.
I would appreciate it if you told me what I should fix so that it matches the version? Example: number of charsets, code in strhub/models/vl_str/systems.py and other files such as you mentioned.
I hope that you can respond to my wishes.
Thanks, Have a nice day!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants