Skip to content
This repository has been archived by the owner on Dec 10, 2024. It is now read-only.

关于Retrieval-Augmented Generator部分的问题 #4

Closed
FreshPrincePerseus opened this issue Apr 26, 2024 · 10 comments
Closed

关于Retrieval-Augmented Generator部分的问题 #4

FreshPrincePerseus opened this issue Apr 26, 2024 · 10 comments

Comments

@FreshPrincePerseus
Copy link

作者您好,我在尝试复现Retrieval-Augmented Generator部分## train a vanilla Transformer model 环节的工作时,代码报错lightning_lite.utilities.exceptions.MisconfigurationException: The provided lr scheduler LambdaLRdoesn't follow PyTorch's LRScheduler API. You should override theLightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.,这个报错我没能很好的解决,请问作者有遇到这个情况吗?网上查询是说要在 LightningModule 中覆盖 lr_scheduler_step 方法,并添加自己的逻辑。
此外,我在注释了以上报错的代码(python文件包中的代码)后继续执行命令语句,发现模型训练后无法结束(会一直占用显卡内存),请问训练好了 vanilla Transformer model后 ,会主动结束进程吗?
最后想请教作者,模型训练好后会以一个什么形式储存呢?我的result文件夹(SelfMemory/results/generator/bigpatent/bart_large)里只有多个版本的lightning_logs,不知道是不是训练vanilla Transformer model 没完成,所以只有logs呢?
image
希望作者可以解答我的疑惑,谢谢~

@Hannibal046
Copy link
Owner

你好,你的lightning版本是 pytorch-lightning==1.8.0.post1吗?

@FreshPrincePerseus
Copy link
Author

你好,你的lightning版本是 pytorch-lightning==1.8.0.post1吗?

是的,我的软件版本都按照了您代码里的要求。

@Hannibal046
Copy link
Owner

你好,能再检查下torch和lighting的版本吗?这是目前的scheduler的定义, 满足PyTorch's LRScheduler API:

import torch
import math
from torch.optim.lr_scheduler import LambdaLR
def get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, last_epoch=-1):
"""
Create a schedule with a learning rate that decreases linearly from the initial lr set in the optimizer to 0,
after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer.
"""
def lr_lambda(current_step: int):
if current_step < num_warmup_steps:
return float(current_step) / float(max(1, num_warmup_steps))
return max(
0.0, float(num_training_steps - current_step) / float(max(1, num_training_steps - num_warmup_steps))
)
return LambdaLR(optimizer, lr_lambda, last_epoch)

@FreshPrincePerseus
Copy link
Author

你好,能再检查下torch和lighting的版本吗?这是目前的scheduler的定义, 满足PyTorch's LRScheduler API:

import torch
import math
from torch.optim.lr_scheduler import LambdaLR
def get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, last_epoch=-1):
"""
Create a schedule with a learning rate that decreases linearly from the initial lr set in the optimizer to 0,
after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer.
"""
def lr_lambda(current_step: int):
if current_step < num_warmup_steps:
return float(current_step) / float(max(1, num_warmup_steps))
return max(
0.0, float(num_training_steps - current_step) / float(max(1, num_training_steps - num_warmup_steps))
)
return LambdaLR(optimizer, lr_lambda, last_epoch)

是的,我检查了我的这两项:我的CUDA 版本为12.1,pytorch版本为符合12.1CUDA的torch 2.1.1。 lightning 版本为pytorch-lightning==1.8.0.post1。由于见到一个warning为DEPRECATION: lightning-lite 1.8.0.post1 has a non-standard dependency specifier torch>=1.9.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of lightning-lite or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063,我将pip版本也降级到了23.3.1。但我尝试执行python train_generator.py \ --config config/bigpatent/train_generator.yaml \ --precision 16时,仍然会报错lightning_lite.utilities.exceptions.MisconfigurationException: The provided lr scheduler LambdaLRdoesn't follow PyTorch's LRScheduler API. You should override theLightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler. 可以问一下您使用的torch版本吗?

@Hannibal046
Copy link
Owner

请尝试下这个版本torch==1.8.1+cu111

@FreshPrincePerseus
Copy link
Author

请尝试下这个版本torch==1.8.1+cu111

我尝试安装了这个版本后,在执行pip intsall pytorch-lightning==1.8.0.post1命令时,会自动安装一个2.3.0版本的torch 来替代原有的torch版本(可能是cpu版本)。我认为是torch==1.8.1+cu111版本旧了一些而导致的自动升级?
然后执行

python train_generator.py \
    --config config/bigpatent/train_generator.yaml \
    --precision 16

后同样会报
lightning_lite.utilities.exceptions.MisconfigurationException: The provided lr scheduler LambdaLRdoesn't follow PyTorch's LRScheduler API. You should override theLightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.
的错误。

@Hannibal046
Copy link
Owner

在这里可以找到安装pytorch 1.x的命令: https://pytorch.org/get-started/previous-versions/

@FreshPrincePerseus
Copy link
Author

FreshPrincePerseus commented May 1, 2024

在这里可以找到安装pytorch 1.x的命令: https://pytorch.org/get-started/previous-versions/

是的,我一开始的pytorch1.8版本是在这里安装的。但是执行了pip intsall pytorch-lightning==1.8.0.post1命令后,就会自动变为2.3.0版本。我尝试了两个新的conda环境都是如此

@FreshPrincePerseus
Copy link
Author

经过调研,我认为出现这个问题的原因在于torch的版本。当我将pytorch版本下降到2.0.0以下后就可以运行了。答案来源参考了Lightning-AI/pytorch-lightning#15912

@FreshPrincePerseus
Copy link
Author

经过调研,我认为出现这个问题的原因在于torch的版本。当我将pytorch版本下降到2.0.0以下后就可以运行了。答案来源参考了Lightning-AI/pytorch-lightning#15912

作者先前提到的pytorch==1.8.1+cu111应该是不支持pytorch-lightning==1.8.0.post1,因此执行pip install pytorch-lightning==1.8.0.post1后会自动升级pytoch版本。我最终安装了pytorch==1.13.1。👍

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants