关于Retrieval-Augmented Generator部分的问题 #4

FreshPrincePerseus · 2024-04-26T02:15:40Z

作者您好，我在尝试复现Retrieval-Augmented Generator部分## train a vanilla Transformer model 环节的工作时，代码报错lightning_lite.utilities.exceptions.MisconfigurationException: The provided lr scheduler LambdaLRdoesn't follow PyTorch's LRScheduler API. You should override theLightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.，这个报错我没能很好的解决，请问作者有遇到这个情况吗？网上查询是说要在 LightningModule 中覆盖 lr_scheduler_step 方法，并添加自己的逻辑。
此外，我在注释了以上报错的代码（python文件包中的代码）后继续执行命令语句，发现模型训练后无法结束（会一直占用显卡内存），请问训练好了 vanilla Transformer model后，会主动结束进程吗？
最后想请教作者，模型训练好后会以一个什么形式储存呢？我的result文件夹（SelfMemory/results/generator/bigpatent/bart_large）里只有多个版本的lightning_logs，不知道是不是训练vanilla Transformer model 没完成，所以只有logs呢？

希望作者可以解答我的疑惑，谢谢～

The text was updated successfully, but these errors were encountered:

Hannibal046 · 2024-04-27T12:59:18Z

你好,你的lightning版本是 pytorch-lightning==1.8.0.post1吗?

FreshPrincePerseus · 2024-04-28T08:07:16Z

你好,你的lightning版本是 pytorch-lightning==1.8.0.post1吗?

是的，我的软件版本都按照了您代码里的要求。

Hannibal046 · 2024-04-28T10:45:30Z

你好,能再检查下torch和lighting的版本吗?这是目前的scheduler的定义, 满足PyTorch's LRScheduler API:

SelfMemory/src/utils/optim_utils.py

Lines 2 to 19 in 58d8b61

    
           import torch 
        
           import math 
        
           from torch.optim.lr_scheduler import LambdaLR 
        
           def get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, last_epoch=-1): 
        
               """ 
        
               Create a schedule with a learning rate that decreases linearly from the initial lr set in the optimizer to 0, 
        
               after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer. 
        
               """ 
        
               def lr_lambda(current_step: int): 
        
                   if current_step < num_warmup_steps: 
        
                       return float(current_step) / float(max(1, num_warmup_steps)) 
        
                   return max( 
        
                       0.0, float(num_training_steps - current_step) / float(max(1, num_training_steps - num_warmup_steps)) 
        
                   ) 
        
               return LambdaLR(optimizer, lr_lambda, last_epoch)

FreshPrincePerseus · 2024-04-29T13:11:53Z

你好,能再检查下torch和lighting的版本吗?这是目前的scheduler的定义, 满足PyTorch's LRScheduler API:

SelfMemory/src/utils/optim_utils.py

Lines 2 to 19 in 58d8b61

import torch

import math

from torch.optim.lr_scheduler import LambdaLR

def get_linear_schedule_with_warmup(optimizer, num_warmup_steps, num_training_steps, last_epoch=-1):

"""

Create a schedule with a learning rate that decreases linearly from the initial lr set in the optimizer to 0,

after a warmup period during which it increases linearly from 0 to the initial lr set in the optimizer.

"""

def lr_lambda(current_step: int):

if current_step < num_warmup_steps:

return float(current_step) / float(max(1, num_warmup_steps))

return max(

0.0, float(num_training_steps - current_step) / float(max(1, num_training_steps - num_warmup_steps))

)

return LambdaLR(optimizer, lr_lambda, last_epoch)

是的，我检查了我的这两项：我的CUDA 版本为12.1，pytorch版本为符合12.1CUDA的torch 2.1.1。 lightning 版本为pytorch-lightning==1.8.0.post1。由于见到一个warning为DEPRECATION: lightning-lite 1.8.0.post1 has a non-standard dependency specifier torch>=1.9.*. pip 24.0 will enforce this behaviour change. A possible replacement is to upgrade to a newer version of lightning-lite or contact the author to suggest that they release a version with a conforming dependency specifiers. Discussion can be found at https://github.com/pypa/pip/issues/12063，我将pip版本也降级到了23.3.1。但我尝试执行python train_generator.py \ --config config/bigpatent/train_generator.yaml \ --precision 16时，仍然会报错lightning_lite.utilities.exceptions.MisconfigurationException: The provided lr scheduler LambdaLRdoesn't follow PyTorch's LRScheduler API. You should override theLightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler. 可以问一下您使用的torch版本吗？

Hannibal046 · 2024-04-29T15:35:41Z

请尝试下这个版本torch==1.8.1+cu111

FreshPrincePerseus · 2024-05-01T04:08:45Z

请尝试下这个版本torch==1.8.1+cu111

我尝试安装了这个版本后，在执行pip intsall pytorch-lightning==1.8.0.post1命令时，会自动安装一个2.3.0版本的torch 来替代原有的torch版本（可能是cpu版本）。我认为是torch==1.8.1+cu111版本旧了一些而导致的自动升级？
然后执行

python train_generator.py \
    --config config/bigpatent/train_generator.yaml \
    --precision 16

后同样会报
lightning_lite.utilities.exceptions.MisconfigurationException: The provided lr scheduler LambdaLRdoesn't follow PyTorch's LRScheduler API. You should override theLightningModule.lr_scheduler_step hook with your own logic if you are using a custom LR scheduler.
的错误。

Hannibal046 · 2024-05-01T04:59:27Z

在这里可以找到安装pytorch 1.x的命令: https://pytorch.org/get-started/previous-versions/

FreshPrincePerseus · 2024-05-01T05:05:14Z

在这里可以找到安装pytorch 1.x的命令: https://pytorch.org/get-started/previous-versions/

是的，我一开始的pytorch1.8版本是在这里安装的。但是执行了pip intsall pytorch-lightning==1.8.0.post1命令后，就会自动变为2.3.0版本。我尝试了两个新的conda环境都是如此

FreshPrincePerseus · 2024-05-05T06:15:17Z

经过调研，我认为出现这个问题的原因在于torch的版本。当我将pytorch版本下降到2.0.0以下后就可以运行了。答案来源参考了Lightning-AI/pytorch-lightning#15912

FreshPrincePerseus · 2024-05-05T06:19:37Z

经过调研，我认为出现这个问题的原因在于torch的版本。当我将pytorch版本下降到2.0.0以下后就可以运行了。答案来源参考了Lightning-AI/pytorch-lightning#15912

作者先前提到的pytorch==1.8.1+cu111应该是不支持pytorch-lightning==1.8.0.post1，因此执行pip install pytorch-lightning==1.8.0.post1后会自动升级pytoch版本。我最终安装了pytorch==1.13.1。👍

FreshPrincePerseus closed this as completed May 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

关于Retrieval-Augmented Generator部分的问题 #4

关于Retrieval-Augmented Generator部分的问题 #4

FreshPrincePerseus commented Apr 26, 2024

Hannibal046 commented Apr 27, 2024

FreshPrincePerseus commented Apr 28, 2024

Hannibal046 commented Apr 28, 2024

FreshPrincePerseus commented Apr 29, 2024

Hannibal046 commented Apr 29, 2024

FreshPrincePerseus commented May 1, 2024

Hannibal046 commented May 1, 2024

FreshPrincePerseus commented May 1, 2024 •

edited

Loading

FreshPrincePerseus commented May 5, 2024

FreshPrincePerseus commented May 5, 2024

关于Retrieval-Augmented Generator部分的问题 #4

关于Retrieval-Augmented Generator部分的问题 #4

Comments

FreshPrincePerseus commented Apr 26, 2024

Hannibal046 commented Apr 27, 2024

FreshPrincePerseus commented Apr 28, 2024

Hannibal046 commented Apr 28, 2024

FreshPrincePerseus commented Apr 29, 2024

Hannibal046 commented Apr 29, 2024

FreshPrincePerseus commented May 1, 2024

Hannibal046 commented May 1, 2024

FreshPrincePerseus commented May 1, 2024 • edited Loading

FreshPrincePerseus commented May 5, 2024

FreshPrincePerseus commented May 5, 2024

FreshPrincePerseus commented May 1, 2024 •

edited

Loading