pytorch basics #145210

dog-last · 2024-11-21T13:20:05Z

dog-last
Nov 21, 2024

Body

When I delete the code "retain_graph=True", IDE raises error. The message is:"RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.“
The code:

import torch
t_c=[0.5,14.0,15.0,28.0,11.0,8.0,3.0,-4.0,6.0,13.0,21.0]
t_u=[35.7,55.9,58.2,81.9,56.3,48.9,33.9,21.8,48.4,60.4,68.4]
t_c=torch.tensor(t_c,requires_grad=True)
t_u=torch.tensor(t_u,requires_grad=True)
def model(w,b,t_u):
    return w*t_u+b
def loss_fn(t_p,t_c):
    a=(t_p-t_c)**2
    return a.mean()
t_un=t_u/10
def training_loop(n_epochs,learning_rate,params,t_un,t_c):
    for epoch in range(n_epochs+1):
        if params.grad is not None:
            params.grad.zero_()
        loss=loss_fn(model(params[0],params[1],t_un),t_c)
        loss.backward(retain_graph=True)

        with torch.no_grad():
            params-=learning_rate*params.grad
        if(epoch%100==0):
            print("epoch:",epoch,"loss:",loss.item())
    return params

learning_rate=1e-2
params=training_loop(5000,learning_rate,torch.tensor([1.0,0.0],requires_grad=True),t_un,t_c)
print(params)

Guidelines

I have read and understood this category's guidelines before making this post.

jorritvanderheide · 2024-11-21T14:35:00Z

jorritvanderheide
Nov 21, 2024

The error you're encountering is because PyTorch's autograd (automatic differentiation) system frees the intermediate computations (the computation graph) after calling .backward(). This is done to save memory, and if you attempt to backward through the graph a second time without retaining it, you get the error you saw.

The solution is to ensure that the computation graph is not freed after the first backward pass if you need to perform another backward pass, or you need to ensure that you only perform backward once if you're not modifying the graph.

Fixing the issue

You can fix this by removing retain_graph=True from the loss.backward() call in your training loop. However, you also need to handle the fact that params is a tensor that will need to have its gradients cleared before the next pass. You should remove the retain_graph=True parameter and adjust the training loop accordingly.

Here’s the corrected code:

import torch

# Data
t_c = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0]
t_u = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]
t_c = torch.tensor(t_c, requires_grad=True)
t_u = torch.tensor(t_u, requires_grad=True)

# Model definition
def model(w, b, t_u):
    return w * t_u + b

# Loss function
def loss_fn(t_p, t_c):
    return ((t_p - t_c) ** 2).mean()

# Normalized input
t_un = t_u / 10

# Training loop
def training_loop(n_epochs, learning_rate, params, t_un, t_c):
    for epoch in range(n_epochs + 1):
        if params.grad is not None:
            params.grad.zero_()
        
        # Calculate the loss
        loss = loss_fn(model(params[0], params[1], t_un), t_c)

        # Perform the backward pass
        loss.backward()

        # Update the parameters manually
        with torch.no_grad():
            params -= learning_rate * params.grad
        
        # Print the loss every 100 epochs
        if epoch % 100 == 0:
            print("epoch:", epoch, "loss:", loss.item())
    
    return params

# Learning rate and training
learning_rate = 1e-2
params = training_loop(5000, learning_rate, torch.tensor([1.0, 0.0], requires_grad=True), t_un, t_c)

print(params)

Key changes

Removed retain_graph=True from loss.backward() since you only need to backpropagate once per iteration.
Gradients are zeroed out at the start of each epoch using params.grad.zero_() to prevent accumulating gradients across iterations.
Manual gradient update is done with params -= learning_rate * params.grad.

This should resolve the error and allow your training loop to proceed without issues.

2 replies

dog-last Nov 21, 2024
Author

Thank you for your help, but I think it seems there might have been a misunderstanding. The code you provided

import torch

# Data
t_c = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0]
t_u = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]
t_c = torch.tensor(t_c, requires_grad=True)
t_u = torch.tensor(t_u, requires_grad=True)

# Model definition
def model(w, b, t_u):
    return w * t_u + b

# Loss function
def loss_fn(t_p, t_c):
    return ((t_p - t_c) ** 2).mean()

# Normalized input
t_un = t_u / 10

# Training loop
def training_loop(n_epochs, learning_rate, params, t_un, t_c):
    for epoch in range(n_epochs + 1):
        if params.grad is not None:
            params.grad.zero_()
        
        # Calculate the loss
        loss = loss_fn(model(params[0], params[1], t_un), t_c)

        # Perform the backward pass
        loss.backward()

        # Update the parameters manually
        with torch.no_grad():
            params -= learning_rate * params.grad
        
        # Print the loss every 100 epochs
        if epoch % 100 == 0:
            print("epoch:", epoch, "loss:", loss.item())
    
    return params

# Learning rate and training
learning_rate = 1e-2
params = training_loop(5000, learning_rate, torch.tensor([1.0, 0.0], requires_grad=True), t_un, t_c)

print(params)

will raise the error "Trying to backward through the graph a second time ..."
And I mean that when I change the code into :

import torch
t_c=[0.5,14.0,15.0,28.0,11.0,8.0,3.0,-4.0,6.0,13.0,21.0]
t_u=[35.7,55.9,58.2,81.9,56.3,48.9,33.9,21.8,48.4,60.4,68.4]
t_c=torch.tensor(t_c,requires_grad=True)
t_u=torch.tensor(t_u,requires_grad=True)
def model(w,b,t_u):
    return w*t_u+b
def loss_fn(t_p,t_c):
    a=(t_p-t_c)**2
    return a.mean()
t_un=t_u/10
def training_loop(n_epochs,learning_rate,params,t_un,t_c):
    for epoch in range(n_epochs+1):
        if params.grad is not None:
            params.grad.zero_()
        loss=loss_fn(model(params[0],params[1],t_un),t_c)
        loss.backward(retain_graph=True)

        with torch.no_grad():
            params-=learning_rate*params.grad
        if(epoch%100==0):
            print("epoch:",epoch,"loss:",loss.item())
    return params

learning_rate=1e-2
params=training_loop(5000,learning_rate,torch.tensor([1.0,0.0],requires_grad=True),t_un,t_c)
print(params)

then it can output correctly. Can you help me?

jorritvanderheide Nov 21, 2024

What about this:

import torch

# Data
t_c = [0.5, 14.0, 15.0, 28.0, 11.0, 8.0, 3.0, -4.0, 6.0, 13.0, 21.0]
t_u = [35.7, 55.9, 58.2, 81.9, 56.3, 48.9, 33.9, 21.8, 48.4, 60.4, 68.4]
t_c = torch.tensor(t_c, requires_grad=True)
t_u = torch.tensor(t_u, requires_grad=True)

# Model definition
def model(w, b, t_u):
    return w * t_u + b

# Loss function
def loss_fn(t_p, t_c):
    return ((t_p - t_c) ** 2).mean()

# Normalized input
t_un = t_u / 10

# Training loop
def training_loop(n_epochs, learning_rate, params, t_un, t_c):
    for epoch in range(n_epochs + 1):
        if params.grad is not None:
            params.grad.zero_()
        
        # Calculate the loss
        loss = loss_fn(model(params[0], params[1], t_un), t_c)

        # Perform the backward pass
        loss.backward()  # No need for retain_graph=True
        
        # Update the parameters manually
        with torch.no_grad():
            params -= learning_rate * params.grad
        
        # Print the loss every 100 epochs
        if epoch % 100 == 0:
            print("epoch:", epoch, "loss:", loss.item())
    
    return params

# Learning rate and training
learning_rate = 1e-2
params = training_loop(5000, learning_rate, torch.tensor([1.0, 0.0], requires_grad=True), t_un, t_c)

print(params)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

pytorch basics #145210

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

GitHub Community

pytorch basics #145210

dog-last Nov 21, 2024

Body

Guidelines

Replies: 1 comment · 2 replies

jorritvanderheide Nov 21, 2024

Fixing the issue

Key changes

dog-last Nov 21, 2024 Author

jorritvanderheide Nov 21, 2024

dog-last
Nov 21, 2024

Replies: 1 comment 2 replies

jorritvanderheide
Nov 21, 2024

dog-last Nov 21, 2024
Author