-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
【PaddlePaddle Hackathon 3 】为 PaddleScience 增加损失函数权重自适应功能 #142
Conversation
Thanks for your contribution! |
✅ This PR's description meets the template requirements! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请添加测试代码,如有任何问题请及时沟通
你好,grad norm中需要进行微分计算,如何使用numpy进行检验呢? |
@pytest.mark.api_network_GradNorm | ||
def test_GradNorm0(): | ||
xy_data = np.array([[0.1, 0.5, 0.3, 0.4, 0.2]]) | ||
u = np.array([1.138526], dtype=np.float32) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这些数值是在相同的初始化方法、随机种子、输入的情况下,使用该仓库的逻辑通过Pytorch计算得来,代码如下:
import torch
import torch.nn as nn
from functools import partial
import numpy as np
from torch.nn.init import constant_
class FCNet(nn.Module):
def __init__(self,
num_ins,
num_outs,
num_layers,
hidden_size,
activation='tanh',
n_loss=1):
super(FCNet, self).__init__()
self.num_ins = num_ins
self.num_outs = num_outs
self.num_layers = num_layers
self.hidden_size = hidden_size
self.weights = nn.Parameter(torch.ones(n_loss).float())
# self.weights = nn.Parameter(torch.tensor([1.0, 2.0, 3.0]).float())
if activation == 'sigmoid':
self.activation = torch.sigmoid
elif activation == 'tanh':
self.activation = torch.tanh
else:
assert 0, "Unsupported activation type."
w = []
self.num_layers = num_layers
for i in range(num_layers):
if i == 0:
lsize = num_ins
rsize = hidden_size
elif i == (num_layers - 1):
lsize = hidden_size
rsize = num_outs
else:
lsize = hidden_size
rsize = hidden_size
w.append(nn.Linear(lsize, rsize, bias=False))
self.fc = nn.ModuleList(w)
self._init_weights()
def _init_weights(self):
for i in self.fc:
if isinstance(i, nn.Linear):
constant_(i.weight, 1)
def forward(self, inp):
u = inp
for i in range(self.num_layers - 1):
u = self.fc[i](u)
u = self.activation(u)
return self.fc[-1](u)
loss_func = [torch.sum, torch.mean, partial(torch.norm, p=2), partial(torch.norm, p=3)]
def cal_gradnorm(ins,
num_ins,
num_outs,
num_layers,
hidden_size,
n_loss,
alpha,
activation='tanh',
weight_attr=None):
net = FCNet(
num_ins=num_ins,
num_outs=num_outs,
num_layers=num_layers,
hidden_size=hidden_size,
activation=activation,
n_loss=n_loss)
res = net(ins)
print(res)
losses = []
for idx in range(n_loss):
losses.append(loss_func[idx](res))
losses = torch.stack(losses)
weighted_loss = losses * net.weights
loss = torch.sum(weighted_loss)
loss.backward(retain_graph=True)
initial_task_loss = losses.detach().numpy()
net.weights.grad.data = net.weights.grad.data * 0.0
W = net.fc[-1]
norms = []
for i in range(n_loss):
# get the gradient of this task loss with respect to the shared parameters
gygw = torch.autograd.grad(losses[i], W.parameters(), retain_graph=True)
# compute the norm
norms.append(torch.norm(torch.mul(net.weights[i], gygw[0])))
norms = torch.stack(norms)
print("norms: ", norms)
if torch.cuda.is_available():
loss_ratio = losses.data.cpu().numpy() / initial_task_loss
else:
loss_ratio = losses.data.numpy() / initial_task_loss
inverse_train_rate = loss_ratio / np.mean(loss_ratio)
print("inverse_train_rate: ", inverse_train_rate)
if torch.cuda.is_available():
mean_norm = np.mean(norms.data.cpu().numpy())
else:
mean_norm = np.mean(norms.data.numpy())
constant_term = torch.tensor(mean_norm * (inverse_train_rate ** alpha), requires_grad=False)
print("constant_term: ", constant_term)
if torch.cuda.is_available():
constant_term = constant_term.cuda()
grad_norm_loss = torch.sum(torch.abs(norms - constant_term))
net.weights.grad = torch.autograd.grad(grad_norm_loss, net.weights)[0]
print(net.weights.grad)
return grad_norm_loss
def randtool(dtype, low, high, shape):
"""
np random tools
"""
if dtype == "int":
return np.random.randint(low, high, shape)
elif dtype == "float":
return low + (high - low) * np.random.random(shape)
if __name__ == '__main__':
np.random.seed(22)
xy_data = randtool("float", 0, 10, (9, 2))
print(xy_data)
# xy_data = torch.tensor(np.array([[0.1, 0.5, 0.2, 0.4]]), dtype=torch.float32)
# xy_data = torch.tensor(np.array([[0.1, 0.5, 0.3, 0.4, 0.2]]), dtype=torch.float32)
# res = cal_gradnorm(xy_data, 4, 3, 5, 20, activation='sigmoid', n_loss=3, alpha=0.5)
res = cal_gradnorm(torch.tensor(xy_data, dtype=torch.float32), 2, 3, 2, 1, activation='tanh', n_loss=4, alpha=0.5)
print(res.item())
2.3版本可正常运行,develop版本运行出错,正在尝试修复 |
@rightpeach 你好,CI已通过 |
@rightpeach 你好,可以开始review吗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
是否可以解释一下原论文的技术思路,并提供代码注释及说明?此外请问是否跑通过原始论文代码?如跑通可以提供一下paddle复现版本与原论文代码的结果差异。并辛苦说明一下替换的API有哪些,以及遇到或遗留的问题。
@rightpeach |
原始论文提供的是一个很简单的样例,我将其修改了一下,代码在上面,结果上没有很大差异,可以保证grad norm loss的相对误差在1e-7次方。 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
请通过CI
@rightpeach 已通过 |
@Asthestarsfalll 你好请问这个api目前能支持ppsci直接调用了吗,我发现目前有一个gradnorm的api,但是好像不支持设置alpha参数,https://paddlescience-docs.readthedocs.io/zh-cn/latest/zh/api/loss/mtl/#ppsci.loss.mtl.GradNorm |
应该就是momentum |
thx,看了下源码实现应该算是一个简化的gradnorm,你的实现更贴合原论文。另外想请教下你训练过程中有遇到过在有遇到过卡死的问题吗,无论是我自己的实现还是这个ppsci的gradnorm,我这边多卡都会卡死在backward处 |
没有测试过多卡 |
PR types
New features
PR changes
APIs
Describe
添加Grad Norm以实现多loss均衡,目前尚未添加测试代码,需要进一步考虑如何进行测试