Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

使用Paddle Custom NPU训练SAC一段时间后reward一直不变 #1106

Open
USTCKAY opened this issue Jul 4, 2023 · 0 comments
Open

使用Paddle Custom NPU训练SAC一段时间后reward一直不变 #1106

USTCKAY opened this issue Jul 4, 2023 · 0 comments

Comments

@USTCKAY
Copy link

USTCKAY commented Jul 4, 2023

Hello,最近我在用NPU跑SAC时遇到了如题所说的情况,reward信息如下图。想请教一下PARL的同学这种情况可能是什么原因导致的呢?
image
我用GPU和CPU版本的Paddle试过,模型都能够正常训练,说明算法本身没有问题。我又统计了一下SAC用到的paddle算子,发现只有add clip full_ matmul relu scale tanh uniform,所以尝试了逐个把这些算子fallback到cpu上运行,但是除了屏蔽matmul外还是会出现相同的情况。而fallback matmul算子时在训练一段时间后会报如下的错误:
image
我这边暂时没有什么思路来定位问题了,恳请PARL的同学帮忙看一下,多谢!
ps:使用的paddle和PARL都是最新的develop版本

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant