Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move regularization and error clipping to parameter server side. #7432

Closed
typhoonzero opened this issue Jan 11, 2018 · 4 comments
Closed

Move regularization and error clipping to parameter server side. #7432

typhoonzero opened this issue Jan 11, 2018 · 4 comments
Assignees

Comments

@typhoonzero
Copy link
Contributor

typhoonzero commented Jan 11, 2018

Move regularization and error clipping to parameter server side in the distributed transpiler.

@helinwang
Copy link
Contributor

helinwang commented Jan 11, 2018

  • I think if the normalization is done on the trainer side, making it part of the loss function, then the gradient update (parameter server's job) don't need to know about the normalization, so the update is always element wise, thus we don't need to do anything special when each parameter is sharded into multiple parameter servers. I briefly looked into the code few days ago, it seems that the normalization is already part of the loss function (@lcy-seso if you happen to be familiar with the L1/L2 normalization implementation, could you confirm? Thank you! If not don't worry about it, I will double check). I prefer that we can treat normalization (e.g., L1, L2) as part of the loss function (most general).

  • Gradient clipping we can do on the parameter server side, it seems not very hard.

@typhoonzero typhoonzero changed the title Move normalization and error clipping to parameter server side. Move regularization and error clipping to parameter server side. Jan 11, 2018
@typhoonzero
Copy link
Contributor Author

Sorry, my fault, I mean regularization and clipping. See: https://github.com/PaddlePaddle/Paddle/blob/develop/python/paddle/v2/fluid/optimizer.py#L203

Thanks very much for your comment.

@helinwang
Copy link
Contributor

helinwang commented Jan 11, 2018

Thanks, very important information.

@typhoonzero
Copy link
Contributor Author

I think the simple version could put decay and clipping to server side should do the basic work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants