[Wait for #2574] [ Context ] Add loss scale in Context & using mse loss #2580

jijoongmoon · 2024-05-11T05:04:26Z

In this PR

This PR add loss scale parameter in RunLayerContext and use it to update
mse loss.

. Add Loss Scale Parameter in RunLayerContext Constructor
. Add applyLossScale func to update return derivative in Loss Layer
. Change MSE Loss Layer to apply the loss scale to return derivative

Self evaluation:

Build test: [X]Passed [ ]Failed [ ]Skipped
Run test: [X]Passed [ ]Failed [ ]Skipped

Signed-off-by: jijoong.moon [email protected]

We will add Var32 Tensor if the Variable Weight is not Full precision (FP32). This eables the Weight Update with full precision and only Apply Gradient Process ueses this Tensor. Therefore, the lifespan of this tensor should be "ApplyGradient". . Modify TensorPool to generate Weigth considering Mixed Precsion. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This pr create the variable fp32 tensor when we create the Weight and Optimizer Weight. . update the manager to create Weight with var32 tensor which requested to weight pool. . update the weight requests with Weight Spec and var, grad and var32 tensors which created already. . add clone Tensor with specific type in tensor.h Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR enables the FP16 support for the layers below: . input layer . mse loss layer Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR includes the mixed precision test case. . Input - FC - MSE : "batch_size=2", "model_tensor_type=FP16-FP16", "loss_scale=128" **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This commit modify apply gradient in optimizer. We do not need to save optimizer variables in weight type. Only Optimizer needs the optimizer variables and we should update the weight with full precision to maintain the accuracy. Therefore, remove the var32 tensors for optimizer variables. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

This PR add is_NaN function to check if the tensor has NaN value. This is for the check NaN during mixed precision training. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

taos-ci · 2024-05-11T05:04:29Z

📝 TAOS-CI Version: 1.5.20200925. Thank you for submitting PR #2580. Please a submit 1commit/1PR (one commit per one PR) policy to get comments quickly from reviewers. Your PR must pass all verificiation processes of cibot before starting a review process from reviewers. If you are new member to join this project, please read manuals in documentation folder and wiki page. In order to monitor a progress status of your PR in more detail, visit http://ci.nnstreamer.ai/.

taos-ci

@jijoongmoon, 💯 All CI checkers are successfully verified. Thanks.

This PR add loss scale parameter in runcontext and use it to update mse loss. . Add Loss Scale Parameter in RunLayerContext Constructor . Add applyLossScale func to update return derivitive in Loss Layer . Change MSE Loss Layer to apply the loss scale to return derivitive **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>

taos-ci

@jijoongmoon, 💯 All CI checkers are successfully verified. Thanks.

jijoongmoon · 2024-11-11T07:06:44Z

closed by #2663

jijoongmoon added 6 commits May 7, 2024 13:38

jijoongmoon requested review from myungjoo, again4you, jaeyun-jung, leemgs, wooksong, helloahn, kparichay, gichan-jang, anyj0527, zhoonit, lhs8928, songgot, jihochu, DonghakPark, SeoHyungjun, baek2sm, skykongkong8, djeong20, EunjuYang and a team as code owners May 11, 2024 05:04

github-actions bot added the Need Review label May 11, 2024

taos-ci approved these changes May 11, 2024

View reviewed changes

jijoongmoon force-pushed the loss_scale branch from bb2bb45 to adc2f2a Compare May 11, 2024 05:41

taos-ci approved these changes May 11, 2024

View reviewed changes

DonghakPark mentioned this pull request Oct 30, 2024

[Wait for #2615] Enable Mixed Precision Training in NNTrainer @open sesame 11/09 15:18 #2663

Merged

jijoongmoon closed this Nov 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Wait for #2574] [ Context ] Add loss scale in Context & using mse loss #2580

[Wait for #2574] [ Context ] Add loss scale in Context & using mse loss #2580

jijoongmoon commented May 11, 2024

taos-ci commented May 11, 2024

taos-ci left a comment

taos-ci left a comment

jijoongmoon commented Nov 11, 2024

[Wait for #2574] [ Context ] Add loss scale in Context & using mse loss #2580

[Wait for #2574] [ Context ] Add loss scale in Context & using mse loss #2580

Conversation

jijoongmoon commented May 11, 2024

In this PR

taos-ci commented May 11, 2024

taos-ci left a comment

Choose a reason for hiding this comment

taos-ci left a comment

Choose a reason for hiding this comment

jijoongmoon commented Nov 11, 2024