-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Wait for #2663][Mixed Precision] Fix gradient clipping logic #2749
base: main
Are you sure you want to change the base?
[Wait for #2663][Mixed Precision] Fix gradient clipping logic #2749
Commits on Aug 29, 2024
-
Signed-off-by: Jiho Chu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a5d16a4 - Browse repository at this point
Copy the full SHA a5d16a4View commit details -
Signed-off-by: Jiho Chu <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 23e40da - Browse repository at this point
Copy the full SHA 23e40daView commit details -
[SWAP] Modify cache for inference mode
This patch is for inference mode for swap device. It re-enable mmap feature, but writing time is controlled manually, due to the inference mode handling. Signed-off-by: Jiho Chu <[email protected]> asdfsadf Describe a commit content (Until 80 colums per line) in detail ASAP. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped
Configuration menu - View commit details
-
Copy full SHA for ba67088 - Browse repository at this point
Copy the full SHA ba67088View commit details -
[ hgemm ] Add hgemm experimental kernel
- According to current paper, accumulating up to 64 ~ 128 w.r.t. K-direction is fine. - Since conventional error metric, and newly introduced metric (max component relative error) is fine as well, introduce experiemntal kernel. - using build option -Dhgemm-precision-level=low can enable such kernel when android build **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a568c93 - Browse repository at this point
Copy the full SHA a568c93View commit details -
[ Weight ] Add Var32 Tensor in Weight.
We will add Var32 Tensor if the Variable Weight is not Full precision (FP32). This eables the Weight Update with full precision and only Apply Gradient Process ueses this Tensor. Therefore, the lifespan of this tensor should be "ApplyGradient". . Modify TensorPool to generate Weigth considering Mixed Precsion. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 19da71d - Browse repository at this point
Copy the full SHA 19da71dView commit details -
[ Mixed ] Create weight with var32 tensor
This pr create the variable fp32 tensor when we create the Weight and Optimizer Weight. . update the manager to create Weight with var32 tensor which requested to weight pool. . update the weight requests with Weight Spec and var, grad and var32 tensors which created already. . add clone Tensor with specific type in tensor.h Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5406860 - Browse repository at this point
Copy the full SHA 5406860View commit details -
[ Layers ] Update Layers to support FP16
This PR enables the FP16 support for the layers below: . input layer . mse loss layer Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b4c663e - Browse repository at this point
Copy the full SHA b4c663eView commit details -
[ Test ] Mixed Precision Test Case
This PR includes the mixed precision test case. . Input - FC - MSE : "batch_size=2", "model_tensor_type=FP16-FP16", "loss_scale=128" **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 5444fa0 - Browse repository at this point
Copy the full SHA 5444fa0View commit details -
[ Optimizer ] Update Optimizer / Adam to support Mixed training
This commit modify apply gradient in optimizer. We do not need to save optimizer variables in weight type. Only Optimizer needs the optimizer variables and we should update the weight with full precision to maintain the accuracy. Therefore, remove the var32 tensors for optimizer variables. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 292eb71 - Browse repository at this point
Copy the full SHA 292eb71View commit details -
[ Tensor ] add is_NaN check in Tensor
This PR add is_NaN function to check if the tensor has NaN value. This is for the check NaN during mixed precision training. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for ae868ef - Browse repository at this point
Copy the full SHA ae868efView commit details -
[ Context ] Add loss scale in Context & using mse loss
This PR add loss scale parameter in runcontext and use it to update mse loss. . Add Loss Scale Parameter in RunLayerContext Constructor . Add applyLossScale func to update return derivitive in Loss Layer . Change MSE Loss Layer to apply the loss scale to return derivitive **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for e0596ef - Browse repository at this point
Copy the full SHA e0596efView commit details -
[ Mixed Precision ] Enable Mixed Precision
This PR enables the Mixed Precision Training. For now only FP16-FP32 is considered. Additional Test cases will be added. . add getSortedLayerIdx to set the graph order for fowarding. . change clip_weights to lazy_apply_weights to use both cases. . add fowarding_op to run forwarding from that layer which has a gradient with nan. . add while loop for re-run backwarding after reset the loss scale. . add setLossScale in RunLayerContext . add check the gradient if mixed precsion enable. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 4b7a3ba - Browse repository at this point
Copy the full SHA 4b7a3baView commit details -
[ Tensor ] Add inifinity check in Tensor
This PR add inifinity value check in Tensor data. . rename the hasNaN to isValid . add infinity check in isValid Function and now it check NaN and Inf . modify to check the blas_avx and blas_neon . modify graph and model check is_valid rather than has_nan . add unittest of isValid Function **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 76efb04 - Browse repository at this point
Copy the full SHA 76efb04View commit details -
[ MSE ] Fix for better MSE loss precision
This PR chage the loss computation using full precsion rather than half precsion to maintain accuracy. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 376e67a - Browse repository at this point
Copy the full SHA 376e67aView commit details -
[ TEST ] Add Torch Mixed Precision Model Test
This PR enables the Mixed Precsion Unittest with Torch Model. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d9242f1 - Browse repository at this point
Copy the full SHA d9242f1View commit details -
[ TEST ] add torch input and output test data for mixed precision
This PR add torch mixed precsion golden data generation and input and output for test. . some fixes to test. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 7d664ff - Browse repository at this point
Copy the full SHA 7d664ffView commit details -
[ TEST ] Add more unittest and fixes for mixed precsion
This PR includes more unittest and fixes for mixed precsion. . Model Unittest . 2 fc layer which generate NaN or Inf Gradient from Troch. . MSE Loss and Check whole procedure of the mixed precsion training. . Even if the FC model only have one weight, but it is good enough to validate the mixed precsion. . Torch model also work similar way of NNTrainer. . Some fixes about the exeuction order of apply gradient when the mixed precision is on. . Update SGD to support Mixed Precision training **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for afc2757 - Browse repository at this point
Copy the full SHA afc2757View commit details -
[ Layer ] Update Conv2D to support Mixed Precision
This PR update the conv2D Layer to support Mixed Precision (FP16). It is based on the PR nnstreamer#2579 Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 8c39c64 - Browse repository at this point
Copy the full SHA 8c39c64View commit details -
[ Layer ] enable Mixed Precision in LSTM Layer
This commit enables mixed precision support for LSTM Layer. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for f38b831 - Browse repository at this point
Copy the full SHA f38b831View commit details -
[ Model ] Add Execution Mode in Compile
This PR add Execution Mode parameter when we compile. The default is ml::train::ExeuctionMode::TRAIN. Currently we do not support compiler optimization for inference mode such as batch normalization fusing, etc. But we will add more optimization depending on the exeuction mode. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d47102a - Browse repository at this point
Copy the full SHA d47102aView commit details -
[ Layer ] Mixed Precision support for BN Layer
This PR includes Mixed Precision support for batch normalization layer. When the training, BN layer should run full precsion with FP16 Weight data. Therefore, Reading the FP16 data read and data coversion of the current Weight and Activation are required. For the Inference, we do need compiler optimization like bn fusing. So it also includes execution mode parameters for compile. Because of compilcate data conversion of bn layer, test case generation also needs to update, so that taking the fp16 input,output tensors and weights and converting FP32 weight for computation. For veification, we do need convert FP32 to FP16. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 07b48ec - Browse repository at this point
Copy the full SHA 07b48ecView commit details -
[layer] enable mixed precision - reshape_layer
enable mixed precision on reshape layer - reshape layer only change dim, so change dimensions and check datatype **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b201807 - Browse repository at this point
Copy the full SHA b201807View commit details -
[Layer] Enable mixed precision - pooling2d_layer
Enable Mixed precision on Pooling 2D Layer - I modified it to properly cast for the case of FP16 so that the mixed precision function can be activated on the existing pooling 2d layer. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 825f07c - Browse repository at this point
Copy the full SHA 825f07cView commit details -
[ Model ] Fix the gradient clipping for the FP16 or Low bit Gradient
In this PR, when we compute the l2norm of gradient tensor, it converts to full precsion and computes the l2norm for gradient clipping. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2184452 - Browse repository at this point
Copy the full SHA 2184452View commit details -
[ Layer ] Add mu and var backup up tensor.
This PR add the mu and var backup tensor ( mu_b, var_b ) to restore the previous moving mean and moving variance for mixed precsion training. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 81fd9cb - Browse repository at this point
Copy the full SHA 81fd9cbView commit details -
[ Layer ] prevent randomize when it restore the data
In order to resotore previous iteration data, this pr disable randomnization of mask if it need restore previous data. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 003a5ce - Browse repository at this point
Copy the full SHA 003a5ceView commit details -
[ Context ] add check if it needs restore previous data
This PR enable the check if it need restore previous data. By doing this, we can remove the NaN or Inf data in Tensor for the mixed precsion training. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 57a03ab - Browse repository at this point
Copy the full SHA 57a03abView commit details -
[ Tensor ] remove sscal to set zero.
We do need to remove the Nan or Inf value in Tensor by call setZero(). However, if we using sscal, then Nan or Inf values are remain still. This PR change the sscal to memset. Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for b540642 - Browse repository at this point
Copy the full SHA b540642View commit details -
[ Mixed ] set initialize gradient in layers and bugfixes
This pr fixes some bugs when it runs as Mixed Precision Training **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 16e3a55 - Browse repository at this point
Copy the full SHA 16e3a55View commit details -
[ Mixed Training ] add is_mixed variable in weight
Adding is_mixed variable to check if it is mixed precision training. It means that weight type of model is not full precision. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for aedb11c - Browse repository at this point
Copy the full SHA aedb11cView commit details -
[ BUG FIX ] Fix bug for mixed precision
For the mixed precision computation of bn layer, there is bug relate with f32 computation. Also Adam update has bug too. **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for cbefcd9 - Browse repository at this point
Copy the full SHA cbefcd9View commit details -
[ hgemm ] Use aligned memory allocation in transpose / padding gemm
- Using unaligned memory may invoke SIGSEGV **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: skykongkong8 <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for d39f9d8 - Browse repository at this point
Copy the full SHA d39f9d8View commit details -
[TEST] using builddir/android_build_result to build test
This PR includes changes in Android.mk to use builddir/android_build_result. In order to use, soft link of android_build_reuslt dir is necessary in upper dir (../) ln -s ../../buildir/android_build_result ../nntrainer Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 57e2759 - Browse repository at this point
Copy the full SHA 57e2759View commit details -
[Mixed Precision] Fix mixed precsion to use Tensor V2
This PR includes fixes to use TensorV2 Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 592253f - Browse repository at this point
Copy the full SHA 592253fView commit details
Commits on Oct 7, 2024
-
temporary code for layer initialization
- Temporary code for layer initialization Signed-off-by: hyeonseok lee <[email protected]> (cherry picked from commit fd0b6c3)
Configuration menu - View commit details
-
Copy full SHA for 913a7fe - Browse repository at this point
Copy the full SHA 913a7feView commit details -
Describe a commit content (Until 80 colums per line) in detail ASAP. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]> (cherry picked from commit 846e2c0)
Configuration menu - View commit details
-
Copy full SHA for 1860cd2 - Browse repository at this point
Copy the full SHA 1860cd2View commit details -
[ NNStreamer ] disable nnstreamer trainer
Describe a commit content (Until 80 colums per line) in detail ASAP. **Changes proposed in this PR:** - Added TOC generator for README.md Resolves: **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: jijoong.moon <[email protected]> (cherry picked from commit 80c9855)
Configuration menu - View commit details
-
Copy full SHA for df4baa4 - Browse repository at this point
Copy the full SHA df4baa4View commit details -
[Tizen7.0] Tizen7.0 Backporting
- This commit adds some updates for Tizen7.0 backporting - Type mismatch bug is fixed. - Unused variable is removed. - Missing header files are added in spec file. - spec file is updated Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]> (cherry picked from commit 6d6e924)
Configuration menu - View commit details
-
Copy full SHA for faddcdd - Browse repository at this point
Copy the full SHA faddcddView commit details -
This PR resolves coverity issues in the ShortTensor class. Replace max_abs() implementation with maxValue() since the maximum absolute value of unsigned int equals to the maximum value. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]> (cherry picked from commit 656a231)
Configuration menu - View commit details
-
Copy full SHA for ec06943 - Browse repository at this point
Copy the full SHA ec06943View commit details -
[ Tizen7.0 ] Include neuralnet.h in -dev header
- Update the code to include `neuralnet.h` in -dev header. - Some applications, e.g., ReinforcementLearning uses `forwarding` and `backwarding` directly. To support it, this commit adds the header into dev package. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]> (cherry picked from commit 77e56f1)
Configuration menu - View commit details
-
Copy full SHA for 145a726 - Browse repository at this point
Copy the full SHA 145a726View commit details -
[CI] Fix meson ubuntu ci build
Fix build bug - Currently, there is a bug in the matrix used in CI where the first Meson build runs successfully but subsequent builds fail due to the presence of a 'build' folder. I would like to fix this issue. - Before running the Meson build, ensure that any existing folders named 'build' are deleted. - fix gcc version to 13 Resolves: - nnstreamer#2715 **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]> Co-authored-by: hyeonseok <[email protected]> Co-authored-by: Donghyeon Jeong <[email protected]> (cherry picked from commit b6fdba0)
Configuration menu - View commit details
-
Copy full SHA for e6830cc - Browse repository at this point
Copy the full SHA e6830ccView commit details -
[Tizen7.0] Tizen7.0 Backporting
- This commit adds some updates for Tizen7.0 backporting - Type mismatch bug is fixed. - Unused variable is removed. - Missing header files are added in spec file. - spec file is updated Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]> (cherry picked from commit e025352)
Configuration menu - View commit details
-
Copy full SHA for 19d7dc1 - Browse repository at this point
Copy the full SHA 19d7dc1View commit details -
[ Tizen7.0 ] Include some headers in -dev header for neuralnet.h
- In the previous PR (77e56f1), neuralnet.h was included in dev package. - However, some headers were missing used in nueralnet.h - This PR adds headers which have dependency with neuralnet.h - This PR is tested whether it supports ReinforcementLearning app on Tizen7.0 Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]> (cherry picked from commit 9eb4c85)
Configuration menu - View commit details
-
Copy full SHA for 15cb248 - Browse repository at this point
Copy the full SHA 15cb248View commit details -
[enhance] Registering OpenCL kernels at cl_context
Register custom kernels as well as in-house kernels at cl_context initialization Signed-off-by: Debadri Samaddar <[email protected]> (cherry picked from commit b34fd53)
Configuration menu - View commit details
-
Copy full SHA for 2ab0371 - Browse repository at this point
Copy the full SHA 2ab0371View commit details -
[enhance/gpu] Removing layer_context dependency
Removed layer_context dependency from blas OpenCL kernels. Temporarily commented out cl_layers to avoid build failure. Signed-off-by: Debadri Samaddar <[email protected]> (cherry picked from commit 7ee9294)
Configuration menu - View commit details
-
Copy full SHA for 1f6a2c0 - Browse repository at this point
Copy the full SHA 1f6a2c0View commit details -
[ FC ] update incremental_forwarding to support LoRA and multi-batch
- This commit add some codes to support LoRA in incremental_forwarding. - This commit updates the incremental_forwarding to support multiple batch input. However, it is not the desirable way in that it cannot be parallelized across the batch axis. I left this issue on the comment. Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]> (cherry picked from commit 4ae1477)
Configuration menu - View commit details
-
Copy full SHA for a182f84 - Browse repository at this point
Copy the full SHA a182f84View commit details -
[ LORA ] Bugfix in LoRA support in FC Layer
- In the previous code, LoRA didn't work for the case batch_size > 1. - Tensors used in LoRA-related computation were not updated when the batch size is upsted. - `setBatch()` function is implemented for `FullyConnectedLayer`. - BugFix in Lifespan of loraTmp Tensor: FORWARD_DERIV_LIFESPANE -> FORWARD_GRAD_LIFESPAN Self evaluation: Build test: [X]Passed [ ]Failed [ ]Skipped Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Eunju Yang <[email protected]> (cherry picked from commit 8104cbe)
Configuration menu - View commit details
-
Copy full SHA for 710a160 - Browse repository at this point
Copy the full SHA 710a160View commit details -
[bugfix] Fix memcheck in CacheLoader unit tests
This pull request fixes the issue of failing Continuous Integration. The new patch checks for nullptr after a flush operation on the cache pool. This adjustment is expected to rectify the previous failures encountered during the CI process. **Self-evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghyeon Jeong <[email protected]> (cherry picked from commit 119c60e)
Configuration menu - View commit details
-
Copy full SHA for e0f5d3f - Browse repository at this point
Copy the full SHA e0f5d3fView commit details -
[gpu/enhance] Utility for registering Blas kernels during initialization
Default Blas kernel registration during cl_context initialization Remove RunLayerContext dependency from unit tests Signed-off-by: Debadri Samaddar <[email protected]> (cherry picked from commit 79a7c25)
Configuration menu - View commit details
-
Copy full SHA for c43937c - Browse repository at this point
Copy the full SHA c43937cView commit details -
[ App ] Multi-Input Example Update
- This commit is related to issue nnstreamer#2660 - When using multi-inputs, users must feed the data in reverse order due to a known bug that needs fixing. In the current version, the input must be provided in reverse order, which was not shown in the previous example where random data with the same dimensions were used. - To provide a more accurate example to NNTrainer users, I have temporarily updated this example. - Once the issue is handled, further updates will be necessary. Signed-off-by: Eunju Yang <[email protected]> (cherry picked from commit 2807f69)
Configuration menu - View commit details
-
Copy full SHA for f54120f - Browse repository at this point
Copy the full SHA f54120fView commit details -
[ Print ] Update print result of model summary
- This commit updates the model summary print of the layer with multiple inputs. [ASIS] concat0 concat 1:1:14:2 input0 1:1:4:2 input1 1:1:8:2 input2 [TOBE] concat0 concat 1:1:14:2 input0 input1 input2 Signed-off-by: Eunju Yang <[email protected]> (cherry picked from commit f222ecf)
Configuration menu - View commit details
-
Copy full SHA for 0f043a4 - Browse repository at this point
Copy the full SHA 0f043a4View commit details
Commits on Oct 8, 2024
-
[Mixed Precision] Fix gradient clipping logic
update mixed precision - gradient clipping logic - when gradient clipping, gradient should unscale before calc l2norm **Self evaluation:** 1. Build test: [X]Passed [ ]Failed [ ]Skipped 2. Run test: [X]Passed [ ]Failed [ ]Skipped Signed-off-by: Donghak PARK <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2d4e347 - Browse repository at this point
Copy the full SHA 2d4e347View commit details