Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wait for #2663][Mixed Precision] Fix gradient clipping logic #2749

Open
wants to merge 52 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
a5d16a4
[SWAP] Add swap mode property
jihochu Aug 31, 2023
23e40da
[SWAP] Add inference mode
jihochu Aug 31, 2023
ba67088
[SWAP] Modify cache for inference mode
jihochu Aug 31, 2023
a568c93
[ hgemm ] Add hgemm experimental kernel
skykongkong8 Aug 1, 2024
19da71d
[ Weight ] Add Var32 Tensor in Weight.
jijoongmoon May 2, 2024
5406860
[ Mixed ] Create weight with var32 tensor
jijoongmoon May 7, 2024
b4c663e
[ Layers ] Update Layers to support FP16
jijoongmoon May 7, 2024
5444fa0
[ Test ] Mixed Precision Test Case
jijoongmoon May 7, 2024
292eb71
[ Optimizer ] Update Optimizer / Adam to support Mixed training
jijoongmoon May 9, 2024
ae868ef
[ Tensor ] add is_NaN check in Tensor
jijoongmoon May 8, 2024
e0596ef
[ Context ] Add loss scale in Context & using mse loss
jijoongmoon May 11, 2024
4b7a3ba
[ Mixed Precision ] Enable Mixed Precision
jijoongmoon May 13, 2024
76efb04
[ Tensor ] Add inifinity check in Tensor
jijoongmoon May 14, 2024
376e67a
[ MSE ] Fix for better MSE loss precision
jijoongmoon May 17, 2024
d9242f1
[ TEST ] Add Torch Mixed Precision Model Test
jijoongmoon May 17, 2024
7d664ff
[ TEST ] add torch input and output test data for mixed precision
jijoongmoon May 20, 2024
afc2757
[ TEST ] Add more unittest and fixes for mixed precsion
jijoongmoon May 24, 2024
8c39c64
[ Layer ] Update Conv2D to support Mixed Precision
jijoongmoon May 29, 2024
f38b831
[ Layer ] enable Mixed Precision in LSTM Layer
jijoongmoon May 30, 2024
d47102a
[ Model ] Add Execution Mode in Compile
jijoongmoon May 31, 2024
07b48ec
[ Layer ] Mixed Precision support for BN Layer
jijoongmoon Jun 3, 2024
b201807
[layer] enable mixed precision - reshape_layer
DonghakPark May 30, 2024
825f07c
[Layer] Enable mixed precision - pooling2d_layer
DonghakPark Jun 3, 2024
2184452
[ Model ] Fix the gradient clipping for the FP16 or Low bit Gradient
jijoongmoon Jun 9, 2024
81fd9cb
[ Layer ] Add mu and var backup up tensor.
jijoongmoon Jun 9, 2024
003a5ce
[ Layer ] prevent randomize when it restore the data
jijoongmoon Jun 9, 2024
57a03ab
[ Context ] add check if it needs restore previous data
jijoongmoon Jun 9, 2024
b540642
[ Tensor ] remove sscal to set zero.
jijoongmoon Jun 9, 2024
16e3a55
[ Mixed ] set initialize gradient in layers and bugfixes
jijoongmoon Jun 10, 2024
aedb11c
[ Mixed Training ] add is_mixed variable in weight
jijoongmoon Jun 19, 2024
cbefcd9
[ BUG FIX ] Fix bug for mixed precision
jijoongmoon Jun 20, 2024
d39f9d8
[ hgemm ] Use aligned memory allocation in transpose / padding gemm
skykongkong8 Jun 20, 2024
57e2759
[TEST] using builddir/android_build_result to build test
jijoongmoon Jul 2, 2024
592253f
[Mixed Precision] Fix mixed precsion to use Tensor V2
jijoongmoon Jul 29, 2024
913a7fe
temporary code for layer initialization
lhs8928 Mar 21, 2024
1860cd2
[ SPEC ] chagne fp16
jijoongmoon Apr 22, 2024
df4baa4
[ NNStreamer ] disable nnstreamer trainer
jijoongmoon Apr 22, 2024
faddcdd
[Tizen7.0] Tizen7.0 Backporting
EunjuYang Aug 26, 2024
ec06943
[bugfix] fix coverity issues
djeong20 Aug 29, 2024
145a726
[ Tizen7.0 ] Include neuralnet.h in -dev header
EunjuYang Sep 5, 2024
e6830cc
[CI] Fix meson ubuntu ci build
DonghakPark Aug 23, 2024
19d7dc1
[Tizen7.0] Tizen7.0 Backporting
EunjuYang Aug 26, 2024
15cb248
[ Tizen7.0 ] Include some headers in -dev header for neuralnet.h
EunjuYang Sep 11, 2024
2ab0371
[enhance] Registering OpenCL kernels at cl_context
s-debadri Sep 11, 2024
1f6a2c0
[enhance/gpu] Removing layer_context dependency
s-debadri Sep 13, 2024
a182f84
[ FC ] update incremental_forwarding to support LoRA and multi-batch
EunjuYang Sep 2, 2024
710a160
[ LORA ] Bugfix in LoRA support in FC Layer
EunjuYang Sep 5, 2024
e0f5d3f
[bugfix] Fix memcheck in CacheLoader unit tests
djeong20 Sep 30, 2024
c43937c
[gpu/enhance] Utility for registering Blas kernels during initialization
s-debadri Sep 24, 2024
f54120f
[ App ] Multi-Input Example Update
EunjuYang Oct 2, 2024
0f043a4
[ Print ] Update print result of model summary
EunjuYang Oct 2, 2024
2d4e347
[Mixed Precision] Fix gradient clipping logic
DonghakPark Oct 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 10 additions & 3 deletions .github/workflows/ubuntu_clean_meson_build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,11 +25,18 @@ jobs:
run: sudo apt-get update && sudo apt-get install -y gcc g++ pkg-config libopenblas-dev libiniparser-dev libjsoncpp-dev libcurl3-dev tensorflow2-lite-dev nnstreamer-dev libglib2.0-dev libgstreamer1.0-dev libgtest-dev ml-api-common-dev flatbuffers-compiler ml-inference-api-dev libunwind-dev
- name: install additional packages for features
run: sudo apt-get install -y python3-dev python3-numpy python3
- name: gcc version change
run: |
sudo add-apt-repository ppa:ubuntu-toolchain-r/test
sudo apt-get install build-essential
sudo apt update
sudo apt install -y gcc-13
sudo apt install -y g++-13
sudo update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-13 1000
sudo update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-13 1000
sudo update-alternatives --set gcc /usr/bin/gcc-13
- name: install build systems
run: sudo apt install meson ninja-build
- run: meson setup build/
env:
CC: gcc
- run: |
meson \
--buildtype=plain \
Expand Down
2 changes: 1 addition & 1 deletion Applications/KNN/jni/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ e = executable('knn_sample',
install_dir: application_install_dir
)

test('app_knn', e, args: [nntr_app_resdir / 'KNN'])
test('app_knn', e, args: [nntr_app_resdir / 'KNN/'])
27 changes: 27 additions & 0 deletions Applications/Multi_input/Readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Multi_Input example

- This example demonstrates how to use the `multi_input` layer.
- The NNTrainer supports a network that takes multiple tensors as inputs.
- Users can create multiple `input` layers for the network with their own names and build the network accordingly.
- This code includes an example of training with...

```
+-----------+
| output |
+-----------+
|
+---------------------------------------------------+
| flatten |
+---------------------------------------------------+
|
+---------------------------------------------------+
| concat0 |
+---------------------------------------------------+
| | |
+-----------+ +-----------+ +-----------+
| input 2 | | input 1 | | input 0 |
+-----------+ +-----------+ +-----------+

```

- **[Note]** Users should feed the multi-input in reverse order because the model is structured in a reversed manner internally. This is a known issue for us, and we plan to address it soon.
18 changes: 11 additions & 7 deletions Applications/Multi_input/jni/main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,18 @@ ModelHandle createMultiInputModel() {
layers.push_back(createLayer(
"input", {withKey("name", "input0"), withKey("input_shape", "1:2:2")}));
layers.push_back(createLayer(
"input", {withKey("name", "input1"), withKey("input_shape", "1:2:2")}));
"input", {withKey("name", "input1"), withKey("input_shape", "1:4:2")}));
layers.push_back(createLayer(
"input", {withKey("name", "input2"), withKey("input_shape", "1:2:2")}));
"input", {withKey("name", "input2"), withKey("input_shape", "1:8:2")}));

layers.push_back(
createLayer("concat", {withKey("name", "concat0"), withKey("axis", "3"),
createLayer("concat", {withKey("name", "concat0"), withKey("axis", "2"),
withKey("input_layers", "input0, input1, input2")}));

layers.push_back(
createLayer("flatten", {withKey("name", "flatten0"),
withKey("input_layers", "concat0")}));

layers.push_back(createLayer(
"fully_connected", {withKey("unit", 5), withKey("activation", "softmax")}));

Expand Down Expand Up @@ -123,16 +127,16 @@ std::array<UserDataType, 1>
createFakeMultiDataGenerator(unsigned int batch_size,
unsigned int simulated_data_size) {
UserDataType train_data(new nntrainer::util::MultiDataLoader(
{{batch_size, 1, 2, 2}, {batch_size, 1, 2, 2}, {batch_size, 1, 2, 2}},
{{batch_size, 1, 2, 2}, {batch_size, 1, 4, 2}, {batch_size, 1, 8, 2}},
{{batch_size, 1, 1, 5}}, simulated_data_size));

return {std::move(train_data)};
}

int main(int argc, char *argv[]) {
unsigned int total_data_size = 16;
unsigned int batch_size = 2;
unsigned int epoch = 2;
unsigned int total_data_size = 32;
unsigned int batch_size = 4;
unsigned int epoch = 10;

std::array<UserDataType, 1> user_datas;

Expand Down
7 changes: 4 additions & 3 deletions Applications/Multi_input/jni/multi_loader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -78,10 +78,11 @@ void MultiDataLoader::next(float **input, float **label, bool *last) {
};

float **cur_input_tensor = input;
const auto num_input = input_shapes.size() - 1;
for (unsigned int i = 0; i < input_shapes.size(); ++i) {
fill_input(*cur_input_tensor, input_shapes.at(i).getFeatureLen(),
indicies[count]);
cur_input_tensor++;
fill_input(*cur_input_tensor,
input_shapes.at(num_input - i).getFeatureLen(), indicies[count]);
++cur_input_tensor;
}

float **cur_label_tensor = label;
Expand Down
8 changes: 7 additions & 1 deletion api/ccapi/include/layer.h
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,11 @@ class Layer {
*/
virtual const std::string getType() const = 0;

/**
* @brief Initialize layer
*/
virtual void initialize() = 0;

/**
* @brief Default allowed properties
* - input shape : string
Expand Down Expand Up @@ -261,7 +266,8 @@ createLayer(const LayerType &type,
*/
std::unique_ptr<Layer>
createLayer(const std::string &type,
const std::vector<std::string> &properties = {});
const std::vector<std::string> &properties = {},
const LayerComputeEngine &compute_engine = LayerComputeEngine::CPU);

/**
* @brief General Layer Factory function to register Layer
Expand Down
17 changes: 9 additions & 8 deletions api/ccapi/include/model.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ class Model {
* @retval #ML_ERROR_NONE Successful.
* @retval #ML_ERROR_INVALID_PARAMETER invalid parameter.
*/
virtual int compile() = 0;
virtual int compile(ExecutionMode exec_mode_ = ExecutionMode::TRAIN) = 0;

/**
* @brief Initialize Network. This should be called after setting the
Expand Down Expand Up @@ -188,13 +188,14 @@ class Model {
* @details This function accepts vector of properties in the format -
* { std::string property_name, void * property_val, ...}
*/
virtual int train(const std::vector<std::string> &values = {},
std::function<bool(void *)> stop_cb =
[](void *stop_user_data) { return false; },
void *stop_user_data = nullptr,
std::function<void(void *)> epoch_complete_cb =
[](void *epoch_user_data) { return false; },
void *epoch_user_data = nullptr) = 0;
virtual int train(
const std::vector<std::string> &values = {},
std::function<bool(void *)> stop_cb =
[](void *stop_user_data) { return false; },
void *stop_user_data = nullptr,
std::function<void(void *)> epoch_complete_cb =
[](void *epoch_user_data) { return false; },
void *epoch_user_data = nullptr) = 0;

/**
* @brief Run Model train with callback function by user
Expand Down
5 changes: 3 additions & 2 deletions api/ccapi/src/factory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -40,8 +40,9 @@ std::unique_ptr<Layer> createLayer(const LayerType &type,
* @brief Factory creator with constructor for layer
*/
std::unique_ptr<Layer> createLayer(const std::string &type,
const std::vector<std::string> &properties) {
return nntrainer::createLayerNode(type, properties);
const std::vector<std::string> &properties,
const LayerComputeEngine &compute_engine) {
return nntrainer::createLayerNode(type, properties, compute_engine);
}

std::unique_ptr<Optimizer>
Expand Down
26 changes: 26 additions & 0 deletions debian/nntrainer-dev.install
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,16 @@
/usr/include/nntrainer/blas_interface.h
/usr/include/nntrainer/var_grad.h
/usr/include/nntrainer/weight.h
/usr/include/nntrainer/blas_avx.h
# todo: update dataset headers
/usr/include/nntrainer/databuffer.h
/usr/include/nntrainer/databuffer_factory.h
# layer headers
/usr/include/nntrainer/layer_context.h
/usr/include/nntrainer/layer_devel.h
/usr/include/nntrainer/layer_impl.h
/usr/include/nntrainer/loss_layer.h
/usr/include/nntrainer/acti_func.h
# custom layer kits
/usr/include/nntrainer/app_context.h
# logger
Expand All @@ -41,3 +44,26 @@
/usr/include/nntrainer/util_func.h
/usr/include/nntrainer/fp16.h
/usr/include/nntrainer/util_simd.h
# model
/usr/include/nntrainer/neuralnet.h
## neuralnet.h : forwarding() / backwarding() support
/usr/include/nntrainer/compiler_fwd.h
/usr/include/nntrainer/dynamic_training_optimization.h
/usr/include/nntrainer/layer_node.h
/usr/include/nntrainer/graph_node.h
/usr/include/nntrainer/model_common_properties.h
/usr/include/nntrainer/network_graph.h
/usr/include/nntrainer/graph_core.h
/usr/include/nntrainer/graph_node.h
/usr/include/nntrainer/manager.h
/usr/include/nntrainer/basic_planner.h
/usr/include/nntrainer/memory_planner.h
/usr/include/nntrainer/tensor_pool.h
/usr/include/nntrainer/cache_loader.h
/usr/include/nntrainer/task.h
/usr/include/nntrainer/task_executor.h
/usr/include/nntrainer/cache_pool.h
/usr/include/nntrainer/cache_elem.h
/usr/include/nntrainer/memory_pool.h
/usr/include/nntrainer/swap_device.h
/usr/include/nntrainer/optimizer_wrapped.h
18 changes: 12 additions & 6 deletions meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,19 @@ warning_c_flags = [
'-Wno-error=varargs'
]

arch = host_machine.cpu_family()

if get_option('enable-avx')
extra_defines += '-DUSE_AVX=1'
if get_option('platform') == 'tizen'
add_project_arguments(['-mavx2'], language: ['c','cpp'])
else
add_project_arguments(['-march=native'], language: ['c','cpp'])
endif
message('-march=native added for AVX hardware acceleration.')
endif

if get_option('enable-fp16')
arch = host_machine.cpu_family()
if get_option('platform') == 'android'
add_project_arguments('-mfp16-format=ieee', language: ['c', 'cpp'])
extra_defines += '-DENABLE_FP16=1'
Expand All @@ -88,6 +98,7 @@ if get_option('enable-fp16')
# comaptible with armv8.0 machines.
if cxx.has_argument('-mfp16-format=ieee')
add_project_arguments('-mfp16-format=ieee', language: ['c', 'cpp'])
add_project_arguments('-march=armv8.2-a+fp16', language: ['c', 'cpp'])
else
message ('The compiler does not support -mfp16-format=ieee. However, according to https://gcc.gnu.org/onlinedocs/gcc-9.1.0/gcc/Half-Precision.html, gcc may use IEEE fp16 anyway. Thus, we will proceed without the option for FP16 support.')
endif
Expand All @@ -109,11 +120,6 @@ if get_option('enable-fp16')
if cc.version().version_compare('>=12.1.0')
message ('Float16 for x86_64 enabled. Modern gcc-x64 generally supports float16 with _Float16.')
extra_defines += '-DENABLE_FP16=1'
if get_option('enable-avx')
extra_defines += '-DUSE_AVX=1'
add_project_arguments(['-march=native'], language: ['c','cpp'])
message('-march=native added for AVX hardware acceleration.')
endif
else
warning ('Float16 for x86_64 enabled. However, software emulation is applied for fp16, making it slower and inconsistent. Use GCC 12+ for FP16 support. This build will probably fail unless you bring a compiler that supports fp16 for x64.')
endif
Expand Down
2 changes: 1 addition & 1 deletion meson_options.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ option('enable-fp16', type: 'boolean', value: false)
option('enable-cublas', type: 'boolean', value: false)
option('enable-openmp', type: 'boolean', value: true)
option('enable-neon', type: 'boolean', value: false)
option('enable-avx', type: 'boolean', value: false)
option('enable-avx', type: 'boolean', value: true)
option('enable-opencl', type: 'boolean', value: false)

# ml-api dependency (to enable, install capi-inference from github.com/nnstreamer/api )
Expand Down
2 changes: 1 addition & 1 deletion nnstreamer/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -3,5 +3,5 @@ if get_option('enable-nnstreamer-tensor-filter').enabled()
subdir('tensor_filter')
endif
if get_option('enable-nnstreamer-tensor-trainer').enabled()
subdir('tensor_trainer')
subdir('tensor_trainer')
endif
Loading