Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Wait for #2615] Enable Mixed Precision Training in NNTrainer @open sesame 11/09 15:18 #2663

Merged
merged 32 commits into from
Nov 11, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
b3bb95a
[SWAP] Add swap mode property
jihochu Aug 31, 2023
9f168a5
[SWAP] Add inference mode
jihochu Aug 31, 2023
7e74984
[SWAP] Modify cache for inference mode
jihochu Aug 31, 2023
239ca4e
[ Weight ] Add Var32 Tensor in Weight.
jijoongmoon May 2, 2024
649c92c
[ Mixed ] Create weight with var32 tensor
jijoongmoon May 7, 2024
4e37e89
[ Layers ] Update Layers to support FP16
jijoongmoon May 7, 2024
b2c2e11
[ Test ] Mixed Precision Test Case
jijoongmoon May 7, 2024
f669054
[ Optimizer ] Update Optimizer / Adam to support Mixed training
jijoongmoon May 9, 2024
122d86c
[ Tensor ] add is_NaN check in Tensor
jijoongmoon May 8, 2024
8afa85b
[ Context ] Add loss scale in Context & using mse loss
jijoongmoon May 11, 2024
757cea7
[ Mixed Precision ] Enable Mixed Precision
jijoongmoon May 13, 2024
40cf748
[ Tensor ] Add inifinity check in Tensor
jijoongmoon May 14, 2024
e104427
[ MSE ] Fix for better MSE loss precision
jijoongmoon May 17, 2024
6cf1a09
[ TEST ] Add Torch Mixed Precision Model Test
jijoongmoon May 17, 2024
a4dada0
[ TEST ] add torch input and output test data for mixed precision
jijoongmoon May 20, 2024
1e40557
[ TEST ] Add more unittest and fixes for mixed precsion
jijoongmoon May 24, 2024
ae24fa3
[ Layer ] Update Conv2D to support Mixed Precision
jijoongmoon May 29, 2024
e319919
[ Layer ] enable Mixed Precision in LSTM Layer
jijoongmoon May 30, 2024
6f1e370
[ Model ] Add Execution Mode in Compile
jijoongmoon May 31, 2024
58bdb58
[ Layer ] Mixed Precision support for BN Layer
jijoongmoon Jun 3, 2024
766481d
[layer] enable mixed precision - reshape_layer
DonghakPark May 30, 2024
1b3f3af
[Layer] Enable mixed precision - pooling2d_layer
DonghakPark Jun 3, 2024
e050e83
[ Model ] Fix the gradient clipping for the FP16 or Low bit Gradient
jijoongmoon Jun 9, 2024
224b3e5
[ Layer ] Add mu and var backup up tensor.
jijoongmoon Jun 9, 2024
13a6d1e
[ Layer ] prevent randomize when it restore the data
jijoongmoon Jun 9, 2024
b64a8c1
[ Context ] add check if it needs restore previous data
jijoongmoon Jun 9, 2024
bd5ff2d
[ Tensor ] remove sscal to set zero.
jijoongmoon Jun 9, 2024
c4fd54f
[ Mixed ] set initialize gradient in layers and bugfixes
jijoongmoon Jun 10, 2024
59ad4fc
[ Mixed Training ] add is_mixed variable in weight
jijoongmoon Jun 19, 2024
51fe049
[ BUG FIX ] Fix bug for mixed precision
jijoongmoon Jun 20, 2024
f024585
[TEST] using builddir/android_build_result to build test
jijoongmoon Jul 2, 2024
ea4dd22
[Mixed Precision] Fix mixed precsion to use Tensor V2
jijoongmoon Jul 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion Applications/KNN/jni/meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -15,4 +15,4 @@ e = executable('knn_sample',
install_dir: application_install_dir
)

test('app_knn', e, args: [nntr_app_resdir / 'KNN'])
test('app_knn', e, args: [nntr_app_resdir / 'KNN/'])
17 changes: 9 additions & 8 deletions api/ccapi/include/model.h
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ class Model {
* @retval #ML_ERROR_NONE Successful.
* @retval #ML_ERROR_INVALID_PARAMETER invalid parameter.
*/
virtual int compile() = 0;
virtual int compile(ExecutionMode exec_mode_ = ExecutionMode::TRAIN) = 0;

/**
* @brief Initialize Network. This should be called after setting the
Expand Down Expand Up @@ -188,13 +188,14 @@ class Model {
* @details This function accepts vector of properties in the format -
* { std::string property_name, void * property_val, ...}
*/
virtual int train(const std::vector<std::string> &values = {},
std::function<bool(void *)> stop_cb =
[](void *stop_user_data) { return false; },
void *stop_user_data = nullptr,
std::function<void(void *)> epoch_complete_cb =
[](void *epoch_user_data) { return false; },
void *epoch_user_data = nullptr) = 0;
virtual int train(
const std::vector<std::string> &values = {},
std::function<bool(void *)> stop_cb =
[](void *stop_user_data) { return false; },
void *stop_user_data = nullptr,
std::function<void(void *)> epoch_complete_cb =
[](void *epoch_user_data) { return false; },
void *epoch_user_data = nullptr) = 0;

/**
* @brief Run Model train with callback function by user
Expand Down
2 changes: 2 additions & 0 deletions debian/nntrainer-dev.install
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
/usr/include/nntrainer/blas_interface.h
/usr/include/nntrainer/var_grad.h
/usr/include/nntrainer/weight.h
/usr/include/nntrainer/blas_avx.h
# todo: update dataset headers
/usr/include/nntrainer/databuffer.h
/usr/include/nntrainer/databuffer_factory.h
Expand All @@ -26,6 +27,7 @@
/usr/include/nntrainer/layer_impl.h
/usr/include/nntrainer/operation_layer.h
/usr/include/nntrainer/acti_func.h
/usr/include/nntrainer/loss_layer.h
# custom layer kits
/usr/include/nntrainer/app_context.h
# logger
Expand Down
21 changes: 15 additions & 6 deletions meson.build
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,23 @@ warning_c_flags = [
'-Wno-error=varargs'
]

arch = host_machine.cpu_family()

target = target_machine.cpu_family()

if get_option('enable-avx')
if get_option('platform') != 'android'
if target == 'x86_64' or target == 'x86'
extra_defines += '-DUSE_AVX=1'
add_project_arguments(['-march=native'], language: ['c','cpp'])
add_project_arguments(['-mavx2'], language: ['c','cpp'])
message('-march=native added for AVX hardware acceleration.')
endif
message('This arch does not support avx2')
endif
endif

if get_option('enable-fp16')
arch = host_machine.cpu_family()
if get_option('platform') == 'android'
add_project_arguments('-mfp16-format=ieee', language: ['c', 'cpp'])
extra_defines += '-DENABLE_FP16=1'
Expand Down Expand Up @@ -110,11 +124,6 @@ if get_option('enable-fp16')
if cc.version().version_compare('>=12.1.0')
message ('Float16 for x86_64 enabled. Modern gcc-x64 generally supports float16 with _Float16.')
extra_defines += '-DENABLE_FP16=1'
if get_option('enable-avx')
extra_defines += '-DUSE_AVX=1'
add_project_arguments(['-march=native'], language: ['c','cpp'])
message('-march=native added for AVX hardware acceleration.')
endif
else
warning ('Float16 for x86_64 enabled. However, software emulation is applied for fp16, making it slower and inconsistent. Use GCC 12+ for FP16 support. This build will probably fail unless you bring a compiler that supports fp16 for x64.')
endif
Expand Down
2 changes: 1 addition & 1 deletion meson_options.txt
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ option('enable-fp16', type: 'boolean', value: false)
option('enable-cublas', type: 'boolean', value: false)
option('enable-openmp', type: 'boolean', value: true)
option('enable-neon', type: 'boolean', value: false)
option('enable-avx', type: 'boolean', value: false)
option('enable-avx', type: 'boolean', value: true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pure question : is enabling avx is going default from now on?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will modify the meson to work only if it is available.

option('enable-opencl', type: 'boolean', value: false)

# ml-api dependency (to enable, install capi-inference from github.com/nnstreamer/api )
Expand Down
4 changes: 3 additions & 1 deletion nntrainer/app_context.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -559,6 +559,7 @@ AppContext::registerPluggableFromDirectory(const std::string &base_path) {
struct dirent *entry;

std::vector<int> keys;

while ((entry = readdir(dir)) != NULL) {
if (endswith(entry->d_name, solib_suffix)) {
if (endswith(entry->d_name, layerlib_suffix)) {
Expand All @@ -581,7 +582,8 @@ AppContext::registerPluggableFromDirectory(const std::string &base_path) {
}
}

closedir(dir);
if (dir != NULL)
closedir(dir);

return keys;
}
Expand Down
9 changes: 9 additions & 0 deletions nntrainer/graph/graph_core.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ GraphCore::getSortedNode(unsigned int ith) const {
return Sorted.at(ith);
}

const unsigned int GraphCore::getSortedNodeIdx(const std::string &name) const {
return sorted_node_map.at(name);
}

void GraphCore::makeAdjacencyList(
std::vector<std::list<std::shared_ptr<GraphNode>>> &adj) {
/** initialize the adj list */
Expand Down Expand Up @@ -93,6 +97,11 @@ void GraphCore::topologicalSort() {

if (Sorted.size() != node_list.size())
throw std::runtime_error("Internal error in topologicalSort");
unsigned int idx = 0;
for (auto &n : Sorted) {
sorted_node_map[n->getName()] = idx;
idx++;
}
}

const std::shared_ptr<GraphNode> &
Expand Down
14 changes: 11 additions & 3 deletions nntrainer/graph/graph_core.h
Original file line number Diff line number Diff line change
Expand Up @@ -91,6 +91,13 @@ class GraphCore {
*/
const std::shared_ptr<GraphNode> &getSortedNode(unsigned int ith) const;

/**
* @brief getter of Sorted GraphNode index with name
* @param[in] layer name
* @ret index
*/
const unsigned int getSortedNodeIdx(const std::string &name) const;

/**
* @brief getter of GraphNode with node name
* @param[in] node name
Expand Down Expand Up @@ -249,9 +256,10 @@ class GraphCore {
private:
std::vector<std::shared_ptr<GraphNode>> input_list;
std::vector<std::shared_ptr<GraphNode>> output_list;
std::vector<std::shared_ptr<GraphNode>>
node_list; /**< Unordered Node List */
std::unordered_map<std::string, int> node_map; /**< Unordered Node map */
std::vector<std::shared_ptr<GraphNode>> node_list; /**< Unordered Node List */
std::unordered_map<std::string, int> node_map; /**< Unordered Node map */
std::unordered_map<std::string, int>
sorted_node_map; /**< Unordered Node map */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

quick question! what does this sorted_node_map do? Is it simply the sorted version of the existing node_map?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we do have two node lists at compile time: one for the node list defined by the user and another for the node list sorted by topological sort. Until we finish the compilation, we are not sure the user-defined node list will be used, so we will keep the two node lists. Therefore we do need two node list maps as well.

std::vector<std::shared_ptr<GraphNode>> Sorted; /**< Ordered Node List */
bool sorted; /** if the node_list is sorted */

Expand Down
Loading
Loading