中文 | English
TinyMaix is a tiny inference Neural Network library specifically for microcontrollers (TinyML).
Highlights
- Core Code less than 400 lines(tm_layers.c+tm_model.c+arch_cpu.h), code .text section less than 3KB
- Low ram consume, even Arduino ATmega328 (32KB Flash, 2KB Ram) can run mnist with TinyMaix~
- Support INT8/FP32/FP16 model, experimentally support FP8, convert from keras h5 or tflite.
- Supoort multi architecture accelerate: ARM SIMD/NEON/MVEI,RV32P, RV64V ~
- User-friendly interfaces, just load/run models~
- Support Full Static Memory config
- MaixHub Online Model Training support soon~
Run mnist demo on Arduino ATmega328
mnist demo
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
000000000077AFF9500000000000
000000000AFFFFFFD10000000000
00000000AFFFD8BFF70000000000
00000003FFD2000CF80000000000
00000004FD10007FF40000000000
00000000110000DFF40000000000
00000000000007FFC00000000000
0000000000004FFE300000000000
0000000000008FF9000000000000
00000000000BFF90000000000000
00000000001EFE20000000000000
0000000000CFF800000000000000
0000000004FFB000000000000000
000000001CFF8000000000000000
000000008FFA0000000000000000
00000000FFF10000000000000000
00000000FFF21111000112999900
00000000FFFFFFFFA8AFFFFFFF70
00000000AFFFFFFFFFFFFFFA7730
0000000007777AFFF97720000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
0000000000000000000000000000
===use 49912us
0: 0
1: 0
2: 89
3: 0
4: 1
5: 6
6: 1
7: 0
8: 0
9: 0
### Predict output is: Number 2, prob=89
- optimize tm_layers.c to tm_layers_O1.c, aimed to speed up to 1.4~2.0X
- Train good backbone for 64KB/128KB/256KB/512KB ram litmit
- Add example: Detector,KWS,HAR,Gesture,OCR,...
- ...
Do you want take participate in development of TinyMaix, or discuss with TinyML hobbyist?
Join our telegram group: https://t.me/tinymaix
TinyMaix is design for running AI Neural Network Mdoels on resources limited MCUs, which usually called TinyML
There are many TinyML infer library now, like TFLite micro, microTVM, NNoM, so why we need TinyMaix?
TinyMaix is a weekend hackathons project, so it is simple enough to read though in 30 minutes, and it will help TinyML newbies to understand how is it running.
TinyMaix aims to be a simple TinyML infererence library, it abandon many new features and doesn't use libs like CMSIS-NN.
Following this design goal, now TinyMaix is as simple as 5 files to compile~
We hope TinyMaix can help any MCU run AI Neural Network Mdoels, every one can port it to theirself hardware platform~
Note: Although TinyMaix support multi architecture accelerate, but it still need more effort to balance size and speed.
- Support up to mobilenet v1, RepVGG backbone
- they are most common used, efficient structure for MCUs
- Basic Conv2d, dwConv2d, FC, Relu/Relu6/Softmax, GAP, Reshape
- MaxPool, AvgPool (now use stride instead)
- FP32 model, INT8 quant model, FP16 model(NEW)
- Convert tmdl from keras h5 or tflite
- model is simple enough to train with keras/tf
- tflite have quant functions already
- Model statistics functions in C
- Optional for reduce code
- INT16 quant model
- Advantages:
- more accuracy
- friendly for SIMD/RV32P accelerate
- Disadvantages:
- increase FLASH/RAM consume 2X
- Advantages:
- Concat OP
- Advantages:
- support mobilenet v2, more accuracy
- Disadvantages:
- increase RAM consume 2X
- concat mat cost many time cause model infer slow
- need more work to cvt model into flat structure (in script)
- Advantages:
- Winograd Convolution Optimization
- Advantages:
- may speed up Conv computing
- Disadvantages:
- increase RAM consume, and consume more memory bandwidth
- increase code (.text) size
- need many Transforms, weak MCU may cost many time here
- Advantages:
- BF16 model
- most MCU don't have BF16 computing ability
- accuracy won't better than INT16 to much
- increase FLASH/RAM consume 2X
- AVX/vulkan acceleration
- TinyMaix is for MCUs, not for powerful PC/mobilephones
- other misc OPs
- TinyMaix support MCUs to run basic model in minimum resource consumption, if you want more OPs, switch to TFlite-micro/TVM/NCNN...
MNIST is handwritten digit recognition task, it is simple enough for even 8bit MCU like ATmega328.
Try it on PC:
cd examples/mnist
mkdir build
cd build
cmake ..
make
./mnist
mbnet (mobilenet v1) is simple classification model for mobile devices, but it is still a little heavy for MCUs.
The model in demo is mobilenet v1 0.25, it input 128x128x3 RGB image, output 1000 classes predict.
It need at least 128KB SRAM and 512KB Flash, STM32F411 is the typical minimum config for this model.
Try run mobilenet
cd examples/mbnet
mkdir build
cd build
cmake ..
make
./mbnet
tm_err_t tm_load (tm_mdl_t* mdl, const uint8_t* bin, uint8_tbuf, tm_cb_t cb, tm_mat_t in);
mdl: model handle;
bin: model bin buf;
buf: main buf for middle output; if NULL, auto malloc main buf; else, use your static buffer.
cb: layer callback;
in: return input mat, include buf addr; //you can ignore it if use static buf
void tm_unload(tm_mdl_t* mdl);
tm_err_t tm_preprocess(tm_mdl_t* mdl, tm_pp_t pp_type, tm_mat_t* in, tm_mat_t* out);
TMPP_FP2INT //user own fp buf -> int input buf
TMPP_UINT2INT //int8: cvt in place; int16: can't cvt in place
TMPP_UINT2FP01 // u8/255.0
TMPP_UINT2FPN11// (u8-128)/128
tm_err_t tm_run (tm_mdl_t* mdl, tm_mat_t* in, tm_mat_t* out);
The core file is those 5 files: tm_model.c, tm_layers.c, tinymaix.h, tm_port.h, arch_xxx.h
If you are using normal mcu without any acceleration instructions, choose arch_cpu.h, otherwise choose corresponding architecture header.
And you should edit tm_port.h to fill your desired configs, all config macro have annotation follow it.
Note TM_MAX_CSIZE,TM_MAX_KSIZE,TM_MAX_KCSIZE will occupy static buffers.
And now just put them into your project, compile it~
There are training scripts in examples/mnist to learn how to train simple mnist models.
Note: you need install TensorFlow (>=2.7) first.
After training and save h5 models, you can use scripts in tools to convert to tmdl or c header files.
- h5_to_tflite.py
convert h5 model to float or int8 quant tflite files
python3 h5_to_tflite.py h5/mnist.h5 tflite/mnist_f.tflite 0
python3 h5_to_tflite.py h5/mnist.h5 tflite/mnist_q.tflite 1 quant_img_mnist/ 0to1 - tflite2tmdl.py
convert tflite file to tmdl or c header files.
python3 tflite2tmdl.py tflite/mnist_q.tflite tmdl/mnist_q.tmdl int8 1 28,28,1 10
================ pack model head ================
mdl_type =0
out_deq =1
input_cnt =1
output_cnt =1
layer_cnt =6
buf_size =1464
sub_size =0
in_dims = [3, 28, 28, 1]
out_dims = [1, 1, 1, 10]
================ pack layers ================
CONV_2D
[3, 28, 28, 1] [3, 13, 13, 4]
in_oft:0, size:784; out_oft:784, size:680
padding valid
layer_size=152
CONV_2D
[3, 13, 13, 4] [3, 6, 6, 8]
in_oft:784, size:680; out_oft:0, size:288
padding valid
layer_size=432
CONV_2D
[3, 6, 6, 8] [3, 2, 2, 16]
in_oft:0, size:288; out_oft:1400, size:64
padding valid
layer_size=1360
MEAN
[3, 2, 2, 16] [1, 1, 1, 16]
in_oft:1400, size:64; out_oft:0, size:16
layer_size=48
FULLY_CONNECTED
[1, 1, 1, 16] [1, 1, 1, 10]
in_oft:0, size:16; out_oft:1448, size:16
layer_size=304
SOFTMAX
[1, 1, 1, 10] [1, 1, 1, 10]
OUTPUT!
in_oft:1448, size:16; out_oft:0, size:56
layer_size=48
================ pack done! ================
model size 2.4KB (2408 B) FLASH
buffer size 1.4KB (1464 B) RAM
single layer mode subbuff size 1.4KB (64+1360=1424 B) RAM
Saved to tmdl/mnist_q.tmdl, tmdl/mnist_q.h
Now you have tmdl or c header files, put it into your project to use it~
TODO
TinyMaix use basic dot_product function to accelerate Conv computing.
You just need add arch_xxx_yyy.h in src dir, and implement your platform's dot_product function:
TM_INLINE void tm_dot_prod(mtype_t* sptr, mtype_t* kptr,uint32_t size, sumtype_t* result);
...
If you want contribute fucntions to TinyMaix, please read "TinyMaix Design" sections, we only want functions in "Features in design" and "Features maybe added".
If you want commit your port test result, please commit to benchmark.md. You are welcome to port TinyMaix to your chip/boards, it will prove how easy to use TinyMaix run Deeplearning model in MCUs~
If you have question with TinyMaix usage/porting, please feedback Issues in this repo.
If you have bussiness project consulting or private questions, you can send mail to [email protected] or [email protected] (Caesar Wu).