-
Notifications
You must be signed in to change notification settings - Fork 4.2k
faq
- 打开QQ→点击群聊搜索→搜索群号637093648→输入问题答案:卷卷卷卷卷→进入群聊→准备接受图灵测试(bushi)
- 前往QQ搜索Pocky群:677104663(超多大佬),问题答案:multi level intermediate representation
- nihui的bilibili直播间:水竹院落
-
git clone --recursive https://github.com/Tencent/ncnn/
或者
-
参见 https://github.com/Tencent/ncnn/wiki/how-to-build
-
The submodules were not downloaded! Please update submodules with "git submodule update --init" and try again
如上,下载完整源码。或者按提示执行: git submodule update --init
-
sudo apt-get install libprotobuf-dev protobuf-compiler
-
Could not find a package configuration file provided by "OpenCV" with any of the following names: OpenCVConfig.cmake opencv-config.cmake
sudo apt-get install libopencv-dev
或者自行编译安装,set(OpenCV_DIR {OpenCVConfig.cmake所在目录})
-
Could not find a package configuration file provided by "ncnn" with any of the following names: ncnnConfig.cmake ncnn-config.cmake
set(ncnn_DIR {ncnnConfig.cmake所在目录})
-
cmake版本 3.10,否则没有带 FindVulkan.cmake
android-api >= 24
macos 要先执行安装脚本
-
undefined reference to __kmpc_for_static_init_4 __kmpc_for_static_fini __kmpc_fork_call ...
需要链接openmp库
undefined reference to vkEnumerateInstanceExtensionProperties vkGetInstanceProcAddr vkQueueSubmit ...
需要 vulkan-1.lib
undefined reference to glslang::InitializeProcess() glslang::TShader::TShader(EShLanguage) ...
需要 glslang.lib OGLCompiler.lib SPIRV.lib OSDependent.lib
undefined reference to AAssetManager_fromJava AAssetManager_open AAsset_seek ...
find_library和target_like_libraries中增加 android
find_package(ncnn)
-
opencv rtti -> opencv-mobile
-
升级编译器 / libgcc_s libgcc
-
升级 gcc
-
https://github.com/Tencent/ncnn/wiki/build-for-android.zh 以及见 如何裁剪更小的 ncnn 库
-
先ncnnoptimize再增加自定义层,避免ncnnoptimize不能处理自定义层保存。
-
产生原因是项目工程中使用的库配置不一样导致冲突,根据自己的实际情况分析是需要开启还是关闭。ncnn默认是ON,在重新编译ncnn时增加以下2个参数即可:
- 开启:-DNCNN_DISABLE_RTTI=OFF -DNCNN_DISABLE_EXCEPTION=OFF
- 关闭:-DNCNN_DISABLE_RTTI=ON -DNCNN_DISABLE_EXCEPTION=ON
-
可能的情况:
- 尝试升级 Android Studio 的 NDK 版本
wget https://github.com/Kitware/CMake/releases/download/v3.18.2/cmake-3.18.2-Linux-x86_64.tar.gz
tar zxvf cmake-3.18.2-Linux-x86_64.tar.gz
mv cmake-3.18.2-Linux-x86_64 /opt/cmake-3.18.2
ln -sf /opt/cmake-3.18.2/bin/* /usr/bin/
编译ncnn,make install。linux/windows set/export ncnn_DIR 指向 install目录下包含ncnnConfig.cmake 的目录
-
./caffe2ncnn caffe.prototxt caffe.caffemodel ncnn.param ncnn.bin
-
./mxnet2ncnn mxnet-symbol.json mxnet.params ncnn.param ncnn.bin
-
https://github.com/MarsTechHAN/keras2ncnn @MarsTechHAN
-
onnx-simplifier 静态shape
-
Input 0=w 1=h 2=c
-
ncnnoptimize model.param model.bin yolov5s-opt.param yolov5s-opt.bin 65536
-
Interp Reshape
-
ncnn2mem
-
Yes,全平台通用
-
检测:
参考up的一篇文章https://zhuanlan.zhihu.com/p/128974102,步骤三就是去掉后处理,再导出onnx,其中去掉后处理可以是项目内测试时去掉后续步骤的结果。
方式一:
ONNX_ATEN_FALLBACK 完全自定义的op,先改成能导出的(如 concat slice),转到 ncnn 后再修改 param
方式二:
可以使用PNNX来试试,参考以下文章大概说明:
-
出现此类问题请先更新GPU驱动。Please upgrade your GPU driver if you encounter this crash or error. 这里提供了一些品牌的GPU驱动下载网址.We have provided some drivers' download pages here. Intel,AMD,Nvidia
-
python setup.py develop
-
文件路径 working dir
File not found or not readable. Make sure that XYZ.param/XYZ.bin is accessible.
-
layer name vs blob name
param.bin 应该用 xxx.id.h 的枚举
-
模型本身有问题
Your model file is being the old format converted by an old caffe2ncnn tool.
Checkout the latest ncnn code, build it and regenerate param and model binary files, and that should work.
Make sure that your param file starts with the magic number 7767517.
you may find more info on use-ncnn-with-alexnet
When adding the softmax layer yourself, you need to add 1=1
-
你应该在 load_param / load_model 之前设置 net.opt.use_vulkan_compute = true;
-
多次执行
ex.input()
和ex.extract()
ex.input("data1", in_1);
ex.input("data2", in_2);
ex.extract("output1", out_1);
ex.extract("output2", out_2);
-
不会
-
cmake -DNCNN_BENCHMARK=ON ..
-
from_pixels to_pixels
-
首先,自己申请的内存需要自己管理,此时ncnn::Mat不会自动给你释放你传过来的float数据
std::vector<float> testData(60, 1.0); // 利用std::vector<float>自己管理内存的申请和释放 ncnn::Mat in1 = ncnn::Mat(60, (void*)testData.data()).reshape(4, 5, 3); // 把float数据的指针转成void*传过去即可,甚至还可以指定维度(up说最好使用reshape用来解决channel gap) float* a = new float[60]; // 自己new一块内存,后续需要自己释放 ncnn::Mat in2 = ncnn::Mat(60, (void*)a).reshape(4, 5, 3).clone(); // 使用方法和上面相同,clone() to transfer data owner
-
mat.fill(0.f);
-
cmake时会打印
c_api.h ncnn_version()
自己拼 1.0+yyyymmdd
-
yuv420sp2rgb yuv420sp2rgb_nv12
-
get_affine_transform
warpaffine_bilinear_c3
// 计算变换矩阵 并且求逆变换
int type = 0; // 0->区域外填充为v[0],v[1],v[2], -233->区域外不处理
unsigned int v = 0;
float tm[6];
float tm_inv[6];
// 人脸区域在原图上的坐标和宽高
float src_x = target->det.rect.x / target->det.w * pIveImageU8C3->u32Width;
float src_y = target->det.rect.y / target->det.h * pIveImageU8C3->u32Height;
float src_w = target->det.rect.w / target->det.w * pIveImageU8C3->u32Width;
float src_h = target->det.rect.h / target->det.h * pIveImageU8C3->u32Height;
float point_src[10] = {
src_x + src_w * target->attr.land[0][0], src_x + src_w * target->attr.land[0][1],
src_x + src_w * target->attr.land[1][0], src_x + src_w * target->attr.land[1][1],
src_x + src_w * target->attr.land[2][0], src_x + src_w * target->attr.land[2][1],
src_x + src_w * target->attr.land[3][0], src_x + src_w * target->attr.land[3][1],
src_x + src_w * target->attr.land[4][0], src_x + src_w * target->attr.land[4][1],
};
float point_dst[10] = { // +8 是因为我们处理112*112的图
30.2946f + 8.0f, 51.6963f,
65.5318f + 8.0f, 51.5014f,
48.0252f + 8.0f, 71.7366f,
33.5493f + 8.0f, 92.3655f,
62.7299f + 8.0f, 92.2041f,
};
// 第一种方式:先计算变换在求逆
AffineTrans::get_affine_transform(point_src, point_dst, 5, tm);
AffineTrans::invert_affine_transform(tm, tm_inv);
// 第二种方式:直接拿到求逆的结果
// AffineTrans::get_affine_transform(point_dst, point_src, 5, tm_inv);
// rgb 分离的,所以要单独处理
for(int c = 0; c < 3; c++)
{
unsigned char* pSrc = malloc(xxx);
unsigned char* pDst = malloc(xxx);
ncnn::warpaffine_bilinear_c1(pSrc, SrcWidth, SrcHeight, SrcStride[c], pDst, DstWidth, DstHeight, DstStride[c], tm_inv, type, v);
}
// rgb packed则可以一次处理
ncnn::warpaffine_bilinear_c3(pSrc, SrcWidth, SrcHeight, SrcStride, pDst, DstWidth, DstHeight, DstStride, tm_inv, type, v);
-
ncnn::Mat output;
ex.extract("your_blob_name", output);
-
windows 10 任务管理器 - 性能选项卡 - GPU - 选择其中一个视图左上角的下拉箭头切换到 Compute_0 / Compute_1 / Cuda
你还可以安装软件:GPU-Z
-
Your network contains some operations that are not implemented in ncnn.
You may implement them as custom layer followed in how-to-implement-custom-layer-step-by-step.
Or you could simply register them as no-op if you are sure those operations make no sense.
class Noop : public ncnn::Layer {};
DEFINE_LAYER_CREATOR(Noop)
net.register_custom_layer("LinearRegressionOutput", Noop_layer_creator);
net.register_custom_layer("MAERegressionOutput", Noop_layer_creator);
-
You shall call Net::load_param() first, then Net::load_model().
This error may also happens when Net::load_param() failed, but not properly handled.
For more information about the ncnn model load api, see ncnn-load-model
-
The pointer passed to Net::load_param() or Net::load_model() is not 32bit aligned.
In practice, the head pointer of std::vector is not guaranteed to be 32bit aligned.
you can store your binary buffer in ncnn::Mat structure, its internal memory is aligned.
-
This usually happens if you bundle multiple shared library with openmp linked
It is actually an issue of the android ndk https://github.com/android/ndk/issues/1028
On old android ndk, modify the link flags as
-Wl,-Bstatic -lomp -Wl,-Bdynamic
For recent ndk >= 21
-fstatic-openmp
-
Newer android ndk defaults to dynamic openmp runtime
modify the link flags as
-fstatic-openmp -fopenmp
-
for optimal performance, the openmp threadpool spin waits for about a second prior to shutting down in case more work becomes available.
If you unload a dynamic library that's in the process of spin-waiting, it will crash in the manner you see (most of the time).
Just set OMP_WAIT_POLICY=passive in your environment, before calling loadlibrary. or Just wait a few seconds before calling freelibrary.
You can also use the following method to set environment variables in your code:
for msvc++:
SetEnvironmentVariable(_T("OMP_WAIT_POLICY"), _T("passive"));
for g++:
setenv("OMP_WAIT_POLICY", "passive", 1) reference: https://stackoverflow.com/questions/34439956/vc-crash-when-freeing-a-dll-built-with-openmp
void pretty_print(const ncnn::Mat& m)
{
for (int q=0; q<m.c; q++)
{
const float* ptr = m.channel(q);
for (int y=0; y<m.h; y++)
{
for (int x=0; x<m.w; x++)
{
printf("%f ", ptr[x]);
}
ptr += m.w;
printf("\n");
}
printf("------------------------\n");
}
}
In Android Studio, printf
will not work, you can use __android_log_print
instead. Example :
#include <android/log.h> // Don't forget this
void pretty_print(const ncnn::Mat& m)
{
for (int q=0; q<m.c; q++)
{
for (int y=0; y<m.h; y++)
{
for (int x=0; x<m.w; x++)
{
__android_log_print(ANDROID_LOG_DEBUG,"LOG_TAG","ncnn Mat is : %f", m.channel(q).row(y)[x]);
}
}
}
}
void visualize(const char* title, const ncnn::Mat& m)
{
std::vector<cv::Mat> normed_feats(m.c);
for (int i=0; i<m.c; i++)
{
cv::Mat tmp(m.h, m.w, CV_32FC1, (void*)(const float*)m.channel(i));
cv::normalize(tmp, normed_feats[i], 0, 255, cv::NORM_MINMAX, CV_8U);
cv::cvtColor(normed_feats[i], normed_feats[i], cv::COLOR_GRAY2BGR);
// check NaN
for (int y=0; y<m.h; y++)
{
const float* tp = tmp.ptr<float>(y);
uchar* sp = normed_feats[i].ptr<uchar>(y);
for (int x=0; x<m.w; x++)
{
float v = tp[x];
if (v != v)
{
sp[0] = 0;
sp[1] = 0;
sp[2] = 255;
}
sp += 3;
}
}
}
int tw = m.w < 10 ? 32 : m.w < 20 ? 16 : m.w < 40 ? 8 : m.w < 80 ? 4 : m.w < 160 ? 2 : 1;
int th = (m.c - 1) / tw + 1;
cv::Mat show_map(m.h * th, m.w * tw, CV_8UC3);
show_map = cv::Scalar(127);
// tile
for (int i=0; i<m.c; i++)
{
int ty = i / tw;
int tx = i % tw;
normed_feats[i].copyTo(show_map(cv::Rect(tx * m.w, ty * m.h, m.w, m.h)));
}
cv::resize(show_map, show_map, cv::Size(0,0), 2, 2, cv::INTER_NEAREST);
cv::imshow(title, show_map);
}
-
复用 Extractor?!
-
net.opt.use_fp16_packed = false;
net.opt.use_fp16_storage = false;
net.opt.use_fp16_arithmetic = false;
-
ncnn::set_cpu_powersave(int)绑定大核或小核 注意windows系统不支持绑核。 ncnn支持不同的模型运行在不同的核心。假设硬件平台有2个大核,4个小核,你想把netA运行在大核,netB运行在小核。 可以通过std::thread or pthread创建两个线程,运行如下代码: 0:全部 1:小核 2:大核
void thread_1()
{
ncnn::set_cpu_powersave(2); // bind to big cores
netA.opt.num_threads = 2;
}
void thread_2()
{
ncnn::set_cpu_powersave(1); // bind to little cores
netB.opt.num_threads = 4;
}
-
get_cpu_count
get_gpu_count
-
使用方式一:
- ./ncnnoptimize ncnn.param ncnn.bin new.param new.bin flag
注意这里的flag指的是fp32和fp16,其中0指的是fp32,1指的是fp16
使用方式二:
- ./ncnnoptimize ncnn.param ncnn.bin new.param new.bin flag cutstartname cutendname
cutstartname:模型截取的起点
cutendname:模型截取的终点
- ./ncnnoptimize ncnn.param ncnn.bin new.param new.bin flag
-
opt.num_threads
-
net.opt.openmp_blocktime = 0;
OMP_WAIT_POLICY=passive
int max_batch_size = vkdev->info.compute_queue_count;
ncnn::Mat inputs[1000];
ncnn::Mat outputs[1000];
#pragma omp parallel for num_threads(max_batch_size)
for (int i=0; i<1000; i++)
{
ncnn::Extractor ex = net1.create_extractor();
ex.input("data", inputs[i]);
ex.extract("prob", outputs[i]);
}
-
先 extract 分类,判断后,再 extract bbox
net.opt.use_packing_layout = true;
net.opt.use_bf16_storage = true;
A53
-
对内存消耗的影响
-
nVidia显卡(Intel和AMD估计也有)会在它认为的所谓空闲模式下,自动进入
节能模式
,显存和核心频率就都会降低。简单来说就是如果你的计算任务是
非连续的
,那么可能会让耗时看起来非常不均匀
,当期间有运算空闲间隔发生,显卡进入节能模式,则会在下一次冷启动时发生计算耗时远超正常耗时几倍的情况,如下日志所示://开始播放 Total: 162ms, Diff: 0ms, GLTex2Mat: 7ms, calc: 152ms, Mat2GLTex: 3ms Total: 43ms, Diff: 0ms, GLTex2Mat: 3ms, calc: 35ms, Mat2GLTex: 2ms Total: 45ms, Diff: 0ms, GLTex2Mat: 3ms, calc: 37ms, Mat2GLTex: 3ms Total: 40ms, Diff: 0ms, GLTex2Mat: 3ms, calc: 32ms, Mat2GLTex: 4ms //暂停3秒 //继续播放 Total: 190ms, Diff: 0ms, GLTex2Mat: 9ms, calc: 177ms, Mat2GLTex: 3ms Total: 134ms, Diff: 0ms, GLTex2Mat: 5ms, calc: 110ms, Mat2GLTex: 18ms Total: 40ms, Diff: 0ms, GLTex2Mat: 3ms, calc: 34ms, Mat2GLTex: 2ms Total: 42ms, Diff: 0ms, GLTex2Mat: 3ms, calc: 36ms, Mat2GLTex: 2ms Total: 47ms, Diff: 0ms, GLTex2Mat: 5ms, calc: 38ms, Mat2GLTex: 3ms ...
在对时间不敏感的项目上,这个问题没什么大不了的,完全可以忽略,但是有些业务场景上必须精准推估下一帧及其未来几帧的从上传、计算到渲染的耗时情况,则这种现象将会给开发者打开些许困扰。
- 联系显卡厂商,让其更新驱动将你的应用加入到免节能模式的白名单。
- 优点:你什么都不用改。缺点:沟通困难,很可能显卡厂商根本不理你。
- [显卡控制面板] - [管理3D设置] - [电源管理模式],改成:[最高性能优先]。
- 优点:不用改代码。缺点:如果是部署端是小白用户,需要编写手册手把手教他。
- 可以空闲(暂停)时定期灌一些心跳计算包的任务进去(放1x1小图)让GPU维持在高性能状态。
- 优点:需要改代码。缺点:不低碳不环保。
- 联系显卡厂商,让其更新驱动将你的应用加入到免节能模式的白名单。
-
软件类型 软件名称 系统 Fedora 桌面环境 KDE 编辑器 Kate 画草图 kolourpaint 画函数图像 kmplot bilibili直播 OBS