add elu vulkan operator #4280

Yoh-Z · 2022-10-17T14:18:03Z

add elu vulkan operator

codecov-commenter · 2022-10-17T14:29:22Z

Codecov Report

Merging #4280 (12593f1) into master (f80c274) will decrease coverage by 1.82%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master    #4280      +/-   ##
==========================================
- Coverage   94.60%   92.77%   -1.83%     
==========================================
  Files         781      639     -142     
  Lines      184880   133550   -51330     
==========================================
- Hits       174910   123907   -51003     
+ Misses       9970     9643     -327

Impacted Files	Coverage Δ
src/layer/x86/convolution_2x2_pack8.h	`2.75% <0.00%> (-97.25%)`	⬇️
src/layer/x86/deconvolution_pack8.h	`10.76% <0.00%> (-89.24%)`	⬇️
src/layer/x86/convolution_pack8.h	`34.42% <0.00%> (-65.58%)`	⬇️
src/layer/x86/convolution_pack4to8.h	`42.85% <0.00%> (-55.11%)`	⬇️
...c/layer/x86/convolution_winograd_transform_pack8.h	`54.90% <0.00%> (-45.10%)`	⬇️
src/layer/x86/convolution_3x3_pack1to8.h	`39.95% <0.00%> (-40.04%)`	⬇️
src/layer/arm/pixelshuffle_arm.cpp	`59.41% <0.00%> (-39.92%)`	⬇️
src/layer/x86/convolution_winograd_dot_pack8.h	`60.24% <0.00%> (-39.16%)`	⬇️
src/layer/x86/convolution_1x1_pack8.h	`66.66% <0.00%> (-33.34%)`	⬇️
src/layer/x86/convolution_1x1_pack4to8.h	`66.66% <0.00%> (-33.34%)`	⬇️
... and 352 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

* remove duplicated newline (Tencent#4187) * remove duplicated newline (Tencent#4188) * optmize softmax arm neon (Tencent#4171) * [docs] Fix typo (Tencent#4201) * [Prelu x86] Finish intrinsic with elempack merged (Tencent#4177) * changed size of images for pretty formatting of page (Tencent#4193) * [Gelu x86] Finish intrinsic with elempack merged(fast version) (Tencent#4144) * Finish the gelu x86 intrinsics * Finish the fast tanh x86 simd impl * Ignore .xmake directory (Tencent#4212) * Bump pypa/cibuildwheel from 2.9.0 to 2.10.1 (Tencent#4207) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.9.0 to 2.10.1. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.9.0...v2.10.1) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * style: space alignment (Tencent#4217) * Ignore CMakeSettings.json, the Visual Studio CMake schema file (Tencent#4228) * RVV: use new interface for segment load/store & change word_type to size_t&add clang ci (part Tencent#4100) (Tencent#4118) * RVV: use size_t for vl * RVV: replace vsseg.v tuple type by using regex ----- search: vsseg([1-9])e(8|16|32)_v_(f|i|u)\2m(1|2|4|8)x\1$([ -~]+), vcreate_\3\2m\4x\1\(([ -~]+)$, vl\); substitute by: vsseg$1e$2_v_$3$2m$4($5, $6, vl); * RVV: replace vssseg.v tuple types by using regex --- search: vssseg([1-9])e(8|16|32)_v_f\2m1x\1$([ -~]+), vcreate_f\2m1x\1\(([ -~]+)$, vl\); substitute by: vssseg$1e$2_v_f$2m1($3, $4, vl); * RVV: replace vlseg.v tuple types in load/store * RVV: replace vloxseg2ei32.v tuple types * RVV: add a wrapper for old compilers * RVV: add segment load/store wrapper in pakcing * RVV: fix cmake test * RVV: make clang happy by dropping VLAs in sgemm * RVV: add clang cmake toolchain configure * RVV: add clang ci, riscv64-unknown-linux-gnu Co-authored-by: thelastlin <[email protected]> Co-authored-by: nihui <[email protected]> * Bump pypa/cibuildwheel from 2.10.1 to 2.10.2 (Tencent#4220) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.10.1 to 2.10.2. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.10.1...v2.10.2) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * add c906 build ci (Tencent#4232) * Add benchmark result of T-Head TH1520 (Tencent#4240) `cpuinfo`: ``` isa : rv64imafdcvsu mmu : sv39 cpu-freq : 1.848Ghz cpu-icache : 64KB cpu-dcache : 64KB cpu-l2cache : 1MB cpu-tlb : 1024 4-ways cpu-cacheline : 64Bytes cpu-vector : 0.7.1 ``` Compiled with `-DCMAKE_TOOLCHAIN_FILE=../toolchains/c910-v240.toolchain.cmake -DCMAKE_BUILD_TYPE=release -DNCNN_OPENMP=OFF -DNCNN_THREADS=OFF -DNCNN_RUNTIME_CPU=OFF -DNCNN_RVV=ON -DNCNN_SIMPLEOCV=ON -DNCNN_BUILD_EXAMPLES=ON` Seems much worse than expected 🤔 * fix param parsing issue when layer/blob name exceeds 255 (Tencent#4236) * fix param parsing issue when layer/blob name exceeds 255 * apply code-format changes Co-authored-by: ZhangGe6 <[email protected]> * Memory Pool Improvement For Variadic Sized Inputs (Tencent#4190) * Simple miss count for better space efficiency * Simple double ended greedy; * Add size drop threshold setter; * set workspace allocator cr to zero as we had some sort of recylcing capability :P Co-authored-by: LinHeLurking <[email protected]> Co-authored-by: nihuini <[email protected]> * docs: disable fp16 when wrong results encountered caused by overflow (Tencent#4248) * pnnx math operation (Tencent#4251) * more stricter armv7 fp16 and armv84 bf16 compiler check, fix Tencent#4147 fix Tencent#4222 (Tencent#4247) * modified the param axes of expanddims in modelwriter (Tencent#4259) * Add TH1520 (4*C910V) toolchain support. (Tencent#4267) * implement lstm proj_size (Tencent#4263) * Optimize x86 DeformableConv2D (Tencent#4128) * fix compile warning with gcc 9.1.0 including simplestl.h file (Tencent#4274) * fix compile warning with gcc 9.1.0 including simplestl.h file * apply code-format changes Co-authored-by: veahow <[email protected]> * add benchmark for rk3588 on rock5b (Tencent#4275) * linux-x64-cpu-gcc on tencent ci * implement layer feature disabled bit (Tencent#4278) * add elu vulkan operator (Tencent#4280) * fix tencent ci (Tencent#4277) * implement GLU and pnnx conversion (Tencent#4283) * Bump pypa/cibuildwheel from 2.10.2 to 2.11.1 (Tencent#4271) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.10.2 to 2.11.1. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.10.2...v2.11.1) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * fix pnnx softmax/normalize/slice negative axis conversion to ncnn (Tencent#4284) * pnnx glu batchindex aware conversion (Tencent#4285) * 1. Fix typo in readme (Tencent#4287) * x86 sse2/avx2 optimization for convolution sgemm/winograd int8 family (Tencent#4286) * pnnx skip dynamic size evaluation (Tencent#4291) * Fix linux build error(Tencent#4265) (Tencent#4294) Co-authored-by: wangyu <[email protected]> * general cpu feature detection on macos/ios, enable bf16 and i8mm on a15 a16 and m2 (Tencent#4300) * x86 unified fc fp32/fp16s (Tencent#4303) * more fma * more transpose utility function * Bump pypa/cibuildwheel from 2.11.1 to 2.11.2 (Tencent#4308) Bumps [pypa/cibuildwheel](https://github.com/pypa/cibuildwheel) from 2.11.1 to 2.11.2. - [Release notes](https://github.com/pypa/cibuildwheel/releases) - [Changelog](https://github.com/pypa/cibuildwheel/blob/main/docs/changelog.md) - [Commits](pypa/cibuildwheel@v2.11.1...v2.11.2) --- updated-dependencies: - dependency-name: pypa/cibuildwheel dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <[email protected]> Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * pnnx pytorch 1.13 (Tencent#4314) * fix Tencent#4315 (Tencent#4316) * get_physical_cpu_count api family (Tencent#4302) * get_physical_cpu_count api family * set default to physical big cpu * always treat smt core as big core * is_smt_cpu * get max freq mhz on windows * windows thread affinity * groupnorm 1d/2d/4d (Tencent#4312) * fix slice end index, fix fp16 model weight alignment (Tencent#4317) * tencent ci test-coverage pnnx (Tencent#4305) * RVV: BatchNorm with fp16s(a) support (Tencent#4075) * RVV: InstanceNorm with fp16s(a) support (Tencent#4078) * fix ci pnnx build * fold new_full and full_like (Tencent#4323) * pnnx convert nn.Softmax2d (Tencent#4324) * pnnx convert fold unfold (Tencent#4325) * support yolov5 6.2 (Tencent#4328) * implement ncnn fold and unfold (Tencent#4326) * pnnx load gpu torchscript and reset device (Tencent#4330) * fix:pnnx-softmax (Tencent#4333) * pnnx save onnx zero (Tencent#4077) * save foldable constants in file for reducing memory usage (Tencent#4337) * match inplace slice copy pattern, rewrite copy uses (Tencent#4338) * add vector optimization for loongarch64 (Tencent#4242) * ci loongarch64 lsx (Tencent#4344) * gridsample op support (Tencent#4288) Co-authored-by: LRY89757 <[email protected]> Co-authored-by: nihuini <[email protected]> Co-authored-by: nihui <[email protected]> * squeeze and expanddims 4d (Tencent#4346) * implement MultiheadAttention kdim vdim (Tencent#4347) * pnnx convert torch bitwise left_shift right_shift (Tencent#4349) * pnnx fp16 option for ncnn and onnx weight type (Tencent#4350) * pnnx fuse more function to module (Tencent#4351) * pnnx fuse more function to module * rename some pass name * fuse adjacent reshape, fuse pad conv2d * fuse pad conv1d * split tests (Tencent#4354) * Support mat.numpy() in Python (Tencent#4356) * Fix typo in stb_image.h (Tencent#4358) exitting -> exiting * Fix windows-arm64 build for non-neon case (Tencent#4227) * update release ci (Tencent#4359) * update release ci * find modern glslang * parallel jobs on windows * Fix c api allocator (Tencent#4360) * add some c_api interfaces related to allocator setup. * fix errors in allocator parameters in c_api. * test c api allocator Co-authored-by: zhangtongshe <[email protected]> * update glslang (Tencent#4361) * disable out-of-line atomics since ndk23+ for resolving linking issue with old ndk (Tencent#4362) * I added one more project to the list of examples. (Tencent#4205) * Dedicated to coloring black and white photographs. * add example project link (Tencent#4365) * fix(pybind11): build error (Tencent#4368) * fix openmp affinity abort when cpu goes offline (Tencent#4370) * Update release-python.yml * small fixes * unpack list input * Remove LSTM2 * fix LSTM Signed-off-by: dependabot[bot] <[email protected]> Co-authored-by: Molly Sophia <[email protected]> Co-authored-by: Menci <[email protected]> Co-authored-by: luqiang guo <[email protected]> Co-authored-by: Lry89757 <[email protected]> Co-authored-by: magicse <[email protected]> Co-authored-by: Zhuo Zhang <[email protected]> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: 汤圆奶昔 <[email protected]> Co-authored-by: Xavier Hsinyuan <[email protected]> Co-authored-by: thelastlin <[email protected]> Co-authored-by: nihui <[email protected]> Co-authored-by: 柚木鉉 <[email protected]> Co-authored-by: Zhang Ge <[email protected]> Co-authored-by: ZhangGe6 <[email protected]> Co-authored-by: LinHe <[email protected]> Co-authored-by: LinHeLurking <[email protected]> Co-authored-by: nihuini <[email protected]> Co-authored-by: MisakaBit <[email protected]> Co-authored-by: LiuYi-Up <[email protected]> Co-authored-by: 陸言 <[email protected]> Co-authored-by: miemie2013 <[email protected]> Co-authored-by: Eahow Chen <[email protected]> Co-authored-by: veahow <[email protected]> Co-authored-by: li mengyang <[email protected]> Co-authored-by: Yoh <[email protected]> Co-authored-by: Caize Wu <[email protected]> Co-authored-by: bestpower <[email protected]> Co-authored-by: wangyu <[email protected]> Co-authored-by: shaoshengsong <[email protected]> Co-authored-by: WuJinxuan <[email protected]> Co-authored-by: junchao-loongson <[email protected]> Co-authored-by: LRY89757 <[email protected]> Co-authored-by: Ikko Ashimine <[email protected]> Co-authored-by: zhangtongshe <[email protected]> Co-authored-by: tpoisonooo <[email protected]>

Yoh-Z and others added 2 commits October 17, 2022 22:16

add elu vulkan operator

4143734

apply code-format changes

7377e98

Yoh-Z added 2 commits October 18, 2022 17:33

optimize elu shader

2f29e06

Merge branch 'elu_vk_op' of github.com:Yoh-Z/ncnn into elu_vk_op

12593f1

nihui approved these changes Oct 18, 2022

View reviewed changes

nihui merged commit bb660d0 into Tencent:master Oct 18, 2022

nihui mentioned this pull request Oct 18, 2022

ELU support on gpu #4238

Closed

nihui mentioned this pull request Jan 29, 2023

ELU Layer not supported on Vulkan #4122

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add elu vulkan operator #4280

add elu vulkan operator #4280

Yoh-Z commented Oct 17, 2022

codecov-commenter commented Oct 17, 2022 •

edited

Loading

add elu vulkan operator #4280

add elu vulkan operator #4280

Conversation

Yoh-Z commented Oct 17, 2022

codecov-commenter commented Oct 17, 2022 • edited Loading

Codecov Report

codecov-commenter commented Oct 17, 2022 •

edited

Loading