Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C port of core encoder and decoder #33

Merged
merged 48 commits into from
Nov 25, 2024
Merged

C port of core encoder and decoder #33

merged 48 commits into from
Nov 25, 2024

Conversation

drowe67
Copy link
Owner

@drowe67 drowe67 commented Nov 6, 2024

General strategy is to follow the fine work from DRED/RDOVAE. Scope for this PR is x86 float (int8/other arches later) Vanilla float might actually be fast enough, given the slow frame rate.

Summary:

  • C port of RADE core encoder and decoder
  • ability to compile in model weights or load from binary blob, tool for writing blobs
  • surprising result that (after C port) core enc/dec CPU load is small, even with non SIMD "vanilla" float
  • Tx stack CPU load also small
  • CPU is dominated by Rx Python DSP 🤔
  • Some small optimisations to Python dsp.py

TODO

  • export_rade_weights.py doing something sensible
  • code core encoder C function
  • build system for C core encoder, to either Python or C top level
  • model05 first, as it can be tested without DSP (model19_check3 requires DSP)
  • model05 unit test passing
  • model19_check3 unit test passing
  • way of handling bottleneck in core encoder
  • unit test for C version of core encoder, that measures loss compared to Python core encoder
  • Windows support (e.g. RADE_EXPORT stuff). What are the req here? Core enc & dec might be buried and not callable by Windows code.
  • run C encoder with radae_txe.py, for ctests
  • core decoder C test program test_rade_dec.c
  • loading model twice with C encoder
  • future work: deal with other arch. Do we need to? Not now - vanilla float (arch=0) is currently fast enough
  • future work: load binary blobs for other models
  • Time API calls with C core enc/dec
  • Less verbose output with C core enc/dec

CPU Load Characterisation of Rx DSP + Core Dec

radae_rxe.py, which includes Rx DSP plus Core Decoder. Using all.wav (49 seconds), modern Desktop Intel(R) Core(TM) i5-7600 CPU @ 3.50GHz

The lines below:

  • generate test data
  • Python Rx DSP + C core decoder
  • Python DSP + Python core decoder
  • Just Python DSP
ctest -V -R radae_rx_profile
\time -o log.txt -f '%e' cat rx.f32 | PYTHONPATH='.' build/src/radae_rx > features_c.f32
\time -o log.txt -f '%e' cat rx.f32 | python3 -m cProfile -s time radae_rxe.py --no_stdout | head -n20
\time -o log.txt -f '%e' cat rx.f32 | python3 -m cProfile -s time radae_rxe.py --bypass_dec  --no_stdout | head -n20

To measure just the C core dec:

cat rx.f32 | python3 radae_rxe.py --bypass_dec > z_hat.f32
time cat z_hat.f32 | ./build/src/test_rade_dec 1 > /dev/null
DSP+ core dec DSP core dec
C 15.35 0.11
Python 30.3 15.03

These results suggest the C core decoder CPU is small (almost 0) compared to the DSP (currently coded in Python). Quite a surprising result. Note the core enc/dec is just using vanilla float (no SIMD).

This suggests (i) the CPU load for the RADE enc/dec is quite low (ii) the next target for optimisation should be the Rx DSP (iii) the overall CPU load will be dominated by the FARGAN decoder (iv) further optimisation of the RADE core enc/dec (e.g. SIMD) may not be warranted/is low priority.

Re (ii) - we should not leap into DSP optimisation for RADE V1 (especially a C port) without careful thought. The DSP will change for RADE V2 so this may be wasted effort. Most of the effort is spent in DSP coding and having the DSP in Python makes this much easier and greatly speeds our development. There may be some simple algorithmic changes we can make in the Python code to speed up performance (i.e. without a full C port).

CPU Load of complete Rx Stack (including FARGAN)

As measured by:

ctest -V -R radae_rx_stack_py
ctest -V -R radae_rx_stack_c
CPU %
Python 58
C 33

These results also suggest most of the CPU is in the Rx DSP, i.e. Rx DSP >> FARGAN (at least currently). Another test to demonstrate the breakdown between the RADE Rx and FARGAN synthesis (using rx.f32 from all.wav):

\time -o log.txt -f '%e' cat rx.f32 | PYTHONPATH='.' build/src/radae_rx > features_c.f32
\time -o log.txt -f '%e' cat features_c.f32 | ./build/src/lpcnet_demo -fargan-synthesis - /dev/null
Time(s)
radae_rx 16.15
lpcnet_demo 0.33

CPU Load of complete Tx Stack (including feat extract)

As measured by:

ctest -V -R radae_tx_stack_c
CPU %
C 4.5

@drowe67
Copy link
Owner Author

drowe67 commented Nov 8, 2024

@tmiw - is there a reason we are using -fvisibility=hidden and this sort of stuff:

#if IS_BUILDING_RADE_API
#if _WIN32
#define RADE_EXPORT __declspec(dllexport) __stdcall
#else
#define RADE_EXPORT __attribute__((visibility("default")))
#endif // _WIN32
#else
#if _WIN32
#define RADE_EXPORT __declspec(dllimport) __stdcall
#else
#define RADE_EXPORT 
#endif // _WIN32
#endif // IS_BUILDING_RADE_API

Maybe keeping the namespace as clean as possible?

@drowe67
Copy link
Owner Author

drowe67 commented Nov 8, 2024

Oh Ok it's a libopus convention. Hmm, I need to use some "hidden" function calls inside libopus 🤔

@tmiw
Copy link
Collaborator

tmiw commented Nov 8, 2024

I actually included the above code because I was running into linking issues when trying to integrate it with freedv-gui, but it does make sense that forcing -fvisibility=hidden would cause that to be needed. What hidden functions are you trying to use?

@drowe67
Copy link
Owner Author

drowe67 commented Nov 19, 2024

@tmiw - when convenient - could you pls review this PR? In particular the rade_api.h changes and src/CMakeList.txt with a view to using on Windows.

target_include_directories(radae_rx PRIVATE
"$<TARGET_PROPERTY:Python3::NumPy,INTERFACE_INCLUDE_DIRECTORIES>")

add_library(radecore rade_enc.c rade_dec.c rade_enc_data.c rade_dec_data.c)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to have a separate radecore library just for the C port? There'd have to be modifications on the freedv-gui side to also make sure libradecore.dll got properly packaged along with librade.dll.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure now the port is done I can reconcile into one library.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should really be handled by the opus project (i.e. through some sort of ./configure flag) as major code changes on their end could require changes to the patch.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but changes to Opus is outside of the scope of this review, ie we are unlikely to get them to accept a patch in the near term. This code (nnet.h) has been pretty stable for several years and its a small patch so this approach should be workable for us in the near term as RADE evolves.

@@ -71,6 +71,12 @@ extern "C" {
#define RADE_MODEM_SAMPLE_RATE 8000 // modem waveform sample rate
#define RADE_SPEECH_SAMPLE_RATE 16000 // speech sample rate

// init rade_open() flags
#define RADE_USE_C_ENCODER 0x1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: is there a scenario where one would actually want to use Python for RX or TX (and C for the opposite)? If not, should these two be consolidated into one flag?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For testing, e.g. C encoder driving Python decoder. If you swapped both to C and a test failed you wouldn't know which one had bugs.

@tmiw
Copy link
Collaborator

tmiw commented Nov 21, 2024

Question--where's config.h coming from? I tried using this branch in freedv-gui and ended up with the following error:

/Users/mooneer/devel/freedv-gui/build_osx/rade_src/src/rade_enc.c:31:10: fatal error: 'config.h' file not found
   31 | #include "config.h"
      |          ^~~~~~~~~~
/Users/mooneer/devel/freedv-gui/build_osx/rade_src/src/rade_dec.c:29:10: fatal error: 'config.h' file not found
   29 | #include "config.h"
      |          ^~~~~~~~~~
1 error generated.
make[5]: *** [src/CMakeFiles/rade.dir/rade_enc.c.o] Error 1
make[5]: *** Waiting for unfinished jobs....
1 error generated.
make[5]: *** [src/CMakeFiles/rade.dir/rade_dec.c.o] Error 1
/Users/mooneer/devel/freedv-gui/build_osx/rade_src/src/rade_api.c:48:10: fatal error: 'config.h' file not found
   48 | #include "config.h"
      |          ^~~~~~~~~~
/Users/mooneer/devel/freedv-gui/build_osx/rade_src/src/rade_enc_data.c:5:10: fatal error: 'config.h' file not found
    5 | #include "config.h"
      |          ^~~~~~~~~~
1 error generated.
make[5]: *** [src/CMakeFiles/rade.dir/rade_api.c.o] Error 1

@drowe67
Copy link
Owner Author

drowe67 commented Nov 21, 2024

Question--where's config.h coming from? I tried using this branch in freedv-gui and ended up with the following error:

It's the opus config.h. Seems to build Ok on my two Ubuntu machines and GitHub.

@tmiw
Copy link
Collaborator

tmiw commented Nov 22, 2024

Question--where's config.h coming from? I tried using this branch in freedv-gui and ended up with the following error:

It's the opus config.h. Seems to build Ok on my two Ubuntu machines and GitHub.

I think it was something specific to macOS. I was able to get it to build on its own, but the warning below came back:

In file included from /Users/mooneer/devel/freedv-gui/build_osx/rade_src/src/test_rade_dec.c:14:
In file included from /Users/mooneer/devel/freedv-gui/build_osx/rade_src/src/rade_core.h:35:
/Users/mooneer/devel/freedv-gui/build_osx/rade_build/build_opus_arm-prefix/src/build_opus_arm/dnn/nnet.h:159:2: warning: "Only SSE and SSE2 are available. On newer machines, enable SSSE3/AVX/AVX2 using -march= to get better performance" [-W#warnings]
  159 | #warning "Only SSE and SSE2 are available. On newer machines, enable SSSE3/AVX/AVX2 using -march= to get better performance"
      |  ^

Trying to build into freedv-gui now.

@tmiw
Copy link
Collaborator

tmiw commented Nov 22, 2024

OK, I'm now able to build freedv-gui on both macOS and Windows with this branch. Will need to test actual execution when I have a chance.

@tmiw
Copy link
Collaborator

tmiw commented Nov 22, 2024

OTOH, now freedv-gui won't build on Linux:

[ 91%] Linking CXX executable freedv
cd /home/runner/work/freedv-gui/freedv-gui/build_linux/src && /usr/local/bin/cmake -E cmake_link_script CMakeFiles/freedv.dir/link.txt --verbose=1
/usr/bin/ld: freedv: hidden symbol `linear_init' in ../build_opus-prefix/src/build_opus/.libs/libopus.a(parse_lpcnet_weights.o) is referenced by DSO
/usr/bin/ld: final link failed: bad value
collect2: error: ld returned 1 exit status

I may need to think about this some more.

@tmiw
Copy link
Collaborator

tmiw commented Nov 22, 2024

I think the issue is that we're trying to do something different than what Opus does (i.e. make symbols public in our case when they're private in the original library). Maybe we can copy in the implementations of the various functions being used in the C port to avoid needing to link Opus in at all (or only for the tools and not the library).

Alternatively, maybe we can build Opus as a dynamic library here instead of a static one. Then the build would produce both a libopus.dll/so/dylib as well as a librade.dll/so/dylib that freedv-gui would need to bring in. Since we're focused mainly on Windows (and macOS to a lesser extent) right now, this wouldn't necessarily be bad (but could be once we get to a place where we can think about packaging).

@drowe67
Copy link
Owner Author

drowe67 commented Nov 23, 2024

I think the issue is that we're trying to do something different than what Opus does (i.e. make symbols public in our case when they're private in the original library).

So why is it a problem to expose a few library functions? It seems to work just fine for me on Linux.

@tmiw
Copy link
Collaborator

tmiw commented Nov 23, 2024

I think the issue is that we're trying to do something different than what Opus does (i.e. make symbols public in our case when they're private in the original library).

So why is it a problem to expose a few library functions? It seems to work just fine for me on Linux.

It's a problem mainly with integrating the library with others. I worked around my issue for now by having the freedv-gui build use the version of opus built by RADE (i.e. it no longer builds its own copy). I am getting a lot of warnings like this one, though:

In file included from /home/mooneer/freedv-gui/src/freedv_interface.cpp:23:
In file included from /home/mooneer/freedv-gui/src/main.h:88:
In file included from /home/mooneer/freedv-gui/src/freedv_interface.h:49:
In file included from /home/mooneer/freedv-gui/build_win_unsigned/rade_build/build_opus-prefix/src/build_opus/dnn/fargan.h:31:
In file included from /home/mooneer/freedv-gui/build_win_unsigned/rade_build/build_opus-prefix/src/build_opus/dnn/fargan_data.h:5:
/home/mooneer/freedv-gui/build_win_unsigned/rade_build/build_opus-prefix/src/build_opus/dnn/nnet.h:31:9: warning: 'RADE_EXPORT' macro redefined [-Wmacro-redefined]
#define RADE_EXPORT __attribute__((visibility("default")))
        ^
/home/mooneer/freedv-gui/build_win_unsigned/rade_src/src/rade_api.h:50:9: note: previous definition is here
#define RADE_EXPORT __declspec(dllimport) __stdcall
        ^
1 warning generated.

But AFAICT they don't seem to affect execution (only tested with Windows so far).

@tmiw
Copy link
Collaborator

tmiw commented Nov 23, 2024

Warnings look like they're gone now with the latest commit. (BTW I'm working in drowe67/freedv-gui#774 right now for testing.)

@drowe67
Copy link
Owner Author

drowe67 commented Nov 23, 2024

It's a problem mainly with integrating the library with others. I worked around my issue for now by having the freedv-gui build use the version of opus built by RADE (i.e. it no longer builds its own copy).

Yep 👍 That's the correct way to build the overall project. It's not a work around - we are patching libopus so of course we need to use only the patched version.

@drowe67
Copy link
Owner Author

drowe67 commented Nov 25, 2024

@tmiw - I'll merge this now, just open another PR if there are any more tweaks you need.

@drowe67 drowe67 merged commit d05ed99 into main Nov 25, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants