Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Std sin in sinosc and cpu with unison #1858

Closed
baconpaul opened this issue May 11, 2020 · 9 comments · Fixed by #1989
Closed

Std sin in sinosc and cpu with unison #1858

baconpaul opened this issue May 11, 2020 · 9 comments · Fixed by #1989
Labels
DSP Issues and feature requests related to sound generation in the synth
Milestone

Comments

@baconpaul
Copy link
Collaborator

nownwe added unison to sin you can run pretty cpu hot with 16 voices calling std::sin on every voice at every sample. This is a well solved problem so use one of those solutions .

@baconpaul baconpaul added the DSP Issues and feature requests related to sound generation in the synth label May 11, 2020
@baconpaul baconpaul added this to the 1.7 beta 1 milestone May 11, 2020
@mkruselj
Copy link
Collaborator

Hopefully the approximation won't be adding any harmonics? This should be checked, too.

@baconpaul
Copy link
Collaborator Author

nah there's super good literature on how to do it. don't worry. not something i'm figuring out solo.

@baconpaul
Copy link
Collaborator Author

std::sinf is indeed about 2x faster on mac and linux than std::sin - but if the argument is an explicit float gets automselected. On windows it makes no difference. Also windows implementation of sin is about 3x slower than the mac and linux one in my quick test.

@baconpaul
Copy link
Collaborator Author

#include <cmath>
#include <chrono>
#include <iostream>
#include <iomanip>

int main( int argc, char **argv )
{
  int tm = 100000000;
  for( int it = 0; it < 10; ++it )
  {
    auto start = std::chrono::high_resolution_clock::now();
    double d = 0;
    for( int i=0; i<tm; ++i )
    {
       d += std::sin( 0.00001 * i ) + std::cos( 0.00001 * i ); 
    }
    auto dbtime = std::chrono::high_resolution_clock::now();
    float f = 0;
    for( int i=0; i<tm; ++i )
    {
       f += std::sinf( 0.00001f * i ) + std::cosf( 0.00001f * i ); 
    }
    auto fltime = std::chrono::high_resolution_clock::now();
    std::cout << "D : " << d << " " <<  std::chrono::duration_cast<std::chrono::milliseconds>(dbtime - start).count() << std::endl;
    std::cout << "F : " << f << " " <<  std::chrono::duration_cast<std::chrono::milliseconds>(fltime - dbtime).count() << std::endl;
  }
}

code i used to make that assertion btw.

@baconpaul
Copy link
Collaborator Author

So did some research on this last night.

The approximation I was considering has a better polynomial approximation (the pade approximation). Juce even has it in their lib https://github.com/juce-framework/JUCE/blob/02bbe31c0d2fb59ed32fb725b56ad25536c7ed75/modules/juce_dsp/maths/juce_FastMathApproximations.h#L159

so I'm going to try that one out as a sin approximation (with the appropriate phase bounds in place)

@baconpaul
Copy link
Collaborator Author

yeah the max error of that pade approximation in the range -pi to pi on floats is 1e-5 and the average error is 1e-7 - basically float precision. On my mac it is twice as fast as std::sin and 30% faster than std::sinf. On linux is it more drastic, being about 10x faster than std::sin. On windows it is about 8x as fast as std::sin

So that's definitely the approx to us.

baconpaul added a commit to baconpaul/surge that referenced this issue May 30, 2020
Taking the Pade approximation of sin from JUCE and Wikipedia
(https://en.wikipedia.org/wiki/Padé_approximant and JUCE6 dsp fastmath
classes), introduce a fast approximation of sin and cos which is
valid in -PI,PI; Use it in the sin oscillator and ring modulator
to reduce CPU in high unison counts.

Closes surge-synthesizer#1858
baconpaul added a commit that referenced this issue May 30, 2020
Taking the Pade approximation of sin from JUCE and Wikipedia
(https://en.wikipedia.org/wiki/Padé_approximant and JUCE6 dsp fastmath
classes), introduce a fast approximation of sin and cos which is
valid in -PI,PI; Use it in the sin oscillator and ring modulator
to reduce CPU in high unison counts.

Closes #1858
@baconpaul
Copy link
Collaborator Author

From slack:

Pushing the introduction of a sin approximant for the sin osc and the ring modulator.
11:11
After quite a bit of research I settled on the same approximation that JUCE has, namely the Pade fraction https://en.wikipedia.org/wiki/Padé_approximant
11:12
Using the same constants as JUCE6, the Surge::dsp::fastsin and fastcos have an average error of the period of 5-e7 - basically float precision - and a max error of 1e-5.
11:12
If you use these classes they are valid only in range -PI,PI, so I adjusted the oscillators accordingly to stay in that range (as opposed to the prior range of 0,2PI)
11:12
Applied it to the sin oscillator and the ring modulator
11:13
Punchline is: this approximation runs about 2.5x faster than std::sin( double ) and 30% faster than std::sinf( float ) on a mac; about 5x faster than std::sin( double ) and about 2.5x faster than std::sinf( float ) on ubuntu 20, and about 10x faster than std::sin on windows 10 / vs2019, where I found the sin implemetation was the slowest
11:13
Tests appreciated
11:13
I tested pretty extensively
11:14
But especially windows users trying something like: all 6 oscillators set to 16 unison sins with a ring modulator in each scene set to 16 unison carrier should now be doable, whereas before would have started straining the CPU
11:15
Finally: Turns out the field of “appoximate sin and cos well for floating point math” is a very big one.
11:15
There are faster approximations which are worse; there are better approaches which are slower (and that’s basically what the stdlib uses - note that on mac where the std lib is pretty well optimized for math and vector stuff it wasn’t that big a speedup)
11:16
but interesting problem
11:17
Oh finally I don’t have reliable audio out on win / lin but if it sounds wrong to you, change Surge::dsp::fastsin back to std::sin and recompile in Oscillator.cpp and RingModulator.cpp just to make sure you aren’t hearing gremlins.
11:17
The above has now been merged

@VincyZed
Copy link
Collaborator

Just tested this. It sounds fine on my end!

The results for me when playing a 3 note chord on 16 voices unison on 1 SINE OSC are the following:

Before: 11-12% CPU usage
After: 6-7% CPU usage

As a reference, 3 note chord on 1 16 voices SAW OSC takes around 1-2% on my machine.

All in all, noticeable improvement!

Setup:
Windows 10
FL Studio 20.7
256 samples buffer size

@baconpaul
Copy link
Collaborator Author

Thanks!
And yeah one day when I’m feeling ambitious I could do sse across unison voices (feedback makes it hard to do across block time) which should get more improvement. But that is tricky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DSP Issues and feature requests related to sound generation in the synth
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants