Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial istft implementation. #85

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Initial istft implementation. #85

wants to merge 1 commit into from

Conversation

jfsantos
Copy link
Member

I wrote an istft function to bring a real signal from its per-frame STFT representation to a time-domain signal by using overlap-add. This version is unoptimized as I am having some trouble to create an FFTW.Plan for a backward RFFT. I can add my non-working code to this branch in case you want to take a look.

I understand the PR is in a crude shape right now but I need some help moving forward, and I figured it would be better to submit a PR than try to solve this via e-mail.

Any suggestions for improvement or tips on how I can use the FFTW.Plan interface to do what I want here?

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling a29dbbb on jfs/istft into 4019eac on master.

@gummif
Copy link
Member

gummif commented Nov 27, 2014

Here is a working example

backward_plan{T<:Union(Float32, Float64)}(X::AbstractArray{Complex{T}}, Y::AbstractArray{T}) =
    FFTW.Plan(X, Y, 1, FFTW.ESTIMATE, FFTW.NO_TIMELIMIT).plan

x=randn(8)
y=rfft(x)  # works also for fft(x)
tmp1=similar(y)
tmp2=similar(x)

plan = backward_plan(tmp1,tmp2)

copy!(tmp1,y)
copy!(tmp2,x)
FFTW.execute(plan, tmp1, tmp2)  # warning: destroys tmp1
scale!(tmp2, FFTW.normalization(tmp2))

I suggest you do a copy to tmp1 from S instead of S[:,k]. Also probably better:

for 1:length(wsum)
    @inbounds wsum[i] != 0 && (out[i] /= wsum[i])
end
# instead of
pos = wsum .!= 0
out[pos] ./= wsum[pos]

@jfsantos
Copy link
Member Author

Thanks! How did you implement backward_plan? I tried the following implementation:

backward_plan{T<:Union(Float32, Float64)}(X::AbstractArray{Complex{T}}, Y::AbstractArray{T}) =
    FFTW.Plan(Y, X, 1, FFTW.BACKWARD, FFTW.ESTIMATE, FFTW.NO_TIMELIMIT).plan

but then if I try to call it I get:

x=randn(8)
y=rfft(x)
tmp1=similar(y)
tmp2=similar(x)

plan = backward_plan(tmp2,tmp1)

ERROR: `Plan{T<:Union(Float64,Float32,Complex{Float32},Complex{Float64})}` has no method matching Plan{T<:Union(Float64,Float32,Complex{Float32},Complex{Float64})}(::Array{Float64,1}, ::Array{Complex{Float64},1}, ::Int64, ::Int32, ::Uint32, ::Float64)
 in backward_plan at none:1

EDIT: I see you posted your backward plan now. So you do not need to use FFTW.BACKWARD in the arguments?

@gummif
Copy link
Member

gummif commented Nov 27, 2014

The plan is exactly how it is done in the FFTW module, so FFTW.BACKWARD is apparently not needed.

@jfsantos
Copy link
Member Author

Right, it seems to work perfectly. I'll just test this with different windows and lengths and add tests and documentation, but the function seems to be working well now. Thanks a lot!

@gummif
Copy link
Member

gummif commented Nov 27, 2014

To avoid allocation (if you have not not so already):

copy!(tmp1, 1, S, 1+(k-1)*size(S,1), length(tmp1))

@jfsantos
Copy link
Member Author

That's exactly what I did in my last commit. However, execution time and allocation seem to be the same for both implementations, at least for the size of slices I'm using (length 128).

@gummif
Copy link
Member

gummif commented Nov 27, 2014

I mean for the copy!(tmp1, S[:,k]) line. There is also allocation for tmp2.*win which could be done with a loop.

@jfsantos
Copy link
Member Author

You are right. I fixed both lines and memory allocation was reduced to 75% of the previous implementation. Execution times are more or less the same (again, maybe because I am using small window sizes).

@gummif
Copy link
Member

gummif commented Nov 27, 2014

OK good. A small bug:

1+(k-1)*winc:((k-1)*winc+n
# should be
ix + n
# with 
ix = (k-1)*winc

@r9y9
Copy link
Contributor

r9y9 commented Nov 28, 2014

Using PRESERVE_INPUT flag in FFTW.Plan and removing copy! in istft roop make much faster. So the backward plan would be:

backward_plan{T<:Union(Float32, Float64)}(X::AbstractArray{Complex{T}}, Y::AbstractArray{T}) =
    FFTW.Plan(X, Y, 1, FFTW.ESTIMATE | FFTW.PRESERVE_INPUT, FFTW.NO_TIMELIMIT).plan

@gummif
Copy link
Member

gummif commented Nov 28, 2014

@r9y9 That's a good point. It would however require a bit of pointer hacking like

tmp1 = pointer_to_array(pointer(S, 1+(k-1)*size(S,1)), size(S,1), false)
# replaces
copy!(tmp1, 1, S, 1+(k-1)*size(S,1), length(tmp1))

@r9y9
Copy link
Contributor

r9y9 commented Nov 28, 2014

@gummif Thanks for your comment. I was thinking about replacing:

copy!(tmp1, 1, S, 1+(k-1)*size(S,1), length(tmp1))
FFTW.execute(p, tmp1, tmp2)

with

FFTW.execute(p, sub(S,1:size(S,1), k), tmp2)

Though this requires slight allocations in sub, I've confirmed that my implementation works faster for relatively short signals. Unfortunately, it may be slower for long signals (depends on window length, overlap and window type, etc). Below is the testing code:

using DSP, Base.Test

wlen = 2048
overlap = div(wlen, 2)

x1 = rand(2^12)
println("testing with short signal (length: $(length(x1)))")
X1 = stft(x1, wlen, overlap)

# force JIT-compilation
istft(X1, wlen, overlap)
istft2(X1, wlen, overlap)

@time y1 = istft(X1, wlen, overlap)
@time y2 = istft2(X1, wlen, overlap)
@test_approx_eq x1 y1
@test_approx_eq y1 y2

x1 = rand(2^22)
println("testing with long signal (length: $(length(x1)))")

X1 = stft(x1, wlen, overlap)
@time y1 = istft(X1, wlen, overlap)
@time y2 = istft2(X1, wlen, overlap)
@test_approx_eq x1 y1
@test_approx_eq y1 y2

The results using julia v0.3.2:

testing with short signal (length: 4096)
elapsed time: 0.000865683 seconds (131456 bytes allocated)
elapsed time: 0.000283067 seconds (102768 bytes allocated)
testing with long signal (length: 4194304)
elapsed time: 0.106112303 seconds (67160936 bytes allocated)
elapsed time: 0.167759533 seconds (69568720 bytes allocated)

complete code for istft2: https://gist.github.com/r9y9/93d1645275f7bea9569a

I agree the current implementation. I think we can add benchmarks and improve performance then.

@jfsantos
Copy link
Member Author

I tested both the suggested version with the pointer to array and the one with copy. Surprisingly, the one using copy uses slightly less memory (maybe because a new pointer is allocated at each iteration?). I only averaged over 100 executions but the version using copy was also slightly faster.

Since the @time macro measures the time for a single execution of the function, it may not be accurate. For this reason, I usually run the same expression 100/1000 times and average the result of the @elapsed expression instead:

t = zeros(100)
for i=1:100
t[i] = @elapsed y1 = istft(X1, wlen, overlap)
end

When trying both @r9y9 and the pointer_to_array approaches, I get segfaults from a FFTW function call on my machine:

signal (11): Segmentation fault: 11
r2cb_16 at /usr/local/Cellar/fftw/3.3.4/lib/libfftw3.3.dylib (unknown line)
apply at /usr/local/Cellar/fftw/3.3.4/lib/libfftw3.3.dylib (unknown line)
apply at /usr/local/Cellar/fftw/3.3.4/lib/libfftw3.3.dylib (unknown line)
apply_dif_dft at /usr/local/Cellar/fftw/3.3.4/lib/libfftw3.3.dylib (unknown line)
apply_hc2r at /usr/local/Cellar/fftw/3.3.4/lib/libfftw3.3.dylib (unknown line)
istft2 at /Users/jfsantos/.julia/v0.3/DSP/src/util.jl:150
julia_istft2;20180 at  (unknown line)
jlcall_istft2;20180 at  (unknown line)
jl_apply at /private/tmp/julia-MRinr5/src/./julia.h:980
jl_apply at /private/tmp/julia-MRinr5/src/interpreter.c:59
eval at /private/tmp/julia-MRinr5/src/interpreter.c:207
eval at /private/tmp/julia-MRinr5/src/interpreter.c:215
eval_body at /private/tmp/julia-MRinr5/src/interpreter.c:541
jl_interpret_toplevel_thunk_with at /private/tmp/julia-MRinr5/src/interpreter.c:569
jl_toplevel_eval_flex at /private/tmp/julia-MRinr5/src/toplevel.c:511
jl_f_top_eval at /private/tmp/julia-MRinr5/src/builtins.c:399
eval_user_input at REPL.jl:54
jlcall_eval_user_input;19917 at  (unknown line)
jl_apply at /private/tmp/julia-MRinr5/src/./julia.h:980
anonymous at task.jl:96
jl_apply at /private/tmp/julia-MRinr5/src/task.c:427
julia_trampoline at /private/tmp/julia-MRinr5/src/init.c:1007
Segmentation fault: 11

With the istft2 implementation, it segfaults on the first call. Using pointer_to_array, it segfaults after some calls. I am not sure about what's causing this behavior.

@r9y9
Copy link
Contributor

r9y9 commented Nov 28, 2014

Oh, I had the same segfaults. Could you modify backward_plan as follows and try again?

backward_plan2{T<:Union(Float32, Float64)}(X::AbstractArray{Complex{T}}, Y::AbstractArray{T}) =
    FFTW.Plan(X, Y, 1, FFTW.ESTIMATE | FFTW.PRESERVE_INPUT, FFTW.NO_TIMELIMIT).plan

to

backward_plan2{T<:Union(Float32, Float64)}(X::AbstractArray{Complex{T}}, Y::AbstractArray{T}) =
    FFTW.Plan(X, Y, 1, FFTW.ESTIMATE | FFTW.PRESERVE_INPUT, FFTW.NO_TIMELIMIT)

and also

FFTW.execute(p, tmp1, tmp2)

to

FFTW.execute(p.plan, tmp1, tmp2)

I am not sure why segfaults happen but this changes seems to fix the segfaults in my environment. It's weird..

@gummif
Copy link
Member

gummif commented Nov 28, 2014

I'm not able to produce a significant difference. So the version with copy might be more safe.

julia> @time for i = 1:nn
           y2 = istft2(X1, wlen, overlap)
       end
elapsed time: 0.148178763 seconds (110615752 bytes allocated, 34.14% gc time)

julia> @time for i = 1:nn
           y1 = istft(X1, wlen, overlap)
       end
elapsed time: 0.159816237 seconds (110062040 bytes allocated, 46.83% gc time)

julia> @time for i = 1:nn
           y2 = istft2(X1, wlen, overlap)
       end
elapsed time: 0.148254777 seconds (110615736 bytes allocated, 32.35% gc time)

julia> @time for i = 1:nn
           y1 = istft(X1, wlen, overlap)
       end
elapsed time: 0.13468912 seconds (110061368 bytes allocated, 37.36% gc time)

@r9y9
Copy link
Contributor

r9y9 commented Nov 29, 2014

Sorry for my insufficient benchmarks. I agree with @gummif

@simonster
Copy link
Member

This is a late response, but the segfaults may come from trying to apply a plan that expects an array with a certain alignment to an array with a different alignment, which would be the case for a SubArray with an odd offset. You can use the UNALIGNED flag to tell FFTW not to assume aligned input, but the last time I looked at this, it seemed like an unaligned FFT was measurably slower than an aligned FFT, but like @gummif I found no substantial overhead to using copy!.

@jfsantos
Copy link
Member Author

jfsantos commented Dec 8, 2014

Alright, so there is no reason not to use copy! then. I just need to add some more tests and docs to this.

@jfsantos
Copy link
Member Author

Oh, it seems I did something terrible while rebasing. I thought I had squashed everything instead of making this mess. Any tips on how to fix this?

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.35%) to 97.06% when pulling e5a0d01 on jfs/istft into bcca68b on master.

nframes = size(S,2)-1
outlen = nfft + nframes*winc
out = zeros(T, outlen)
tmp1 = similar(S[:,1])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would should be Array(eltype(S), size(S, 1)) to avoid making an unnecessary copy and to ensure that the array is actually an Array (which is not necessarily the case if S is an AbstractMatrix).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will fix!

@coveralls
Copy link

Coverage Status

Coverage increased (+0.07%) to 97.48% when pulling 3349186 on jfs/istft into bcca68b on master.

@simonster
Copy link
Member

Not entirely sure what's going on with the history here, but the diff looks right. I guess the nuclear option is:

git diff master > patch.diff
git reset --hard master
patch -p1 < patch.diff
rm patch.diff

and then commit and force-push to this branch.

@jfsantos
Copy link
Member Author

jfsantos commented Jun 6, 2015

This should do the trick, thanks @simonster! Let's wait until the tests pass and then finally merge this?

@jfsantos
Copy link
Member Author

jfsantos commented Jul 2, 2015

Any more comments on this? @simonster @gummif @r9y9

@standarddeviant
Copy link

standarddeviant commented Oct 31, 2017

What ever happened to this? This came up in a discourse thread re: istft in DSP.jl:
https://discourse.julialang.org/t/should-there-be-an-inverse-stft-in-dsp-jl/6661/2

@jfsantos
Copy link
Member Author

@standarddeviant I ended up never being able to finish this PR and in the meantime, Julia changed a lot and I was not using it on a daily basis anymore. Please feel free to do whatever is needed to get this to work with newer versions of Julia and submit another PR.

@davidavdav
Copy link
Contributor

davidavdav commented Jan 19, 2018

I started to write an istft as well, today, until I found out this PR and the MusicProcessing reference mentioned above. The latter turned out to have an alternative implementation of MFCC as well.

It would be useful, indeed, to have this istft functionality added to DSP.

While implementing my own little istft, I noticed that DSP.arraysplit() seems to only keep "whole" segments, dropping the last samples of the source array. If we would strive for perfect reconstruction possibility with istft, arraysplit should probably have an option to augment the input.

While we're at it, it would also be great to have an option to arraysplit in such a way that the overlap is symmetric on the left and right of each frame, even for the first and last frame. It may be best to illustrate this with an example. Suppose I have 1000 samples, and I want frame-shifts of 100 samples and an analysis window of 300 samples. Then the first frame would be samples -100..200, the second frame would be sampes 0..300, etc until the last frame 800..1100 (give or take a Julia base 1 index offset). Negative and >size indexes would refer to augmented data, e.g., zeros.

The advantage would be that such a scheme would have 1000/100 = 10 frames, so that the frames are more easy to align with the original samples (first frame aligned with 0..100, the middle of the analysis window, etc.). The analysis would be symmetric w.r.t. start and end, of the signals, and of the frames themselves.

In Kaldi a modern speech recognition toolkit, this is known as the analysis option snip-edges=false.

@galenlynch
Copy link
Member

It seems a little bit unnatural to me to circularly wrap a signal by default. When I apply stfts to data, its usually because it's non-stationary data, and the first and the last frames end up being quite different from each other.

@davidavdav, would you have any interest in continuing this PR?

@bafonso
Copy link

bafonso commented Aug 17, 2020

gist here
Hi, I'd like to help to get the istft implemented in DSP. I am trying to bring back MusicProcessing and I need a working istft and I see others have shown interest in having this in the DSP library. I am trying to familiarise myself with the DSP.jl code and I am sharing the code inspired on this PR and made some changes so it plays nicely with current DSP.jl code. I tried to follow the same approach but I am sure I am missing several details.. help please? :-)

Right now it appears to work ok using a hanning window but wrongly scaled at the beginning when window is "nothing". I am not sure how to properly address this, perhaps the scaling I see in older code?

@test x1  Y1

fails even with hanning window, I believe because of discrepancies at the the first sample. Once we have the algorithm working I'd be happy to work on the tests, documentation and do a proper PR but happy to hear your thoughts

@davidavdav
Copy link
Contributor

It seems a little bit unnatural to me to circularly wrap a signal by default. When I apply stfts to data, its usually because it's non-stationary data, and the first and the last frames end up being quite different from each other.

I don't think signals under a stft transform should be circularly wrapped, either. Where is that suggested and/or happening?

@davidavdav, would you have any interest in continuing this PR?

I think that if we have a stft, there should be a potentially perfectly reconstructing istft.

@galenlynch
Copy link
Member

galenlynch commented Aug 17, 2020

I don't think signals under a stft transform should be circularly wrapped, either. Where is that suggested and/or happening?

Good question, I don't know where I got that from.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants