Skip to content

amcerbu/Delay-Embedding-and-Subspace-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Audio Visualization via Delay Embedding & Subspace Learning

This is the code repository to accompany a submission to DAFx 2024. Included are two Jupyter notebooks and a C++ application.

  • Figures.ipynb: The notebook used to generate the figures in the Methods section of our paper, and to generate all videos in this repository.
  • Frequency.ipynb: A self-contained notebook demonstrating the frequency-detection algorithm described in Appendix B.
  • Scope: Source code for a realtime C++ visualization application. A demonstration video is available here. The project depends on SDL, PortAudio, Eigen3, and RtMidi.

Interpreting the videos

The videos embedded in this document each have an attached description. Each video clip this section is the visualizations of a single sound, rendered simultaneously with the parameters of figures 1, 3, 4, and 5 from the manuscript. The shading of the curve indicates the distance to the plane of projection (lighter is closer). The colored dots -- or, when $N$ is large, what may appear to be a multicolored curve -- are the rows of the matrix $A$ learned whose columns are orthonormal and approximate the trajectory in $\mathbf R^n$.

Notice how the positions of the colored dots -- the subspace learned -- tend to depend only on the pitch. In the highest-dimensional case, the learned matrix is often a bandpass filterbank which selects for the high-amplitude frequencies present. In that case the multicolored curve will be displayed as an arc of a circle; the angle through which this arc turns corresponds to the frequency detected. This notion is only a heuristic; for exact reconstruction of frequency information using our methods see Frequency.ipynb.

Visualizations of OrchideaSOL

We have visualized a number of orchestral samples from the OrchideaSOL database: for a number of instruments, each playing a number of pitches at three different dynamics. Those can be found in the Orch directory; a few are selected as examples below. These visualizations have been produced with the same four choices of parameters as above. As with the synthetic visualizations, the shading of the gray curve records distance to projecting plane, and the colorful dots record the choice of projection.

An accordion playing $C3$.

Acc.C3.mf.mp4

A bassoon playing $C3$.

Bn.C3.ff.mp4

An alto saxophone playing $D4$. Notice how the high partials appear only after the fundamental is at resonance.

ASax.D4.mf.mp4

A flute playing $C4$. Note the overall consistency of the shape despite the presence of air noise. Note also that the visualizations created with larger $N$ have filtered out the noise (the curves are smoother).

Fl.C4.ff.mp4

A viola playing $D4$. Note the sensitivity of the image to the slight fluctuations in bow pressure and speed.

Va.D4.mf.mp4

A contrabass playing $G2$. Note, toward the end, that the string vibrates inharmonically without the phase-locking from the bow.

Cb.G2.mf.mp4

Synthetic sounds

Below is a synthesized tone created by passing white noise through a harmonically tuned collection of two-pole bandpass filters. The resonance of those filters is adjusted over the course of the video. Notice how the projection information stabilizes when the sound is pitched.

noisy.mp4

Below is a synthesized glissando. Notice the changes in shape -- due to aliasing -- in the projections of low-dimensional embeddings (the first row). Notice also how the shapes of the colorful curves change as pitch changes.

gliss.mp4

Below is a synthesized sum of sinusoids whose harmonicity is modulated.

inharmonic.mp4

Below is a synthesized sum of sinusoids whose brightness is modulated.

bright.mp4

Miscellaneous

We include below a handful of miscellaneous visualizations with other parameter choices.

A clarinet playing $B3$, with embedding dimension $N = 48$ and analysis dimension $k = 4$. In order to visualize the learned four-dimensional curve, we draw its projections to the six two-dimensional coordinate planes in $\mathbf R^4$.

2024.04.09.02.57.24.mp4

Visualizing a short bass clarinet melody, $N = 15$ and again $k = 4$.

2024.04.05.19.28.41.mp4

About

Audio visualization algorithms

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published