Skip to content

Commit

Permalink
Add docs to pitch detectors
Browse files Browse the repository at this point in the history
  • Loading branch information
siefkenj committed May 20, 2021
1 parent beda9dc commit a8f3da9
Show file tree
Hide file tree
Showing 11 changed files with 158 additions and 2 deletions.
3 changes: 3 additions & 0 deletions .cargo/config.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[build]
# Including this header allows for math to be rendered in the docs.
rustdocflags = ["--html-in-header", "./src/docs-header.html"]
4 changes: 4 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ rustfft = { version = "5.0.1", default-features = false }
criterion = "0.3"
hound = { version = "3.4.0" }

[package.metadata.docs.rs]
# Including this header allows for math to be rendered in the docs.
rustdoc-args = ["--html-in-header", "./src/docs-header.html"]

[[bench]]
name = "utils_benchmark"
harness = false
9 changes: 9 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,12 @@ fn main() {
## Live Demo
[![Demo Page](https://raw.githubusercontent.com/alesgenova/pitch-detection-app/master/demo.png)](https://alesgenova.github.io/pitch-detection-app/)
[Source](https://github.com/alesgenova/pitch-detection-app)

## Documentation
LaTeX formulas can be used in documentation. This is enabled by a method outlined in [rust-latex-doc-minimal-example](https://github.com/victe/rust-latex-doc-minimal-example). To build the docs, use
```
cargo doc --no-deps
```
The `--no-deps` flag is needed because special headers are included to auto-process the math in the documentation. This
header is specified using a relative path and so an error is produced if `cargo` tries generate documentation for
dependencies.
16 changes: 16 additions & 0 deletions src/detector/autocorrelation.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,19 @@
//! Autocorrelation is one of the most basic forms of pitch detection. Let $S=(s_0,s_1,\ldots,s_N)$
//! be a discrete signal. Then, the autocorrelation function of $S$ at time $t$ is
//! $$ A_t(S) = \sum_{i=0}^{N-t} s_i s_{i+t}. $$
//! The autocorrelation function is largest when $t=0$. Subsequent peaks indicate when the signal
//! is particularly well aligned with itself. Thus, peaks of $A_t(S)$ when $t>0$ are good candidates
//! for the fundamental frequency of $S$.
//!
//! Unfortunately, autocorrelation-based pitch detection is prone to octave errors, since a signal
//! may "line up" with itself better when shifted by amounts larger than by the fundamental frequency.
//! Further, autocorrelation is a bad choice for situations where the fundamental frequency may not
//! be the loudest frequency (which is common in telephone speech and for certain types of instruments).
//!
//! ## Implementation
//! Rather than compute the autocorrelation function directly, an [FFT](https://en.wikipedia.org/wiki/Fast_Fourier_transform)
//! is used, providing a dramatic speed increase for large buffers.

use crate::detector::internals::pitch_from_peaks;
use crate::detector::internals::DetectorInternals;
use crate::detector::internals::Pitch;
Expand Down
2 changes: 2 additions & 0 deletions src/detector/internals.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,8 @@ use crate::utils::peak::detect_peaks;
use crate::utils::peak::PeakCorrection;
use crate::{float::Float, utils::buffer::modulus_squared};

/// A pitch's `frequency` as well as `clarity`, which is a measure
/// of confidence in the pitch detection.
pub struct Pitch<T>
where
T: Float,
Expand Down
21 changes: 21 additions & 0 deletions src/detector/mcleod.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,24 @@
//! The McLeod pitch detection algorithm is based on the algorithm from the paper
//! *[A Smarter Way To Find Pitch](https://www.researchgate.net/publication/230554927_A_smarter_way_to_find_pitch)*.
//! It is efficient and offers an improvement over basic autocorrelation.
//!
//! The algorithm is based on finding peaks of the *normalized square difference* function. Let $S=(s_0,s_1,\ldots,s_N)$
//! be a discrete signal. The *square difference function* at time $t$ is defined by
//! $$ d\'(t) = \sum_{i=0}^{N-t} (s_i-s_{i+t})^2. $$
//! This function is close to zero when the signal "lines up" with itself. However, *close* is a relative term,
//! and the value of $d\'(t)$ depends on volume, which should not affect the pitch of the signal. For this
//! reason, the *normalized square difference function*, $n\'(t)$, is computed.
//! $$ n\'(t) = \frac{d\'(t)}{\sum_{i=0}^{N-t} (x_i^2+x_{i+t}^2) } $$
//! The algorithm then searches for the first local minimum of $n\'(t)$ below a given threshold, called the
//! *clarity threshold*.
//!
//! ## Implementation
//! As outlined in *A Smarter Way To Find Pitch*,
//! an [FFT](https://en.wikipedia.org/wiki/Fast_Fourier_transform) is used to greatly speed up the computation of
//! the normalized square difference function. Further, the algorithm applies some algebraic tricks and actually
//! searches for the *peaks* of $1-n\'(t)$, rather than minimums of $n\'(t)$.
//!
//! After a peak is found, quadratic interpolation is applied to further refine the estimate.
use crate::detector::internals::normalized_square_difference;
use crate::detector::internals::pitch_from_peaks;
use crate::detector::internals::DetectorInternals;
Expand Down
17 changes: 17 additions & 0 deletions src/detector/mod.rs
Original file line number Diff line number Diff line change
@@ -1,15 +1,32 @@
//! # Pitch Detectors
//! Each detector implements a different pitch-detection algorithm.
//! Every detector implements the standard [PitchDetector] trait.

use crate::detector::internals::Pitch;
use crate::float::Float;

pub mod autocorrelation;
#[doc(hidden)]
pub mod internals;
pub mod mcleod;
pub mod yin;

/// A uniform interface to all pitch-detection algorithms.
pub trait PitchDetector<T>
where
T: Float,
{
/// Get an estimate of the [Pitch] of the sound sample stored in `signal`.
///
/// Arguments:
///
/// * `signal`: The signal to be analyzed
/// * `sample_rate`: The number of samples per second contained in the signal.
/// * `power_threshold`: If the signal has a power below this threshold, no
/// attempt is made to find its pitch and `None` is returned.
/// * `clarity_threshold`: A number between 0 and 1 reflecting the confidence
/// the algorithm has in its estimate of the frequency. Higher `clarity_threshold`s
/// correspond to higher confidence.
fn get_pitch(
&mut self,
signal: &[T],
Expand Down
26 changes: 24 additions & 2 deletions src/detector/yin.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,27 @@
//! The YIN pitch detection algorithm is based on the algorithm from the paper
//! *[YIN, a fundamental frequency estimator for speech and music](http://recherche.ircam.fr/equipes/pcm/cheveign/ps/2002_JASA_YIN_proof.pdf)*.
//! It is efficient and offers an improvement over basic autocorrelation.
//!
//! The YIN pitch detection algorithm is similar to the [McLeod][crate::detector::mcleod], but it is based on
//! a different normalization of the *mean square difference function*.
//!
//! Let $S=(s_0,s_1,\ldots,s_N)$ be a discrete signal. The *mean square difference function* at time $t$
//! is defined by
//! $$ d(t) = \sum_{i=0}^{N-t} (s_i-s_{i+t})^2. $$
//! This function is close to zero when the signal "lines up" with itself. However, *close* is a relative term,
//! and the value of $d\'(t)$ depends on volume, which should not affect the pitch of the signal. For this
//! reason, the signal is normalized. The YIN algorithm computes the *cumulative mean normalized difference function*,
//! $$ d\'(t) = \begin{cases}1&\text{if }t=0\\\\ d(t) / \left[ \tfrac{1}{t}\sum_{i=0}^t d(i) \right] & \text{otherwise}\end{cases}. $$
//! Then, it searches for the first local minimum of $d\'(t)$ below a given threshold.
//!
//! ## Implementation
//! Rather than compute the cumulative mean normalized difference function directly,
//! an [FFT](https://en.wikipedia.org/wiki/Fast_Fourier_transform) is used, providing a dramatic speed increase for large buffers.
//!
//! After a candidate frequency is found, quadratic interpolation is applied to further refine the estimate.
//!
//! The current implementation does not perform *Step 6* of the algorithm specified in the YIN paper.

use crate::detector::internals::pitch_from_peaks;
use crate::detector::internals::Pitch;
use crate::detector::PitchDetector;
Expand All @@ -7,8 +31,6 @@ use crate::utils::peak::PeakCorrection;

use super::internals::{windowed_square_error, yin_normalize_square_error, DetectorInternals};

/// Pitch detection based on the YIN algorithm. See http://recherche.ircam.fr/equipes/pcm/cheveign/ps/2002_JASA_YIN_proof.pdf

pub struct YINDetector<T>
where
T: Float + std::iter::Sum,
Expand Down
16 changes: 16 additions & 0 deletions src/docs-header.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
<link rel="stylesheet" href="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.css" integrity="sha384-9eLZqc9ds8eNjO3TmqPeYcDj8n+Qfa4nuSiGYa6DjLNcv9BtN69ZIulL9+8CqC9Y" crossorigin="anonymous">
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/katex.min.js" integrity="sha384-K3vbOmF2BtaVai+Qk37uypf7VrgBubhQreNQe9aGsz9lB63dIFiQVlJbr92dw2Lx" crossorigin="anonymous"></script>
<script src="https://cdn.jsdelivr.net/npm/[email protected]/dist/contrib/auto-render.min.js" integrity="sha384-kmZOZB5ObwgQnS/DuDg6TScgOiWWBiVt0plIRkZCmE6rDZGrEOQeHM5PcHi+nyqe" crossorigin="anonymous"></script>
<script>
document.addEventListener("DOMContentLoaded", function() {
renderMathInElement(document.body, {
delimiters: [
{left: "$$", right: "$$", display: true},
{left: "\\(", right: "\\)", display: false},
{left: "$", right: "$", display: false},
{left: "\\[", right: "\\]", display: true}
]
});
});
</script>

2 changes: 2 additions & 0 deletions src/float/mod.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
//! Generic [Float] type which acts as a stand-in for `f32` or `f64`.
use rustfft::num_traits::float::FloatCore as NumFloatCore;
use rustfft::FftNum;
use std::fmt::{Debug, Display};

/// Signals are processed as arrays of [Float]s. A [Float] is normally `f32` or `f64`.
pub trait Float: Display + Debug + NumFloatCore + FftNum {}

impl Float for f64 {}
Expand Down
44 changes: 44 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,47 @@
//! # Pitch Detection
//! *pitch_detection* implements several algorithms for estimating the
//! fundamental frequency of a sound wave stored in a buffer. It is designed
//! to be usable in a WASM environment.
//!
//! # Detectors
//! A *detector* is an implementation of a pitch detection algorithm. Each detector's tolerance
//! for noise and polyphonic sounds varies.
//!
//! * [AutocorrelationDetector][detector::autocorrelation]
//! * [McLeodDetector][detector::mcleod]
//! * [YINDetector][detector::yin]
//!
//! # Examples
//! ```
//! use pitch_detection::detector::mcleod::McLeodDetector;
//! use pitch_detection::detector::PitchDetector;
//!
//! fn main() {
//! const SAMPLE_RATE: usize = 44100;
//! const SIZE: usize = 1024;
//! const PADDING: usize = SIZE / 2;
//! const POWER_THRESHOLD: f64 = 5.0;
//! const CLARITY_THRESHOLD: f64 = 0.7;
//!
//! // Signal coming from some source (microphone, generated, etc...)
//! let dt = 1.0 / SAMPLE_RATE as f64;
//! let freq = 300.0;
//! let signal: Vec<f64> = (0..SIZE)
//! .map(|x| (2.0 * std::f64::consts::PI * x as f64 * dt * freq).sin())
//! .collect();
//!
//! let mut detector = McLeodDetector::new(SIZE, PADDING);
//!
//! let pitch = detector
//! .get_pitch(&signal, SAMPLE_RATE, POWER_THRESHOLD, CLARITY_THRESHOLD)
//! .unwrap();
//!
//! println!("Frequency: {}, Clarity: {}", pitch.frequency, pitch.clarity);
//! }
//! ```

pub use detector::internals::Pitch;

pub mod detector;
pub mod float;
pub mod utils;

0 comments on commit a8f3da9

Please sign in to comment.