binsegRcpp Efficient implementation of the binary segmentation heuristic algorithm for changepoint detection, using C++ std::multiset. Also contains functions for comparing empirical time complexity to best/worst case.
tests | |
coverage |
install.packages("binsegRcpp")
##OR
if(require("remotes"))install.packages("remotes")
remotes::install_github("tdhock/binsegRcpp")
The main function is binseg
for which you must at least specify the
first two arguments:
distribution.str
specifies the loss function to minimize.data.vec
is a numeric vector of data to segment.
> x <- c(0.1, 0, 1, 1.1, 0.1, 0)
> (models.dt <- binsegRcpp::binseg("mean_norm", x))
binary segmentation model:
segments end loss validation.loss
<int> <int> <num> <num>
1: 1 6 1.348333e+00 0
2: 2 4 1.015000e+00 0
3: 3 2 1.500000e-02 0
4: 4 3 1.000000e-02 0
5: 5 5 5.000000e-03 0
6: 6 1 -3.339343e-16 0
The result above summarizes the data that are computed during the binary segmentation algorithm. It has a special class with dedicated methods:
> class(models.dt)
[1] "binsegRcpp" "list"
> methods(class="binsegRcpp")
[1] coef plot print
see '?methods' for accessing help and source code
The coef methods returns a data table of segment means:
> coef(models.dt, segments=2:3)
segments start end start.pos end.pos mean
<int> <int> <int> <num> <num> <num>
1: 2 1 4 0.5 4.5 0.55
2: 2 5 6 4.5 6.5 0.05
3: 3 1 2 0.5 2.5 0.05
4: 3 3 4 2.5 4.5 1.05
5: 3 5 6 4.5 6.5 0.05
Demo of poisson loss and non-uniform weights:
> data.vec <- c(3,4,10,20)
> (fit1 <- binsegRcpp::binseg("poisson", data.vec, weight.vec=c(1,1,1,10)))
binary segmentation model:
segments end loss validation.loss
<int> <int> <num> <num>
1: 1 4 -393.8437 0
2: 2 3 -411.6347 0
3: 3 2 -413.9416 0
4: 4 1 -414.0133 0
Demo of change in mean and variance for normal distribution:
> sim <- function(mu,sigma)rnorm(10000,mu,sigma)
> set.seed(1)
> data.vec <- c(sim(5,1), sim(0, 5))
> fit <- binsegRcpp::binseg("meanvar_norm", data.vec)
> coef(fit, 2L)
segments start end start.pos end.pos mean var
<int> <int> <int> <num> <num> <num> <num>
1: 2 1 10000 0.5 10000.5 4.99346296 1.024763
2: 2 10001 20000 10000.5 20000.5 -0.02095033 24.538556
Other implementations of binary segmentation include changepoint::cpt.mean(method=”BinSeg”) (quadratic storage in max number of segments), BinSeg::BinSegModel() (same linear storage as binsegRcpp), and ruptures.Binseg() (unknown storage). Figures comparing the timings.
This version uses the Rcpp/.Call interface whereas the binseg package uses the .C interface.
See branches for variations of the interface to use as test cases in RcppDeepState development.