-
Notifications
You must be signed in to change notification settings - Fork 18
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #124 from TeoGiane/bayesmixr
BayesMixR - an R interface to BayesMix
- Loading branch information
Showing
29 changed files
with
1,046 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -34,3 +34,5 @@ src/hierarchies/updaters/.old/ | |
examples/gamma_hierarchy/.old/ | ||
# .env file | ||
.env | ||
# R stuff | ||
.Rproj.user |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
# Files | ||
.Rbuildignore | ||
*.Rproj | ||
*.Rhistory | ||
*.Rdata | ||
*.nb.html | ||
|
||
# Folders | ||
build/ | ||
.Rproj.user/ |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
# BayesMixR: an R interface to BayesMix | ||
|
||
## Installation | ||
|
||
The simplest way to install `bayesmixr` on all platforms is via [`devtools`](https://cran.r-project.org/web/packages/devtools/index.html) package in `R`. After you have cloned the `bayesmix` GitHub directory, open `R`, navigate to the `R/` sub-folder and install `bayesmixr` via: | ||
|
||
```r | ||
# Install devtools in case is not present | ||
install.packages("devtools") | ||
|
||
# Locally install bayesmixr and clean files created at installation time | ||
devtools::install("bayesmixr/", quick = TRUE, args = "--clean") | ||
``` | ||
|
||
## Usage | ||
|
||
`bayesmixr` provides two main functions: `build_bayesmix` and `run_mcmc`. The first one installs `bayesmix` and its executables for you, while the second one calls the executable that runs the MCMC sampler from `R`. | ||
|
||
### Building bayesmix | ||
|
||
To build `bayesmix`, in a R/Rstudio session or script write | ||
|
||
```r | ||
# load library | ||
library("bayesmixr") | ||
|
||
# Set number of processors for parallel build (it defaults to half of your cores) | ||
n_proc = 4 | ||
|
||
# Build bayesmix on your system | ||
build_bayesmix(n_proc) | ||
``` | ||
|
||
This will print out the full installation log. | ||
|
||
### Running bayesmix | ||
|
||
To `run_mcmc`, you must define the model and the algorithm in some configuration files or text strings. See the documentation for more details. | ||
|
||
For instance, to fit a Dirichlet Process Mixture on univariate data using a Normal-NormalInverseGamma hierarchy using Neal's Algorithm 3, we use the following | ||
|
||
```r | ||
out = run_mcmc("NNIG", "DP", data, nnig_params, dp_params, algo_params, dens_grid) | ||
``` | ||
|
||
where `data` is a numeric vector of data points, `dens_grid` is a numeric vector of points where to evaluate the density, and `nnig_params`, `dp_params` and `algo_params` are defined as follows. | ||
|
||
```r | ||
nnig_params = | ||
" | ||
ngg_prior { | ||
mean_prior { | ||
mean: 5.5 | ||
var: 2.25 | ||
} | ||
var_scaling_prior { | ||
shape: 0.2 | ||
rate: 0.6 | ||
} | ||
shape: 1.5 | ||
scale_prior { | ||
shape: 4.0 | ||
rate: 2.0 | ||
} | ||
} | ||
" | ||
``` | ||
|
||
This specifies that the base (centering) measure is a Normal-InverseGamma with parameters $(\mu_0, \lambda_0, a_0, b_0)$. Moreover, $\mu_0 \sim \mathcal{N}(5.5, 2.25)$, $\lambda_0 \sim \mathcal{G}(0.2, 0.6)$, $a_0 = 1.5$ and $b_0 \sim \mathcal{G}(4.0, 2.0)$. See the messages `NNIGPrior` and `NNIGPrior::NGGPrior` in the file [hierarchy_prior.proto](https://github.com/bayesmix-dev/bayesmix/blob/master/src/proto/hierarchy_prior.proto) for further reference. | ||
|
||
```r | ||
dp_params = | ||
" | ||
gamma_prior { | ||
totalmass_prior { | ||
shape: 4.0 | ||
rate: 2.0 | ||
} | ||
} | ||
" | ||
``` | ||
|
||
This specifies that the concentration parameter of the DP has an hyperprior which is a Gamma distribution with parameters (4, 2). Finally, we specify the parameters of the algorithm as follows: | ||
|
||
```r | ||
algo_params = | ||
" | ||
algo_id: 'Neal3' | ||
rng_seed: 20201124 | ||
iterations: 2000 | ||
burnin: 1000 | ||
init_num_clusters: 3 | ||
" | ||
``` | ||
|
||
See the notebook in `notebooks/gaussian_mix_uni.Rmd` for a concrete usage example |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,19 @@ | ||
Package: bayesmixr | ||
Title: An R interface to BayesMix | ||
Version: 0.1.3 | ||
Author: Matteo Gianella | ||
Maintainer: Matteo Gianella <[email protected]> | ||
Description: This package provides a light-weight R interface for BayesMix C++ library. | ||
License: BSD_3_clause + file LICENSE | ||
Encoding: UTF-8 | ||
Roxygen: list(markdown = TRUE) | ||
RoxygenNote: 7.2.3 | ||
Suggests: | ||
devtools (>= 2.4.5), | ||
testthat (>= 3.1.5) | ||
Config/testthat/edition: 3 | ||
Imports: | ||
bitops (>= 1.0.7), | ||
RProtoBuf (>= 0.4.20), | ||
utils (>= 4.3.1), | ||
withr (>= 2.5.0) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
YEAR: 2020 | ||
COPYRIGHT HOLDER: bayesmix-dev | ||
ORGANIZATION: bayesmix |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
# Generated by roxygen2: do not edit by hand | ||
|
||
export(build_bayesmix) | ||
export(import_protobuf_messages) | ||
export(read_many_proto_from_file) | ||
export(run_mcmc) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,69 @@ | ||
#' Builds the BayesMix executable | ||
#' | ||
#' After the build, if no error has occurred, it saves the path into the \code{BAYESMIX_EXE} environment variable. | ||
#' Such variable is defined only when this package is loaded in the R session. | ||
#' | ||
#' @param nproc Number of processes to use for parallel compilation. Thanks to \code{parallel} package, | ||
#' this parameter defaults to half of the available processes (through \code{\link[parallel]{detectCores}} function) | ||
#' @param build_subdir Name for the sub-directory of \code{bayesmix/} folder in which configuration and compilation happens. | ||
#' Default value is \code{build}. | ||
#' @return No output if build is successful, it raises errors otherwise | ||
#' | ||
#' @export | ||
build_bayesmix <- function(nproc = ceiling(parallel::detectCores()/2), build_subdir = "build") { | ||
|
||
# Check input types | ||
if(!is.numeric(nproc)) { stop("nproc must be a number") } | ||
if(!is.character(build_subdir)) { stop("build_subdir must be a string") } | ||
|
||
# Get .Renviron file from package | ||
renviron = system.file("bayesmixr.Renviron", package = "bayesmixr") | ||
|
||
# Set bayesmix_home folder from BAYESMIXR_HOME | ||
readRenviron(renviron) | ||
home_dir = Sys.getenv("BAYESMIXR_HOME") | ||
if(home_dir == ""){ | ||
stop("BAYESMIXR_HOME environment variable is not set") | ||
} | ||
bayesmix_home = dirname(dirname(home_dir)) | ||
|
||
# Create build/ subdirectory | ||
build_dir = sprintf("%s/%s", bayesmix_home, build_subdir) | ||
dir.create(build_dir, showWarnings = F) | ||
|
||
# Configure bayesmix | ||
cat("*** Configuring BayesMix ***\n") | ||
flags = '-DDISABLE_TESTS=TRUE -DDISABLE_PLOTS=TRUE -DCMAKE_BUILD_TYPE=Release' | ||
CONFIGURE = sprintf('cmake .. -G "Unix Makefiles" %s', flags) | ||
errlog <- withr::with_dir(build_dir, system(CONFIGURE, ignore.stderr = TRUE)) | ||
if(errlog != 0L){ | ||
errmsg <- "Something went wrong during configure: command '%s' exit with status %d" | ||
stop(sprintf(errmsg, CONFIGURE, errlog)) | ||
} | ||
cat("\n") | ||
|
||
# Build bayesmix::run_mcmc executable | ||
cat("*** Building BayesMix executable ***\n") | ||
BUILD = sprintf('make run_mcmc -j%d', nproc) | ||
errlog <- withr::with_dir(build_dir, system(BUILD)) | ||
if (errlog != 0L) { | ||
errmsg <- "Something went wrong during build: command '%s' exit with status %d" | ||
stop(sprintf(errmsg, BUILD, errlog)) | ||
} | ||
cat("\n") | ||
|
||
# Set BAYESMIX_EXE environment variable | ||
cat("*** Setting BAYESMIX_EXE environment variable ***\n") | ||
write(x = sprintf('BAYESMIX_EXE=%s/run_mcmc', build_dir), file = renviron, append = TRUE) | ||
cat("\n") | ||
|
||
# Set TBB_PATH environment variable | ||
cat("*** Setting TBB_PATH environment variable ***\n") | ||
tbb_path = sprintf('%s/lib/_deps/math-src/lib/tbb', bayesmix_home) | ||
write(x = sprintf('TBB_PATH=%s', tbb_path), file = renviron, append = TRUE) | ||
cat("\n") | ||
|
||
# Parse .Renviron file to get environment variables | ||
readRenviron(renviron) | ||
cat("Successfully installed BayesMix\n") | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,39 @@ | ||
#' Return a decoder for a basic varint value (does not include tag). | ||
#' | ||
#' Decoded values will be bitwise-anded with the given mask before being | ||
#' returned, e.g. to limit them to 32 bits. The returned decoder does not take | ||
#' the usual "end" parameter -- the caller is expected to do bounds checking | ||
#' after the fact (often the caller can defer such checking until later). The | ||
#' decoder returns a (value, new_pos) pair. | ||
#' | ||
#' @keywords internal | ||
VarintDecoder = function(mask, result_type) { | ||
|
||
# Define DecodeVarint function | ||
DecodeVarint <- function(buffer, pos) { | ||
result = 0 | ||
shift = 0 | ||
while (TRUE) { | ||
b = as.numeric(buffer[pos]) | ||
result = bitops::bitOr(result, bitops::bitShiftL(bitops::bitAnd(b, 0x7f), shift)) | ||
pos = pos + 1 | ||
if (!bitops::bitAnd(b, 0x80)) { | ||
result <- bitops::bitAnd(result, mask) | ||
result <- result_type(result) | ||
return(list(result = result, pos = as.integer(pos))) | ||
} | ||
shift <- shift + 7 | ||
if (shift >= 64) { | ||
stop('Too many bytes when decoding varint.') | ||
} | ||
} | ||
} | ||
|
||
# Return the decoder as result | ||
return(DecodeVarint) | ||
} | ||
|
||
#' Use this decoder version for values which must be limited to 32 bits. | ||
#' | ||
#' @keywords internal | ||
DecodeVarint32 = VarintDecoder(2^32 - 1, as.integer) |
Oops, something went wrong.