Skip to content

Commit

Permalink
Merge pull request #124 from TeoGiane/bayesmixr
Browse files Browse the repository at this point in the history
BayesMixR - an R interface to BayesMix
  • Loading branch information
TeoGiane authored Oct 10, 2023
2 parents 10e8330 + c6d1b45 commit e74eaba
Show file tree
Hide file tree
Showing 29 changed files with 1,046 additions and 5 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -34,3 +34,5 @@ src/hierarchies/updaters/.old/
examples/gamma_hierarchy/.old/
# .env file
.env
# R stuff
.Rproj.user
6 changes: 4 additions & 2 deletions INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,9 @@ brew install git g++ make cmake pkg-config

## Requirements - Windows

First of all, install `git` via the [Git for Windows](https://gitforwindows.org/) project. Download the [installer](https://github.com/git-for-windows/git/releases/latest) and complete the prompts leaving default choices to install. The Git BASH that comes with this program is the shell we suggest to compile and run `bayesmix`.
First of all, install `git` via the [Git for Windows](https://gitforwindows.org/) project. Download the [installer](https://github.com/git-for-windows/git/releases/latest) and complete the prompts leaving default choices to install.

<!-- The Git BASH that comes with this program is the shell we suggest to compile and run `bayesmix`. -->

On Windows, we also need the installation of a proper C++ toolchain to install the other required packages. `bayesmix` can be successfully compiled and installed with RTools40, RTools42 and RTools43 toolchains. This choice simplified the development on a lightweight `R` interface working on all platforms.

Expand Down Expand Up @@ -76,7 +78,7 @@ pacman -Sy mingw-w64-ucrt-x86_64-pkgconf

### Important remarks :

- Use the Git BASH shell available with Git for Windows to execute these commands. If `PATH` environment variable has been configured correctly, all requirements will be satisfied.
- Use the native windows Command Prompt (or PowerShell) to execute these commands. If `PATH` environment variable has been configured correctly, all requirements will be satisfied.
- In order for `bayesmix` to be properly linked to Intel's TBB library, the absolute path to `tbb` must be added to the User `PATH` variable. This is done automatically during build but to make this change effective user need to close and open a new Git BASH shell.

## Build `bayesmix`
Expand Down
10 changes: 10 additions & 0 deletions R/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Files
.Rbuildignore
*.Rproj
*.Rhistory
*.Rdata
*.nb.html

# Folders
build/
.Rproj.user/
96 changes: 96 additions & 0 deletions R/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,96 @@
# BayesMixR: an R interface to BayesMix

## Installation

The simplest way to install `bayesmixr` on all platforms is via [`devtools`](https://cran.r-project.org/web/packages/devtools/index.html) package in `R`. After you have cloned the `bayesmix` GitHub directory, open `R`, navigate to the `R/` sub-folder and install `bayesmixr` via:

```r
# Install devtools in case is not present
install.packages("devtools")

# Locally install bayesmixr and clean files created at installation time
devtools::install("bayesmixr/", quick = TRUE, args = "--clean")
```

## Usage

`bayesmixr` provides two main functions: `build_bayesmix` and `run_mcmc`. The first one installs `bayesmix` and its executables for you, while the second one calls the executable that runs the MCMC sampler from `R`.

### Building bayesmix

To build `bayesmix`, in a R/Rstudio session or script write

```r
# load library
library("bayesmixr")

# Set number of processors for parallel build (it defaults to half of your cores)
n_proc = 4

# Build bayesmix on your system
build_bayesmix(n_proc)
```

This will print out the full installation log.

### Running bayesmix

To `run_mcmc`, you must define the model and the algorithm in some configuration files or text strings. See the documentation for more details.

For instance, to fit a Dirichlet Process Mixture on univariate data using a Normal-NormalInverseGamma hierarchy using Neal's Algorithm 3, we use the following

```r
out = run_mcmc("NNIG", "DP", data, nnig_params, dp_params, algo_params, dens_grid)
```

where `data` is a numeric vector of data points, `dens_grid` is a numeric vector of points where to evaluate the density, and `nnig_params`, `dp_params` and `algo_params` are defined as follows.

```r
nnig_params =
"
ngg_prior {
mean_prior {
mean: 5.5
var: 2.25
}
var_scaling_prior {
shape: 0.2
rate: 0.6
}
shape: 1.5
scale_prior {
shape: 4.0
rate: 2.0
}
}
"
```

This specifies that the base (centering) measure is a Normal-InverseGamma with parameters $(\mu_0, \lambda_0, a_0, b_0)$. Moreover, $\mu_0 \sim \mathcal{N}(5.5, 2.25)$, $\lambda_0 \sim \mathcal{G}(0.2, 0.6)$, $a_0 = 1.5$ and $b_0 \sim \mathcal{G}(4.0, 2.0)$. See the messages `NNIGPrior` and `NNIGPrior::NGGPrior` in the file [hierarchy_prior.proto](https://github.com/bayesmix-dev/bayesmix/blob/master/src/proto/hierarchy_prior.proto) for further reference.

```r
dp_params =
"
gamma_prior {
totalmass_prior {
shape: 4.0
rate: 2.0
}
}
"
```

This specifies that the concentration parameter of the DP has an hyperprior which is a Gamma distribution with parameters (4, 2). Finally, we specify the parameters of the algorithm as follows:

```r
algo_params =
"
algo_id: 'Neal3'
rng_seed: 20201124
iterations: 2000
burnin: 1000
init_num_clusters: 3
"
```

See the notebook in `notebooks/gaussian_mix_uni.Rmd` for a concrete usage example
19 changes: 19 additions & 0 deletions R/bayesmixr/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Package: bayesmixr
Title: An R interface to BayesMix
Version: 0.1.3
Author: Matteo Gianella
Maintainer: Matteo Gianella <[email protected]>
Description: This package provides a light-weight R interface for BayesMix C++ library.
License: BSD_3_clause + file LICENSE
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Suggests:
devtools (>= 2.4.5),
testthat (>= 3.1.5)
Config/testthat/edition: 3
Imports:
bitops (>= 1.0.7),
RProtoBuf (>= 0.4.20),
utils (>= 4.3.1),
withr (>= 2.5.0)
3 changes: 3 additions & 0 deletions R/bayesmixr/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
YEAR: 2020
COPYRIGHT HOLDER: bayesmix-dev
ORGANIZATION: bayesmix
6 changes: 6 additions & 0 deletions R/bayesmixr/NAMESPACE
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
# Generated by roxygen2: do not edit by hand

export(build_bayesmix)
export(import_protobuf_messages)
export(read_many_proto_from_file)
export(run_mcmc)
69 changes: 69 additions & 0 deletions R/bayesmixr/R/build_bayesmix.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
#' Builds the BayesMix executable
#'
#' After the build, if no error has occurred, it saves the path into the \code{BAYESMIX_EXE} environment variable.
#' Such variable is defined only when this package is loaded in the R session.
#'
#' @param nproc Number of processes to use for parallel compilation. Thanks to \code{parallel} package,
#' this parameter defaults to half of the available processes (through \code{\link[parallel]{detectCores}} function)
#' @param build_subdir Name for the sub-directory of \code{bayesmix/} folder in which configuration and compilation happens.
#' Default value is \code{build}.
#' @return No output if build is successful, it raises errors otherwise
#'
#' @export
build_bayesmix <- function(nproc = ceiling(parallel::detectCores()/2), build_subdir = "build") {

# Check input types
if(!is.numeric(nproc)) { stop("nproc must be a number") }
if(!is.character(build_subdir)) { stop("build_subdir must be a string") }

# Get .Renviron file from package
renviron = system.file("bayesmixr.Renviron", package = "bayesmixr")

# Set bayesmix_home folder from BAYESMIXR_HOME
readRenviron(renviron)
home_dir = Sys.getenv("BAYESMIXR_HOME")
if(home_dir == ""){
stop("BAYESMIXR_HOME environment variable is not set")
}
bayesmix_home = dirname(dirname(home_dir))

# Create build/ subdirectory
build_dir = sprintf("%s/%s", bayesmix_home, build_subdir)
dir.create(build_dir, showWarnings = F)

# Configure bayesmix
cat("*** Configuring BayesMix ***\n")
flags = '-DDISABLE_TESTS=TRUE -DDISABLE_PLOTS=TRUE -DCMAKE_BUILD_TYPE=Release'
CONFIGURE = sprintf('cmake .. -G "Unix Makefiles" %s', flags)
errlog <- withr::with_dir(build_dir, system(CONFIGURE, ignore.stderr = TRUE))
if(errlog != 0L){
errmsg <- "Something went wrong during configure: command '%s' exit with status %d"
stop(sprintf(errmsg, CONFIGURE, errlog))
}
cat("\n")

# Build bayesmix::run_mcmc executable
cat("*** Building BayesMix executable ***\n")
BUILD = sprintf('make run_mcmc -j%d', nproc)
errlog <- withr::with_dir(build_dir, system(BUILD))
if (errlog != 0L) {
errmsg <- "Something went wrong during build: command '%s' exit with status %d"
stop(sprintf(errmsg, BUILD, errlog))
}
cat("\n")

# Set BAYESMIX_EXE environment variable
cat("*** Setting BAYESMIX_EXE environment variable ***\n")
write(x = sprintf('BAYESMIX_EXE=%s/run_mcmc', build_dir), file = renviron, append = TRUE)
cat("\n")

# Set TBB_PATH environment variable
cat("*** Setting TBB_PATH environment variable ***\n")
tbb_path = sprintf('%s/lib/_deps/math-src/lib/tbb', bayesmix_home)
write(x = sprintf('TBB_PATH=%s', tbb_path), file = renviron, append = TRUE)
cat("\n")

# Parse .Renviron file to get environment variables
readRenviron(renviron)
cat("Successfully installed BayesMix\n")
}
39 changes: 39 additions & 0 deletions R/bayesmixr/R/decoder.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#' Return a decoder for a basic varint value (does not include tag).
#'
#' Decoded values will be bitwise-anded with the given mask before being
#' returned, e.g. to limit them to 32 bits. The returned decoder does not take
#' the usual "end" parameter -- the caller is expected to do bounds checking
#' after the fact (often the caller can defer such checking until later). The
#' decoder returns a (value, new_pos) pair.
#'
#' @keywords internal
VarintDecoder = function(mask, result_type) {

# Define DecodeVarint function
DecodeVarint <- function(buffer, pos) {
result = 0
shift = 0
while (TRUE) {
b = as.numeric(buffer[pos])
result = bitops::bitOr(result, bitops::bitShiftL(bitops::bitAnd(b, 0x7f), shift))
pos = pos + 1
if (!bitops::bitAnd(b, 0x80)) {
result <- bitops::bitAnd(result, mask)
result <- result_type(result)
return(list(result = result, pos = as.integer(pos)))
}
shift <- shift + 7
if (shift >= 64) {
stop('Too many bytes when decoding varint.')
}
}
}

# Return the decoder as result
return(DecodeVarint)
}

#' Use this decoder version for values which must be limited to 32 bits.
#'
#' @keywords internal
DecodeVarint32 = VarintDecoder(2^32 - 1, as.integer)
Loading

0 comments on commit e74eaba

Please sign in to comment.