MCMC Monitor enables tracking and visualization of MCMC processes executed with Stan in local or remote web browsers. When you run a sampler, you can configure Stan to generate output to a directory on your computer. MCMC Monitor reads this output and displays it in the web app, with real-time updates. As you track the progress of the run, MCMC provides diagnostic plots and statistics.
- Run the mcmc-monitor service on your local computer (see below), configuring it to monitor an output directory.
- Run a Stan program, configuring it to write output to a subdirectory of the output directory.
- Open the web app to monitor and visualize the Stan run.
You can optionally configure MCMC Monitor to enable monitoring from a remote computer (see below).
- Running the monitoring service
- Running a Stan program
- Opening the web app
- Enabling remote access
- Installing cmdstan and cmdstanpy
- Including and excluding parameters
To see MCMC-Monitor in action against sample data, follow this link to a demo version.
This tool is under active development. We welcome bug reports and feature requests--please feel free to submit a Github issue if you spot a way MCMC Monitor can be improved.
First, make sure you have a recent version of NodeJS installed. Our tests use version 16.
(node
version 16.0.0 is considered the minimum supported version for the project.)
Note: there is no need to clone this repo. You can use the npx command below.
Install node-pre-gyp globally:
# prerequisite
npm install -g @mapbox/node-pre-gyp
Start the monitor:
npx mcmc-monitor@latest start --dir /path/to/parent/output/directory/of/sampler --verbose
# The server will start listening for requests
# Keep this terminal open
# Open the web app linked above
# Optionally use the --enable-remote-access flag
The --dir
flag tells mcmc-monitor
which directory to monitor for output, and should correspond to the directory where
the sampler (e.g. Stan) writes its output files. The service expects each subdirectory of --dir
to contain the output
CSV files from one execution of a sampler program (a "run"). These runs are the links displayed on the MCMC Monitor home page.
For example, suppose we run mcmc-monitor
with --dir /home/user/examples/
, and that this directory contains two
subdirectories, multi-normal-1
and multi-normal-2
, each containing the output of a Stan run (completed or
still in progress). Then the monitor home page will list two runs, linked as multi-normal-1
and multi-normal-2
.
If another process subsequently writes an analysis to /home/user/examples/analysis-3
, then the monitor will
begin displaying a run called analysis-3
as well.
Note that there is not currently a way to change the monitored directory while the monitor is running.
Some tools (such as CmdStanPy
and CmdStanR
) use tmp
directories by default; to monitor the output from
these programs, you must either tell MCMC Monitor to monitor the tmp
directory, or use an output_dir
argument
to send program output to specified subdirectories under the monitored directory.
To enable remote access (i.e., access this monitor service from a different computer) follow the instructions in the section below.
Some users have experienced an issue in which the npx mcmc-monitor@latest ...
command above is not successful. In
these cases, npx
asks for permission to install the new version, and appears to install, but does not actually install or launch
mcmc-monitor
. In this case, it may be necessary to install mcmc-monitor
manually:
npm install mcmc-monitor@latest
After manually installing, invoking the program with npx
as above should work. However, you will need to rerun the manual installation step
when new versions are released. Also note that installing a node package without the -g
(global) flag will cause npm to create
a node_modules
folder in the current working directory, so you may wish to create a specific subdirectory first if you don't
want this added to your home directory.
Install cmdstan, and optionally cmdstanpy or cmdstanr (see installation instructions).
There are some example scripts in the examples directory of this repo. Try
cd examples
python test_multi_normal.py
# you can monitor this run using mcmc-monitor
python test_finite_mixture.py
# this one executes too quickly to monitor
# but you can view the output using mcmc-monitor
For the above examples, you should monitor the examples/example-output directory.
The web app is hosted here.
By default, the web app will attempt to connect to your monitoring service on port 61542 of localhost, but you can configure the GUI to point to other services, including remote services via our proxy server. See enabling remote access.
Monitoring a local service will not currently work in Safari due to an issue with non-user-configurable security settings. Firefox and Chrome are known to work fine for monitoring local services, including on Mac environments; affected users should use one of these browsers.
Safari can still monitor remote services, as well as services delivered over proxy, without a problem.
Regardless of browser, mobile users will likely need to use landscape orientation.
To allow remote computers to access your monitoring service, do the following
- Use the --enable-remote-access flag when starting the service.
- Follow the link printed in the console output.
How does this work? We provide a proxy server that allows remote machines to access your monitor service. In order to avoid excessive bandwidth usage on our server, the system establishes a WebRTC connection so that traffic flows directly between computers, bypassing our proxy. However, since it is not always possible to establish a WebRTC connection (due to firewall configurations), you may need to disable WebRTC by changing webrtc=1
to webrtc=0
in the query parameters of the URL. Note that in the case of webrtc=0
, our proxy server may limit the amount of data that is served. Please try to keep webrtc=1
whenever that is working. You can also host your own proxy server.
To install cmdstan and cmdstanpy (within your conda environment):
conda install -c conda-forge cmdstan cmdstanpy
Important: You must reopen your terminal after running this install command, or reactivate the conda environment. This is necessary so that the $CMDSTAN environment variable is set properly.
For more information see the cmdstanpy documentation.
By default, MCMC Monitor will not necessarily monitor all model parameters. It will always monitor system diagnostic variables (those ending in __
such as lp__
), scalar parameters, and vectors, matrices and tensors with up to 100 elements. By default, MCMC will not monitor variables with larger than 100 elements. You can override this by creating a mcmc-run.yaml
file in the output directory for the run and then including the following content, for example:
# mcmc-run.yaml
includeVariables: [y.1, y.2, y.3]
This will force monitoring of the variables included.
Markov Chain Monte Carlo (MCMC) sampling is a method of sampling from a probability distribution, such as a posterior distribution, in order to approximate the distribution. This is accomplished by running a Markov Chain with the desired distribution as its equilibrium distribution. With each step of the Markov Chain, a sample is taken from the probability distribution. After a sufficient number of steps, the samples will approximate the desired distribution.
Stan is a statistical software package designed for Bayesian inference. To use Stan, the user must first write a Stan program, which contains a probabilistic model specification and a set of data-generating parameters. The program is then compiled and run on the Stan platform, which uses MCMC sampling to generate posterior distributions for the parameters in the model. With the posterior distributions, the user can then make inferences about the data and make predictions.
Monitoring a running Stan program provides insight into the progress of the run and the results of the sampling. By tracking the progress of the MCMC sampling, it is possible to detect and diagnose problems with the program and observe whether the iterations are converging to the equilibrium distribution. Additionally, monitoring the results of the sampling allows the user to gain a better understanding of the posterior distributions, even before the program completes, which can inform decisions and predictions.
Apache-2.0
Jeremy Magland and Jeff Soules, Center for Computational Mathematics, Flatiron Institute
Thanks also to
- Brian Ward
- Bob Carpenter