-
Notifications
You must be signed in to change notification settings - Fork 168
BIDMach's Architecture
jcanny edited this page May 1, 2014
·
12 revisions
BIDMach has a modular design intended to make it very easy to create new models, to run diverse datasources, and tailor the performance measures that are optimized in training. A graphic of BIDMach's architecture appears below:
The elements of the architecture are:
- Datasources support a "next" method which produces a minibatch of data i.e. a block of samples of specified size. The datasource itself may be backed by an array in memory, a collection of files on disk, or an HDFS source. Datasources in general output multiple matrices in response to the next method: for instance a datasource for training regression models outputs a block of k samples as a sparse matrix and a block of k class membership vectors as a dense matrix. Some datasources also support a "putBack" method, which allows data to be pushed back into the Datasource. Such sources are therefore both sources and sinks. For instance, a datasources for regression prediction has two "output" matrices: one contains includes the data instances to predict from, and the second matrix contains the predictions.
- Learner Classes manage data movement, training and synchronization of distributed models. The simplest learner runs one thread and simply calls model update, mixin and updater methods. There are two parallel learner classes. One of these uses a single data sources and distributes data blocks to multiple threads as shown in the figure above, the other has one datasource attached to each model thread.
- Models implement particular learning algorithms in a minibatch framework. Models support an update method, which often produce a model gradient from the current model and input data block, and a likelihood method which returns the likelihood of a data block given the current model. Models are represented with one or more matrices, which are stored in model.modelmats. Not all updates are gradients, and e.g. multiplicative updates involve two matrices whose ratio defines the final model.
- Mixins are additional terms in the loss function that the learner minimizes. Mixins include L1 and L2 regularizers, but also richer terms like within-topic entropy, cross-topic divergence, distributional skew (borrowed from ICA and used to improve topic independence). Any number of mixins may be used with a particular learner. Mixins also include an individual loss term that is evaluated and save during learning. They are natural "KPIs" (Key Performance Indicators), and are very useful for model tuning.
- Updaters implement particular optimization strategies. They include simple batch updates and many kinds of minibatch update. Minibatch updaters use dynamic windows and decreasing weights to integrate gradient updates with an evolving model.