documentation.tex

\documentclass[12pt,pdflatex]{article} 
\usepackage{color}
\usepackage[usenames,dvipsnames]{xcolor}
\usepackage{colortbl} 
\usepackage{graphicx} 
\usepackage{colordvi} 
\usepackage{amssymb} 
\usepackage{mathtools}
\usepackage{ulem} 
\usepackage{hyperref}

\newtheorem{listing}{Listing}

\begin{document}
\tableofcontents

\newpage 

\section{Introduction}

GooFit\footnote{Named in homage to RooFit, with the `G' standing for `GPU'.}
is a framework for creating arbitrary probability density functions (PDFs) 
and evaluating them over large datasets using nVidia Graphics Processing Units (GPUs). 
New PDFs are written partly in nVidia's CUDA programming language and
partly in C++; however, no expertise in CUDA is required to get started, 
because the already-existing PDFs can be put together in plain C++. 

Aside from the mass of unenlightened hominids who have not yet discovered
their need for a massively-parallel fitting framework, there are three kinds
of GooFit users:
\begin{itemize}
\item Initiates, who write ``user-level code'' - that is, code which 
instantiates existing PDF classes in some combination. No knowledge of
CUDA is required for this level.
If your data can be described
by a combination of not-too-esoteric functions, even if the combination is 
complicated, then user code is sufficient. Section \ref{sec:usercode}
gives an example of how to write a simple fit.
\item Acolytes, or advanced users, who have grasped the art of creating new PDF classes.
This involves some use of CUDA, but is mainly a question of understanding
the variable-index organisation that GooFit PDFs use. Section \ref{sec:newpdfs}
considers this organisation in some depth.
\item Ascended Masters, or architects, who by extended meditation have 
acquired a full understanding of the core engine of GooFit, and can modify it
to their desire\footnote{Although, if they are \emph{Buddhist} masters,
they don't even though they can, since they have transcended desire - and suffering 
with it.}. Section \ref{sec:engine} gives a detailed narrative of the
progress of a PDF evaluation through the engine core, thus elucidating its
mysteries. It should only rarely be necessary to acquire this level of 
mastery; in principle only the developers of GooFit need to know its 
internal details. 
\end{itemize}

Aside from considerations of the user's understanding, GooFit does require
a CUDA-capable graphics card to run on, with compute capability at least 2.1.
Further, you will need nVidia's CUDA SDK, in particular the \texttt{nvcc} compiler.
Aside from this, GooFit is known to compile and run on Fedora 14, Ubuntu 12.04,
and OSX 10.8.4. It has been tested on the Tesla, Fermi, and Kepler
generations of nVidia GPUs. 

\subsection{Getting started}

You will need to have a CUDA-capable device and to have the development
environment (also known as the software development kit or SDK) 
set up, with access to the compiler \texttt{nvcc} and its libraries.
If you have the hardware, you can get the SDK 
from \href{https://developer.nvidia.com/gpu-computing-sdk}{nVidia's website}.

With your CUDA environment set up, you can install GooFit thus: 
\begin{itemize}
\item Clone from the GitHub repository:
\begin{verbatim}
git clone git://github.com/GooFit/GooFit.git
cd GooFit
\end{verbatim}
\item If necessary, edit the Makefile so the variable \texttt{CUDALOCATION}
points to your local CUDA install.
\item Compile GooFit with \texttt{gmake} or \texttt{make}. Do not be alarmed
by warning messages saying that such-and-such a function's stack size could
not be statically determined; this is an unavoidable (so far) side effect 
of the function-pointer implementation discussed in section \ref{sec:engine}.
\item Compile and run the `simpleFitExample' program, which generates
three distributions, fits them, and plots the results:
\begin{verbatim}
cd examples/simpleFit
gmake
./simpleFitExample
\end{verbatim}
The expected output is a MINUIT log for three different fits, 
and three image files. 
\item Compile and run the Dalitz-plot tutorial, which fits a 
text file containing toy Monte Carlo data to a coherent sum of
Breit-Wigner resonances: 
\begin{verbatim}
cd examples/dalitz
export CUDALOCATION=/usr/local/cuda/
gmake
./dalitz dalitz_toyMC_000.txt
\end{verbatim}
\end{itemize}

Quick troubleshooting: If your shell doesn't like \texttt{export},
try instead \verb|setenv CUDALOCATION /usr/local/cuda/|. Check
that \verb|/usr/local/cuda/| exists and contains, eg, \verb|bin/nvcc| - 
otherwise, track down the directory that does and set \verb|CUDALOCATION|
to point to that instead. Some installs have \verb|make| in place of \verb|gmake|. 

The text file contains information about simulated decays of the
$D^0$ particle to $\pi^+\pi^-\pi^0$; in particular, in each line, the second and
third numbers are the Dalitz-plot coordinates $m^2(pi^+\pi^0)$ and
$m^2(pi^-\pi^0)$. The \texttt{dalitz} program creates a PDF describing the distribution of
these two variables in terms of Breit-Wigner resonances, 
reads the data, sends it to the GPU, and fits the PDF to the data
 - the floating parameters are the complex coefficients of the resonances.
The expected output is a MINUIT fit log showing that the fit
converged, with such-and-such values for the real and imaginary
parts of the resonance coefficients. 

\section{User-level code}
\label{sec:usercode}

From the outside, GooFit code should look like ordinary,
object-oriented C++ code: The CUDA parts are hidden away
inside the engine, invisible to the user. Thus, to construct
a simple Gaussian fit, merely declare three \texttt{Variable}
objects and a \texttt{GaussianPdf} object that uses them,
and create an appropriate \texttt{UnbinnedDataSet} to fit to:
\begin{listing}
\label{listing:gaussfit}
Simple Gaussian fit.

\begin{verbatim}
int main (int argc, char** argv) {
  // Create an object to represent the observable, 
  // the number we have measured. Give it a name,
  // upper and lower bounds, and a number of bins
  // to use in numerical integration. 
  Variable* xvar = new Variable("xvar", -5, 5); 
  xvar->numbins = 1000; 

  // A data set to store our observations in.
  UnbinnedDataSet data(xvar);

  // "Observe" ten thousand events and add them
  // to the data set, throwing out any events outside
  // the allowed range. In a real fit this step would
  // involve reading a previously created file of data
  // from an _actual_ experiment. 
  TRandom donram(42); 
  for (int i = 0; i < 10000; ++i) {
    fptype val = donram.Gaus(0.2, 1.1);
    if (fabs(val) > 5) {--i; continue;} 
    data.addEvent(val); 
  }

  // Variables to represent the mean and standard deviation
  // of the Gaussian PDF we're going to fit to the data.
  // They take a name, starting value, optional initial 
  // step size and upper and lower bounds. Notice that
  // here only the mean is given a step size; the sigma
  // will use the default step of one-tenth of its range.
  Variable* mean = new Variable("mean", 0, 1, -10, 10);
  Variable* sigm = new Variable("sigm", 1, 0.5, 1.5); 

  // The actual PDF. The Gaussian takes a name, an independent
  // (ie observed) variable, and a mean and width. 
  GaussianPdf gauss("gauss", xvar, mean, sigm); 

  // Copy the data to the GPU. 
  gauss.setData(&data);

  // A class that talks to MINUIT and GooFit. It needs
  // to know what PDF it should set up in MINUIT. 
  FitManager fitter(&gauss); 

  // The actual fit. 
  fitter.fit(); 
  return 0;
}
\end{verbatim}
\end{listing} 

Notice that, behind the scenes, GooFit assumes that there
will be exactly one top-level PDF and data set; it is not
advised to break this assumption unless you know what you
are doing and exactly how you are getting around it. 

\subsection{Data sets}

To create a data set with several dimensions, supply a \texttt{vector}
of \texttt{Variables}:
\begin{verbatim}
vector<Variable*> vars;
Variable* xvar = new Variable("xvar", -10, 10);
Variable* yvar = new Variable("yvar", -10, 10);
vars.push_back(xvar);
vars.push_back(yvar);
UnbinnedDataSet data(vars);
\end{verbatim}
In this case, to fill the data set, set the \texttt{Variable}
values and call the \texttt{addEvent} method without arguments:
\begin{verbatim}
xvar->value = 3;
yvar->value = -2;
data.addEvent();
\end{verbatim}
This will add an event with the current values of the \texttt{Variable}
list to the data set. In general, where an unknown number of arguments
are wanted, GooFit prefers to use a \texttt{vector} of pointers. 

\subsection{Fit types} 

By default, GooFit will do an unbinned maximum-likelihood fit, where
the goodness-of-fit metric that is minimised\footnote{For historical reasons, 
MINUIT always minimises rather than maximising.} is 
the negative sum of logarithms of 
probabilities, which is equivalent to maximising the joint
overall probability: 
\begin{eqnarray}
{\cal P} &=& -2\sum\limits_{events} \log(P_i)
\end{eqnarray}
where $P_i$ is the PDF value for event $i$. 

To get a binned fit, you should create a \texttt{BinnedDataSet} instead of the
\texttt{UnbinnedDataSet}; the procedure is otherwise the same. Notice that
the \texttt{BinnedDataSet} will use the number of bins that its constituent
\texttt{Variable}s have at the moment of its creation. Supplying a \texttt{BinnedDataSet}
%to a GooPdf (which is the base class of all the \texttt{FooPdf}
to a \texttt{GooPdf} (which is the base class of all the \texttt{FooPdf}
classes such as \texttt{GaussianPdf}) will, by default, make it
do a binned negative-log-likelihood fit, in which the goodness-of-fit criterion
is the sum of the logarithms of the Poisson probability of each bin:
\begin{eqnarray}
{\cal P} &=& -2*\sum\limits_{bins}(N * \log(E) - E)
\end{eqnarray}
where $E$ is the expected number of events in a bin and $N$ is
the observed number. 

There are two non-default variants of binned fits: A chisquare
fit where the error on a bin entry is taken as the square root
of the number of observed entries in it (or 1 if the bin is empty),
and a ``bin error'' fit where the error on each bin is supplied
by the \texttt{BinnedDataSet}. To do such a fit, in addition to
supplying the \texttt{BinnedDataSet} (and providing the errors
through the \texttt{setBinError} method in the case of the bin error fit),
you should create a suitable \texttt{FitControl} object and send
it to the top-level \texttt{GooPdf}:
\begin{verbatim}
decayTime = new Variable("decayTime", 100, 0, 10); 
BinnedDataSet* ratioData = new BinnedDataSet(decayTime); 
for (int i = 0; i < 100; ++i) {
  ratioData->SetBinContent(getRatio(i));
  ratioData->SetBinError(getError(i));
}

vector<Variable*> weights;
weights.push_back(new Variable("constaCoef", 0.03, 0.01, -1, 1));
weights.push_back(new Variable("linearCoef", 0, 0.01, -1, 1));
weights.push_back(new Variable("secondCoef", 0, 0.01, -1, 1));

PolynomialPdf* poly;
poly = new PolynomialPdf("poly", decayTime, weights); 
poly->setFitControl(new BinnedErrorFit()); 
poly->setData(ratioData); 
\end{verbatim}
The \texttt{FitControl} classes are \texttt{UnbinnedNLLFit} (the default), 
\texttt{BinnedNLLFit} (the default for binned fits), \texttt{BinnedErrorFit}
and \texttt{BinnedChisqFit}. 

\section{Creating new PDF classes}
\label{sec:newpdfs}

The simplest way to create a new PDF is to take the existing 
\texttt{GaussianPdf} class as a template. The existence of
a \texttt{FooPdf.cu} file in the \texttt{FPOINTER} directory
is, because of Makefile magic, sufficient to get the \texttt{Foo} PDF
included in the GooFit library. However, a certain amount of boilerplate
is necessary to make the PDF actually work. First of all, it needs a 
device-side function with a particular signature:
\begin{listing}
\label{listing:fsign}
Signature of evaluation functions.

\begin{verbatim}
__device__ fptype device_Gaussian (fptype* evt, 
                                   fptype* p, 
                                   unsigned int* indices); 
\end{verbatim}
\end{listing}

Notice that this is a standalone function, not part of any class; \texttt{nvcc}
does not play entirely nicely with device-side polymorphism, which is why we organise the
code using a table of function pointers - a poor man's implementation of the
virtual-function lookups built into C++. Second, we need a pointer to the 
evaluation function:
\begin{verbatim}
__device__ device_function_ptr ptr_to_Gaussian = device_Gaussian; 
\end{verbatim}
where \texttt{device\_function\_ptr} is defined (using \texttt{typedef}) as a pointer
to a function with the signature shown in listing \ref{listing:fsign}:
\begin{verbatim}
typedef fptype (*device_function_ptr) (fptype*, 
                                       fptype*, 
                                       unsigned int*);
\end{verbatim}
This pointer\footnote{You might ask, why not copy the
function directly? The reason is that \texttt{cudaMemcpy} doesn't
like to get the address of a function, but \texttt{nvcc} is perfectly
happy to statically initialise a pointer. It's a workaround, in other
words.} 
will be copied into the \texttt{device\_function\_table} array,
and its index in that array is the PDF's internal representation of ``my evaluation
function''. 

Finally, the new PDF needs a bog-standard C++ class definition, extending
the \texttt{GooPdf} superclass, which will allow it to be instantiated
and passed around in user-level code. Section \ref{subsec:indexarray} discusses
what should happen in the constructor; otherwise the class may have any supporting
paraphernalia that are necessary or useful to its evaluation - caches, 
lists of components, pointers to device-side constants, whatever. 

\subsection{The indices array}
\label{subsec:indexarray}

The heart of a PDF's organisation is its index array, which 
appears in the arguments to its device-side evaluation function
as \verb|unsigned int* indices|. The index array stores the position
of the parameters of the PDF within the global parameter array;
this allows different PDFs to share the same parameters, as in two
Gaussians with a common mean. It also stores the position of the
event variables, sometimes called observables, within the event array
passed to the evaluation function; this is the argument \verb|fptype* evt|.

The index array is created by the constructor of a PDF class; in particular, 
the constructor should call \verb|registerParameter| 
so as to obtain the global indices of its parameters, store these
numbers in a \verb|vector<unsigned int>| (conventionally called \verb|pindices|),
and pass this \verb|vector| to \verb|initialise|. The PDF constructor should
also call \verb|registerObservable| on each of the event variables it depends
on. 

The \verb|initialise| method constructs the array that is used on the GPU side,
which consists of four parts. First is stored the number of parameters,
which is equal to the size of the \verb|pindices vector|. Next come the indices
of the parameters, in the order they were put into \verb|pindices|. Then comes
the number of observables, and finally the indices of the observables, again
in the order they were registered. 

An example may be useful at this point. Consider the simple Gaussian PDF constructor:
\begin{verbatim}
GaussianPdf::GaussianPdf (std::string n, 
                          Variable* _x, 
                          Variable* mean, 
                          Variable* sigma) 
  : GooPdf(_x, n) 
{
  std::vector<unsigned int> pindices;
  pindices.push_back(registerParameter(mean));
  pindices.push_back(registerParameter(sigma));
  MEMCPY_FROM_SYMBOL((void**) &host_fcn_ptr, 
                       ptr_to_Gaussian, 
                       sizeof(void*));
  initialise(pindices); 
}
\end{verbatim}
This is almost the simplest possible PDF: Two parameters, one observable, 
no messing about! Notice that the call to \verb|registerObservable| is done
in the parent \verb|GooPdf| constructor - this saves some boilerplate
in the constructors of one-observable PDFs. For the second and subsequent
observables the calls should be done manually. The device-side index array
for the Gaussian, assuming it is the only PDF in the system, looks like this:
\begin{verbatim}
index  0 1 2 3 4
value  2 0 1 1 0
\end{verbatim}
Here the initial 2 is the number of parameters - mean and sigma. Then come their
respective indices; since by assumption the Gaussian is the only PDF we're constructing,
these will simply be 0 and 1. Then comes the number of observables, which is 1, and
finally the index of the observable - which, as it is the only observable registered,
must be 0. Now we can consider how the device-side code makes use of this:
\begin{verbatim}
__device__ fptype device_Gaussian (fptype* evt, 
                                   fptype* p, 
                                   unsigned int* indices) {
  fptype x = evt[indices[2 + indices[0]]]; 
  fptype mean = p[indices[1]];
  fptype sigma = p[indices[2]];

  fptype ret = EXP(-0.5*(x-mean)*(x-mean)/(sigma*sigma));
  return ret; 
}
\end{verbatim}
The calculation of the Gaussian is straightforward enough, but let's look
at where the numbers \verb|mean, sigma| and especially \verb|x| come from. 
The function is passed a pointer to the particular event it is to calculate
the value for, a global parameter array, and the index array. The parameter
array, in the case of a single Gaussian, consists simply of the values for
the mean and sigma in the current MINUIT iteration. Let us replace the index
lookups in those lines with their values from above:
\begin{verbatim}
  fptype mean = p[0];
  fptype sigma = p[1]; 
\end{verbatim}
which is exactly what we want. The fetching of \verb|x| appears a little
more formidable, with its double \verb|indices| lookup; it calls for
some explanation. First, \verb|indices[0]| is the number of parameters
of the function; we want to skip ahead by this number to get to the `event'
part of the array. In the Gaussian, this is known at compile-time to be
2, but not every PDF writer is so fortunate; a polynomial PDF, for example,
could have an arbitrary number of parameters. (Or it might specify a maximum
number, say 10, and in most cases leave seven or eight of them fixed at zero - 
but then there would be a lot of wasted multiplications-by-zero and additions-of-zero.) 
Thus, as a convention, lookups of event variables should always use 
\verb|indices[0]| even if the coder knows what that number is going to be.
Then, 2 must be added to this number to account for the space taken
by the number-of-parameters and number-of-observables entries in the array.
So, replacing the first level of lookup by the values, we have:
\begin{verbatim}
  fptype x = evt[indices[4]]; 
\end{verbatim}
and \verb|indices[4]| is just 0; so in other words, \verb|x| is the first
observable in the event. In the case of the single Gaussian, it is
also the \emph{only} observable, so we've done quite a bit of work to
arrive at a zero that we knew from the start; but in more complex fits this would not be true. 
The \verb|x| variable could be observable number 5, for all we know
to the contrary in the general case. Likewise the mean and sigma
could be stored at positions 80 and 101 of the global parameter array. 

\subsection{Constants}

There are two ways of storing constants, or three if we count
registering a \texttt{Variable} as a parameter and telling MINUIT to keep
it fixed. For integer constants, we may simply store them in the
index array; since it is up to the programmer to interpret the
indices, there is no rule that says it absolutely must be taken
as an offset into the global parameter array! An index can also
store integers for any other purpose - the maximum degree of a 
polynomial, flagging the use of an optional parameter, or anything
else you can think of. Indeed, this is exactly
what the framework does in enforcing the convention that the first
number in the index array is the number of parameters. 

However, this will not serve for non-integer-valued constants. They must either
go through MINUIT as fixed parameters, or else go into the \texttt{functorConstants}
array. \verb|functorConstants| works just like the global parameters
array, except that it does not update on every MINUIT iteration since
it is meant for storing constants. To use it, you should first reserve
some space in it using the \verb|registerConstants| method, which takes
the number of constants you want as an argument and returns the index
of the first one. Usually you will want to put that index in the \verb|pindices|
array. For example, suppose I want to store $\sqrt{2\pi}$ as a constant
for use in the Gaussian. Then I would modify the constructor thus:
\begin{verbatim}
__host__ GaussianPdf::GaussianPdf (std::string n, 
                                   Variable* _x, 
                                   Variable* mean, 
                                   Variable* sigma) 
  : GooPdf(_x, n) 
{
  std::vector<unsigned int> pindices;
  pindices.push_back(registerParameter(mean));
  pindices.push_back(registerParameter(sigma));

  pindices.push_back(registerConstants(1)); 
  fptype sqrt2pi = SQRT(2*M_PI);
  MEMCPY_TO_SYMBOL(functorConstants, &sqrt2pi, sizeof(fptype), 
                     cIndex*sizeof(fptype), cudaMemcpyHostToDevice); 

  MEMCPY_FROM_SYMBOL((void**) &host_fcn_ptr, ptr_to_Gaussian, sizeof(void*));
  initialise(pindices); 
}
\end{verbatim}
Notice the member variable \verb|cIndex|, which is set (and returned) by \verb|registerConstants|; it is the
index of the first constant belonging to this object. To extract my constant for
use in the device function, I look it up as though it were a parameter, but the
target array is \verb|functorConstants| instead of the passed-in \verb|p|:
\begin{verbatim}
__device__ fptype device_Gaussian (fptype* evt, 
                                   fptype* p, 
                                   unsigned int* indices) {
  fptype x = evt[indices[2 + indices[0]]]; 
  fptype mean = p[indices[1]];
  fptype sigma = p[indices[2]];
  fptype sqrt2pi = functorConstants[indices[3]];

  fptype ret = EXP(-0.5*(x-mean)*(x-mean)/(sigma*sigma));
  ret /= sqrt2pi; 
  return ret; 
}
\end{verbatim}

If I had registered two constants instead of one, the second one would be looked
up by \verb|functorConstants[indices[3] + 1]|, not the \texttt{functorConstants[indices[4]]}
one might naively expect. This is because the constant is stored next to the first one
registered, but its \emph{index} is not stored at all; it has to be calculated from the
index of the first constant. Thus the \verb|+1| must go outside the indices lookup,
not inside it! Keeping the levels of indirection
straight when constructing this sort of code calls for some care and attention. 

Note that \verb|functorConstants[0]| is reserved for the number of events
in the fit. 

\section{Program flow} 
\label{sec:engine}

This section narrates the course of a fit after it is created, passing through
MINUIT and the core GooFit engine. In particular, we will consider the example
Gaussian fit shown in listing \ref{listing:gaussfit} and look at what happens
in these innocent-looking lines:
\begin{listing}
\label{listing:actualfit}
Data transfer and fit invocation.

\begin{verbatim}
  gauss.setData(&data);
  FitManager fitter(&gauss); 
  fitter.fit(); 
\end{verbatim}
\end{listing} 

\subsection{Copying data} 

The \texttt{setData} method copies the contents of the supplied \texttt{DataSet}
to the GPU:
\begin{listing}
\label{listing:setData}
Internals of the setData method.

\begin{verbatim}
setIndices();
int dimensions = observables.size();
numEntries = data->getNumEvents(); 
numEvents = numEntries; 

fptype* host_array = new fptype[numEntries*dimensions];
for (int i = 0; i < numEntries; ++i) {
  for (obsIter v = obsBegin(); v != obsEnd(); ++v) {
    fptype currVal = data->getValue((*v), i);
    host_array[i*dimensions + (*v)->index] = currVal; 
  }
}

gooMalloc((void**) &dev_event_array, dimensions*numEntries*sizeof(fptype)); 
cudaMemcpy(dev_event_array, host_array, dimensions*numEntries*sizeof(fptype), cudaMemcpyHostToDevice);
MEMCPY_TO_SYMBOL(functorConstants, &numEvents, sizeof(fptype), 0, cudaMemcpyHostToDevice); 
delete[] host_array; 
\end{verbatim}
\end{listing} 
Notice the call to \texttt{setIndices}; this is where the indices
of observables passed to the PDF are decided and copied into the indices
array. This step cannot be done before all the subcomponents of the 
PDF have had a chance to register their observables. Hence \texttt{setData}
should be called only after the creation of all PDF components, and only
on the top-level PDF. 

The array thus created has the simple structure \verb|x1 y1 z1 x2 y2 z2 ... xN yN zN|,
that is, the events are laid out contiguously in memory, each event consisting simply
of the observables, in the same order every time. Notice that if the \texttt{DataSet} contains \texttt{Variable}s
that have not been registered as observables, they are ignored. 
If \texttt{setData}
is called with an \texttt{BinnedDataSet} object, the procedure is similar
except that each `event' consists of the coordinates of the bin center, the
number of events in the bin, and either the bin error or the bin size. We will
see later how the engine uses the \texttt{dev\_event\_array} either as a list
of events or a list of bins. 

\subsection{MINUIT setup} 

Having copied the data to the GPU, the next task is to create the MINUIT object
that will do the actual fit; this is done by creating a \texttt{FitManager} object,
with the top-level PDF as its argument, and calling its \texttt{fit} method. 
The \texttt{fit} method does two things: First it calls the \texttt{getParameters}
method of the supplied PDF, which recursively gets the registered parameters of
all the component PDFs, and from the resulting list of \texttt{Variable}s it creates
MINUIT parameters by calling \texttt{DefineParameter}. Second, it sets the method
\texttt{FitFun} to be MINUIT's function-to-minimise, and calls MINUIT's \texttt{mnmigr}
method. 

A few variants on the above procedure exist. Most obviously, ROOT contains three 
implementations of the MINUIT algorithm, named \texttt{TMinuit}, \texttt{TMinuit2}, 
and \texttt{TVirtualFitter}\footnote{These are, respectively, ancient FORTRAN code
translated line-by-line into C++, almost literally by the addition of semicolons; 
someone's obsessively-detailed object-oriented
implementation of the same algorithm, with the same spaghetti logic chopped into
classes instead of lines of code; and what seems to be intended as a common interface
for a large number of possible fitting backends, which falls a little flat since it
has only the MINUIT backend to talk to. You pays your money and takes your choice.}. 
One can switch between these by setting the constant 
\texttt{MINUIT\_VERSION} in FitManager.hh to, respectively, 1, 2, and 3. The interfaces differ,
but the essential procedure is the one described above: Define parameters, set
function-to-minimise, run MIGRAD. (NB: As of v0.2, GooFit has not recently been
tested with \texttt{MINUIT\_VERSION} set to 2 or 3.) In the case of \texttt{TMinuit},
one can call \texttt{setMaxCalls} to override the usual MINUIT limitation on the
number of iterations, although my experience is that this is not usually helpful
because running into the iteration limit tends to indicate a deeper problem with
the fit. Finally, the underlying \texttt{TMinuit} object is available through the
\texttt{getMinuitObject} method, allowing fine-grained control of what MINUIT does, 
for example by calling \texttt{mnhess} in place of \texttt{mnmigr}.

\subsection{PDF evaluation}

We have copied the data to the GPU, set up MINUIT, and invoked \texttt{mnmigr}. 
Program flow now passes to MINUIT, which for purposes of this documentation
is a black box, for some time; it returns to GooFit by calling the \texttt{FitFun}
method with a list of parameters for which MINUIT would like us to evaluate the NLL. 
\texttt{FitFun} translates MINUIT indices into GooFit indices, and calls \texttt{copyParams}, 
which eponymously copies the parameter array to \texttt{cudaArray} on the GPU. 
\texttt{FitFun} then returns the value from \texttt{GooPdf::calculateNLL}
to MINUIT, which absorbs the number into its inner workings and eventually comes
back with another set of parameters to be evaluated. Control continues to pass back
and forth in this way until MINUIT converges or gives up, or until GooFit crashes. 

The \texttt{calculateNLL} method does two things: First it calls the \texttt{normalise}
function of the PDF, which in turn will usually recursively normalise the components;
the results of the \texttt{normalise} call are copied into the \texttt{normalisationFactors}
array on the GPU. Next it calls \texttt{sumOfNll} and returns the resulting value.
Particular PDF implementations may override \texttt{sumOfNll}; most notably
\texttt{AddPdf} does so in order to have the option of returning
an `extended' likelihood, with a term for the Poisson probability of the observed
number of events in addition to the event probabilities. 

The \texttt{normalise} method, by default, simply evaluates the PDF at a grid of points,
returning the sum of all the values multiplied by the grid fineness - a primitive algorithm
for numerical integration, but one which takes advantage of the GPU's massive parallelisation.
The fineness of the grid usually depends on the \texttt{numbins} member of the observables; in
the case of the example Gaussian fit in listing \ref{listing:gaussfit}, the PDF will be evaluated at 1000 points, evenly
spaced between -5 and 5. However, this behaviour can be overridden by calling the 
\texttt{setIntegrationFineness} method of the PDF object, in which case the number of bins
(in each observable) will be equal to the supplied fineness. 

Stripped of complications, the essential part of the \texttt{normalise} function
is a call to \texttt{transform\_reduce}:
\begin{listing}
\label{listing:normalisation}
Normalisation code.

\begin{verbatim}
  fptype dummy = 0; 
  static plus<fptype> cudaPlus;
  constant_iterator<fptype*> arrayAddress(normRanges); 
  constant_iterator<int> eventSize(observables.size());
  counting_iterator<int> binIndex(0); 

  fptype sum = transform_reduce(make_zip_iterator(
                                 make_tuple(binIndex, 
                                            eventSize, 
                                            arrayAddress)),
                                make_zip_iterator(
                                 make_tuple(binIndex + totalBins, 
                                            eventSize, 
                                            arrayAddress)),
				*logger, dummy, cudaPlus); 
\end{verbatim}
\end{listing}
Here \texttt{normRanges} is an array of triplets \verb|lower, upper, bins| for each observable, created
by the \texttt{generateNormRanges} method. The member \texttt{logger} points to an instance of the
\texttt{MetricTaker} class, which has an operator method that Thrust will invoke on each bin index between
the initial value of zero and the final value of \texttt{totalBins$-$1}. This operator method, which is invoked once per
thread with a separate (global) bin number for each invocation, calculates
the bin center and returns the value of the PDF at that point.
The \texttt{dummy} and \texttt{cudaPlus} variables
merely indicate that Thrust should add (rather than, say, multiply) all the returned
values, and that it should start the sum at zero. The \texttt{normalisation} method
returns this sum, but stores its inverse in the \texttt{host\_normalisation} array
that will eventually be copied to \texttt{normalisationFactors} on the GPU; this is to allow the micro-optimisation of 
multiplying by the inverse rather than dividing in every thread. 

PDF implementations may override the \texttt{normalisation} method, and among the default
PDFs, both \texttt{AddPdf} and \texttt{ProdPdf} do so to
ensure that their components are correctly normalised. Among the more specialised
implementations, \texttt{TddpPdf} overrides \texttt{normalise} so that
it may cache the slowly-changing Breit-Wigner calculations, and also because its time dependence
is analytically integrable and it is a good optimisation to do only the Dalitz-plot
part numerically. This points to a more general rule, that once a PDF depends on three
or four observables, the relatively primitive numerical integration outlined above may
become unmanageable because of the number of points it creates. Finally, note that PDFs
may, without overriding \texttt{normalise}, advertise an analytical integral by overriding
\texttt{GooPdf}'s \texttt{hasAnalyticIntegral} method to return \texttt{true}, 
and then implementing an \texttt{integrate} method to be evaluated on the CPU. 

The \texttt{logger} object will appear again in the actual PDF evaluation,
performing a very similar function, so it is worth taking a moment to 
consider in detail exactly what the \texttt{transform\_reduce} call does. 
The first two parameters (involving \texttt{make\_tuple} calls) define 
the range of evaluation: In this case, global bins\footnote{
A global bin ranges from 0 to $n_1n_2\ldots n_N-1$ where $n_j$ is the number
of bins in the $j$th variable and $N$ is the number of variables. In two dimensions,
with three bins in each of $x$ and $y$, the global bin is given by $3b_y+b_x$,
where $b_{x,y}$ is the bin number in $x$ or $y$ respectively, as shown here:
\begin{displaymath}
\begin{array}{l|ccc}
2 & 6 & 7 & 8 \\
1 & 3 & 4 & 5 \\
0 & 0 & 1 & 2 \\
\hline
  & 0 & 1 & 2 
\end{array} 
\end{displaymath}
where the leftmost column and bottom row indicate the $y$ and $x$ bin number. 
}
 0 through $N-1$. They also specify which \texttt{operator} method of 
\texttt{MetricTaker} should be called: It is the one which takes as arguments
two integers (the bin index and event size) and an \texttt{fptype} array (holding the \texttt{normRanges} values), in 
that order. Conceptually, Thrust will create one thread for each unique value
of the iterator range thus created - that is, one per global bin - and have each
thread invoke the indicated \texttt{operator} method. As a matter of organisation
on the physical chip, it is likely that Thrust will actually create a thousand or
so threads and have each thread evaluate as many bins as needed; but at any rate,
the \verb|operator(int, int, fptype*)| method will be called once per global bin.
The last two arguments indicate that the return value should be calculated as 
the sum of the return values from each \texttt{operator} invocation, and that the sum
should start at zero. Finally, the \verb|*logger| argument indicates the specific
\texttt{MetricTaker} object to use, which is important because this is where the
function-pointer and parameter indices are stored. 

The \texttt{operator} does two things: First it calculates the bin centers,
in each observable, of the global bin:
\begin{listing}
\label{listing:bincenter}
Bin-center calculation.

\begin{verbatim}
MEM_SHARED fptype binCenters[1024*MAX_NUM_OBSERVABLES];

// To convert global bin number to (x,y,z...) coordinates: 
// For each dimension, take the mod with the number of bins 
// in that dimension. Then divide by the number of bins, in 
// effect collapsing so the grid has one fewer dimension. 
// Rinse and repeat. 

int offset = threadIdx.x*MAX_NUM_OBSERVABLES;
unsigned int* indices = paramIndices + parameters;
for (int i = 0; i < evtSize; ++i) {
  fptype lowerBound = thrust::get<2>(t)[3*i+0];
  fptype upperBound = thrust::get<2>(t)[3*i+1];
  int numBins    = (int) FLOOR(thrust::get<2>(t)[3*i+2] + 0.5); 
  int localBin = binNumber % numBins;

  fptype x = upperBound - lowerBound; 
  x /= numBins;
  x *= (localBin + 0.5); 
  x += lowerBound;
  binCenters[indices[indices[0] + 2 + i]+offset] = x; 
  binNumber /= numBins;
}
\end{verbatim}
\end{listing}
in the straightforward way, and stores the bin centers in a \emph{fake event}.
Since events are just lists of observables, all that's necessary is to keep track
of which part of the \verb|MEM_SHARED|\footnote{That is, \texttt|\_\_shared\_\_| for the default CUDA target.} 
\texttt{binCenters} array is owned by this
thread, look up the index-within-events of each observable, and set the entries
of the locally-owned part of \texttt{binCenters} accordingly. This fake event is then
sent to the PDF for evaluation:
\begin{verbatim}
fptype ret = callFunction(binCenters+offset, 
                          functionIdx, 
                          parameters); 
\end{verbatim}
where \texttt{callFunction} is just a wrapper for looking up the function referred to
by \texttt{functionIdx} and calling it with the right part of the parameter array:
\begin{listing}
\label{listing:callfunction}
Code to call device-side PDF implementations (some lines
broken up for clarity). 

\begin{verbatim}
__device__ fptype callFunction (fptype* eventAddress, 
                                unsigned int functionIdx, 
                                unsigned int paramIdx) {
  void* rawPtr = device_function_table[functionIdx];
  device_function_ptr fcn;
  fcn = reinterpret_cast<device_function_ptr>(rawPtr);
  return (*fcn)(eventAddress, 
                cudaArray, 
                paramIndices + paramIdx);
}
\end{verbatim}
\end{listing}
This, finally, is where the \verb|__device__| function from the PDF definitions
in section \ref{sec:newpdfs} is called; we have now connected all this engine
code with the evaluation code for the Gaussian, Breit-Wigner, polynomial, sum of
functions, or whatever calculation we happen to be doing today. 

Having found the integral of the PDF, either using fake events as outlined above
or with an analytic calculation, we are now ready to find the actual NLL, or sum of
chi-squares, or other goodness-of-fit metric, using the actual, observed events that
we copied across in \texttt{setData}. The procedure is similar to that for the normalisation:
\begin{listing}
\label{listing:nlleval}
Goodness-of-fit evaluation.

\begin{verbatim}
transform_reduce(make_zip_iterator(make_tuple(eventIndex, 
                                              arrayAddress, 
                                              eventSize)),
                 make_zip_iterator(make_tuple(eventIndex + numEntries, 
                                              arrayAddress, 
                                              eventSize)),
                 *logger, dummy, cudaPlus);   
\end{verbatim}
\end{listing}

Here the \verb|*logger|, \verb|dummy|, and \verb|cudaPlus| arguments are doing
the same jobs as before. The tuple arguments, however, differ: In particular,
they are now indicating the range 0 to $N-1$ in \emph{events}, not bins, and
\verb|arrayAddress| this time points to the array of events, not to a set of 
normalisation triplets from which bin centers can be calculated. Since the order
of the arguments differs - it is now \verb|int, fptype*, int| - a different 
\texttt{operator} method is called: 
\begin{listing}
\label{listing:maineval} 
Main evaluation operator (some lines broken up for clarity). 

\begin{verbatim}
__device__ fptype MetricTaker::operator () 
  (thrust::tuple<int, fptype*, int> t) const {
  // Calculate event offset for this thread. 
  int eventIndex = thrust::get<0>(t);
  int eventSize  = thrust::get<2>(t);
  fptype* eventAddress = thrust::get<1>(t);
  eventAddress += (eventIndex * abs(eventSize)); 

  // Causes stack size to be statically undeterminable.
  fptype ret = callFunction(eventAddress, functionIdx, parameters);

  // Notice assumption here! For unbinned fits the 
  // eventAddress pointer won't be used in the metric, 
  // so it doesn't matter what it is. For binned fits it 
  // is assumed that the structure of the event is 
  // (obs1 obs2... binentry binvolume), so that the array
  // passed to the metric consists of (binentry binvolume). 
  void* fcnAddr = device_function_table[metricIndex];
  device_metric_ptr fcnPtr;
  fcnPtr = reinterpret_cast<device_metric_ptr>(fcnAddr);
  eventAddress += abs(eventSize)-2;
  ret = (*fcnPtr)(ret, eventAddress, parameters);
  return ret; 
}
\end{verbatim}
\end{listing}

Observe that, for binned events, \texttt{eventSize} is negative; in this case
the event array looks like \verb|x1 y1 n1 v1 x2 y2 n2 v2 ... xN yN nN vN| where
\texttt{x} and \texttt{y} are bin centers, \texttt{n} is the number of entries, 
and \texttt{v} is the bin volume or error. This does not matter for the PDF evaluation
invoked by \texttt{callFunction}, which will just get a pointer to the start of
the event and read off the bin centers as event variables; hence the \verb|abs(eventSize)|
in the calculation of the event address allows binned and unbinned PDFs to be 
treated the same. However, it very much does matter for the goodness-of-fit metric.
Suppose the fit is the default NLL: Then all the operator needs to do at this
point is take the logarithm of what the PDF returned, multiply by -2, and be
on its way. But if it is a chi-square fit, then it must calculate the expected
number of hits in the bin, which depends on the PDF value, the bin volume,
and the total number of events\footnote{This is why \texttt{functorConstants[0]} is reserved for that value!},
subtract the observed number, square, and divide by the observed
number. Hence there is a second function-pointer lookup, but now the \verb|void*|
stored in \verb|device_function_table| is to be interpreted as a different kind
of function - a ``take the metric'' function rather than a ``calculate the PDF''
function. The \verb|metricIndex| member of \verb|MetricTaker| is set by the \texttt{FitControl}
object of the PDF; it points to one of the \texttt{calculateFoo} functions:
\begin{listing}
\label{listing:metrics}
Metric-taking functions.

\begin{verbatim}
__device__ fptype calculateEval (fptype rawPdf, 
                                 fptype* evtVal, 
                                 unsigned int par) {
  // Just return the raw PDF value, for use 
  // in (eg) normalisation. 
  return rawPdf; 
}

__device__ fptype calculateNLL (fptype rawPdf, 
                                 fptype* evtVal, 
                                 unsigned int par) {
  rawPdf *= normalisationFactors[par];
  return rawPdf > 0 ? -LOG(rawPdf) : 0; 
}

__device__ fptype calculateProb (fptype rawPdf, 
                                 fptype* evtVal, 
                                 unsigned int par) {
  // Return probability, ie normalised PDF value.
  return rawPdf * normalisationFactors[par];
}

__device__ fptype calculateBinAvg (fptype rawPdf, 
                                 fptype* evtVal, 
                                 unsigned int par) {
  rawPdf *= normalisationFactors[par];
  rawPdf *= evtVal[1]; // Bin volume 
  // Log-likelihood of numEvents with expectation of exp 
  // is (-exp + numEvents*ln(exp) - ln(numEvents!)). 
  // The last is constant, so we drop it; and then multiply 
  // by minus one to get the negative log-likelihood. 
  if (rawPdf > 0) {
    fptype expEvents = functorConstants[0]*rawPdf;
    return (expEvents - evtVal[0]*log(expEvents)); 
  }
  return 0; 
}

__device__ fptype calculateBinWithError (fptype rawPdf, 
                                 fptype* evtVal, 
                                 unsigned int par) {
  // In this case interpret the rawPdf as just a number, 
  // not a number of events. Do not divide by integral over 
  // phase space, do not multiply by bin volume, and do not 
  // collect 200 dollars. evtVal should have the structure 
  // (bin entry, bin error). 
  rawPdf -= evtVal[0]; // Subtract observed value.
  rawPdf /= evtVal[1]; // Divide by error.
  rawPdf *= rawPdf; 
  return rawPdf; 
}

__device__ fptype calculateChisq (fptype rawPdf, 
                                 fptype* evtVal, 
                                 unsigned int par) {
  rawPdf *= normalisationFactors[par];
  rawPdf *= evtVal[1]; // Bin volume 

  fptype ret = pow(rawPdf * functorConstants[0] - evtVal[0], 2);
  ret /= (evtVal[0] > 1 ? evtVal[0] : 1); 
  return ret;
}
\end{verbatim}
\end{listing}
Notice the use of \verb|normalisationFactors| in most of the metric functions,
and the special cases when the PDF or the observed number of events is zero. 

It is worth noting that the PDF evaluation function may itself call other functions,
either using \verb|callFunction| or manually casting a function index into other
kinds of functions, as in the metric calculation of listing~\ref{listing:maineval}.
For example, in \verb|DalitzPlotPdf|, each resonance may be parametrised
by a relativistic Breit-Wigner, a Gaussian, a Flatte function, or more esoteric forms;
so the main function is supplied with a list of function indices and parameter indices
for them, and interprets the \verb|void| pointer from \verb|device_function_table| as a specialised
function type taking Dalitz-plot location (rather than a generic event) as its argument. 
More prosaically, \verb|AddPdf| simply carries a list of PDF function indices
and indices of weights to assign them, and invokes \verb|callFunction| several times,
multiplying the results by its weight parameters and returning the sum. 

We have now calculated the function value that we ask MINUIT to minimise, for 
a single set of parameters; this value is passed back to MINUIT, which does its
thing and comes up with another set of parameters for us, completing the loop. 
Ends here the saga of the fit iteration; you now know the entire essential
functionality of GooFit's core engine. 

\section{Existing PDF classes}

The GooFit PDFs, like ancient Gaul, are roughly divisible into three:
\begin{itemize}
\item Basic functions, written because they are (expected to be) frequently used,
such as the Gaussian and polynomial PDFs.
\item Combiners, functions that take other functions as arguments and
spit out some combination of the inputs, for example sums and products.
\item Specialised PDFs, written for the $D^0\to\pi\pi\pi^0$ mixing analysis
that is the driving test case for GooFit's capabilities. 
\end{itemize}

In the lists below, note that all the constructors
take pointers to \texttt{Variable} objects; rather than 
repetitively repeat ``\texttt{Variable} pointer''
in a redundantly recurring manner, we just say \texttt{Variable}. 
Additionally, the first argument in every constructor is the name
of the object being created; again this is not mentioned in every
item. By convention, constructors take observables first, then parameters. 

\subsection{Basic PDFs}

Basic PDFs are relatively straightforward: They take one or more observables
and one or more parameters, and implement operations that are in some sense
`atomic' - they do not combine several functions in any way. Usually they have
a reasonably well-known given name, for example ``the threshold function'' or ``a polynomial''. 
The canonical example is the Gaussian PDF. 

\begin{itemize}
\item \texttt{ArgusPdf}: Implements a threshold function
\begin{eqnarray}
P(x;m_0,a,p) &=& \left\{ \begin{matrix}
0 & x \le m_0 \\
x\left(\frac{x^2-m_0^2}{m_0^2}\right)^p e^{a\frac{x^2-m_0^2}{m_0^2}} & x > m_0 \\
\end{matrix}
\right. 
\end{eqnarray}
where the power $p$ is, by default, fixed at 0.5. The constructor takes \texttt{Variable}s
representing $x$, $m_0$,
and $a$, followed by a boolean indicating whether the threshold is an upper or lower
bound. The equation above shows the PDF for a lower bound; for upper bounds, $x^2-m_0^2$
becomes instead $m_0^2-x^2$, and the value is zero above rather than below $m_0$. 
The constructor also takes an optional \texttt{Variable} representing the power $p$;
if not given, a default parameter with value 0.5 is created. 
\item \texttt{BifurGaussPdf}: A two-sided Gaussian, with a $\sigma$ that
varies depending on which side of the mean you are on: 
\begin{eqnarray}
P(x;m,\sigma_L,\sigma_R) &=& \left\{ \begin{matrix}
e^{-\frac{(x-m)^2}{2\sigma_L^2}} & x \le m \\
e^{-\frac{(x-m)^2}{2\sigma_R^2}} & x > m. \\
\end{matrix}
\right. 
\end{eqnarray}
The constructor takes the observable $x$, mean $m$, and left and right sigmas
$\sigma_{L,R}$.
\item \texttt{BWPdf}: A non-relativistic Breit-Wigner function,
sometimes called a Cauchy function:
\begin{eqnarray}
P(x;m,\Gamma) &=& \frac{1}{2\sqrt{\pi}}\frac{\Gamma}{(x-m)^2 + \Gamma^2/4}
\end{eqnarray}
The constructor takes the observable $x$, mean $m$, and width $\Gamma$. 
\item \texttt{CorrGaussianPdf}: A correlated Gaussian - that is, 
a function of two variables $x$ and $y$, each described by a Gaussian distribution, 
but the width of the $y$ distribution depends on $x$:
\begin{eqnarray}
P(x,y;\bar x,\sigma_x,\bar y, \sigma_y, k) &=& 
e^{-\frac{(x-\bar x)^2}{2\sigma_x^2}}e^{-\frac{(y-\bar y)^2}{2(1 + k(\frac{x-\bar x}{\sigma_x})^2)\sigma_y^2}}
\end{eqnarray}
In other words, the effective $\sigma_y$ grows quadratically
in the normalised distance from the mean of $x$, with the quadratic
term having coefficient $k$.
The constructor takes observables $x$ and $y$, means and widths $\bar x$, $\sigma_x$,
$\bar y$ and $\sigma_y$, and coefficient $k$. Notice that if $k$ is zero, 
the function reduces to a product of two Gaussians, 
$P(x,y;\bar x,\sigma_x,\bar y, \sigma_y) = G(x;\bar x, \sigma_x)G(y;\bar y, \sigma_y)$. 
\item \texttt{CrystalBallPdf}: A Gaussian with a power-law tail on one side:
\begin{eqnarray}
P(x;m,\sigma,\alpha,p) &=& \left\{ \begin{matrix}
e^{-\frac{(x-m)^2}{2\sigma^2}} & \mathrm{sg}(\alpha)\frac{x - m}{\sigma} \le \mathrm{sg}(\alpha)\alpha \\
e^{-\alpha^2/2}\left(\frac{p/\alpha}{p/\alpha - \alpha + \frac{x-m}{\sigma}}\right)^p
& \mathrm{otherwise } (\alpha\ne 0). \\
\end{matrix}
\right. 
\end{eqnarray}
The constructor takes the observable $x$, the mean $m$, width $\sigma$, cutoff $\alpha$,
and power $p$. Note that if $\alpha$ is negative, the power-law tail is on the right;
if positive, on the left. For $\alpha=0$, the function reduces to a simple Gaussian
in order to avoid $p/\alpha$ blowing up. 
\item \texttt{ExpGausPdf}: An exponential decay convolved with a Gaussian
resolution:
\begin{eqnarray}
P(t;m,\sigma,\tau) &=& e^{-t/\tau} \otimes e^{-\frac{(t-m)^2}{2\sigma^2}} \\
&=& (\tau/2)e^{(\tau/2)(2m+\tau\sigma^2-2t}\mathrm{erfc}\left(\frac{m+\tau\sigma^2-t}{\sigma\sqrt{2}}\right)
\end{eqnarray}
where $\mathrm{erfc}$ is the complementary error function. The constructor
takes the observed time $t$, mean $m$ and width $\sigma$ of the resolution,
and lifetime $\tau$. Note that the original decay function is zero for $t<0$. 
\item \texttt{ExpPdf}: A plain exponential,
\begin{eqnarray}
P(x;\alpha, x_0) &=& e^{\alpha(x-x_0)}
\end{eqnarray}
taking the observable $x$, exponential constant $\alpha$, and optional offset $x_0$.
If $x_0$ is not specified it defaults to zero. A variant constructor takes, in place
of $\alpha$, a \texttt{vector} of coefficients (in the order $\alpha_0$ to $\alpha_n$) 
to form a polynomial in the exponent:
\begin{eqnarray}
P(x;\alpha_0, \alpha_1, \ldots \alpha_n, x_0) &=& e^{\alpha_0 + \alpha_1(x-x_0) + \alpha_2(x-x_0)^2 + \ldots + \alpha_n(x-x_0)^n}
\end{eqnarray}
The offset $x_0$ is again optional and defaults to zero. 
\item \texttt{GaussianPdf}: What can I say? It's a normal distribution, the potato
of PDFs. Kind of bland, but goes with anything. National cuisines have been based on it.  
\begin{eqnarray}
P(x;m,\sigma) &=& e^-\frac{(x-m)^2}{2\sigma^2}
\end{eqnarray}
The constructor takes the observable $x$, mean $m$, and width $\sigma$. 
\item \texttt{InterHistPdf}: An interpolating histogram; in one dimension:
\begin{eqnarray}
P(x) &=& \frac{f(x, b(x))H[b(x)] + f(x, 1 + b(x))H[b(x) + 1]}{f(x, b(x)) + f(x, 1 + b(x))}
\end{eqnarray}
where $H$ is a histogram, $H[n]$ is the content of its bin with index $n$, 
$b(x)$ is a function that returns the bin number that $x$ falls into,
and $f(x, n)$ is the distance between $x$ and the center of bin $n$. 
In other words, it does linear interpolation between bins. However,
there are two complicating factors. First, the histogram may have
up to ten\footnote{On the grounds that ten dimensions should be enough for anyone!} dimensions.
Second, the dimensions may be either observables or fit parameters. 
So, for example, suppose we want to fit for the width $\sigma$ of a Gaussian
distribution, without using the potato of PDFs. We can do this by making
a two-dimensional histogram: The $x$ dimension is the observable, the $y$
is $\sigma$. Fill the histogram with the value of the Gaussian\footnote{Oops, there's
that potato after all. It's a contrived example.} at each $x$ given the $\sigma$
in that bin. Now when the fit asks the PDF, ``What is your value at $x$ given this $\sigma$?'',
the PDF responds by interpolating linearly between four bins - ones that were precalculated
with $\sigma$ values close to what the fit is asking about. For the Gaussian this is
rather un-necessary, but may save some time for computationally expensive functions. 

The constructor takes a \texttt{BinnedDataSet} representing the underlying histogram, 
a \texttt{vector} of fit parameters, and a \texttt{vector} of observables. 
\item \texttt{JohnsonSUPdf}: Another modified Gaussian. You can eat potatoes
a lot of different ways:
\begin{eqnarray}
P(x;m,\sigma,\gamma,\delta) &=&
\frac{\delta}{\sigma\sqrt{2\pi(1+\frac{(x-m)^2}{\sigma^2})}}
e^{-\frac{1}{2}\left(\gamma + \delta\log(\frac{x-m}{\sigma}+\sqrt{1+\frac{(x-m)^2}{\sigma^2}})\right)^2}
\end{eqnarray}
The constructor takes the observable $x$, mean $m$, width $\sigma$, 
scale parameter $\gamma$, and shape parameter $\delta$. 
\item \texttt{KinLimitBWPdf}: A relativistic Breit-Wigner function modified 
by a factor accounting for limited phase space\footnote{If this seems complicated, spare a thought for the hapless undergrad who
had to code the original CPU version.}; for example, in the decay $D^{*+}\to D^0\pi^+$,
the difference between the $D^*$ and $D^0$ masses is only slightly more than the pion mass.
Consequently, the distribution of $\Delta m = m(D^*) - m(D^0)$ is slightly asymmetric: The left
side of the peak, where the phase space narrows rapidly, is less likely than the right side.
\begin{eqnarray}
P(x;x_0,\Gamma,M,m) &=& \left\{ \begin{matrix}
0 & \lambda(x_0,M,m) \le 0 \\
\frac{S(x,x_0,M,m)x_0'\Gamma^2}{\left(x_0'-x'^2\right)^2 + x_0'\Gamma^2S^2(x,x_0,M,m)} & \mathrm{otherwise.}
\end{matrix}
\right. 
\end{eqnarray}
Here priming indicates addition of $M$, so that
$x'=x+M$, $x_0'=x_0+M$; the phase-space function $S$ and its supporting characters $\lambda$, $p$, and $b_W$ are given by
\begin{eqnarray}
S(x,x_0,M,m)   &=& \left(\frac{p(x,M,m)}{p(x_0,M,m)}\right)^3\left(\frac{b_W(x,M,m)}{b_W(x_0,M,m)}\right)^2 \\
b_W(x,M,m)     &=& \frac{1}{\sqrt{1 + r^2p^2(x,M,m)}}\\
p(x,M,m)       &=& \sqrt{\lambda(x,M,m)/(2x)}\\
\lambda(x,M,m) &=& \left(x'^2-(M-m)^2\right)\left(x'^2-(M+m)^2\right).
\end{eqnarray}
The radius $r$ that appears in $b_W$ (which does not stand for Breit-Wigner,
but Blatt-Weisskopf!) is hardcoded to be 1.6. 

The constructor takes the observable $x$, mean $x_0$, and width $\Gamma$. The
large and small masses $M$ and $m$, which determine the phase space, are by default
1.8645 (the $D^0$ mass) and 0.13957 (mass of a charged pion), but can be set with
a call to \texttt{setMasses}. Note that they are constants, not fit parameters. 
\item \texttt{LandauPdf}: A shape with a long right-hand tail - so long, in
fact, that its moments are not defined. If the most probable value (note that this is not
a mean) and the width are taken as 0 and 1, the PDF is
\begin{eqnarray}
P(x) &=& \frac{1}{\pi}\int_0^\infty e^{-t\log t - xt}\sin(t\pi)\mathrm{d}t
\end{eqnarray}
but the GooFit implementation
is a lookup table stolen from CERNLIB. 
The constructor takes the observable $x$, most probable value $\mu$
(which shifts the above expression) and the width $\sigma$ (which scales it). 
\item \texttt{NovosibirskPdf}: A custom shape with a long tail:
\begin{eqnarray}
P(x;m,\sigma,t) &=& 
e^{-\frac{1}{2}\left(\log^2(1+t\frac{x-m}{\sigma}\frac{\sinh(t\sqrt{\log(4)})}{\sqrt{\log(4)}})/t + t^2\right)}
\end{eqnarray}
The constructor takes the observable $x$, mean $m$, width $\sigma$, 
and tail factor $t$. If $t$ is less than $10^{-7}$, the function
returns a simple Gaussian, which probably indicates that it approximates
a Gaussian for small tail parameters, but I'd hate to have to show such
a thing. 
\item \texttt{PolynomialPdf}: If the Gaussian is the potato,
what is the polynomial? Bread? Milk? Nothing exotic, at any rate. 
The GooFit version does have some subtleties, to allow for polynomials
over an arbitrary number\footnote{Although being honest,
just supporting the special cases of one and two would likely have sufficed.}
 of dimensions:
\begin{eqnarray}
P(\vec x; \vec a, \vec x_0, N) &=&
\sum\limits_{p_1+p_2+\ldots+p_n \le N} a_{p_1p_2\ldots p_n} \prod\limits_{i=1}^n (\vec x - \vec x_0)_i^{p_i}
\end{eqnarray}
where $N$ is the highest degree of the polynomial and $n$ is the number of dimensions.
The constructor takes a \texttt{vector} of observables, denoted $\vec x$ above;
a \texttt{vector} of coefficients, $\vec a$, a \texttt{vector} of optional offsets $\vec x_0$
(if not specified, these default to zero), and the maximum degree $N$. 
The coefficients are in the order 
$a_{p_0p_0\ldots p_0}, a_{p_1p_0\ldots p_0}, \ldots a_{p_Np_0\ldots p_0}, a_{p_0p_1\ldots p_0}, a_{p_1p_1\ldots p_0}, 
\ldots a_{p_0p_0\ldots p_N}$. In other words, start at the index for the constant
term, and increment the power of the leftmost observable. Every time the sum of 
the powers reaches $N$, reset the leftmost power to zero and increment the next-leftmost.
When the next-leftmost reaches $N$, reset it to zero and increment the third-leftmost,
and so on. An example may be helpful; for two dimensions $x$ and $y$, and a maximum power
of 3, the order is $a_{00}, a_{10}, a_{20}, a_{30}, a_{01}, a_{11}, a_{21}, a_{02}, a_{12}, a_{03}$. 
This can be visualised as picking boxes out of a matrix and discarding the ones where 
the powers exceed the maximum:
\begin{displaymath}
\begin{array}{cccc}
9: x^0y^3 &    -      &    -      &    -      \\
7: x^0y^2 & 8: x^1y^2 &    -      &    -      \\
4: x^0y^1 & 5: x^1y^1 & 6: x^2y^1 &    -      \\
0: x^0y^0 & 1: x^1y^0 & 2: x^2y^0 & 3: x^3y^0 \\
\end{array} 
\end{displaymath}
starting in the lower-lefthand corner and going right, then up.

There is also a simpler version of the constructor for the case of a polynomial
with only one dimension; it takes the observable, a \texttt{vector} of coefficients,
an optional offset, and the lowest (not highest) degree of the polynomial; the latter
two both default to zero. In this case the order of the coefficients is from lowest to
highest power. 
\item \texttt{ScaledGaussianPdf}: Another Gaussian variant. This one 
moves its mean by a bias $b$ and scales its width by a scale factor $\epsilon$:
\begin{eqnarray}
P(x;m,\sigma,b,\epsilon) &=& e^{-\frac{(x+b-m)^2}{2(\sigma(1+\epsilon))^2}}.
\end{eqnarray}
This has a somewhat specialised function: It allows fitting Monte Carlo
to, for example, a sum of two Gaussians, whose means and widths are then
frozen. Then real data can be fit for a common bias and $\epsilon$.

The constructor takes the observable $x$, mean $m$, width $\sigma$, bias $b$ and
scale factor $\epsilon$. 
\item \texttt{SmoothHistogramPdf}: Another histogram, but this one does
smoothing in place of interpolation. That is, suppose the event falls in bin $N$ of
a one-dimensional histogram; then the returned value is a weighted average of bins
$N-1$, $N$, and $N+1$. For multidimensional cases the weighted average is over all
the neighbouring bins, including diagonals:
\begin{eqnarray}
P(\vec x;s;H) &=& \frac{H(\mathrm{bin}(\vec x)) + s\sum\limits_{i=\mathrm{neighbours}}\delta{i}H(i)}{1 + s\sum\limits_{i=\mathrm{neighbours}}\delta{i}}
\end{eqnarray}
where $\delta_i$ is zero for bins that fall outside the histogram limits,
and one otherwise. The constructor takes the underlying histogram $H$ (which also
defines the event vector $\vec x$) and the smoothing
factor $s$; notice that if $s$ is zero, the PDF reduces to a simple histogram lookup. 
The \texttt{BinnedDataSet} representing $H$ may be empty; in that case the lookup table should
be set later using the \texttt{copyHistogramToDevice} method. 
\item \texttt{StepPdf}: Also known as the Heaviside function. Zero up to a point, 
then 1 after that point:
\begin{eqnarray}
P(x;x_0) &=& \left\{
\begin{matrix}
0 & x \le x_0 \\ 
1 & x > x_0 
\end{matrix}
\right.
\end{eqnarray}
The constructor takes the observable $x$ and threshold $x_0$. 
\item \texttt{VoigtianPdf}: A convolution of a classical Breit-Wigner
and a Gaussian resolution:
\begin{eqnarray}
P(x;m,\sigma,\Gamma) &=& \int\limits_{-\infty}^\infty\frac{\Gamma}{(t-m)^2-\Gamma^2/4} e^{-\frac{(t-x)^2}{2\sigma^2}}\mathrm{d}t. 
\end{eqnarray}
The actual implementation is a horrible lookup-table-interpolation;
had Lovecraft been aware of this sort of thing, he would not have piffled
about writing about mere incomprehensible horrors from the depths of time.
The constructor takes the observable $x$, mean $m$, Gaussian resolution width
$\sigma$, and Breit-Wigner width $\Gamma$. 
\end{itemize}

\subsection{Combination PDFs}

These are the tools that allow GooFit to be more than a collection
of special cases. The most obvious example is a sum of PDFs - without
a class for this, you'd have to write a new PDF every time you added a Gaussian
to your fit. 

\begin{itemize}
\item \texttt{AddPdf}: A weighted sum of two or more PDFs. There are two variants,
`extended' and `unextended'. In the extended version the weights are interpreted as numbers
of events, and $N$ PDFs have $N$ weights; in the unextended version the weights are probabilities
(i.e., between 0 and 1) and $N$ PDFs have $N-1$ weights, with the probability of the last PDF
being 1 minus the sum of the weights of the others. 
\begin{eqnarray}
P(F_1,\ldots, F_n,w_1,\ldots,w_n) &=& w_1F_1 + \ldots + w_nF_n \\
P(F_1,\ldots, F_n,w_1,\ldots,w_{n-1}) &=& 
w_1F_1 + \ldots + w_{n-1}F_{n-1}\\
&&+ (1 - w_1 - \ldots - w_{n-1})F_n.
\end{eqnarray}
The constructor takes a \texttt{vector} of weights $w_i$ and a \texttt{vector}
of components $F_i$. If the two \texttt{vector}s are of equal length the extended
version is used; if there is one more component than weight, the unextended version;
anything else is an error. There is also a special-case constructor taking a single
weight and two components, to save creating the \texttt{vector}s in this common case.

Note that this PDF overrides the \texttt{sumOfNll} method; if an extended \texttt{AddPdf}
is used as a top-level PDF (that is, sent to \texttt{FitManager} for fitting), an additional
term for the number of events will be added to the NLL.

Also note that if the \texttt{AddPdf}'s options mask (set by calling
\texttt{setSpecialMask}) includes \texttt{ForceCommonNorm}, the normalisation changes.
By default the components are normalised separately, so that
\begin{eqnarray}
P(x;\vec F, \vec w) &=& \sum\limits_i \frac{w_iF_i(x)}{\int F_i(x) \mathrm{d}x},
\end{eqnarray}
but with \texttt{ForceCommonNorm} set, the integral is instead taken at
the level of the sum:
\begin{eqnarray}
P(x;\vec F, \vec w) &=& \frac{\sum\limits_i w_iF_i(x)}{\int\sum\limits_i w_iF_i(x)\mathrm{d}x}.
\end{eqnarray}
The difference is subtle but sometimes important. 
\item \texttt{BinTransformPdf}: Returns the global bin of its argument; in one dimension:
\begin{eqnarray}
P(x;l,s) &=& \mathrm{floor}\left(\frac{x-l}{s}\right)
\end{eqnarray}
where $l$ is the lower limit and $s$ is the bin size. The utility of
this is perhaps not immediately obvious; one application is as an intermediate
step in a \texttt{MappedPdf}. For example, suppose I want to 
model a $y$ distribution with a different set of parameters in five slices
of $x$; then I would use a \texttt{BinTransformPdf} to calculate
which slice each event is in. 

The constructor
takes \texttt{vector}s of the observables $\vec x$, lower bounds $\vec l$,
bin sizes $\vec b$, and number of bins $\vec n$. The last is used for converting
local (i.e. one-dimensional) bins into global bins in the case of multiple dimensions. 
\item \texttt{CompositePdf}: A chained function, 
\begin{eqnarray}
P(x) &=& h(g(x)).
\end{eqnarray}
The constructor takes the kernel function $g$ and the shell function $h$. 
Note that only one-dimensional composites are supported - $h$ cannot take more
than one argument. The core function $g$ can take any number. 
\item \texttt{ConvolutionPdf}: Numerically calculates a convolution integral
\begin{eqnarray}
P(x;f,g) &=& f\otimes g = \int\limits_{-\infty}^\infty f(t) g(x-t) \mathrm{d}t.
\end{eqnarray}
The constructor takes the observable $x$, model function $f$, and resolution function $g$.

The implementation of this function is a little complicated and relies on caching.
There is a variant constructor for cases where several convolutions may run at the same
time, eg a \texttt{MappedPdf} where all the targets are convolutions. This
variant does cooperative loading of the caches, which is a \emph{really neat} optimisation
and ought to work a lot better than it, actually, does. Its constructor takes the observable, 
model, and resolution as before, and an integer indicating how many other convolutions are going
to be using the same cache space. 
\item \texttt{EventWeightedAddPdf}: A variant of \texttt{AddPdf}, 
in which the weights are not fit parameters but rather observables. It otherwise
works the same way as \texttt{AddPdf}; the constructor takes \texttt{vector}s
of the weights and components, and it has extended and non-extended variants. Note that
you should not mix-and-match; the weights must be either all observables or all fit parameters.
\item \texttt{MappedPdf}: A function having the form
\begin{eqnarray}
F(x) &=& \left\{ \begin{matrix}
F_1(x)   & x_0 \le x \le x_1 \\
F_2(x)   & x_1 < x \le x_2 \\
(\ldots) & (\ldots)        \\
F_n(x)   & x_{n-1} < x \le x_n \\
\end{matrix}
\right. 
\end{eqnarray}
The constructor takes a \emph{mapping function} $m$, which returns an index;
and a \texttt{vector} of evaluation functions $\vec F$, so that if $m$ is zero, 
the PDF returns $F_0$, and so on. Notice that $m$ does not strictly need to 
return an integer - in fact the constraints of GooFit force it to return a floating-point
number - since \texttt{MappedPdf} will round the result to the nearest
whole number. The canonical example of a mapping function is \texttt{BinTransformPdf}.
\item \texttt{ProdPdf}: A product of two or more PDFs:
\begin{eqnarray}
P(x; \vec F) &=& \prod\limits_i F_i(x).
\end{eqnarray}
The constructor just takes a \texttt{vector} of the functions to be multiplied. 

\texttt{ProdPdf} does allow variable overlaps, that is, the components
may depend on the same variable, eg $P(x) = A(x)B(x)$. If this happens, the entire
\texttt{ProdPdf} object will be normalised together, since in general
$\int A(x)B(x) \mathrm{d}x \ne \int A(x) \mathrm{d}x \int B(x) \mathrm{d}x$.
However, if any of the components have the flag \texttt{ForceSeparateNorm} set,
as well as in the default case that the components depend on separate observables, 
each component will be normalised individually. Some care is indicated
when using the \texttt{ForceSeparateNorm} flag, and possibly a rethink of why there
is a product of two PDFs depending on the same variable in the first place. 
\end{itemize}

\subsection{Specialised amplitude-analysis functions}

These functions exist mainly for use in a specific
physics analysis, mixing in $D^0\to\pi\pi\pi^0$. Nonetheless, 
if you are doing a Dalitz-plot analysis, you may find them, and
conceivably even this documentation, helpful. 

\begin{itemize}
\item \texttt{DalitzPlotPdf}: A time-independent description
of the Dalitz plot as a coherent sum of resonances:
\begin{eqnarray}
P(m^2_{12},m^2_{13};\vec\alpha) &=& \left|\sum\limits_i \alpha_i B_i(m^2_{12},m^2_{13})\right|^2\epsilon(m^2_{12},m^2_{13})
\end{eqnarray}
where $\alpha_i$ is a complex coefficient, $B_i$ is a resonance parametrisation (see \texttt{ResonancePdf}, below),
and $\epsilon$ is a real-valued efficiency function. The constructor takes the squared-mass variables $m_{12}$ and
$m_{13}$, an event index (this is used in caching), a \texttt{DecayInfo} object which contains a \texttt{vector}
of \texttt{ResonancePdf}s as well as some global information like the mother and daughter masses, 
and the efficiency function. 
\item \texttt{DalitzVetoPdf}: Tests whether a point is in a particular 
region of the Dalitz plot, and returns zero if so, one otherwise. Intended for
use as part of an efficiency function, excluding particular regions - canonically
the one containing the  $K^0\to\pi\pi$ decay, as a large source of backgrounds that
proved hard to model. The constructor takes the squared-mass variables $m_{12}$ and
$m_{13}$, the masses (contained in \texttt{Variable}s) of the mother and three daughter
particles involved in the decay, and a \texttt{vector} of \texttt{VetoInfo} objects.
The \texttt{VetoInfo} objects just contain a cyclic index (either \texttt{PAIR\_12},
\texttt{PAIR\_13}, or \texttt{PAIR\_23}) and the lower and upper bounds of the veto region.
\item \texttt{IncoherentSumPdf}: Similar to \texttt{DalitzPlotPdf}, 
but the resonances are added incoherently:
\begin{eqnarray}
P(m^2_{12},m^2_{13};\vec\alpha) &=& \sum\limits_i \left|\alpha_i B_i(m^2_{12},m^2_{13})\right|^2\epsilon(m^2_{12},m^2_{13})
\end{eqnarray}
The constructor is the same, but note that the \texttt{amp\_imag} member of \texttt{ResonancePdf}
is not used, so the $\alpha$ are in effect interpreted as real numbers. 
\item \texttt{MixingTimeResolution\_Aux}: The abstract base class of \texttt{TruthResolution\_Aux}
and \texttt{ThreeGaussResolution\_Aux}. Represents a parametrisation of the time resolution. 
\item \texttt{ResonancePdf}: Represents a resonance-shape parametrisation, the $B_i$
that appear in the equations for \texttt{DalitzPlotPdf}, \texttt{IncoherentSumPdf},
and \texttt{TddpPdf}. Canonically a relativistic Breit-Wigner. The constructor takes
the real and imaginary parts of the coefficient $\alpha$ (note that this is actually used by
the containing function), and additional parameters depending on which function the resonance
is modelled by:
\begin{itemize}
\item Relativistic Breit-Wigner: Mass, width, spin, and cyclic index. The two last are integer constants.
Only spins 0, 1, and 2 are supported. 
\item Gounaris-Sakurai parametrisation: Spin, mass, width, and cyclic index. Notice that this
is the same list as for the relativistic BW, just a different order.
\item Nonresonant component (ie, constant across the Dalitz plot): Nothing additional.
\item Gaussian: Mean and width of the Gaussian, cyclic index. Notice that the Gaussian
takes the mass $m_{12,13,23}$ as its argument, not the squared mass $m^2_{12,13,23}$
like the other parametrisations. 
\end{itemize}
\item \texttt{TddpPdf}: If the Gaussian is a potato, this is a five-course banquet
dinner involving entire roasted animals stuffed with other animals, large dance troupes 
performing between the courses, an orchestra playing in the background, and lengthy
speeches. There will not be a vegetarian option. Without going too deeply into the physics, 
the function models a decay, eg $D^0\to\pi\pi\pi^0$, that can happen either directly or
through a mixed path $D^0\to \overline{D^0}\to\pi\pi\pi^0$. (Although developed for the
$\pi\pi\pi^0$ case, it should be useful for any decay where the final state is its own anti-state.)
The probability of the mixing
path depends on the decay time, and quantum-mechanically interferes with the direct path.
Consequently the full Time-Dependent Dalitz-Plot (Tddp) amplitude is (suppressing
the dependence on squared masses, for clarity):
\begin{eqnarray}
\label{eq:fullmix}
P(m^2_{12}, m^2_{13}, t, \sigma_t;x,y,\tau,\vec\alpha) &=&
e^{-t/\tau}\Big(|A+B|^2\cosh(yt/\tau)\\
&& + |A-B|^2\cos(xt/\tau)\\
&& - 2\Re(AB^*)\sinh(yt/\tau)\\
&& - 2\Im(AB^*)\sin(xt/\tau)\Big)
\end{eqnarray}
where (notice the reversed masses in the $B$ calculation)
\begin{eqnarray}
A &=& \sum\limits_i \alpha_iB_i(m^2_{12}, m^2_{13}) \\
B &=& \sum\limits_i \alpha_iB_i(m^2_{13}, m^2_{12}), 
\end{eqnarray}
\emph{convolved with} a time-resolution function and \emph{multiplied by} an efficiency. 
The implementation involves a large
amount of caching of the intermediate $B_i$ values, because these are expected
to change slowly relative to the coefficients $\alpha$ (in many cases, not at all, 
since masses and widths are often held constant) and are relatively expensive to calculate. 

The constructor takes the measured decay time $t$, error on decay time $\sigma_t$, 
squared masses $m^2_{12}$ and $m^2_{13}$, event number, decay information (the same
class as in \texttt{DalitzPlotPdf}; it also holds the mixing parameters
$x$ and $y$ and lifetime $\tau$), time-resolution function, efficiency, and optionally
a mistag fraction. A variant constructor takes, instead of a single time-resolution
function, a \texttt{vector} of functions and an additional observable $m_{D^0}$;
in this case the resolution function used depends on which bin of $m_{D^0}$ the event
is in, and the number of bins is taken as equal to the number of resolution functions
supplied. 

It is not suggested to try to use this thing from scratch. Start with a working
example and modify it gradually. 
\item \texttt{ThreeGaussResolution\_Aux}: A resolution function
consisting of a sum of three Gaussians, referred to as the `core', 
`tail', and `outlier' components. The constructor takes the core and
tail fractions (the outlier fraction is 1 minus the other two), core
mean and width, tail mean and width, and outlier mean and width. Notice
that this is a resolution function, so the full probability is found by
convolving Gaussians with Equation \ref{eq:fullmix}, and this runs to a page
or so of algebra involving error functions. It is beyond the scope of this
documentation. 
\item \texttt{TrigThresholdPdf}: Intended as part of an efficiency
function, modelling a gradual fall-off near the edges of phase space:
\begin{eqnarray}
P(x;a,b,t) &=& \left\{\begin{matrix}
1 & d > 1/2 \\
a + (1-a) \sin(d\pi) & \mathrm{otherwise}
\end{matrix}
\right. 
\end{eqnarray}
where $d=b(x-t)$ or $d=b(t-x)$ depending on whether the function is modelling
a lower or upper threshold. The constructor takes the observable $x$ 
(which will be either $m^2_{12}$ or $m^2_{13}$), 
threshold value $t$, trig constant $b$, linear constant $a$, and a boolean
which if true indicates an upper threshold. A variant
constructor, for modelling a threshold in the ``third'' Dalitz-plot dimension
$m^2_{23}$, takes both $m^2_{12}$ and $m^2_{13}$, and an additional mass constant $m$;
it then forms $x = m - m^2_{12} - m^2_{13}$, and otherwise does the same calculation. 
\item \texttt{TruthResolution\_Aux}: The simplest possible resolution
function, a simple delta spike at zero - i.e., time is always measured
perfectly. The constructor takes no arguments at all! 
\end{itemize}


\end{document}