# gnm: The MCMC Jagger

Rock and Rolling Monte Carlo Sampler

### Abstract

• gnm is a stable, well tested Python implementation of the affine-invariant Markov chain Monte Carlo (MCMC) sampler that uses the Gauss-Newton-Metropolis (GNM) Algorithm. The GNM algorithm is specialized in sampling probability distribution functions of the form $e^{-||f(x)||^2/2}$, where $f$ is a (non-linear) differentiable function. Rock on!

Feel free to contact me. Any questions, comments, issues or anything else. Feedback is greatly appreciated!

gnm is open sourced under the MIT LICENSE.

## Installation

### Basic Installation

The easiest way to install gnm would be to use pip:

$pip install gnm Alternatively, one can run easy_install if pip is not installed:$ easy_install gnm

### Manual Installation

$git clone https://github.com/mugurbil/gnm.git Then you can install manually by going into the gnm directory, and then runing setup.py:$ python setup.py install

### Test the Installation

To check that the package is working, you can run a quick test:

$python -c 'import gnm; gnm.test()' There should be some information on the terminal. If matplotlib is installed, a plot like Figure 1 should pop up. ### Requirements numpy: needs to be installed!! matplotlib: for the examples and plotting the results. acor: for the autocorrelation feature of the sampler. From the default packages, one will need os, setuptools (or distutils), re, sys, copy, and json. These packages likely come with any python installation. ## Introduction ### The Problem The GNM algorithm is for sampling generalizations of probability densities of the form$p(x)\propto e^{-||f(x)||^2/2}$. Here$x\in \mathbb{R}^n$is a "parameter" vector,$f:\mathbb{R}^n\to\mathbb{R}^m$is a "model" function, and$||f(x)||^2=\sum_{k=1}^m f_k^2(x)$. Typically$m>n$. GNM is a powerful algorithm for ill-conditioned problems because it is affine invariant. ### Least Squares This is related to nonlinear least squares$\min_x ||f(x)||^2$that arises in statistics. For example, if$m(x)$predicts the outcome of an experiment with parameter$x$, and$y_k$are measurements, then the goodness of fit is$\sum_{k=1}^m\frac{(m_k(x)-y_k)^2}{2\sigma_k^2}$. If$f_k(x)=\frac{m_k(x)-y_k}{\sigma_k}$, then this sum has the form$||f(x)||^2/2$. ### Generalizations One generalization to the probability density function$p(x)$is adding a Gaussian prior probability distribution on the parameters,$\pi:\mathbb{R}^n\to\mathbb{R}$, such that$\pi(x)=\frac{1}{Z}e^{-(x-m)^T H(x-m)/2}$, where$m$is the mean and$H$is the precision matrix, the inverse of the covariance. Another generalization is having an indicator function$\chi:\mathbb{R}^n\to \{0,1\}$to limit the domain of the model function. ### User Code The user of the sampler must write Python code to evaluate$\chi(x)$,$f(x)$and$\nabla f(x)$, the Jacobian. It is optional to add prior information, mean vector$m\in\mathbb{R}^n$and precision matrix$H\in\mathbb{R}^{n\times n}$(inverse of the covariance matrix). ### Example Sampling from a 2D Gaussian could be done as follows: import gnm x_0 = [0, 0] # initial guess def model(x, args): return 1, x, [[1,0], [0,1]] jagger = gnm.sampler(x_0, model, {}) jagger.sample(1000) Here, we import the package, then set up an initial guess, a model, arguments for the model (empty in this case), and then feed these to the sampler. Finally, we sample. The model has to take in the input and arguments, and in turn return a flag, a function, and the derivative (Jacobian) of the function. ## User Function ### Model The user needs to create Python code that contains the model information. This Python function, which we will call the user function, should have two inputs, and three outputs. This code will be passed to the sampler. # User defined function outline def model(x, args): ... return chi(x), f(x), J(x) ### Inputs The first input should be the parameter vector,$x\in\mathbb{R}^n$. The second input should be the arguments that the function takes, which may include the data. ### Outputs The first output is the constraint${\chi(x)}$. It is supposed to tell where the model function is defined. Thus, it would have value$1$(or True) when the function and its derivative are defined, and$0$(or False) otherwise. If$D$is the domain of$f$and$\nabla f$, then${\chi(x)}=\left\{ \begin{array}{ll} 1 & x\in D \\ 0 & x\not\in D \end{array} \right.$. The second output is an array of size$m$of$f$evaluated at$x$,$f(x)\in\mathbb{R}^m$(range), that should be convertible to a np.array. The third output is an array of size$m$by$n$that is the Jacobian matrix,$\nabla f$, evaluated at$x$,$J(x)\in\mathbb{R}^{m\times n}$. This also needs to be convertible to an np.array. ## Sampler ### Initalization The sampler needs the user function (model), arguments the user function takes (args), and the initial guess of the sampler (${\bf x\_0}$). This information is provided to the sampler during initialization. The function will be evaluated at the initial guess to check if it is defined at that point. # Sampler initialization jagger = gnm.sampler(x_0, model, args) ### Prior The prior information is optional. For a Gaussian prior distribution, mean vector$m$(m) and precision matrix$H$(H) need to be provided. # Setting the prior jagger.prior(m, H) ### Jacobian Test The Jtest method is to check whether the derivative information and the model function given in the user function match by employing numerical differentiation. It has two required inputs, six optional inputs, and one output. The domain,$D_J$, for Jtest should be a rectangle given by two vectors, the top right and the bottom left corners of the rectangle,$\textbf{x}_{\text{max}}$and$\textbf{x}_{\text{min}}$respectively. These vectors are the two required inputs. Note that they should have the same dimensions, and all values of$\textbf{x}_{\text{min}}$should be less than the corresponding values of$\textbf{x}_{\text{max}}$so that the domain is nonempty. $$D_J=\{{\bf x}: \forall i, (\textbf{x}_{\text{min}})_i<{\bf x}_i<(\textbf{x}_{\text{max}})_i\}$$ The method has one output, error, that tells whether the numerical Jacobian,$D f$, converged to the Jacobian supplied by the user function,$\nabla f$. The output is 0 if the convergence occurred. Otherwise, it is the norm of the difference of the numeric and provided Jacobians,$||D f - \nabla f||$, at the point it failed to converge. # Jtest usage error = jagger.Jtest(x_min,x_max) assert error == 0 #Is the Jacobian info correct? ### Jtest Details The six optional inputs of Jtest and their default values are given below. We then explain how the method works and where these variables come into use. Table 1: Jtest optional inputs. SYMBOL IN CODE DEFAULT VALUE$dx$dx$2\cdot10^{-4}N$N$1000\epsilon_{\text{max}}$eps_max$1\cdot10^{-4}p$p$2l_{\text{max}}$l_max$50r$r$0.5$Jtest chooses a i.i.d. uniform random variable,$\textbf{x}_k$, in its specified domain. Then it calculates the numerical Jacobian at that point by the symmetric difference quotient. This operation is done for all entries in the Jacobian, for$j=1$to$n$, and for each$j$,$i=1\rightarrow n$. Note that there is no$i$in the code since the operations are done in vectors. $$(D^{(l)}\textbf{f}(\textbf{x}_k))_{ij}=\dfrac{f_i(\textbf{x}_k+\delta \textbf{x}_j^{(l)})-f_i(\textbf{x}_k-\delta\textbf{x}_j^{(l)})}{2\delta_j^{(l)}}\approx\dfrac{\partial f_i}{\partial x_j}(\textbf{x}_k)$$ Here$\delta\textbf{x}_j^{(l)} = \delta_j^{(l)}*\textbf{e}_j$, where$\textbf{e}_j$has 0s everywhere but a 1 at the$j$th entry, and$\delta_j^{(l)}$is the magnitude of the perturbance in$\textbf{e}_j$direction at the$l$th stage. The initial perturbance magnitude,$\delta_j^{(0)}$, depends on the domain and$dx$by the equation$\delta_j^{(0)} = (\textbf{x}_{\text{max}}-\textbf{x}_{\text{min}})_j*dx$. The numerical Jacobian is compared under$\textbf{p}$norm to the Jacobian given by the user function evaluated at$\textbf{x}_k$,$\epsilon_k^{(l)} = ||D^{(l)}\textbf{f}(\textbf{x}_k)-\nabla\textbf{f}(\textbf{x}_k)||_p$. If this error,$\epsilon_k^{(l)}$, is smaller than$\epsilon_{\text{max}}$, this point passes the differentiation test. Otherwise, the magnitudes of the perturbations are reduced by a multiplication with$r$:$ \delta_j^{(l)} = \delta_j^{(l-1)}*r=\delta_j^{(0)}*r^{l}$,$\forall j$. This is repeated for the same point a maximum amount of$l_{\text{max}}$times, so that$l \leq l_{\text{max}}$. If the point is still generating an error greater than$\epsilon_{\text{max}}$, the Jacobian is said to be incorrect. This is because the numerical Jacobian is not convergent to the provided Jacobian as we decrease$\delta\textbf{x}$repeatedly,$\delta\textbf{x}\rightarrow {\bf0}$. In this case, the program returns$\epsilon^{(l_{\text{max}})}_k$, the final error at this point. This procedure is repeated for$N$different points. If all the points pass the differential test, then the method returns 0. # Advanced Jtest Usage error = jagger.Jtest(x_min,x_max,N=5,dx=0.001) ### Back-Off There are currently two types of back-off strategies, static and dynamic. Static way is by fixing the maximum number of back-offs as well as the back-off dilation factor. Dynamic way is by fixing the max number of back-offs but changing the dilation factor based on the current position of the sampler. # Setting back-off jagger.static(5, 0.2) # OR jagger.dynamic(3) ### Sampling Once all the information is given to the sampler, all it requires is to call the sample method with the number of samples wanted (n_samples). The sampling process can be divided into sessions. If you run the sample method again, the sampler will remember where it left off and continue the sampling from there. Also, you can divide sampling n_samples into n_divs by setting the option divs=n_divs. If used on terminal with n_divs$>1$, to see the progress of sampling, visual option could be set to True. Turning visual to True shows the percentage of the sampling completed. The sampling can be done in safe mode by setting safe=True. This will cause the sampler to save progress at every division, and once the sampling is completed. # Sampling jagger.sample(n_samples, divs=n_divs, visual=True, safe=True) The initial guess might be far away from the region where the chain is supposed to be, causing delay before the chain converges to the desired distribution. After the sampling, these undesired initial samples can be burned (discarded) by calling burn and providing the number of samples to be burned (n_burned). # Burning samples jagger.burn(n_burned) ### Outputs Outputs of the sampler can be accessed after the sampling as properties. These outputs are provided in table 2. The main output is the chain, which contains the Markov chain of the samples. This is an np.array of shape$({\bf n\_samples}, n)$. Thus, it has length equal to the number of samples and each sample is a vector of size$n$containing the position of that sample. # Information access chain = jagger.chain Table 2: Sampler Outputs CODE EXPLANATION chain The Markov chain n_samples Number of elements sampled n_accepted Number of samples accepted accept_rate Acceptance rate of the algorithm call_count Number of function calls step_count Array of size max_steps indicating how many samples got accepted at each back-off step ## Examples ### Quickstart The quickstart.py file in /examples directory contains a simple example.$ python quickstart.py

Running this file should create a plot that looks like this:

In this example, $n=m=1$, $f(x)=\frac{x^2-y}{\sigma}$, $f'(x)=\frac{2x}{\sigma}$. The prior has mean $\mu=0$ and precision matrix $H=[1]$, giving $\pi(x)=\frac{1}{Z}e^{-x^2/2}$. Therefore, the probability distribution that we want to sample is: $$p(x)=\frac{1}{Z}\pi(x)e^{-|f(x)|^2/2}=\frac{1}{Z} e^{-x^2/2-(x^2-y)^2/(2\sigma^2)}$$

### Well

The problem in quick-start can be made harder by deepening the well, that is taking $y$ to be bigger. This example is in the /well folder. The following line will give all the options for the file that you can play around with.