Approximate a complex posterior distribution using multivariate normal distribution

To fit a Normal probability density function (PDF) to a unnormalized posterior distribution, you can follow these steps:

  1. Transform the parameters so they are all unconstrained. The normal distribution is defined for all real values and so we want all our parameters
  2. Find the peak (local maximum) of the unnormalized log posterior distribution. This point (peak/mode) has the highest probability density, and contours
    1. We generally work with the log of the distribution since 1) it is much better behaved numerically and 2) the peak of the log of a function is at the same location as the peak of the function itself.
    2. In practice, we usually find the local minimum of the unnormalized, negative log posterior, and extract the point
  3. Full rank variational inference: Compute all second-order derivatives of the log of multivariate normal distribution PDF at the peak. We set it to equal the negative inverse covariance matrix of the target distribution at the peak $$ \frac{\partial^{2}}{\partial x_{i}\partial x_{j}}\log p(\mathbf{x}, \boldsymbol{\theta})=H=-\mathbf{\Sigma}^{-1}
You can't use 'macro parameter character #' in math mode5. Fit the [[multivariate normal distribution]] with $\mu=\boldsymbol{\theta}^{*}$ and $\Sigma = -H^{-1}$ where H is the [[Hessian matrix]] # Usage - Laplace approximation is accurate when the true posterior is approximately Gaussian. - **Limited adaptability**: It may not perform well for non-Gaussian or multimodal posterior distributions. In that case, ADVI can be better because it provides more choice for variational distributions $q(\boldsymbol{\theta})$ - **Sensitivity to mode:** We see that the estimate for model evidence largely depends on the MAP point $\hat{\boldsymbol{\theta}}$, therefore its accuracy depends on how close this point is to the true posterior distribution. For example, if the true posterior is bimodal, Laplace approximation might incorrectly estimate the log unnormalized posterior $\log p(\mathbf{x}, \boldsymbol{\theta})$ # Derivation We start with a [[Bayes' Theorem]]

p(\boldsymbol{\theta}|\mathbf{x})=\frac{p(\mathbf{x}\mid \boldsymbol{\theta})p(\boldsymbol{\theta})}{p(\mathbf{x})}=\frac{p(\mathbf{x}, \boldsymbol{\theta})}{\int_{\boldsymbol{\theta}} p(\mathbf{x}, \boldsymbol{\theta})d \boldsymbol{\theta}}

\frac{1}{Z} = \int_{\boldsymbol{\theta}} p(\mathbf{x}, \boldsymbol{\theta})d \boldsymbol{\theta}

p(\boldsymbol{\theta}\mid \mathbf{x})=Z \times p(\mathbf{x}, \boldsymbol{\theta})

p(\boldsymbol{\theta}\mid \mathbf{x})=Z p(\mathbf{x}, \boldsymbol{\theta}) \approx Z q(\boldsymbol{\theta})

q(\boldsymbol{\theta})=\mathcal{N}(\boldsymbol{\theta}\mid \boldsymbol{\mu}=\hat{\boldsymbol{\theta}},\boldsymbol{\Sigma}=H^{-1})

\hat{\boldsymbol{\theta}}=\arg\max_{\boldsymbol{\theta}}; \log p(\mathbf{x}, \boldsymbol{\theta})

H= -\nabla_{\boldsymbol{\theta}}^{2} ; \log p(\mathbf{x}, \boldsymbol{\theta})\mid *{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}

\log p(\mathbf{x}, \boldsymbol{\theta})\approx q(\boldsymbol{\theta})=\mathcal{N}(\boldsymbol{\theta}\mid \boldsymbol{\mu}=\hat{\boldsymbol{\theta}},\boldsymbol{\Sigma}=H^{-1})

\log p(\mathbf{x}, \boldsymbol{\theta}) \approx \log p(\mathbf{x}, \hat{\boldsymbol{\theta}}) + (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^\top \nabla*{\boldsymbol{\theta}} \log p(\mathbf{x}, \hat{\boldsymbol{\theta}}) + \frac{1}{2} (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^\top H (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})

\log p(\mathbf{x}, \boldsymbol{\theta}) \approx \log p(\mathbf{x}, \hat{\boldsymbol{\theta}}) + \frac{1}{2} (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^\top H (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})

You can't use 'macro parameter character #' in math mode# Model evidence Calculate the [[model evidence]] using this [[Laplace approximation]]

\begin{align*} p(\mathbf{x})=\int_{\boldsymbol{\theta}} p(\mathbf{x}, \boldsymbol{\theta})d \boldsymbol{\theta} &= \int_{\boldsymbol{\theta}} \exp \left[ \log p(\mathbf{x}, \boldsymbol{\theta})\right] d \boldsymbol{\theta}\ &\approx \int p(\mathbf{x}, \hat{\boldsymbol{\theta}}) \exp\left[\frac{1}{2} (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^\top H (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})\right] , d\boldsymbol{\theta} \end{align*}

p(\mathbf{x}) \approx p(\mathbf{x}, \hat{\boldsymbol{\theta}}) \int \exp\left[\frac{1}{2} (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^\top H (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})\right] , d\boldsymbol{\theta}

p(\mathbf{x}) \approx p(\mathbf{x}, \hat{\boldsymbol{\theta}}) \left(\frac{(2\pi)^d}{\lvert H \rvert}\right)^{1/2}