Approximate a complex posterior distribution using multivariate normal distribution
To fit a Normal probability density function (PDF) to a unnormalized posterior distribution, you can follow these steps:
- Transform the parameters so they are all unconstrained. The normal distribution is defined for all real values and so we want all our parameters
- Find the peak (local maximum) of the unnormalized log posterior distribution. This point (peak/mode) has the highest probability density, and contours
- We generally work with the log of the distribution since 1) it is much better behaved numerically and 2) the peak of the log of a function is at the same location as the peak of the function itself.
- In practice, we usually find the local minimum of the unnormalized, negative log posterior, and extract the point
- Full rank variational inference: Compute all second-order derivatives of the log of multivariate normal distribution PDF at the peak. We set it to equal the negative inverse covariance matrix
of the target distribution at the peak $$ \frac{\partial^{2}}{\partial x_{i}\partial x_{j}}\log p(\mathbf{x}, \boldsymbol{\theta})=H=-\mathbf{\Sigma}^{-1}
p(\boldsymbol{\theta}|\mathbf{x})=\frac{p(\mathbf{x}\mid \boldsymbol{\theta})p(\boldsymbol{\theta})}{p(\mathbf{x})}=\frac{p(\mathbf{x}, \boldsymbol{\theta})}{\int_{\boldsymbol{\theta}} p(\mathbf{x}, \boldsymbol{\theta})d \boldsymbol{\theta}}
\frac{1}{Z} = \int_{\boldsymbol{\theta}} p(\mathbf{x}, \boldsymbol{\theta})d \boldsymbol{\theta}
p(\boldsymbol{\theta}\mid \mathbf{x})=Z \times p(\mathbf{x}, \boldsymbol{\theta})
p(\boldsymbol{\theta}\mid \mathbf{x})=Z p(\mathbf{x}, \boldsymbol{\theta}) \approx Z q(\boldsymbol{\theta})
q(\boldsymbol{\theta})=\mathcal{N}(\boldsymbol{\theta}\mid \boldsymbol{\mu}=\hat{\boldsymbol{\theta}},\boldsymbol{\Sigma}=H^{-1})
\hat{\boldsymbol{\theta}}=\arg\max_{\boldsymbol{\theta}}; \log p(\mathbf{x}, \boldsymbol{\theta})
H= -\nabla_{\boldsymbol{\theta}}^{2} ; \log p(\mathbf{x}, \boldsymbol{\theta})\mid *{\boldsymbol{\theta}=\hat{\boldsymbol{\theta}}}
\log p(\mathbf{x}, \boldsymbol{\theta})\approx q(\boldsymbol{\theta})=\mathcal{N}(\boldsymbol{\theta}\mid \boldsymbol{\mu}=\hat{\boldsymbol{\theta}},\boldsymbol{\Sigma}=H^{-1})
\log p(\mathbf{x}, \boldsymbol{\theta}) \approx \log p(\mathbf{x}, \hat{\boldsymbol{\theta}}) + (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^\top \nabla*{\boldsymbol{\theta}} \log p(\mathbf{x}, \hat{\boldsymbol{\theta}}) + \frac{1}{2} (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^\top H (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})
\log p(\mathbf{x}, \boldsymbol{\theta}) \approx \log p(\mathbf{x}, \hat{\boldsymbol{\theta}}) + \frac{1}{2} (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^\top H (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})
\begin{align*} p(\mathbf{x})=\int_{\boldsymbol{\theta}} p(\mathbf{x}, \boldsymbol{\theta})d \boldsymbol{\theta} &= \int_{\boldsymbol{\theta}} \exp \left[ \log p(\mathbf{x}, \boldsymbol{\theta})\right] d \boldsymbol{\theta}\ &\approx \int p(\mathbf{x}, \hat{\boldsymbol{\theta}}) \exp\left[\frac{1}{2} (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^\top H (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})\right] , d\boldsymbol{\theta} \end{align*}
p(\mathbf{x}) \approx p(\mathbf{x}, \hat{\boldsymbol{\theta}}) \int \exp\left[\frac{1}{2} (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})^\top H (\boldsymbol{\theta} - \hat{\boldsymbol{\theta}})\right] , d\boldsymbol{\theta}
p(\mathbf{x}) \approx p(\mathbf{x}, \hat{\boldsymbol{\theta}}) \left(\frac{(2\pi)^d}{\lvert H \rvert}\right)^{1/2}