My (Chiffon) Nguyen

❯

❯

❯

❯

Bayesian inference

Bayesian inference

Feb 25, 20254 min read

Maths/Prob-Stats/bayes
Minerva/CS146

Using Bayes’ Theorem to infer the parameters of unknown posterior distribution from data

Approach

compute the maximum likelihood estimation
sampling (markov chain monte carlo methods)
fitting (variational inference)

Try distribution with fewer unknown parameters if we can

It’s expensive to run models with multiple parameters

Underlying each parameter is assumptions. The more parameters we have, the more we need to express uncertainty.

Maximum likelihood estimation

You can't use 'macro parameter character #' in math mode\begin{align*} &P(\text{param} = \theta |\text{data}=d) \\ =& \frac{P(\text{data}=d|\text{param}=\theta)P(\text{param}=\theta)}{P(\text{data}=d)} \end{align*}$$ To get a 99% interval that contains the true value of the parameter $p$ (assuming the model $d$ is correct), we compute a 99% interval of the [[posterior distribution]] $P(\text{param} = x|\text{data}=d)$ # Approximate by sampling - Define [[prior distribution]] $P(\theta = \theta)$ - Define [[likelihood function|likelihood function]] - Build [[prior-predictive distribution]] (all done before seeing current data) - Let model produce [[posterior distribution]] and generate samples from it. If our model and sampling are good, the resulting samples will have the same proportions as the exact posterior [[probability density function|PDF]] - Let model produce [[posterior-predictive distribution]] and generate samples from it. > [!important]- Sample code > > ```python > import pymc as pm > import numpy as np > import scipy.stats as sts > import matplotlib.pyplot as plt > > data = { > 'treatment': { > 'improved': 107, > 'patients': 141}, > 'control': { > 'improved': 57, > 'patients': 121}} > > with pm.Model() as medical_model: > # Prior as Beta > p = pm.Beta('p', alpha=1, beta=1) > > # Likelihood as Binomial > x = pm.Binomial( > 'x', n=data['treatment']['patients'], p=p, > observed=data['treatment']['improved']) > > with medical_model: > inference = pm.sample() > > # inference.posterior > > # Access a variable 'p' in posterior > all_p_samples = inference.posterior.p.values.flatten() > > # Plot the samples from the approximate posterior as well as the exact solution > plt.figure(figsize=(8, 4)) > > plt.hist( > all_p_samples, bins=20, density=True, > edgecolor='white', label='approximation') > > x = np.linspace(0.5, 0.9, 500) > y = sts.beta.pdf(x, 108, 35) > plt.plot(x, y, label='exact') > > plt.title('Comparison of the sampling approximation and the exact posterior') > plt.xlabel('p') > plt.ylabel('probability density') > plt.legend() > plt.show() > ``` ## Sampling to summarize ### Intervals of defined boundaries How much posterior probability lies below some parameter value (e.g. 0.5)? ```python np.sum(samples < 0.5) / len(samples) ``` ### Intervals of defined probability mass An interval of posterior probability are usually called *credible interval* Which parameter value marks the lower 5% of the posterior probability? ```python np.quantile(samples, 0.05) # np.percentile(samples, 5) ``` Which range of parameter values contains 80% of the posterior probability? Calculate [[highest posterior density]] ```python import arviz as az az.hdi(samples, hdi_prob = 0.8) ``` ### Point estimate Which parameter value has highest posterior probability [[maximum a posteriori|MAP]]? ## Sampling to simulate prediction # Approximate by fitting a function We approximate a complicated function (the posterior) using a simple function. Here, "simple" means it should be easy to generate samples from the approximation and it should be easy to evaluate the PDF of the approximation (e.g. > [!important]- Sample code > > ```python > import pymc as pm > import numpy as np > import scipy.stats as sts > import matplotlib.pyplot as plt > > data = { > 'treatment': { > 'improved': 107, > 'patients': 141}, > 'control': { > 'improved': 57, > 'patients': 121}} > > with pm.Model() as medical_model: > # Prior as Beta > p = pm.Beta('p', alpha=1, beta=1) > > # Likelihood as Binomial > x = pm.Binomial( > 'x', n=data['treatment']['patients'], p=p, > observed=data['treatment']['improved']) > > with medical_model: > approx = pm.fit() > inference = approx.sample(4000) > > # Access a variable 'p' in posterior > all_p_samples = inference.posterior.p.values.flatten() > > # Plot the samples from the approximate posterior as well as the exact solution > plt.figure(figsize=(8, 4)) > > plt.hist( > all_p_samples, bins=20, density=True, > edgecolor='white', label='approximation') > > x = np.linspace(0.5, 0.9, 500) > y = sts.beta.pdf(x, 108, 35) > plt.plot(x, y, label='exact') > > plt.title('Comparison of the sampling approximation and the exact posterior') > plt.xlabel('p') > plt.ylabel('probability density') > plt.legend() > plt.show() > ```

Graph View

Backlinks

No backlinks found

Created with Quartz v4.4.0 © 2025

GitHub
Mail
Home