Using Bayes’ Theorem to infer the parameters of unknown posterior distribution from data

Approach

Maximum likelihood estimation

You can't use 'macro parameter character #' in math mode\begin{align*} &P(\text{param} = \theta |\text{data}=d) \\ =& \frac{P(\text{data}=d|\text{param}=\theta)P(\text{param}=\theta)}{P(\text{data}=d)} \end{align*}$$ To get a 99% interval that contains the true value of the parameter $p$ (assuming the model $d$ is correct), we compute a 99% interval of the [[posterior distribution]] $P(\text{param} = x|\text{data}=d)$ # Approximate by sampling - Define [[prior distribution]] $P(\theta = \theta)$ - Define [[likelihood function|likelihood function]] - Build [[prior-predictive distribution]] (all done before seeing current data) - Let model produce [[posterior distribution]] and generate samples from it. If our model and sampling are good, the resulting samples will have the same proportions as the exact posterior [[probability density function|PDF]] - Let model produce [[posterior-predictive distribution]] and generate samples from it. > [!important]- Sample code > > ```python > import pymc as pm > import numpy as np > import scipy.stats as sts > import matplotlib.pyplot as plt > > data = { > 'treatment': { > 'improved': 107, > 'patients': 141}, > 'control': { > 'improved': 57, > 'patients': 121}} > > with pm.Model() as medical_model: > # Prior as Beta > p = pm.Beta('p', alpha=1, beta=1) > > # Likelihood as Binomial > x = pm.Binomial( > 'x', n=data['treatment']['patients'], p=p, > observed=data['treatment']['improved']) > > with medical_model: > inference = pm.sample() > > # inference.posterior > > # Access a variable 'p' in posterior > all_p_samples = inference.posterior.p.values.flatten() > > # Plot the samples from the approximate posterior as well as the exact solution > plt.figure(figsize=(8, 4)) > > plt.hist( > all_p_samples, bins=20, density=True, > edgecolor='white', label='approximation') > > x = np.linspace(0.5, 0.9, 500) > y = sts.beta.pdf(x, 108, 35) > plt.plot(x, y, label='exact') > > plt.title('Comparison of the sampling approximation and the exact posterior') > plt.xlabel('p') > plt.ylabel('probability density') > plt.legend() > plt.show() > ``` ## Sampling to summarize ### Intervals of defined boundaries How much posterior probability lies below some parameter value (e.g. 0.5)? ```python np.sum(samples < 0.5) / len(samples) ``` ### Intervals of defined probability mass An interval of posterior probability are usually called *credible interval* Which parameter value marks the lower 5% of the posterior probability? ```python np.quantile(samples, 0.05) # np.percentile(samples, 5) ``` Which range of parameter values contains 80% of the posterior probability? Calculate [[highest posterior density]] ```python import arviz as az az.hdi(samples, hdi_prob = 0.8) ``` ### Point estimate Which parameter value has highest posterior probability [[maximum a posteriori|MAP]]? ## Sampling to simulate prediction # Approximate by fitting a function We approximate a complicated function (the posterior) using a simple function. Here, "simple" means it should be easy to generate samples from the approximation and it should be easy to evaluate the PDF of the approximation (e.g. > [!important]- Sample code > > ```python > import pymc as pm > import numpy as np > import scipy.stats as sts > import matplotlib.pyplot as plt > > data = { > 'treatment': { > 'improved': 107, > 'patients': 141}, > 'control': { > 'improved': 57, > 'patients': 121}} > > with pm.Model() as medical_model: > # Prior as Beta > p = pm.Beta('p', alpha=1, beta=1) > > # Likelihood as Binomial > x = pm.Binomial( > 'x', n=data['treatment']['patients'], p=p, > observed=data['treatment']['improved']) > > with medical_model: > approx = pm.fit() > inference = approx.sample(4000) > > # Access a variable 'p' in posterior > all_p_samples = inference.posterior.p.values.flatten() > > # Plot the samples from the approximate posterior as well as the exact solution > plt.figure(figsize=(8, 4)) > > plt.hist( > all_p_samples, bins=20, density=True, > edgecolor='white', label='approximation') > > x = np.linspace(0.5, 0.9, 500) > y = sts.beta.pdf(x, 108, 35) > plt.plot(x, y, label='exact') > > plt.title('Comparison of the sampling approximation and the exact posterior') > plt.xlabel('p') > plt.ylabel('probability density') > plt.legend() > plt.show() > ```