Using Bayes’ Theorem to compute for each model



Discrete

Process

  • Write down the two (or more) models you want to compare.
  • Write down the prior probability of each model. The prior probabilities of the models are often equal unless you have some reason to prefer one model over the other (before taking the data into consideration).
  • Compute the likelihood of each model .
    • We assume the data were generated independently from the model. So, the probability of the entire data set is the product of the probabilities of each datum .
    • We assume the data were generated independently from the model. So, the probability of the entire data set is the product of the probabilities of each datum .
- We assume the data were generated independently from the model. So, the probability of the entire data set is the product of the probabilities of each datum $d_j$. 
- We need to sum (or integrate) over all parameters $\theta$. From the [[law of total probability]], $$p(D∣M_i)=\sum\limits_{\theta} p(D∣\theta,M_i)p(\theta∣M_i)$$
  • Use Bayes’ equation to compute the posterior probability of each model.

If we use the Bayes’ Theorem (odds), assuming , the ratio of the posterior probability is equal to the likelihood function ratio

Pros & Cons

  • Pro: Bayesian model selection is simple conceptually and it addresses the question we’re interested in, namely, what is the posterior probability of each of the models,  and ​, given the data?
  • Pro: We can easily extend this to more than 2 models by computing the posterior probability of each model using Bayes Theorem (general)
  • Con: For any somewhat complicated model, the posteriors can be computationally expensive to calculate. In particular, the denominator of Bayes’ equation is computationally expensive to evaluate for complicated models.

Continuous

Given a dataset and we want to estimate the value of an unknown parameter

where the numerator is calculated using the law of total probability