Introduction to Bayesian Inference
Source: Lecture 1 — Introduction and Bernoulli data (BDA Ch. 1, 2.1—2.4).
The Main Question
Section titled “The Main Question”Bayesian inference starts from a practical problem:
How should our uncertainty change after we see data?
The answer has three pieces:
- A model for how data are generated.
- A prior distribution for what we believed before seeing the data.
- A posterior distribution for what we believe after seeing the data.
The chapter builds these pieces in the simplest useful setting: Bernoulli data, where each observation is either success or failure.
Step 1: Write Down the Sampling Model
Section titled “Step 1: Write Down the Sampling Model”Suppose we observe binary data
Let mean success and mean failure. The unknown success probability is .
The Bernoulli model is
This means:
- each observation has probability of being 1;
- each observation has probability of being 0;
- the observations are conditionally independent given .
Let
Here is the number of successes and is the number of failures.
Step 2: Build the Likelihood
Section titled “Step 2: Build the Likelihood”What Is It For?
Section titled “What Is It For?”The likelihood tells us which parameter values make the observed data more or less plausible.
For Bernoulli data, multiplying the probabilities of all observations gives
As a function of , this is the likelihood function.
How to Read the Formula
Section titled “How to Read the Formula”The term
rewards values of that make the observed successes plausible.
The term
rewards values of that make the observed failures plausible.
If the data contain many successes, the likelihood is larger for larger . If the data contain many failures, the likelihood is larger for smaller .
A Common Confusion
Section titled “A Common Confusion”The likelihood is not a probability distribution over .
In the sampling model, is fixed and the data are random. In the likelihood, the data are fixed and is varied. The same expression is used, but the interpretation changes.
Step 3: Represent Prior Uncertainty
Section titled “Step 3: Represent Prior Uncertainty”Bayesian probability is allowed to describe uncertainty about fixed but unknown quantities.
For example,
is meaningful in Bayesian inference. It means:
After seeing the data, how much posterior probability is below 0.6?
The prior distribution represents uncertainty before observing the current data. It may come from previous studies, expert knowledge, or a deliberately weak starting point.
The important rule is that the prior must be stated before it is combined with the current likelihood.
Step 4: Use Bayes’ Theorem
Section titled “Step 4: Use Bayes’ Theorem”The Formula
Section titled “The Formula”Bayes’ theorem updates the prior by the likelihood:
The denominator is
It makes the posterior integrate to one.
The Working Version
Section titled “The Working Version”For many calculations, we first use the proportional form:
In words:
This is the central update rule of Bayesian inference.
What Each Piece Does
Section titled “What Each Piece Does”- The prior says which parameter values were plausible before the data.
- The likelihood says which parameter values are supported by the data.
- The posterior combines both sources of information.
- The normalizing constant makes the result a valid probability distribution.
Example: A Rare Disease Test
Section titled “Example: A Rare Disease Test”Let be the event that a person has a very rare disease. Suppose
Let be the event that the test is positive. Suppose
Bayes’ theorem gives
Substituting the numbers,
The positive test matters: the probability increased by a factor of about 18. But the disease is so rare that the posterior probability is still small.
This example shows why both ingredients matter. A strong likelihood signal can still lead to a small posterior probability when the prior probability is extremely low.
The Beta Prior for a Bernoulli Probability
Section titled “The Beta Prior for a Bernoulli Probability”Why Use a Beta Distribution?
Section titled “Why Use a Beta Distribution?”The success probability must lie between 0 and 1. The Beta distribution is a flexible distribution on this interval.
Write
Its density is
The parameters and control the prior shape. Roughly, larger supports more prior successes, and larger supports more prior failures.
The Bernoulli-Beta Update
Section titled “The Bernoulli-Beta Update”Ingredient 1: Likelihood
Section titled “Ingredient 1: Likelihood”For successes and failures,
Ingredient 2: Prior
Section titled “Ingredient 2: Prior”The Beta prior has the kernel
Step 3: Multiply
Section titled “Step 3: Multiply”Bayes’ theorem gives
Collecting powers,
This is the kernel of a Beta distribution, so
How to Remember It
Section titled “How to Remember It”The update is:
Add observed successes to the first Beta parameter. Add observed failures to the second Beta parameter.
The Beta prior is conjugate to the Bernoulli likelihood because the posterior is again a Beta distribution.
Example: Spam Email
Section titled “Example: Spam Email”George examines 4601 emails. Of these, 1813 are spam and 2788 are not spam.
Let if email is spam. The model is
With prior
the posterior is
What This Example Teaches
Section titled “What This Example Teaches”If , different reasonable priors can noticeably change the posterior.
If , the prior still matters, but less.
If , the likelihood is so concentrated that reasonable priors give very similar posteriors.
The practical lesson is not that priors are irrelevant. It is that prior influence depends on how much information the likelihood contains.
The Likelihood Principle
Section titled “The Likelihood Principle”The Principle
Section titled “The Principle”The likelihood principle says:
Once the data have been observed, all evidence about contained in the data is in the likelihood function.
This means that two experiments with proportional likelihoods should lead to the same inference about if the same prior is used.
Three Ways to Observe Bernoulli Data
Section titled “Three Ways to Observe Bernoulli Data”Suppose the observed data contain successes and failures.
In a fixed-order Bernoulli experiment, the likelihood is
In a binomial experiment with fixed , the likelihood is
In a negative binomial experiment with fixed , the likelihood is
The binomial coefficients do not depend on . Therefore all three likelihoods are proportional as functions of .
Why the Bayesian Posterior Is the Same
Section titled “Why the Bayesian Posterior Is the Same”Bayes’ theorem multiplies the likelihood by the prior and then normalizes. Constants that do not depend on cancel during normalization.
So, with the same prior, all three sampling schemes produce the same posterior:
This is why Bayesian inference respects the likelihood principle in this setting.
Study Questions
Section titled “Study Questions”- In the Bernoulli model, what information from the data enters the likelihood?
- Why is the likelihood not itself a probability distribution over ?
- What does the normalizing constant in Bayes’ theorem do?
- In the Beta-Bernoulli update, why do successes add to and failures add to ?
- Why do proportional likelihoods lead to the same Bayesian posterior when the prior is the same?
Chapter Summary
Section titled “Chapter Summary”Bayesian inference updates uncertainty by combining a prior distribution with a likelihood. The likelihood measures how well different parameter values explain the observed data, while the prior represents uncertainty before the data. Bayes’ theorem turns these into the posterior distribution. In the Bernoulli-Beta model, the update is especially simple: add observed successes and failures to the two Beta parameters. This example also shows the likelihood principle: Bayesian inference depends on the likelihood as a function of the parameter, not on sampling details that only multiply the likelihood by constants.