Normal and Poisson Models
Source: Lecture 2 — Normal and Poisson data. Prior elicitation (BDA Ch. 2).
The Goal of This Chapter
Section titled “The Goal of This Chapter”The Bernoulli model handled binary data. This chapter adds two common data types:
- Continuous measurements, using the Normal model.
- Counts, using the Poisson model.
The Bayesian workflow is the same:
- Choose a likelihood.
- Choose a prior.
- Multiply prior and likelihood.
- Read the posterior distribution.
The new idea is that different likelihoods have different convenient conjugate priors.
Normal Data with Known Variance
Section titled “Normal Data with Known Variance”What Problem Are We Solving?
Section titled “What Problem Are We Solving?”Suppose the observations are continuous and centered around an unknown mean . The observation variance is assumed known.
The model is
The parameter of interest is the mean .
Start With the Likelihood
Section titled “Start With the Likelihood”For the Normal model, the data enter the likelihood through the sample mean .
As a function of , the likelihood is proportional to
How to Read the Formula
Section titled “How to Read the Formula”This is the kernel of a Normal density in .
It is centered at
and has variance
So the likelihood favors values of near the sample mean. As increases, the likelihood becomes narrower.
Case 1: Flat Prior for the Normal Mean
Section titled “Case 1: Flat Prior for the Normal Mean”A flat prior writes
It gives equal prior weight to all values of . This is an improper prior, but in this simple model it leads to a proper posterior when data are observed.
Posterior
Section titled “Posterior”Because the prior is constant, the posterior has the same shape as the likelihood:
Interpretation
Section titled “Interpretation”With a flat prior, the posterior mean is the sample mean and the posterior standard deviation is
This is the usual standard error, now interpreted as posterior uncertainty about .
Case 2: Normal Prior for the Normal Mean
Section titled “Case 2: Normal Prior for the Normal Mean”Why This Prior?
Section titled “Why This Prior?”A Normal prior is natural when prior knowledge says that is probably near some value , with uncertainty measured by .
Write
This prior is conjugate for the Normal likelihood with known variance.
The Posterior Formula
Section titled “The Posterior Formula”The posterior is Normal:
The posterior precision is
The posterior mean is
where
How to Read the Formula
Section titled “How to Read the Formula”Precision means inverse variance. The posterior precision is:
The posterior mean is a weighted average:
- receives more weight when the data are precise or is large;
- receives more weight when the prior is precise.
This is the same Bayesian update as before, now expressed on the precision scale.
Sequential Updating
Section titled “Sequential Updating”Bayesian updating can be done one observation at a time.
For three observations,
The posterior after the first two observations becomes the prior before the third observation.
This is not a different method. It is the same Bayes’ theorem applied repeatedly.
Example: Canadian Log Wages
Section titled “Example: Canadian Log Wages”Suppose log-wages are modeled as
Use the vague prior
The data precision is
The prior precision is
Therefore
The posterior mean is almost entirely determined by the sample mean. The prior is weak compared with the information in 205 observations.
Poisson Data
Section titled “Poisson Data”What Problem Are We Solving?
Section titled “What Problem Are We Solving?”The Poisson model is used for counts: number of bids, number of calls, number of defects, or number of events in a fixed exposure period.
Let be the event rate. The model is
For one observation,
The Poisson Likelihood
Section titled “The Poisson Likelihood”Multiplying over observations gives
How to Read the Formula
Section titled “How to Read the Formula”The data enter through two quantities:
- the total count ;
- the number of observations .
The total count pulls upward. The exposure appears in and keeps the rate scaled per observation.
Gamma Prior for the Poisson Rate
Section titled “Gamma Prior for the Poisson Rate”Why This Prior?
Section titled “Why This Prior?”The rate must be positive. The Gamma distribution is flexible on positive values and is conjugate to the Poisson likelihood.
Use the shape-rate parameterization:
with kernel
The prior mean is
The Poisson-Gamma Update
Section titled “The Poisson-Gamma Update”Multiply likelihood and prior:
Collect terms:
Therefore
How to Remember It
Section titled “How to Remember It”The Gamma-Poisson update is:
Observed counts add to the shape. Observed exposure adds to the rate.
Example: eBay Auction Bids
Section titled “Example: eBay Auction Bids”Suppose coin auctions have total bid count
Use the prior
This prior has mean 4. After observing the data,
The posterior mean is
The posterior is concentrated because the dataset is large. The prior suggested a rate near 4, but 1000 auctions pull the posterior toward the observed average.
If auctions are split by reservation price, the two groups can have very different posterior means. That is a model-building lesson: a single Poisson rate may be too simple when important predictors are ignored.
Posterior Intervals
Section titled “Posterior Intervals”What Are They For?
Section titled “What Are They For?”A posterior distribution is often summarized by an interval containing most of its probability.
A 95% credible interval satisfies
Equal-Tail Interval
Section titled “Equal-Tail Interval”An equal-tail 95% interval uses the 2.5% and 97.5% posterior quantiles.
It leaves 2.5% posterior probability below the interval and 2.5% above it.
Highest Posterior Density Interval
Section titled “Highest Posterior Density Interval”An HPD interval is the shortest interval containing 95% posterior probability.
For symmetric unimodal posteriors, equal-tail and HPD intervals are often the same or very close. For skewed posteriors, such as some Gamma posteriors, they can differ.
Quick Normal Approximation
Section titled “Quick Normal Approximation”When the posterior is approximately Normal,
gives an approximate 95% credible interval.
Conjugate Priors
Section titled “Conjugate Priors”Definition
Section titled “Definition”A prior family is conjugate for a likelihood family if the posterior stays in the same family as the prior.
The main examples so far are:
| Likelihood | Conjugate prior | Posterior |
|---|---|---|
| Bernoulli | Beta | Beta |
| Normal mean, known variance | Normal | Normal |
| Poisson | Gamma | Gamma |
| Multinomial | Dirichlet | Dirichlet |
Why Conjugacy Helps
Section titled “Why Conjugacy Helps”Conjugacy gives closed-form updates. This is useful for learning the structure of Bayesian inference and for simple applied models.
But conjugacy is not required for Bayesian inference. When the posterior is not available in closed form, simulation methods can still be used.
Prior Elicitation
Section titled “Prior Elicitation”What Is the Task?
Section titled “What Is the Task?”Prior elicitation means turning domain knowledge into a probability distribution.
The expert may not think in terms of parameters. The statistician’s job is to ask questions about meaningful quantities and translate the answers into a prior.
Useful questions include:
- What value seems most plausible?
- What range would contain the quantity with high probability?
- Is it plausible that the quantity is below a particular threshold?
- How surprising would very large or very small values be?
Practical Warning
Section titled “Practical Warning”People are affected by anchoring and overconfidence. It helps to show the implied prior distribution back to the expert and check whether its consequences make sense.
Jeffreys’ Prior
Section titled “Jeffreys’ Prior”What Problem Is It Trying to Solve?
Section titled “What Problem Is It Trying to Solve?”Sometimes we want an automatic prior that is less tied to a specific parameterization.
Jeffreys’ prior uses the Fisher information:
For one parameter,
Jeffreys’ prior is invariant under one-to-one transformations of the parameter.
Example: Bernoulli Jeffreys’ Prior
Section titled “Example: Bernoulli Jeffreys’ Prior”For Bernoulli observations,
Therefore
This is the kernel of
The prior puts more density near 0 and 1 than a uniform prior does.
A Caution About Jeffreys’ Prior
Section titled “A Caution About Jeffreys’ Prior”Jeffreys’ prior depends on the Fisher information, and the Fisher information can depend on the sampling scheme.
For example, Bernoulli sampling and negative-binomial sampling can give different Jeffreys’ priors even when the likelihoods are proportional after the data are observed.
This is a warning: automatic priors are not free of modeling choices. In multiparameter problems, Jeffreys’ prior can also become difficult or inappropriate.
Types of Prior Information
Section titled “Types of Prior Information”Bayesian analyses commonly use several kinds of prior information:
- Expert information from previous studies or domain experience.
- Weak or vague priors that regularize without dominating the likelihood.
- Smoothness or shrinkage priors that stabilize complex models.
The right prior depends on the purpose of the analysis. Estimation, prediction, and model comparison can place different demands on the prior.
Study Questions
Section titled “Study Questions”- In the Normal model with known variance, why does the likelihood become narrower as grows?
- How is the Normal posterior mean a weighted average of prior information and data information?
- In the Poisson-Gamma update, why does add to the Gamma rate parameter?
- What is the difference between an equal-tail credible interval and an HPD interval?
- Why should Jeffreys’ prior be treated with caution?
Chapter Summary
Section titled “Chapter Summary”Normal and Poisson models extend the basic Bayesian update to continuous and count data. With known Normal variance, a Normal prior gives a Normal posterior whose precision is the sum of prior and data precision. For Poisson counts, a Gamma prior gives a Gamma posterior by adding observed counts and exposure. Credible intervals summarize posterior uncertainty, conjugate priors provide closed-form updates, and prior elicitation connects the mathematics to real information. Jeffreys’ prior is useful as an invariant automatic construction, but it still reflects modeling choices.