Normal and Poisson Models

Source: Lecture 2 — Normal and Poisson data. Prior elicitation (BDA Ch. 2).

The Goal of This Chapter

The Bernoulli model handled binary data. This chapter adds two common data types:

Continuous measurements, using the Normal model.
Counts, using the Poisson model.

The Bayesian workflow is the same:

Choose a likelihood.
Choose a prior.
Multiply prior and likelihood.
Read the posterior distribution.

The new idea is that different likelihoods have different convenient conjugate priors.

Normal Data with Known Variance

What Problem Are We Solving?

Suppose the observations are continuous and centered around an unknown mean $\theta$ . The observation variance $\sigma^2$ is assumed known.

The model is

X_1,\ldots,X_n \mid \theta,\sigma^2 \overset{\mathrm{iid}}{\sim} N(\theta,\sigma^2).

The parameter of interest is the mean $\theta$ .

Start With the Likelihood

For the Normal model, the data enter the likelihood through the sample mean $\bar{x}$ .

As a function of $\theta$ , the likelihood is proportional to

p(x_1,\ldots,x_n \mid \theta,\sigma^2) \propto \exp\left[ -\frac{1}{2(\sigma^2/n)}(\theta-\bar{x})^2 \right].

How to Read the Formula

This is the kernel of a Normal density in $\theta$ .

It is centered at

\bar{x}

and has variance

\frac{\sigma^2}{n}.

So the likelihood favors values of $\theta$ near the sample mean. As $n$ increases, the likelihood becomes narrower.

Case 1: Flat Prior for the Normal Mean

Prior

A flat prior writes

p(\theta) \propto c.

It gives equal prior weight to all values of $\theta$ . This is an improper prior, but in this simple model it leads to a proper posterior when data are observed.

Posterior

Because the prior is constant, the posterior has the same shape as the likelihood:

\theta \mid x_1,\ldots,x_n \sim N\left(\bar{x},\frac{\sigma^2}{n}\right).

Interpretation

With a flat prior, the posterior mean is the sample mean and the posterior standard deviation is

\frac{\sigma}{\sqrt{n}}.

This is the usual standard error, now interpreted as posterior uncertainty about $\theta$ .

Case 2: Normal Prior for the Normal Mean

Why This Prior?

A Normal prior is natural when prior knowledge says that $\theta$ is probably near some value $\mu_0$ , with uncertainty measured by $\tau_0^2$ .

Write

\theta \sim N(\mu_0,\tau_0^2).

This prior is conjugate for the Normal likelihood with known variance.

The Posterior Formula

The posterior is Normal:

\theta \mid x \sim N(\mu_n,\tau_n^2).

The posterior precision is

\frac{1}{\tau_n^2} = \frac{n}{\sigma^2} + \frac{1}{\tau_0^2}.

The posterior mean is

\mu_n = w\bar{x}+(1-w)\mu_0,

where

w = \frac{n/\sigma^2}{n/\sigma^2+1/\tau_0^2}.

How to Read the Formula

Precision means inverse variance. The posterior precision is:

\text{posterior precision} = \text{data precision} + \text{prior precision}.

The posterior mean is a weighted average:

$\bar{x}$ receives more weight when the data are precise or $n$ is large;
$\mu_0$ receives more weight when the prior is precise.

This is the same Bayesian update as before, now expressed on the precision scale.

Sequential Updating

Bayesian updating can be done one observation at a time.

For three observations,

p(\theta \mid x_1,x_2,x_3) \propto p(x_3 \mid \theta)p(\theta \mid x_1,x_2).

The posterior after the first two observations becomes the prior before the third observation.

This is not a different method. It is the same Bayes’ theorem applied repeatedly.

Example: Canadian Log Wages

Suppose $n=205$ log-wages are modeled as

X_i \mid \theta \sim N(\theta,0.4).

Use the vague prior

\theta \sim N(12,100).

The data precision is

\frac{n}{\sigma^2} = \frac{205}{0.4} = 512.5.

The prior precision is

\frac{1}{100} = 0.01.

Therefore

w = \frac{512.5}{512.5+0.01} \approx 0.99998.

The posterior mean is almost entirely determined by the sample mean. The prior is weak compared with the information in 205 observations.

Poisson Data

What Problem Are We Solving?

The Poisson model is used for counts: number of bids, number of calls, number of defects, or number of events in a fixed exposure period.

Let $\theta$ be the event rate. The model is

Y_1,\ldots,Y_n \mid \theta \overset{\mathrm{iid}}{\sim} \mathrm{Pois}(\theta).

For one observation,

p(y_i \mid \theta) = \frac{\theta^{y_i}e^{-\theta}}{y_i!}.

The Poisson Likelihood

Multiplying over observations gives

p(y \mid \theta) \propto \theta^{\sum_{i=1}^n y_i}\exp(-n\theta).

How to Read the Formula

The data enter through two quantities:

the total count $\sum_i y_i$ ;
the number of observations $n$ .

The total count pulls $\theta$ upward. The exposure $n$ appears in $\exp(-n\theta)$ and keeps the rate scaled per observation.

Gamma Prior for the Poisson Rate

Why This Prior?

The rate $\theta$ must be positive. The Gamma distribution is flexible on positive values and is conjugate to the Poisson likelihood.

Use the shape-rate parameterization:

\theta \sim \mathrm{Gamma}(\alpha,\beta),

with kernel

p(\theta) \propto \theta^{\alpha-1}\exp(-\beta\theta).

The prior mean is

E(\theta)=\frac{\alpha}{\beta}.

The Poisson-Gamma Update

Multiply likelihood and prior:

p(\theta \mid y) \propto \theta^{\sum_i y_i}\exp(-n\theta) \theta^{\alpha-1}\exp(-\beta\theta).

Collect terms:

p(\theta \mid y) \propto \theta^{\alpha+\sum_i y_i-1} \exp[-(\beta+n)\theta].

Therefore

\theta \mid y \sim \mathrm{Gamma}\left(\alpha+\sum_{i=1}^n y_i,\beta+n\right).

How to Remember It

The Gamma-Poisson update is:

(\alpha,\beta) \quad \longrightarrow \quad \left(\alpha+\sum_i y_i,\beta+n\right).

Observed counts add to the shape. Observed exposure adds to the rate.

Example: eBay Auction Bids

Suppose $n=1000$ coin auctions have total bid count

\sum_{i=1}^{1000} y_i = 3635.

Use the prior

\theta \sim \mathrm{Gamma}(2,1/2).

This prior has mean 4. After observing the data,

\theta \mid y \sim \mathrm{Gamma}(3637,1000.5).

The posterior mean is

E(\theta \mid y) = \frac{3637}{1000.5} \approx 3.635.

The posterior is concentrated because the dataset is large. The prior suggested a rate near 4, but 1000 auctions pull the posterior toward the observed average.

If auctions are split by reservation price, the two groups can have very different posterior means. That is a model-building lesson: a single Poisson rate may be too simple when important predictors are ignored.

Posterior Intervals

What Are They For?

A posterior distribution is often summarized by an interval containing most of its probability.

A 95% credible interval $[a,b]$ satisfies

\Pr(a \le \theta \le b \mid y)=0.95.

Equal-Tail Interval

An equal-tail 95% interval uses the 2.5% and 97.5% posterior quantiles.

It leaves 2.5% posterior probability below the interval and 2.5% above it.

Highest Posterior Density Interval

An HPD interval is the shortest interval containing 95% posterior probability.

For symmetric unimodal posteriors, equal-tail and HPD intervals are often the same or very close. For skewed posteriors, such as some Gamma posteriors, they can differ.

Quick Normal Approximation

When the posterior is approximately Normal,

E(\theta \mid y) \pm 1.96\,\mathrm{SD}(\theta \mid y)

gives an approximate 95% credible interval.

Conjugate Priors

Definition

A prior family is conjugate for a likelihood family if the posterior stays in the same family as the prior.

The main examples so far are:

Likelihood	Conjugate prior	Posterior
Bernoulli	Beta	Beta
Normal mean, known variance	Normal	Normal
Poisson	Gamma	Gamma
Multinomial	Dirichlet	Dirichlet

Why Conjugacy Helps

Conjugacy gives closed-form updates. This is useful for learning the structure of Bayesian inference and for simple applied models.

But conjugacy is not required for Bayesian inference. When the posterior is not available in closed form, simulation methods can still be used.

Prior Elicitation

What Is the Task?

Prior elicitation means turning domain knowledge into a probability distribution.

The expert may not think in terms of parameters. The statistician’s job is to ask questions about meaningful quantities and translate the answers into a prior.

Useful questions include:

What value seems most plausible?
What range would contain the quantity with high probability?
Is it plausible that the quantity is below a particular threshold?
How surprising would very large or very small values be?

Practical Warning

People are affected by anchoring and overconfidence. It helps to show the implied prior distribution back to the expert and check whether its consequences make sense.

Jeffreys’ Prior

What Problem Is It Trying to Solve?

Sometimes we want an automatic prior that is less tied to a specific parameterization.

Jeffreys’ prior uses the Fisher information:

p(\theta) \propto |I(\theta)|^{1/2}.

For one parameter,

I(\theta) = -E_{x \mid \theta} \left[ \frac{\partial^2}{\partial \theta^2} \log p(x \mid \theta) \right].

Jeffreys’ prior is invariant under one-to-one transformations of the parameter.

Example: Bernoulli Jeffreys’ Prior

For Bernoulli observations,

I(\theta) = \frac{n}{\theta(1-\theta)}.

Therefore

p(\theta) \propto \theta^{-1/2}(1-\theta)^{-1/2}.

This is the kernel of

\mathrm{Beta}(1/2,1/2).

The prior puts more density near 0 and 1 than a uniform prior does.

A Caution About Jeffreys’ Prior

Jeffreys’ prior depends on the Fisher information, and the Fisher information can depend on the sampling scheme.

For example, Bernoulli sampling and negative-binomial sampling can give different Jeffreys’ priors even when the likelihoods are proportional after the data are observed.

This is a warning: automatic priors are not free of modeling choices. In multiparameter problems, Jeffreys’ prior can also become difficult or inappropriate.

Types of Prior Information

Bayesian analyses commonly use several kinds of prior information:

Expert information from previous studies or domain experience.
Weak or vague priors that regularize without dominating the likelihood.
Smoothness or shrinkage priors that stabilize complex models.

The right prior depends on the purpose of the analysis. Estimation, prediction, and model comparison can place different demands on the prior.

Study Questions

In the Normal model with known variance, why does the likelihood become narrower as $n$ grows?
How is the Normal posterior mean a weighted average of prior information and data information?
In the Poisson-Gamma update, why does $n$ add to the Gamma rate parameter?
What is the difference between an equal-tail credible interval and an HPD interval?
Why should Jeffreys’ prior be treated with caution?

Chapter Summary

Normal and Poisson models extend the basic Bayesian update to continuous and count data. With known Normal variance, a Normal prior gives a Normal posterior whose precision is the sum of prior and data precision. For Poisson counts, a Gamma prior gives a Gamma posterior by adding observed counts and exposure. Credible intervals summarize posterior uncertainty, conjugate priors provide closed-form updates, and prior elicitation connects the mathematics to real information. Jeffreys’ prior is useful as an invariant automatic construction, but it still reflects modeling choices.