Multiparameter Models
Source: Lecture 3 — Multiparameter models (BDA Ch. 3).
The Main Change
Section titled “The Main Change”Earlier chapters mostly used one unknown parameter:
Many real models have several unknown quantities. A Normal model may have both a mean and a variance. A multinomial model has several category probabilities. A multivariate Normal model has a vector mean.
The main Bayesian idea does not change:
What changes is that may now be a vector.
Joint and Marginal Posteriors
Section titled “Joint and Marginal Posteriors”Step 1: Keep the Parameters Together
Section titled “Step 1: Keep the Parameters Together”Suppose
The Bayesian update gives a joint posterior:
This joint distribution describes uncertainty about both parameters and their dependence after seeing the data.
Step 2: Focus on One Parameter
Section titled “Step 2: Focus on One Parameter”Often one parameter is the main target and the other is a nuisance parameter.
If is the parameter of interest, remove by integration:
This is called marginalization.
Step 3: Read the Conditional Form
Section titled “Step 3: Read the Conditional Form”The same marginal posterior can be written as
This says:
- condition on a possible value of ;
- describe uncertainty about at that value;
- average over posterior uncertainty in .
This interpretation is central to Bayesian computation.
Normal Model with Unknown Variance
Section titled “Normal Model with Unknown Variance”What Problem Are We Solving?
Section titled “What Problem Are We Solving?”Previously, the Normal variance was known. Now both the mean and the variance are unknown:
The two unknowns are:
- : the population mean;
- : the population variance.
A Standard Non-Informative Prior
Section titled “A Standard Non-Informative Prior”A common prior for this model is
This prior is improper, but it leads to a proper posterior when there is enough data.
It can be understood as flat in and flat in .
Conditional Posterior for the Mean
Section titled “Conditional Posterior for the Mean”If were known, the posterior for would be Normal. The same conditional result appears here:
How to Read It
Section titled “How to Read It”For each possible value of , uncertainty about is centered at the sample mean.
Larger gives a wider conditional posterior for . Smaller gives a narrower one.
Posterior for the Variance
Section titled “Posterior for the Variance”Let
Under the standard non-informative prior,
The degrees of freedom are , matching the usual sample variance calculation.
The Simulation Algorithm
Section titled “The Simulation Algorithm”Purpose
Section titled “Purpose”The joint posterior can be simulated in two simple steps. This is often easier than trying to manipulate the full joint density directly.
Algorithm
Section titled “Algorithm”Repeat the following:
-
Draw
-
Set
-
Draw
The paired draws are draws from the joint posterior.
Marginal Posterior for the Mean
Section titled “Marginal Posterior for the Mean”After averaging over uncertainty in , the marginal posterior for is a Student distribution:
Why a t Distribution Appears
Section titled “Why a t Distribution Appears”If were known, the posterior for would be Normal.
When is unknown, extra uncertainty remains. The distribution has heavier tails than the Normal, which reflects this additional variance uncertainty.
As grows, is estimated more precisely and the distribution becomes closer to a Normal distribution.
Conjugate Normal-Inverse-Chi-Squared Prior
Section titled “Conjugate Normal-Inverse-Chi-Squared Prior”What Problem Does It Solve?
Section titled “What Problem Does It Solve?”The non-informative prior is useful, but sometimes we have real prior information about both the mean and the variance.
A conjugate prior is
and
The hyperparameters have useful interpretations:
- is the prior center for the mean;
- is the prior strength for the mean;
- is the prior degrees of freedom for the variance;
- is the prior scale for the variance.
The Conjugate Update
Section titled “The Conjugate Update”The posterior has the same family:
and
The updated parameters are
and
The updated variance scale satisfies
How to Read the Update
Section titled “How to Read the Update”The posterior mean is a weighted average of the prior mean and the sample mean.
The variance update has three contributions:
- prior variance information;
- within-sample variation;
- disagreement between the prior mean and the sample mean.
The marginal posterior for the mean is
Multinomial Model
Section titled “Multinomial Model”What Problem Are We Solving?
Section titled “What Problem Are We Solving?”The multinomial model handles counts in several categories.
Suppose there are categories and observed counts
Let
be the category probabilities, where
Ignoring constants that do not depend on , the likelihood is
Dirichlet Prior
Section titled “Dirichlet Prior”Why This Prior?
Section titled “Why This Prior?”The Dirichlet distribution is a distribution over probability vectors. It is the multivariate analogue of the Beta distribution.
Write
Its density kernel is
The prior mean for category is
The total
controls how concentrated the prior is.
The Dirichlet-Multinomial Update
Section titled “The Dirichlet-Multinomial Update”Multiply likelihood and prior:
Collect powers:
Therefore
How to Remember It
Section titled “How to Remember It”Each category updates independently in the parameter list:
The prior count for a category is increased by the observed count in that category.
Simulating from a Dirichlet Distribution
Section titled “Simulating from a Dirichlet Distribution”The Algorithm
Section titled “The Algorithm”To simulate
use independent Gamma draws:
-
Draw
-
Normalize:
Then
has the desired Dirichlet distribution.
Why It Works Intuitively
Section titled “Why It Works Intuitively”The Gamma draws create positive category weights. Dividing by their sum turns those weights into probabilities that add to one.
Example: Market Shares
Section titled “Example: Market Shares”A survey of 513 smartphone owners gives:
| Category | Count |
|---|---|
| iPhone | 180 |
| Android | 230 |
| Windows | 62 |
| Other | 41 |
An older survey suggested shares of 30%, 30%, 20%, and 20%. Represent that prior as a 50-person pseudo-survey:
The posterior is
The interpretation is direct: prior pseudo-counts plus observed counts.
Multivariate Normal Model
Section titled “Multivariate Normal Model”What Problem Are We Solving?
Section titled “What Problem Are We Solving?”Now each observation is a vector:
Assume
where is known and is unknown.
The goal is to learn the vector mean .
The Likelihood
Section titled “The Likelihood”Let
As a function of , the likelihood has the same shape as a multivariate Normal density centered at :
The matrix is the data precision matrix for one observation.
Normal Prior for the Vector Mean
Section titled “Normal Prior for the Vector Mean”Use the conjugate prior
Here is the prior mean vector and is the prior covariance matrix.
The posterior is
The posterior precision matrix is
The posterior mean is
How to Read the Multivariate Formula
Section titled “How to Read the Multivariate Formula”This is the vector version of the one-dimensional Normal update.
The posterior precision equals:
The posterior mean combines:
- prior information, ;
- data information, .
If the prior is made very weak, then approaches zero and
So the posterior centers on the sample mean vector with covariance shrinking at rate .
Study Questions
Section titled “Study Questions”- What does it mean to marginalize over a nuisance parameter?
- Why does the Normal mean have a posterior when is unknown?
- In the Normal-inverse-chi-squared update, what does control?
- How does the Dirichlet prior update after multinomial counts are observed?
- How is the multivariate Normal update similar to the one-dimensional Normal update?
Chapter Summary
Section titled “Chapter Summary”Multiparameter Bayesian models use joint posterior distributions. When one parameter is not the main target, it is removed by marginalization. In the Normal model with unknown variance, uncertainty about makes the marginal posterior for the mean a Student distribution. The conjugate Normal-inverse-chi-squared prior adds prior information about both mean and variance. For categorical counts, the Dirichlet prior updates by adding observed counts to prior pseudo-counts. For a multivariate Normal model with known covariance, the vector mean update is the same precision-weighted idea as in the one-dimensional Normal model.