Glossary

This glossary collects technical terms introduced throughout the textbook.

Acceptance Probability

In Metropolis—Hastings, the probability $\alpha$ of accepting a proposed move, computed as the ratio of posterior densities times the proposal ratio.

Bayes Factor

The ratio of marginal likelihoods $B_{12} = p_1(y)/p_2(y)$ , measuring the relative evidence for model $M_1$ versus $M_2$ .

Bayes’ Theorem

$p(\theta \mid y) \propto p(y \mid \theta)\, p(\theta)$ . Combines prior and likelihood into the posterior.

Bayesian LOO-CV

Leave-one-out cross-validation computed from the posterior, asymptotically equivalent to WAIC.

Bernstein—von Mises Theorem

In large samples, the posterior distribution is approximately normal centered at the MLE, regardless of the prior (under regularity conditions).

Beta Distribution

$p(\theta) \propto \theta^{\alpha-1}(1-\theta)^{\beta-1}$ for $\theta \in [0,1]$ . Conjugate prior for Bernoulli/binomial likelihood.

Burn-in

The initial portion of an MCMC chain discarded before the chain has converged to the stationary distribution.

Conjugate Prior

A prior family such that the posterior belongs to the same family. Examples: Beta—Bernoulli, Normal—Normal, Gamma—Poisson, Dirichlet—multinomial.

Credible Interval

An interval $[a, b]$ such that $\Pr(\theta \in [a,b] \mid y) = 1 - \alpha$ . Interpreted as a probability statement about $\theta$ .

Data Augmentation

Introducing latent variables to make Gibbs sampling conditionals tractable. Used in mixture models, logistic regression, and probit regression.

Decision Theory

Framework for choosing actions $a$ to maximize posterior expected utility $E_{p(\theta \mid y)}[U(a, \theta)]$ .

Dirichlet Distribution

Multivariate generalization of the Beta distribution. Conjugate prior for the multinomial likelihood.

Effective Sample Size (ESS)

$\mathrm{ESS} = N/\mathrm{IF}$ . The number of independent draws equivalent to $N$ autocorrelated MCMC draws.

Fisher Information

$I(\theta) = -E_{x \mid \theta}[\partial^2 \ln p(x \mid \theta)/\partial\theta^2]$ . Measures the information that data carries about $\theta$ .

Full Conditional

$p(\theta_j \mid \theta_{-j}, y)$ . The distribution of one block of parameters given all others and the data. Used in Gibbs sampling.

Gamma Distribution

$p(\theta) \propto \theta^{\alpha-1}\exp(-\beta\theta)$ . Conjugate prior for the Poisson likelihood.

Gibbs Sampling

An MCMC algorithm that iteratively draws from full conditional distributions. Converges to the joint target distribution.

Hamiltonian Monte Carlo (HMC)

An MCMC algorithm that uses Hamiltonian dynamics (via the leapfrog integrator) to make distant proposals with high acceptance rates.

Highest Posterior Density (HPD) Interval

The shortest credible interval containing a given probability mass.

Hierarchical Model

A model where parameters have priors whose hyperparameters themselves have priors, creating a multi-level structure.

Inefficiency Factor (IF)

$\mathrm{IF} = 1 + 2\sum_{k=1}^\infty \rho_k$ . Measures the efficiency loss due to MCMC autocorrelation.

Jeffreys’ Prior

$p(\theta) \propto |I(\theta)|^{1/2}$ . A non-informative prior based on Fisher information. Transformation-invariant but violates the likelihood principle.

Laplace Approximation

A normal approximation to the posterior obtained by Taylor-expanding the log-posterior around the mode.

Lasso

$L_1$ -regularized regression. Equivalent to the posterior mode under a Laplace (double-exponential) prior on coefficients.

Likelihood Function

$p(y \mid \theta)$ viewed as a function of $\theta$ for fixed data. Not a probability distribution over $\theta$ .

Likelihood Principle

All evidence about $\theta$ in a sample is contained in the likelihood function. Bayesian inference respects this principle.

Marginal Likelihood

$p(y) = \int p(y \mid \theta)\, p(\theta)\, d\theta$ . The probability of the data averaged over the prior. Used for model comparison.

Marginalization

Integrating out nuisance parameters from the joint posterior: $p(\theta_1 \mid y) = \int p(\theta_1, \theta_2 \mid y)\, d\theta_2$ .

Markov Chain Monte Carlo (MCMC)

A family of algorithms that construct a Markov chain whose stationary distribution is the target posterior.

Metropolis—Hastings (MH)

A general MCMC algorithm: propose from $q(\theta^p \mid \theta)$ , accept with probability $\alpha = \min(1, \text{posterior ratio} \times \text{proposal ratio})$ .

Model Averaging

Combining predictions or inferences across models, weighted by posterior model probabilities.

Monte Carlo Simulation

Approximating $E[g(\theta)]$ by $\frac{1}{N}\sum_{i=1}^N g(\theta^{(i)})$ where $\theta^{(i)} \sim p(\theta)$ .

No-U-Turn Sampler (NUTS)

An adaptive variant of HMC that automatically selects trajectory length, implemented in Stan.

Polya-Gamma Augmentation

Introducing Polya-Gamma latent variables to make logistic regression amenable to Gibbs sampling.

Posterior Distribution

$p(\theta \mid y) \propto p(y \mid \theta)\, p(\theta)$ . The updated belief about $\theta$ after observing data.

Posterior Predictive Distribution

$p(\tilde{y} \mid y) = \int p(\tilde{y} \mid \theta)\, p(\theta \mid y)\, d\theta$ . The distribution of future data averaging over posterior uncertainty.

Posterior Predictive P-Value

$\Pr[T(y^{\mathrm{rep}}) \ge T(y)]$ . Measures whether observed data is extreme relative to model-generated replicates.

Predictive Distribution

The distribution of future observations given observed data and model. Accounts for both parameter and sampling uncertainty.

Prior Distribution

$p(\theta)$ . The belief about $\theta$ before observing data.

Probabilistic Programming Language

A language for specifying probabilistic models and performing automatic Bayesian inference (e.g., Stan).

Probit Regression

$\Pr(y_i = 1 \mid x_i) = \Phi(x_i'\beta)$ . Alternative to logistic regression using the normal CDF link.

Ridge Regression

$L_2$ -regularized regression. Equivalent to the posterior mean under a Normal prior on coefficients.

Scaled Inverse Chi-Squared Distribution

A conjugate prior for the variance $\sigma^2$ in Normal models.

Smoothness Prior

A prior that shrinks parameters (e.g., spline coefficients) toward zero to prevent overfitting. Controls model flexibility.

Stan

A probabilistic programming language that implements HMC/NUTS for Bayesian inference. Written in C++, callable from R.

Stationary Distribution

The distribution $\pi$ satisfying $\pi = \pi P$ for a Markov chain with transition matrix $P$ . MCMC constructs chains with the posterior as stationary distribution.

Subjective Probability

Probability as a degree of belief, not a long-run frequency. Foundation of Bayesian inference.

Variable Selection Indicators

Binary indicators $I = (I_1, \ldots, I_p)$ determining which covariates are included in a regression model.

WAIC

Widely Applicable Information Criterion. Measures predictive accuracy using the log pointwise predictive density and an effective number of parameters.