Glossary
This glossary collects technical terms introduced throughout the textbook.
Acceptance Probability
Section titled “Acceptance Probability”In Metropolis—Hastings, the probability of accepting a proposed move, computed as the ratio of posterior densities times the proposal ratio.
Bayes Factor
Section titled “Bayes Factor”The ratio of marginal likelihoods , measuring the relative evidence for model versus .
Bayes’ Theorem
Section titled “Bayes’ Theorem”. Combines prior and likelihood into the posterior.
Bayesian LOO-CV
Section titled “Bayesian LOO-CV”Leave-one-out cross-validation computed from the posterior, asymptotically equivalent to WAIC.
Bernstein—von Mises Theorem
Section titled “Bernstein—von Mises Theorem”In large samples, the posterior distribution is approximately normal centered at the MLE, regardless of the prior (under regularity conditions).
Beta Distribution
Section titled “Beta Distribution”for . Conjugate prior for Bernoulli/binomial likelihood.
Burn-in
Section titled “Burn-in”The initial portion of an MCMC chain discarded before the chain has converged to the stationary distribution.
Conjugate Prior
Section titled “Conjugate Prior”A prior family such that the posterior belongs to the same family. Examples: Beta—Bernoulli, Normal—Normal, Gamma—Poisson, Dirichlet—multinomial.
Credible Interval
Section titled “Credible Interval”An interval such that . Interpreted as a probability statement about .
Data Augmentation
Section titled “Data Augmentation”Introducing latent variables to make Gibbs sampling conditionals tractable. Used in mixture models, logistic regression, and probit regression.
Decision Theory
Section titled “Decision Theory”Framework for choosing actions to maximize posterior expected utility .
Dirichlet Distribution
Section titled “Dirichlet Distribution”Multivariate generalization of the Beta distribution. Conjugate prior for the multinomial likelihood.
Effective Sample Size (ESS)
Section titled “Effective Sample Size (ESS)”. The number of independent draws equivalent to autocorrelated MCMC draws.
Fisher Information
Section titled “Fisher Information”. Measures the information that data carries about .
Full Conditional
Section titled “Full Conditional”. The distribution of one block of parameters given all others and the data. Used in Gibbs sampling.
Gamma Distribution
Section titled “Gamma Distribution”. Conjugate prior for the Poisson likelihood.
Gibbs Sampling
Section titled “Gibbs Sampling”An MCMC algorithm that iteratively draws from full conditional distributions. Converges to the joint target distribution.
Hamiltonian Monte Carlo (HMC)
Section titled “Hamiltonian Monte Carlo (HMC)”An MCMC algorithm that uses Hamiltonian dynamics (via the leapfrog integrator) to make distant proposals with high acceptance rates.
Highest Posterior Density (HPD) Interval
Section titled “Highest Posterior Density (HPD) Interval”The shortest credible interval containing a given probability mass.
Hierarchical Model
Section titled “Hierarchical Model”A model where parameters have priors whose hyperparameters themselves have priors, creating a multi-level structure.
Inefficiency Factor (IF)
Section titled “Inefficiency Factor (IF)”. Measures the efficiency loss due to MCMC autocorrelation.
Jeffreys’ Prior
Section titled “Jeffreys’ Prior”. A non-informative prior based on Fisher information. Transformation-invariant but violates the likelihood principle.
Laplace Approximation
Section titled “Laplace Approximation”A normal approximation to the posterior obtained by Taylor-expanding the log-posterior around the mode.
-regularized regression. Equivalent to the posterior mode under a Laplace (double-exponential) prior on coefficients.
Likelihood Function
Section titled “Likelihood Function”viewed as a function of for fixed data. Not a probability distribution over .
Likelihood Principle
Section titled “Likelihood Principle”All evidence about in a sample is contained in the likelihood function. Bayesian inference respects this principle.
Marginal Likelihood
Section titled “Marginal Likelihood”. The probability of the data averaged over the prior. Used for model comparison.
Marginalization
Section titled “Marginalization”Integrating out nuisance parameters from the joint posterior: .
Markov Chain Monte Carlo (MCMC)
Section titled “Markov Chain Monte Carlo (MCMC)”A family of algorithms that construct a Markov chain whose stationary distribution is the target posterior.
Metropolis—Hastings (MH)
Section titled “Metropolis—Hastings (MH)”A general MCMC algorithm: propose from , accept with probability .
Model Averaging
Section titled “Model Averaging”Combining predictions or inferences across models, weighted by posterior model probabilities.
Monte Carlo Simulation
Section titled “Monte Carlo Simulation”Approximating by where .
No-U-Turn Sampler (NUTS)
Section titled “No-U-Turn Sampler (NUTS)”An adaptive variant of HMC that automatically selects trajectory length, implemented in Stan.
Polya-Gamma Augmentation
Section titled “Polya-Gamma Augmentation”Introducing Polya-Gamma latent variables to make logistic regression amenable to Gibbs sampling.
Posterior Distribution
Section titled “Posterior Distribution”. The updated belief about after observing data.
Posterior Predictive Distribution
Section titled “Posterior Predictive Distribution”. The distribution of future data averaging over posterior uncertainty.
Posterior Predictive P-Value
Section titled “Posterior Predictive P-Value”. Measures whether observed data is extreme relative to model-generated replicates.
Predictive Distribution
Section titled “Predictive Distribution”The distribution of future observations given observed data and model. Accounts for both parameter and sampling uncertainty.
Prior Distribution
Section titled “Prior Distribution”. The belief about before observing data.
Probabilistic Programming Language
Section titled “Probabilistic Programming Language”A language for specifying probabilistic models and performing automatic Bayesian inference (e.g., Stan).
Probit Regression
Section titled “Probit Regression”. Alternative to logistic regression using the normal CDF link.
Ridge Regression
Section titled “Ridge Regression”-regularized regression. Equivalent to the posterior mean under a Normal prior on coefficients.
Scaled Inverse Chi-Squared Distribution
Section titled “Scaled Inverse Chi-Squared Distribution”A conjugate prior for the variance in Normal models.
Smoothness Prior
Section titled “Smoothness Prior”A prior that shrinks parameters (e.g., spline coefficients) toward zero to prevent overfitting. Controls model flexibility.
A probabilistic programming language that implements HMC/NUTS for Bayesian inference. Written in C++, callable from R.
Stationary Distribution
Section titled “Stationary Distribution”The distribution satisfying for a Markov chain with transition matrix . MCMC constructs chains with the posterior as stationary distribution.
Subjective Probability
Section titled “Subjective Probability”Probability as a degree of belief, not a long-run frequency. Foundation of Bayesian inference.
Variable Selection Indicators
Section titled “Variable Selection Indicators”Binary indicators determining which covariates are included in a regression model.
Widely Applicable Information Criterion. Measures predictive accuracy using the log pointwise predictive density and an effective number of parameters.