Information about Markov Chain Monte Carlo

Markov chain Monte Carlo (MCMC) methods (which include random walk Monte Carlo methods), are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a large number of steps is then used as a sample from the desired distribution. The quality of the sample improves as a function of the number of steps.

Usually it is not hard to construct a Markov Chain with the desired properties. The more difficult problem is to determine how many steps are needed to converge to the stationary distribution within an acceptable error. A good chain will have rapid mixing—the stationary distribution is reached quickly starting from an arbitrary position—described further under Markov chain mixing time.

Typical use of MCMC sampling can only approximate the target distribution, as there is always some residual effect of the starting position. More sophisticated MCMC-based algorithms such as coupling from the past can produce exact samples, at the cost of additional computation and an unbounded (though finite on average) running time.

The most common application of these algorithms is numerically calculating multi-dimensional integrals. In these methods, an ensemble of "walkers" moves around randomly. At each point where the walker steps, the integrand value at that point is counted towards the integral. The walker then may make a number of tentative steps around the area, looking for a place with reasonably high contribution to the integral to move into next. Random walk methods are a kind of random simulation or Monte Carlo method. However, whereas the random samples of the integrand used in a conventional Monte Carlo integration are statistically independent, those used in MCMC are correlated. A Markov chain is constructed in such a way as to have the integrand as its equilibrium distribution. Surprisingly, this is often easy to do.

Multi-dimensional integrals often arise in Bayesian statistics and computational physics, so Markov chain Monte Carlo methods are widely used in those fields.

Random walk algorithms

Many Markov chain Monte Carlo methods move around the equilibrium distribution in relatively small steps, with no tendency for the steps to proceed in the same direction. These methods are easy to implement and analyse, but unfortunately it can take a long time for the walker to explore all of the space. The walker will often double back and cover ground already covered. Here are some random walk MCMC methods:
  • Metropolis-Hastings algorithm: Generates a random walk using a proposal density and a method for rejecting proposed moves.
  • Gibbs sampling: Requires that all the conditional distributions of the target distribution can be sampled exactly. Popular partly because when this is so, the method does not require any `tuning'.
  • Slice sampling: Depends on the principle that one can sample from a distribution by sampling uniformly from the region under the plot of its density function. This method alternates uniform sampling in the vertical direction with uniform sampling from the horizontal `slice' defined by the current vertical position.

Avoiding random walks

More sophisticated algorithms use some method of preventing the walker from doubling back. These algorithms may be harder to implement, but may exhibit faster convergence (i.e. fewer steps for an accurate result).
  • Successive over-relaxation: A Monte Carlo version of this technique can be seen as a variation on Gibbs sampling; it sometimes avoids random walks.
  • Hybrid Monte Carlo (HMC) (Would be better called `Hamiltonian Monte Carlo'): Tries to avoid random walk behaviour by introducing an auxiliary momentum vector and implementing Hamiltonian dynamics where the potential function is the target density. The momentum samples are discarded after sampling. The end result of Hybrid MCMC is that proposals move across the sample space in larger steps and are therefore less correlated and converge to the target distribution more rapidly.
  • Some variations on slice sampling also avoid random walks.

Changing dimension

The Reversible Jump method is a variant of Metropolis-Hastings that allows proposals that change the dimensionality of the space. This method was first proposed in 1995 by Peter Green of Bristol University[1].

See also

References

1. ^ P. J. Green. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika, 82(4):711–732, 1995
  • Christophe Andrieu et al, "An Introduction to MCMC for Machine Learning", 2003
  • Bernd A. Berg. "Markov Chain Monte Carlo Simulations and Their Statistical Analysis". Singapore, World Scientific 2004.
  • George Casella and Edward I. George. "Explaining the Gibbs sampler". The American Statistician, 46:167-174, 1992. (Basic summary and many references.)
  • A.E. Gelfand and A.F.M. Smith. "Sampling-Based Approaches to Calculating Marginal Densities". J. American Statistical Association, 85:398-409, 1990.
  • Andrew Gelman, John B. Carlin, Hal S. Stern, and Donald B. Rubin. Bayesian Data Analysis. London: Chapman and Hall. First edition, 1995. (See Chapter 11.)
  • S. Geman and D. Geman. "Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images". IEEE Transactions on Pattern Analysis and Machine Intelligence, 6:721-741, 1984.
  • Radford M. Neal, Probabilistic Inference Using Markov Chain Monte Carlo Methods, 1993.
  • Gilks W.R., Richardson S. and Spiegelhalter D.J. "Markov Chain Monte Carlo in Practice". Chapman & Hall/CRC, 1996.
  • C.P. Robert and G. Casella. "Monte Carlo Statistical Methods" (second edition). New York: Springer-Verlag, 2004.
  • R. L. Smith ``Efficient Monte Carlo Procedures for Generating Points Uniformly Distributed Over Bounded Regions,'' Operations Research , Vol. 32, pp. 1296-1308, 1984.
This article or section may be confusing or unclear for some readers.
Please [improve the article] or discuss this issue on the talk page. This article has been tagged since September 2007.
..... Click the link for more information.
Monte Carlo methods are a widely used class of computational algorithms for simulating the behavior of various physical and mathematical systems, and for other computations.
..... Click the link for more information.
random walk, sometimes called a "drunkard's walk," is a formalization in mathematics, computer science, and physics of the intuitive idea of taking successive steps, each in a random direction.
..... Click the link for more information.
probability distribution that assigns a probability to every subset (more precisely every measurable subset) of its state space in such a way that the probability axioms are satisfied.
..... Click the link for more information.
This article or section may be confusing or unclear for some readers.
Please [improve the article] or discuss this issue on the talk page. This article has been tagged since September 2007.
..... Click the link for more information.
Mixing time refers to any of several variant formalizations of the idea: how large must t be until the time-t distribution is approximately π ? One variant, variation distance mixing time, is defined as the smallest t such that

..... Click the link for more information.
Running Time may refer to:
  • Running Time (film)
  • see Analysis of algorithms

..... Click the link for more information.
INTErnational Gamma-Ray Astrophysics Laboratory (INTEGRAL) is detecting some of the most energetic radiation that comes from space. It is the most sensitive gamma ray observatory ever launched.
..... Click the link for more information.
ensemble (also statistical ensemble or thermodynamic ensemble) is an idealization consisting of a large number of mental copies (sometimes infinitely many) of a system, considered all at once, each of which represents a possible state that the real system might be in.
..... Click the link for more information.
Monte Carlo methods are a widely used class of computational algorithms for simulating the behavior of various physical and mathematical systems, and for other computations.
..... Click the link for more information.
In probability theory, to say that two events are independent, intuitively means that the occurrence of one event makes it neither more nor less probable that the other occurs.
..... Click the link for more information.
correlation, also called correlation coefficient, indicates the strength and direction of a linear relationship between two random variables. In general statistical usage, correlation or co-relation refers to the departure of two variables from independence.
..... Click the link for more information.
This article or section may be confusing or unclear for some readers.
Please [improve the article] or discuss this issue on the talk page. This article has been tagged since September 2007.
..... Click the link for more information.
Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.
..... Click the link for more information.
Computational physics is the study and implementation of numerical algorithms in order to solve problems in physics for which a quantitative theory already exists. It is often regarded as a subdiscipline of theoretical physics but some consider it an intermediate branch between
..... Click the link for more information.
Metropolis-Hastings algorithm is a rejection sampling algorithm used to generate a sequence of samples from a probability distribution that is difficult to sample from directly. This sequence can be used in Markov chain Monte Carlo simulation to approximate the distribution (i.e.
..... Click the link for more information.
random walk, sometimes called a "drunkard's walk," is a formalization in mathematics, computer science, and physics of the intuitive idea of taking successive steps, each in a random direction.
..... Click the link for more information.
In mathematics and physics, Gibbs sampling is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. The purpose of such a sequence is to approximate the joint distribution (i.e.
..... Click the link for more information.
Given two jointly distributed random variables X and Y, the conditional probability distribution of Y given X (written "Y | X") is the probability distribution of Y when X is known to be a particular value.
..... Click the link for more information.
In mathematics and physics, Slice sampling is a type of Markov chain Monte Carlo sampling algorithm based on the observation that to sample a random variable one can sample uniformly from the region under the graph of its density function.
..... Click the link for more information.
Successive over-relaxation (SOR) is a numerical method used to speed up convergence of the Gauss–Seidel method for solving a linear system of equations. A similar method can be used for any slowly converging iterative process. It was devised simultaneously by David M.
..... Click the link for more information.
In mathematics and physics, Gibbs sampling is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. The purpose of such a sequence is to approximate the joint distribution (i.e.
..... Click the link for more information.
momentum (pl. momenta; SI unit kg m/s, or, equivalently, N•s) is the product of the mass and velocity of an object. For more accurate measures of momentum, see the section "modern definitions of momentum" on this page.
..... Click the link for more information.
Hamiltonian mechanics is a re-formulation of classical mechanics that was invented in 1833 by Irish mathematician William Rowan Hamilton. It arose from Lagrangian mechanics, another re-formulation of classical mechanics, introduced by Joseph Louis Lagrange in 1788.
..... Click the link for more information.

..... Click the link for more information.
Bayesian inference is statistical inference in which evidence or observations are used to update or to newly infer the probability that a hypothesis may be true. The name "Bayesian" comes from the frequent use of Bayes' theorem in the inference process.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter