Information about Ziggurat Algorithm

The ziggurat algorithm is an algorithm to generate random numbers from a non-uniform distribution. It belongs to the class of rejection sampling algorithms and can be used for choosing values from a monotone decreasing probability distribution. It can also be applied to symmetric unimodal distributions such as the normal distribution by choosing a point from one half and then randomly choosing a half. It was developed by George Marsaglia and Wai Wan Tang.

Like most such algorithms, it assumes an underlying source of uniformly-distributed random numbers, typically a pseudo-random number generator. In the common case (97.5% of the time for normal and exponential distributions with typical table sizes), the algorithm consists of generating a random floating-point value, a random table index, performing one table lookup, one multiply and one comparison. This is considerably faster than the two more commonly used methods to generate normally distributed random numbers, the Marsaglia polar method or the Box-Muller transform, which require at least a logarithm and a square root.

On the other hand, the ziggurat algorithm is more complex to implement and requires precomputed tables, so it is best used when large numbers of random values are required.

The ziggurat algorithm is so named because it covers the probability distribution with rectangular segments stacked in decreasing size order, resembling a ziggurat.

Theory of operation

The ziggurat algorithm is a rejection sampling algorithm; it randomly generates a point in a distribution slightly larger than the desired distribution, then tests whether the generated point is inside the desired distribution. If not, it tries again. Given a random point underneath a probability distribution curve, its x coordinate is a random number with the desired distribution.

The distribution the ziggurat algorithm chooses from is made up of n equal-area regions; n−1 rectangles that cover the bulk of the desired distribution, on top of a non-rectangular base that includes the tail of the distribution.

Given a monotone decreasing probability distribution function f(x), defined for all x≥0, the base of the ziggurat is defined as all points inside the distribution and below y0 = f(x0). This consists of a rectangular region from (0, 0) to (x0, y0), and the (typically infinite) tail of the distribution, where x>x0.

This layer has area A. On top of this, add a rectangular layer of width x0 and height A/x0, so it also has area A. The top of this layer intersects the distribution function at a point (x1, y1), where y1 = f(x1) = y0 + A/x0. This layer includes every point in the distibution function between y0 and y1, but (unlike the base layer) also includes points such as (x0, y1) which are not in the desired distribution.

Further layers are then stacked on top. To use a precomputed table of size n (n=256 is typical), one chooses x0 such that xn−1=0, meaning that the top box, box n−1, reaches the distribution's peak at (0, f(0)) exactly.

Layer i extends vertically from yi−1 to yi, and can be divided into two regions horizontally: the (generally larger) portion from 0 to xi which is entirely contained within the desired distribution, and the (small) portion from xi to xi−1, which is only partially contained.

Ignoring for a moment the problem of layer 0, and given uniform random variables U0 and U1 ∈ [0,1), the ziggurat algorithm can be described as:
  1. Choose a random layer 0 ≤ i < n.
  2. Let x = U0xi−1.
  3. If x < xi, return x.
  4. Let y = yi−1 + U1(yiyi−1).
  5. Compute f(x). If y < f(x), return x.
  6. Otherwise, choose new random numbers and go back to step 1.


Step 1 amounts to choosing a low-resolution y coordinate. Step 3 tests if the x coordinate is clearly within the desired distribution function without knowing more about the y coordinate. If it is not, step 4 chooses a high-resolution y coordinate and step 5 does the rejection test.

With closely spaced layers, the algorithm terminates at step 3 a very large fraction of the time. Note that for the top layer n−1, however, this test always fails, because xn−1 = 0.

Layer 0 can also be divided into a central region and an edge, but the edge is an infinite tail. To use the same algorithm to check if the point is in the central region, generate a fictitious x−1 = A/y0. This will generate points with x < x0 with the correct frequency, and in the rare case that layer 0 is selected and xx0, use a special fallback algorithm to select a point at random from the tail. Because the fallback algorithm is used less than one time in a thousand, speed is not essential.

Thus, the full ziggurat algorithm for one-sided distributions is:
  1. Choose a random layer 0 ≤ i < n.
  2. Let x = U0xi−1
  3. If x < xi, return x.
  4. If i=0, generate a point from the tail using the fallback algorithm.
  5. Let y = yi−1 + U1(yiyi−1).
  6. Compute f(x). If y < f(x), return x.
  7. Otherwise, choose new random numbers and go back to step 1.


For a two-sided distribution, of course, the result must be negated 50% of the time. This can often be done conveniently by choosing U0 ∈ (−1,1) and, in step 3, testing if |x| < xi.

Fallback algorithms for the tail

Because the ziggurat algorithm only generates most outputs very rapidly, and requires a fallback algorithm for outliers, it is always more complex than a more direct implementation. The fallback algorithm, of course, depends on the distribution.

For an exponential distribution, the tail looks just like the body of the distribution. One way is to fall back to the most elementary algorithm E=−ln(U1) and let x=x0−ln(U1). Another is to call the ziggurat algorithm recursively.

For a normal distribution, Marsaglia suggests a compact algorithm:
  1. Let x = −ln(U1)/x0
  2. Let y = −ln(U2)
  3. If 2y > x², return x+x0
  4. Otherwise, go back to step 1.


Since x0 ≈ 3.5 for typical table sizes, the test in step 3 is almost always successful. Note also that −ln(U) is just a simple way to generate an exponentially distributed random number. If you have a ziggurat exponential distribution generator available, you can use it instead.

Optimizations

The algorithm can be performed efficiently with precomputed tables of xi and yi = f(xi), but there are some modifications to make it even faster:
  • Nothing in the ziggurat algorithm depends on the probability distribution function being normalized (integral under the curve equal to 1); removing normalizing constants can speed up the computation of f(x).
  • Most uniform random number generators are based on integer random number generators which return an integer in the range [0, 232−1]. A table of 2−32xi lets you use such numbers directly for U0.
  • When computing two-sided distributions using a two-sided U0 as described earlier, the random integer can be interpreted as a signed number in the range [−231, 231−1] and a scale factor of 2−31 can be used.
  • Rather than comparing the result of the multiplication in step 3, it is possible to precompute xi/xi−1 and compare U0 with that directly. If U0 is an integer random number generator, these limits may be premultiplied by 232 (or 231, as appropriate) so an integer comparison can be used.
  • With the above two changes, the table of unmodified xi values is no longer needed and may be deleted.
  • When generating IEEE 754 single-precision floating point values, which only have a 24-bit mantissa (including the implicit leading 1), the least-significant bits of a 32-bit integer random number are not used. These bits may be used to select the layer number. (See the references below for a detailed discussion of this.)
  • The first three steps may be put into an inline function, which can call an out-of-line implementation of the less frequently needed steps.

Generating the tables

It is possible to store the entire table precomputed, or just include the values n, y0, A, and an implementation of f −1(x) in the source code, and compute the remaining values when initializing the random number generator.

As previously described, you can find xi = f −1(yi) and yi+1 = yiA/xi. Repeat n−1 times for the layers of the ziggurat. At the end, you should have yn−1=f(0). There will, of course, be some round-off error, but it is a useful sanity test to see that it is acceptably small.

When actually filling in the table values, just assume that xn−1=0 and yn−1=f(0), and accept the slight difference in layer n−1's area as rounding error.

Finding x0 and A

Given an initial (guess at) x0, you need a way to compute the area t of the tail for which x>x0. For the exponential distribution, this is just −log(x0), while for the normal distribution, assuming you are using the unnormalized f(x) = ex²/2, this is √(π/2)erfc(x/√2). For more awkward distributions, numerical integration may be required.

With this in hand, from x0, you can find y0 = f(x0), the area t in the tail, and the area of the base layer A = x0y0 + t.

The compute the series yi and xi as above. If yi > f(0) for any i < n, then the initial estimate x0 was too low, leading to too large an area A. If yn−1 < f(0), then the initial estimate x0 was too high.

Given this, use a root-finding algorithm (such as the bisection method) to find the value x0 which produces yn−1 as close to f(0) as possible. Alternatively, look for the value which makes the area of the topmost layer, xn−2(f(0)−yn−2), as close to the desired value A as possible. This saves one evaluation of f −1(x) and is actually the condition of greatest interest.

See also

References

In mathematics, the uniform distributions are simple probability distributions. There are two types:
  • The discrete uniform distribution
  • The continuous uniform distribution

..... Click the link for more information.
In mathematics, rejection sampling is a technique used to generate observations from a distribution. It is also commonly called the acceptance-rejection method or "accept-reject algorithm".
..... Click the link for more information.
monotonic function (or monotone function) is a function which preserves the given order. This concept first arose in calculus, and was later generalized to the more abstract setting of order theory.
..... Click the link for more information.
probability distribution that assigns a probability to every subset (more precisely every measurable subset) of its state space in such a way that the probability axioms are satisfied.
..... Click the link for more information.
In mathematics, a symmetric function of multiple variables is one that is invariant under permutation of its variables; that is, the value of the function does not depend on the order of the n-tuple of arguments.
..... Click the link for more information.
unimodal if for some value m (the mode), it is monotonically increasing for xm and monotonically decreasing for xm.
..... Click the link for more information.
normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. Each member of the family may be defined by two parameters, location and scale: the mean ("average",
..... Click the link for more information.
George Marsaglia is a mathematician and computer scientist. He is perhaps best known for establishing the lattice structure[1] of congruential random number generators in the paper "Random numbers fall mainly in the planes", often called The Marsaglia effect,
..... Click the link for more information.
A pseudorandom number generator (PRNG) is an algorithm to generate a sequence of numbers that approximate the properties of random numbers. The sequence is not truly random in that it is completely determined by a relatively small set of initial values, called the PRNG's
..... Click the link for more information.
exponential distributions are a class of continuous probability distribution. They are often used to model the time between independent events that happen at a constant average rate.
..... Click the link for more information.
Marsaglia polar method is a method for generating a pair of independent standard normal random variables by choosing random points (xy) in the square −1 < x < 1, −1 < y < 1 until
..... Click the link for more information.
Box-Müller transform (by George Edward Pelham Box and Mervin Edgar Müller 1958)[1] is a method of generating pairs of independent standard normally distributed (zero expectation, unit variance) random numbers, given a source of uniformly distributed random numbers.
..... Click the link for more information.

Description

Ziggurats date from the 6th century BC. The top of the ziggurat was flat, unlike many pyramids. The step pyramid style began near the end of the Early Dynastic Period.
..... Click the link for more information.
Recursion, in mathematics and computer science, is a method of defining functions in which the function being defined is applied within its own definition. The term is also used more generally to describe a process of repeating objects in a self-similar way.
..... Click the link for more information.
The concept of a normalizing constant arises in probability theory and a variety of other areas of mathematics.

Definition and examples

In probability theory, a normalizing constant
..... Click the link for more information.
The IEEE Standard for Binary Floating-Point Arithmetic (IEEE 754) is the most widely-used standard for floating-point computation, and is followed by many CPU and FPU implementations.
..... Click the link for more information.
In computer science, an inline function is a programming language construct used to suggest to a compiler that a particular function be subjected to in-line expansion; that is, it suggests that the compiler insert the complete body of the function in every context where that
..... Click the link for more information.
A round-off error, also called rounding error, is the difference between the calculated approximation of a number and its exact mathematical value. Numerical analysis specifically tries to estimate this error when using approximation equations and/or algorithms, especially
..... Click the link for more information.
A sanity test or sanity check is a basic test to quickly evaluate the validity of a claim or calculation. In mathematics, for example, when multiplying by three or nine, verifying that the sum of the digits of the result is a multiple of 3 or 9 respectively is a sanity test.
..... Click the link for more information.
error function (also called the Gauss error function) is a non-elementary function which occurs in probability, statistics and partial differential equations. It is defined as:

Properties

The error function is odd:


..... Click the link for more information.
numerical integration constitutes a broad family of algorithms for calculating the numerical value of a definite integral, and by extension, the term is also sometimes used to describe the numerical solution of differential equations.
..... Click the link for more information.
A root-finding algorithm is a numerical method, or algorithm, for finding a value x such that f(x) = 0, for a given function f. Such an x is called a root of the function f.
..... Click the link for more information.
bisection method is a root-finding algorithm which works by repeatedly dividing an interval in half and then selecting the subinterval in which the root exists.

Suppose we want to solve the equation f(x) = 0.
..... Click the link for more information.
Box-Müller transform (by George Edward Pelham Box and Mervin Edgar Müller 1958)[1] is a method of generating pairs of independent standard normally distributed (zero expectation, unit variance) random numbers, given a source of uniformly distributed random numbers.
..... Click the link for more information.
A random number generator (often abbreviated as RNG) is a computational or physical device designed to generate a sequence of numbers or symbols that lack any pattern, i.e. appear random.
..... Click the link for more information.
George Marsaglia is a mathematician and computer scientist. He is perhaps best known for establishing the lattice structure[1] of congruential random number generators in the paper "Random numbers fall mainly in the planes", often called The Marsaglia effect,
..... Click the link for more information.
MATLAB is a numerical computing environment and programming language. Created by The MathWorks, MATLAB allows easy matrix manipulation, plotting of functions and data, implementation of algorithms, creation of user interfaces, and interfacing with programs in other languages.
..... Click the link for more information.
normal distribution, also called the Gaussian distribution, is an important family of continuous probability distributions, applicable in many fields. Each member of the family may be defined by two parameters, location and scale: the mean ("average",
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter