Information about Quasi Newton Methods

In optimization, quasi-Newton methods (also known as variable metric methods) are well-known algorithms for finding local maxima and minima of functions. Quasi-Newton methods are based on Newton's method to find the stationary point of a function, where the gradient is 0. Newton's method assumes that the function can be locally approximated as a quadratic in the region around the optimum, and use the first and second derivatives (gradient and Hessian) to find the stationary point.

In Quasi-Newton methods the Hessian matrix of second derivatives of the function to be minimized does not need to be computed. The Hessian is updated by analyzing successive gradient vectors instead. Quasi-Newton methods are a generalization of the secant method to find the root of the first derivative for multidimensional problems. In multi-dimensions the secant equation is under-determined, and quasi-Newton methods differ in how they constrain the solution, typically by adding a simple low-rank update to the current estimate of the Hessian.

The first quasi-Newton algorithm was proposed by W.C. Davidon, a physicist working at Argonne National Laboratory. Like the story of QuickSort, Davidon got the trouble of applying optimization algorithms on his research, thus he decided to improve such algorithm. Eventually he developed the first quasi-Newton algorithm in 1959: the DFP updating formula, which was later popularized by Fletcher and Powell in 1963, but is rarely used today. The most common quasi-Newton algorithms are currently the SR1 formula (for symmetric rank one) and the widespread BFGS method, that was suggested independently by Broyden, Fletcher, Goldfarb, and Shanno, in 1970. The Broyden's class is a linear combination of the DFP and BFGS methods.

The SR1 formula does not guarantee the update matrix to maintain positive-definiteness and can be used for indefinite problems. The Broyden's method does not require the update matrix to be symmetric and it is used to find the root of a general system of equations (rather than the gradient) by updating the Jacobian (rather than the Hessian).

Description of the method

As in the Newton's method, one uses the second order approximation to find the minimum of a function . The Taylor series of is:
:,
where () is the gradient and the Hessian matrix. The Taylor series of the gradient itself:
:,
is called secant equation. Solving for provides the Newton step:
:,
but is unknown. In one dimension, solving for and applying the Newton's step with the updated value is is equivalent to the secant method. In multidimensions is under determined. Various methods are used to find the solution to the secant equation that is symmetric () and closest to the current approximate value according to some metric . An approximate initial value of is often sufficient to achieve rapid convergence. The unknown is updated applying the Newton's step calculated using the current approximate Hessian matrix
  • , with chosen to satisfy the Wolfe conditions;
  • ;
  • The gradient computed at the new point , and
,
  • is used to update the Hessian , or directly its inverse . The most popular update formulas are:
Method
DFP
BFGS
Broyden
Broyden Family,
SR1

See also

References

  • Eventually W.C. Davidon's paper published. William C. Davidon, Variable Metric Method for Minimization, SIOPT Volume 1 Issue 1, Pages 1-17, 1991.
  • Nocedal, Jorge & Wright, Stephen J. (1999). Numerical Optimization. Springer-Verlag. ISBN 0-387-98793-2.
  • Edwin K.P.Chong and Stanislaw H.Zak, An Introduction to Optimization 2ed, John Wiley & Sons Pte. Ltd. August 2001.
In mathematics, the term optimization, or mathematical programming, refers to the study of problems in which one seeks to minimize or maximize a real function by systematically choosing the values of real or integer variables from within an allowed set.
..... Click the link for more information.
maxima and minima, known collectively as extrema, are the largest value (maximum) or smallest value (minimum), that a function takes in a point either within a given neighbourhood (local extremum) or on the function domain in its entirety (global
..... Click the link for more information.
function expresses dependence between two quantities, one of which is given (the independent variable, argument of the function, or its "input") and the other produced (the dependent variable, value of the function, or "output").
..... Click the link for more information.
In mathematics, Newton's method is a well-known algorithm for finding roots of equations in one or more dimensions. It can also be used to find local maxima and local minima of functions by noticing that if a real number is a stationary point of a function , then is a root of the
..... Click the link for more information.
stationary point is an input to a function where the derivative is zero (equivalently, the gradient is zero): where the function "stops" increasing or decreasing (hence the name).
..... Click the link for more information.
gradient of a scalar field is a vector field which points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change.
..... Click the link for more information.
In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function. Given the real-valued function



if all second partial derivatives of f exist, then the Hessian matrix of f is the matrix
..... Click the link for more information.
In English, derivative primarily refers to anything derived from a source - not primitive or original.

Derivative may also specifically refer to:
  • Derivative, in mathematics, the instantaneous rate of change of a function

..... Click the link for more information.
In optimization, quasi-Newton methods (also known as variable metric methods) are well-known algorithms for finding local maxima and minima of functions. Quasi-Newton methods are based on Newton's method to find the stationary point of a function, where the gradient is 0.
..... Click the link for more information.
In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines lines to better approximate a root of a function f.
..... Click the link for more information.
Argonne National Laboratory is one of the United States Department of Energy's oldest and largest science and engineering research national laboratories and is the largest in the Midwest, about twice as large as the nearby Fermilab.
..... Click the link for more information.
Quicksort is a well-known sorting algorithm developed by C. A. R. Hoare that, on average, makes (big O notation) comparisons to sort n items. However, in the worst case, it makes comparisons.
..... Click the link for more information.
The Davidon-Fletcher-Powell formula (DFP) finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition (see below).
..... Click the link for more information.
The Symmetric Rank 1 method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem.
..... Click the link for more information.
In mathematics, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is a method to solve an unconstrained nonlinear optimization problem.

The BFGS method is derived from the Newton's method in optimization, a class of hill-climbing optimization techniques that
..... Click the link for more information.
In linear algebra, a positive-definite matrix is a Hermitian matrix which in many ways is analogous to a positive real number. The notion is closely related to a positive-definite symmetric bilinear form (or a sesquilinear form in the complex case).
..... Click the link for more information.
In mathematics, Broyden's method is a quasi-Newton method for the numerical solution of nonlinear equations in more than one variable. It was originally described by C. G. Broyden in 1965.
..... Click the link for more information.
Jacobian is shorthand for either the Jacobian matrix or its determinant, the Jacobian determinant.

In algebraic geometry the Jacobian of a curve means the Jacobian variety: a group variety associated to the curve, in which the curve can be embedded.
..... Click the link for more information.
In mathematics, Newton's method is a well-known algorithm for finding roots of equations in one or more dimensions. It can also be used to find local maxima and local minima of functions by noticing that if a real number is a stationary point of a function , then is a root of the
..... Click the link for more information.
prevew not available
..... Click the link for more information.
gradient of a scalar field is a vector field which points in the direction of the greatest rate of increase of the scalar field, and whose magnitude is the greatest rate of change.
..... Click the link for more information.
In mathematics, the Hessian matrix is the square matrix of second-order partial derivatives of a function. Given the real-valued function



if all second partial derivatives of f exist, then the Hessian matrix of f is the matrix
..... Click the link for more information.
prevew not available
..... Click the link for more information.
In numerical analysis, the secant method is a root-finding algorithm that uses a succession of roots of secant lines lines to better approximate a root of a function f.
..... Click the link for more information.
system of linear equations (or linear system) is a collection of linear equations involving the same set of variables. For example,
is a system of three equations in the three variables .
..... Click the link for more information.
In (unconstrained) optimization, the Wolfe conditions are a set of inequalities for performing inexact linesearches; that is, for efficiently selecting a step length in the linesearch algorithm.

Let be a smooth objective function, and let be a given search direction.
..... Click the link for more information.
The Davidon-Fletcher-Powell formula (DFP) finds the solution to the secant equation that is closest to the current estimate and satisfies the curvature condition (see below).
..... Click the link for more information.
In mathematics, the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method is a method to solve an unconstrained nonlinear optimization problem.

The BFGS method is derived from the Newton's method in optimization, a class of hill-climbing optimization techniques that
..... Click the link for more information.
In mathematics, Broyden's method is a quasi-Newton method for the numerical solution of nonlinear equations in more than one variable. It was originally described by C. G. Broyden in 1965.
..... Click the link for more information.
The Symmetric Rank 1 method is a quasi-Newton method to update the second derivative (Hessian) based on the derivatives (gradients) calculated at two points. It is a generalization to the secant method for a multidimensional problem.
..... Click the link for more information.


This article is copied from an article on Wikipedia.org - the free encyclopedia created and edited by online user community. The text was not checked or edited by anyone on our staff. Although the vast majority of the wikipedia encyclopedia articles provide accurate and timely information please do not assume the accuracy of any particular article. This article is distributed under the terms of GNU Free Documentation License.
Herod_Archelaus


page counter