Halley's method

A method of numerically finding roots of a function

In numerical analysis, Halley's method is a root-finding algorithm used for functions of one real variable with a continuous second derivative. Edmond Halley was an English mathematician and astronomer who introduced the method now called by his name.

The algorithm is second in the class of Householder's methods, after Newton's method. Like the latter, it iteratively produces a sequence of approximations to the root; their rate of convergence to the root is cubic. Multidimensional versions of this method exist.[citation needed]

Halley's method exactly finds the roots of a linear-over-linear Padé approximation to the function, in contrast to Newton's method or the Secant method which approximate the function linearly, or Muller's method which approximates the function quadratically.[1]

Method

Halley's method is a numerical algorithm for solving the nonlinear equation f(x) = 0. In this case, the function f has to be a function of one real variable. The method consists of a sequence of iterations:

x n + 1 = x n 2 f ( x n ) f ( x n ) 2 [ f ( x n ) ] 2 f ( x n ) f ( x n ) {\displaystyle x_{n+1}=x_{n}-{\frac {2f(x_{n})f'(x_{n})}{2{[f'(x_{n})]}^{2}-f(x_{n})f''(x_{n})}}}

beginning with an initial guess x0.[2]

If f is a three times continuously differentiable function and a is a zero of f but not of its derivative, then, in a neighborhood of a, the iterates xn satisfy:

| x n + 1 a | K | x n a | 3 ,  for some  K > 0. {\displaystyle |x_{n+1}-a|\leq K\cdot {|x_{n}-a|}^{3},{\text{ for some }}K>0.}

This means that the iterates converge to the zero if the initial guess is sufficiently close, and that the convergence is cubic.[3]

The following alternative formulation shows the similarity between Halley's method and Newton's method. The expression f ( x n ) / f ( x n ) {\displaystyle f(x_{n})/f'(x_{n})} is computed only once, and it is particularly useful when f ( x n ) / f ( x n ) {\displaystyle f''(x_{n})/f'(x_{n})} can be simplified:

x n + 1 = x n f ( x n ) f ( x n ) f ( x n ) f ( x n ) f ( x n ) 2 = x n f ( x n ) f ( x n ) [ 1 f ( x n ) f ( x n ) f ( x n ) 2 f ( x n ) ] 1 . {\displaystyle x_{n+1}=x_{n}-{\frac {f(x_{n})}{f'(x_{n})-{\frac {f(x_{n})}{f'(x_{n})}}{\frac {f''(x_{n})}{2}}}}=x_{n}-{\frac {f(x_{n})}{f'(x_{n})}}\left[1-{\frac {f(x_{n})}{f'(x_{n})}}\cdot {\frac {f''(x_{n})}{2f'(x_{n})}}\right]^{-1}.}

When the second derivative is very close to zero, the Halley's method iteration is almost the same as the Newton's method iteration.

Derivation

Consider the function

g ( x ) = f ( x ) | f ( x ) | . {\displaystyle g(x)={\frac {f(x)}{\sqrt {|f'(x)|}}}.}

Any root r of f that is not a root of its derivative is a root of g (i.e., g ( r ) = 0 {\displaystyle g(r)=0} when f ( r ) = 0 | f ( r ) | {\displaystyle f(r)=0\neq {\sqrt {|f'(r)|}}} ), and any root r of g must be a root of f provided the derivative of f at r is not infinite. Applying Newton's method to g gives

x n + 1 = x n g ( x n ) g ( x n ) {\displaystyle x_{n+1}=x_{n}-{\frac {g(x_{n})}{g'(x_{n})}}}

with

g ( x ) = 2 [ f ( x ) ] 2 f ( x ) f ( x ) 2 f ( x ) | f ( x ) | , {\displaystyle g'(x)={\frac {2[f'(x)]^{2}-f(x)f''(x)}{2f'(x){\sqrt {|f'(x)|}}}},}

and the result follows. Notice that if f ′(c) = 0, then one cannot apply this at c because g(c) would be undefined. [further explanation needed]

Cubic convergence

Suppose a is a root of f but not of its derivative. And suppose that the third derivative of f exists and is continuous in a neighborhood of a and xn is in that neighborhood. Then Taylor's theorem implies:

0 = f ( a ) = f ( x n ) + f ( x n ) ( a x n ) + f ( x n ) 2 ( a x n ) 2 + f ( ξ ) 6 ( a x n ) 3 {\displaystyle 0=f(a)=f(x_{n})+f'(x_{n})(a-x_{n})+{\frac {f''(x_{n})}{2}}(a-x_{n})^{2}+{\frac {f'''(\xi )}{6}}(a-x_{n})^{3}}

and also

0 = f ( a ) = f ( x n ) + f ( x n ) ( a x n ) + f ( η ) 2 ( a x n ) 2 , {\displaystyle 0=f(a)=f(x_{n})+f'(x_{n})(a-x_{n})+{\frac {f''(\eta )}{2}}(a-x_{n})^{2},}

where ξ and η are numbers lying between a and xn. Multiply the first equation by 2 f ( x n ) {\displaystyle 2f'(x_{n})} and subtract from it the second equation times f ( x n ) ( a x n ) {\displaystyle f''(x_{n})(a-x_{n})} to give:

0 = 2 f ( x n ) f ( x n ) + 2 [ f ( x n ) ] 2 ( a x n ) + f ( x n ) f ( x n ) ( a x n ) 2 + f ( x n ) f ( ξ ) 3 ( a x n ) 3 f ( x n ) f ( x n ) ( a x n ) f ( x n ) f ( x n ) ( a x n ) 2 f ( x n ) f ( η ) 2 ( a x n ) 3 . {\displaystyle {\begin{aligned}0&=2f(x_{n})f'(x_{n})+2[f'(x_{n})]^{2}(a-x_{n})+f'(x_{n})f''(x_{n})(a-x_{n})^{2}+{\frac {f'(x_{n})f'''(\xi )}{3}}(a-x_{n})^{3}\\&\qquad -f(x_{n})f''(x_{n})(a-x_{n})-f'(x_{n})f''(x_{n})(a-x_{n})^{2}-{\frac {f''(x_{n})f''(\eta )}{2}}(a-x_{n})^{3}.\end{aligned}}}

Canceling f ( x n ) f ( x n ) ( a x n ) 2 {\displaystyle f'(x_{n})f''(x_{n})(a-x_{n})^{2}} and re-organizing terms yields:

0 = 2 f ( x n ) f ( x n ) + ( 2 [ f ( x n ) ] 2 f ( x n ) f ( x n ) ) ( a x n ) + ( f ( x n ) f ( ξ ) 3 f ( x n ) f ( η ) 2 ) ( a x n ) 3 . {\displaystyle 0=2f(x_{n})f'(x_{n})+\left(2[f'(x_{n})]^{2}-f(x_{n})f''(x_{n})\right)(a-x_{n})+\left({\frac {f'(x_{n})f'''(\xi )}{3}}-{\frac {f''(x_{n})f''(\eta )}{2}}\right)(a-x_{n})^{3}.}

Put the second term on the left side and divide through by

2 [ f ( x n ) ] 2 f ( x n ) f ( x n ) {\displaystyle 2[f'(x_{n})]^{2}-f(x_{n})f''(x_{n})}

to get:

a x n = 2 f ( x n ) f ( x n ) 2 [ f ( x n ) ] 2 f ( x n ) f ( x n ) 2 f ( x n ) f ( ξ ) 3 f ( x n ) f ( η ) 6 ( 2 [ f ( x n ) ] 2 f ( x n ) f ( x n ) ) ( a x n ) 3 . {\displaystyle a-x_{n}={\frac {-2f(x_{n})f'(x_{n})}{2[f'(x_{n})]^{2}-f(x_{n})f''(x_{n})}}-{\frac {2f'(x_{n})f'''(\xi )-3f''(x_{n})f''(\eta )}{6(2[f'(x_{n})]^{2}-f(x_{n})f''(x_{n}))}}(a-x_{n})^{3}.}

Thus:

a x n + 1 = 2 f ( x n ) f ( ξ ) 3 f ( x n ) f ( η ) 12 [ f ( x n ) ] 2 6 f ( x n ) f ( x n ) ( a x n ) 3 . {\displaystyle a-x_{n+1}=-{\frac {2f'(x_{n})f'''(\xi )-3f''(x_{n})f''(\eta )}{12[f'(x_{n})]^{2}-6f(x_{n})f''(x_{n})}}(a-x_{n})^{3}.}

The limit of the coefficient on the right side as xna is:

2 f ( a ) f ( a ) 3 f ( a ) f ( a ) 12 [ f ( a ) ] 2 6 f ( a ) f ( a ) . {\displaystyle -{\frac {2f'(a)f'''(a)-3f''(a)f''(a)}{12[f'(a)]^{2}-6f(a)f''(a)}}.}

If we take K to be a little larger than the absolute value of this, we can take absolute values of both sides of the formula and replace the absolute value of coefficient by its upper bound near a to get:

| a x n + 1 | K | a x n | 3 {\displaystyle |a-x_{n+1}|\leq K|a-x_{n}|^{3}}

which is what was to be proved.

To summarize,

Δ x i + 1 = 3 ( f ) 2 2 f f 12 ( f ) 2 ( Δ x i ) 3 + O [ Δ x i ] 4 , Δ x i x i a . {\displaystyle \Delta x_{i+1}={\frac {3(f'')^{2}-2f'f'''}{12(f')^{2}}}(\Delta x_{i})^{3}+O[\Delta x_{i}]^{4},\qquad \Delta x_{i}\triangleq x_{i}-a.} [4]

Halley's irrational method

Halley actually developed two third-order root-finding methods. The above, using only a division, is referred to as Halley's rational method. A second, "irrational" method uses a square root as well:[5][6][7]

x n + 1 = x n f ( x n ) [ f ( x n ) ] 2 2 f ( x n ) f ( x n ) f ( x n ) {\displaystyle x_{n+1}=x_{n}-{\frac {f'(x_{n})-{\sqrt {[f'(x_{n})]^{2}-2f(x_{n})f''(x_{n})}}}{f''(x_{n})}}}

This iteration was "deservedly preferred" to the rational method by Halley[6] on the grounds that the denominator is smaller, making the division easier. A second advantage is that it tends to have about half of the error of the rational method, a benefit which multiplies as it is iterated. On a computer, it would appear to be slower as it has two slow operations (division and square root) instead of one, but on modern computers the reciprocal of the denominator can be computed at the same time as the square root via instruction pipelining, so the latency of each iteration differs very little.[7]: 24 

References

  1. ^ Boyd, John P. (2013). "Finding the Zeros of a Univariate Equation: Proxy Rootfinders, Chebyshev Interpolation, and the Companion Matrix". SIAM Review. 55 (2): 375–396. doi:10.1137/110838297.
  2. ^ Scavo, T. R.; Thoo, J. B. (1995). "On the geometry of Halley's method". American Mathematical Monthly. 102 (5): 417–426. doi:10.2307/2975033. JSTOR 2975033.
  3. ^ Alefeld, G. (1981). "On the convergence of Halley's method". American Mathematical Monthly. 88 (7): 530–536. doi:10.2307/2321760. JSTOR 2321760.
  4. ^ Proinov, Petko D.; Ivanov, Stoil I. (2015). "On the convergence of Halley's method for simultaneous computation of polynomial zeros". J. Numer. Math. 23 (4): 379–394. doi:10.1515/jnma-2015-0026. S2CID 10356202.
  5. ^ Bateman, Harry (January 1938). "Halley's methods for solving equations". The American Mathematical Monthly. 45 (1): 11–17. doi:10.2307/2303467.
  6. ^ a b Halley, Edmond (May 1694). "Methodus nova accurata & facilis inveniendi radices æqnationum quarumcumque generaliter, sine praviæ reductione". Philosophical Transactions of the Royal Society (in Latin). 18 (210): 136–148. doi:10.1098/rstl.1694.0029. An English translation was published as Halley, Edmond (1809) [May 1694]. "A new, exact, and easy Method of finding the Roots of any Equations generally, and that without any previous Reduction". In C. Hutton; G. Shaw; R. Pearson (eds.). The Philosophical Transactions of the Royal Society of London, from their commencement, in 1665, to the year 1800. Vol. III from 1683 to 1694. pp. 640–649.
  7. ^ a b Leroy, Robin (21 June 2021). A correctly rounded binary64 cube root (PDF) (Technical report).
  • Weisstein, Eric W. "Halley's method". MathWorld.
  • Newton's method and high order iterations, Pascal Sebah and Xavier Gourdon, 2001 (the site has a link to a Postscript version for better formula display)