The Steepest Descent

126views

written 5.1 years ago by

The Steepest Descent method, also called the Cauchy Method ,was put forward by Cauchy in 1857. In this method, we choose an initial point $X_1$ and iteratively move along the steepest descent direction to find the optimum point. Minimization occurs in the negative direction of the gradient vector . The steps in this method are as follows:-

Step 1: Start with an arbitrary point $X_1$ . Consider the iteration number be represented by i. Then set $i=1.$

Step 2: Find the search direction. $S_i$ as given $S_i=-\nabla f_i=-\nabla f(X_i).$

Step 3: Find the optimal step length $\lambda$* in the direction $X_i$ as given in eq. $X_{i+1}= X_i + \lambda_i^* S_i = X_i - \lambda_i^* \nabla f_i.$

Step 4: Test the new point, $X_{i+1}$, for optimality condition whose derivative equals to 0 .In other words, check if the new point is lesser than the previous one. If it is lesser, then the direction of movement is ideal and $X_{i+1}$ becomes the optimum point. If $X_{i+1}$ is the optimum point, then stop the process , otherwise goto step 5.

Step 5: See the new iteration number $i =i+1 $ and goto step 2.

Q:- Minimize $f(x_1,x_2) = 4x_1-2x_2+2x_1^2+2x_1x_2+x_2^2$ starting from point $X_1 =\left\{ \begin{matrix} 0 \\ 0 \end{matrix} \right\} $

solution:- We have, $f(x_1,x_2) = 4x_1-2x_2+2x_1^2+2x_1x_2+x_2^2$

The gradient f is given by $\nabla f = \begin{cases} \frac { \partial f }{ \partial x_{ 1 } } \\ \frac { \partial f }{ \partial x_{ 2 } } \end{cases}= \begin{cases} 4+4x_{ 1 }+2x_{ 2 } \\ -2+2x_1+2x_2 \end{cases}$

$\nabla f_1=\nabla f(X_1)= \left\{ \begin{matrix} 4 \\ -2 \end{matrix} \right\}$

Step 2:- $S_i=-\nabla f_i= -\nabla f(X_i)$

Hence $S_1 =-\nabla f_1= \left\{ \begin{matrix} -4 \\ 2 \end{matrix} \right\}$

Step 3:- Determine the optimal step length $\lambda_i$* in the direction $S_i$.To find $X_2$ we need to find the optimal step length $\lambda_1$.

For this we minimize $f(X_1+\lambda_1S_1)$ with respect to $\lambda_1$. $f(X_1 +\lambda_1S_1) =f(-\lambda_1,\lambda_1) = \lambda_1^2 -2\lambda_1=X_1 1+\lambda_1S_1= \begin{bmatrix}0\\0\end{bmatrix}+\lambda_1 \begin{bmatrix}-4\\2\end{bmatrix}$

We have $f(x_1,x_2) = 4x_1-2x_2+2x_1^2+2x_1x_2+x_2^2$

Hence,

$f(\lambda_1,\lambda_2) = -4\lambda_1-2\lambda_1+2\lambda_1^2-2\lambda_1\lambda_1+\lambda_1^2 =\lambda_1^2-6\lambda_1.......................(1)$

Substituting $\dfrac{\partial f}{\partial \lambda_1}=0$ in equation(1) , we get $\dfrac{\partial( \lambda_1^2-6\lambda_1)}{\partial \lambda_1}=2\lambda_1-6=0$

$\lambda_1= 3$

$X_2 = X_1+\lambda_1^*S_1=\left\{ \begin{matrix} 0 \\ 0 \end{matrix} \right\}+3\left\{ \begin{matrix} -4 \\ 2 \end{matrix} \right\}=\left\{ \begin{matrix} -12 \\ 6 \end{matrix} \right\}$

As $\nabla f_2=\nabla f(X_2)\neq\left\{ \begin{matrix} 0 \\ 0 \end{matrix} \right\},X_2$ is not optimum

ADD COMMENT EDIT