Introduction to Optimization Techniques:

- Derivative based optimization- Steepest Descent, Newton method
- Derivative free optimization- Random Search, Down Hill Simplex

**Optimization**:

Optimization is at the heart of almost all machine learning techniques. Choosing the best element from some set of available alternatives. We are optimizing variables and basing our personal decisions all day long, most of the time without even recognizing the process consciously —

Scheduling the order at which you will answer the emails

Switching to a new route back home to minimize traffic woes

Trying to squeeze out the optimum gap between lunch and afternoon meeting to have a quiet walk around the office campus.

Each of these seemingly personal decisions can be modeled precisely using cold, hard mathematics to show that our brain is an amazing optimizer solving these computationally complex problems all day.

Advanced techniques of artificial intelligence or machine learning may be able to guide
businesses toward a better optimal solution at a faster clip, but they **must confront and solve the
same (or more complex) optimization problems as before.** A deluge of new data will aid this
process but the expectations will also grow as time goes by.

**Basic elements of optimization**

There are three basic elements of any optimization problem -

**Variables:**These are the free parameters which the algorithm can tune**Constraints:**These are the boundaries within which the parameters (or some combination thereof) must fall**Objective function:**This is the set of goal towards which the algorithm drives the solution. For machine learning, often this amount to minimizing some error measure or maximizing some utility function.

**Different techniques for exhibiting optimization are:**

i. Derivative based optimization- Steepest Descent

ii. Newton method

iii. Derivative free optimization- Random Search, Down Hill Simplex

**1 Derivative based optimization**

Derivative based optimization deals with gradient-based optimization techniques, capable of determining search directions according to an objective function’s derivative information.

**1.1 Steepest Descent**

In this method, the search starts from an initial trial point X1, and iteratively moves along the steepest descent directions until the optimum point is found. Although, the method is straightforward, it is not applicable to the problems having multiple local optima. In such cases the solution may get stuck at local optimum points.

**1.2 Newton method**

Newton’s method (sometimes called Newton-Raphson method) uses first and second derivatives and indeed performs better. Given a starting point, construct a quadratic approximation to the objective function that matches the first and second derivative values at that point. We then minimize the approximate (quadratic function) instead of the original objective function. The minimizer of the approximate function is used as the starting point in the next step and repeat the procedure iteratively.

Newton's method is a very popular method which is based on Taylor's Series expansion. The Taylor's Series expansion of a function

$f(X) \text{ at } X=X_i \text{ is given by:}$

$f(X)=f(X_i)+\nabla f_i^T(X-X_i)+\frac{1}{2}(X-X_i)^T[J_i](X-X_i)$------------**(1)**

where, $[J_i]=[J]|X_i$, is the Hessian matrix of f evaluated at point X;

Setting the partial derivatives of equation (1), to zero, the minimum value of f(X) can be obtained,

$\frac{\partial f(X)}{\partial X_j}=0, j=1,2,...,N$-------------**(2)**

From equation (1) and (2),

$\nabla f=\nabla f_i+[J_i](X-X_i)=0$--------------**(3)**

Equation (3) can be solved to obtain an improved solution $X_{i+1}.$

$X_{i+1}=X_i-[J_i]^{-1}\nabla f_i$---------------**(4)**

The procedure is repeated till convergence for finding out the optimal solution.

**2 Derivative free optimization**

Derivative free optimization algorithms are often used when it is difficult to find function derivatives, or if finding such derivatives are time consuming. Derivative-free optimization is a discipline in mathematical optimization that does not use derivative information in the classical sense to find optimal solutions: Sometimes information about the derivative of the objective function f is unavailable, unreliable or impractical to obtain. For example, f might be non- smooth, or time-consuming to evaluate, or in some way noisy, so that methods that rely on derivatives or approximate them via finite differences are of little use. The problem to find optimal points in such situations is referred to as derivative-free optimization.

**2.1 Random Search**

Random Search Method: This method generates trial solutions for the optimization model using random number generators for the decision variables. Random search method includes random jump method, random walk method and random walk method with direction exploitation. Random jump method generates huge number of data points for the decision variable assuming a uniform distribution for them and finds out the best solution by comparing the corresponding objective function values. Random walk method generates trial solution with sequential improvements which is governed by a scalar step length and a unit random vector. The random walk method with direct exploitation is an improved version of random walk method, in which, first the successful direction of generating trial solutions is found out and then maximum possible steps are taken along this successful direction.

**2.2 Down Hill Simplex**

Simplex Method: Simplex method is a conventional direct search algorithm where the best solution lies on the vertices of a geometric figure in N-dimensional space made of a set of N+1 points. The method compares the objective function values at the N+1 vertices and moves towards the optimum point iteratively. The movement of the simplex algorithm is achieved by reflection, contraction and expansion.