Activation Functions in detail

707views

written 5.2 years ago by

To make the work more efficient and to obtain an exact output, some factors or activations may be given. This activation helps in achieving the exact output.

The activation function is applied over the net input to calculate the output of an ANN.

The information processing of a processing element can be viewed as consisting of two major parts: input and output. And the integration function (f) is associated with the input of a processing element. This function serves to combine activation, information or evidence from an external source or other processing elements into a net input to the processing element. The nonlinear activation function is used to ensure that a neuron's response is bounded that is, the actual response of the neuron is conditioned or dampened as a result of large or small activating stimuli and is thus controllable.

Certain non-linear functions are used to achieve the advantage of a multilayer network from a single layer network. When a signal is fed through a multilayer network with linear activation function, the output obtained remains the same as that could be obtained using a single layer network. Due to this reason, non-linear functions are widely used in multilayer networks compared to the linear functions.

There are several activation functions:

1. Identity function:- It is a linear function and can be defined as,

$f(x)=x\forall x$

The output here remains the same as the input. The input layer uses the Identity activation function.

2. Binary step function:- This function can be defined as,

$f(x)=\begin{cases} 1,\quad if\quad x\ge \theta \\ 0,\quad if\quad x\lt \theta \end{cases}$

where $\theta$ represents the threshold value. This function is most widely used in single layer nets to convert the net input to an output that is binary (1 or 0).

3. Bipolar step function:- This function can be defined as,

$f(x)=\begin{cases} 1,\quad if\quad x\ge \theta \\ -1,\quad if\quad x\lt\theta \end{cases}$

where $\theta$ represents the threshold value. This function is also used in a single layer net to convert the net input to an output that is bipolar (+1 or -1).

4. Sigmoidal functions:- The function the sigmoid functions are widely used in back propagation nets because of the relationship between the value of the functions at a point and the value of the derivative at that point which reduces the computational burden during training.

Sigmoidal functions are of two types: -

(a) Binary sigmoid function

(b) Bipolar sigmoid function

(a) Binary Sigmoid Function:- Also called as the logistics sigmoid function or unipolar sigmoid function. It can be defined as,

$y=f(y_{in})$

$f(x)=\frac 1{1+e^{-\lambda x}}=\frac 1{1+e^{-y_{in}}}$

where $\lambda$ is the steepness parameter.

The derivative of this function is,

$f'(x)=\lambda f(x)[1-f(x)]$

Here, the range of the sigmoid function is from 0 to 1.

(b) Bipolar sigmoid function:- This function can be defined as,

$y=f(y_{in})$

$f(x)=\dfrac2{1+e^{-\lambda x}}-1=\dfrac{1-e^{-\lambda x}}{1+e^{-\lambda x}}$

$f(y_{in})=\dfrac {1-e^{-y_{in}}}{1+e^{-y_{in}}}$

where $\lambda$ is equal to steepness parameter

The bipolar sigmoid function is closely related to, hyperbolic tangent function, which is written as,

$h(x)=\dfrac {e^x-e^{-x}}{e^x+e^{-x}}=\dfrac {1-e^{-2x}}{1+e^{-2x}}$

The derivative of this function is,

$h'(x)=[1+h(x)][1-h(x)]$

If the network uses the binary data, then it is better to convert it to bipolar form and use the bipolar sigmoidal activation function or hyperbolic tangent function.

(5) Ramp Function:- The ramp function is defined as,

$f(x)=\begin{cases} 1,\quad if\quad x\ge 1 \\ x,\quad if\quad 0\le x\le 1 \\ 0,\quad if\quad x\lt0 \end{cases}$

ADD COMMENT EDIT