In artificial neural network, an activation function is a function that maps information from neurons in one layer to information in the next layer.
Why do we need non-linear activation function?
- introduce non-linearity into the model, which allows neural networks to approximate complex, non-linear relationships within data. Without non-linear activation functions, neural networks would be limited to modeling only linear transformations, making them less effective at solving a wide range of real-world problems. See how linear activation function translates to simple regression.
- facilitate the backpropagation algorithm. Without non-linear activations, the entire network would collapse into a linear transformation, preventing meaningful learning.
- Some non-linear activation functions, like ReLU, can help create sparse representation
- introduce noise and constraints that prevent the network from overfitting
How to choose activation function?
Depending on the output layer
- binary classification ➡️ sigmoid function OR tanh function
- linear regression &
is both positive and negative ➡️ linear activation function - linear regression &
➡️ ReLU (popular choice) - probability ➡️ softmax