We need activation functions for neural networks and more complicated machine learning models in order to introduce non-linearity. If you don’t have an activation function, or you have a linear activation function, you end up just doing affine transformations. This means there’s no point in having more complicated layers, and you might as well just have one layer with different weights and biases.
To calculate the number of parameters in a CNN, use the formula:
For example, consider the output size of a CNN with an input size of 32x32x32 when 8 filters of size 6x6 are applied to it with a padding of 1 and stride size of 2.
Output_Shape = (32-6+2(1))/2+1 = 15
Output_Shape = 15 x 15 x 40
Here’s an example image (it’s AlexNET) to see some concrete examples of this calculation: