Activation Functions Review

1 minute read

Sigmoid

Things to note:

\[y = \frac{1}{1+e^{-x}}\]

Max is 1, occurs around \(x=4\). Min is zero, occurs around \(x=-4\). Not a great range. When \(x=0\), \(y=0.5\). Sigmoid works like a probability since it returns a number between 0 and 1.

Tanh vs. sigmoid

tanh

Things to note

Max is 1 but min is -1. At \(z=0\), \(y=0\). Grows faster than the conventional sigmoid. Note, sigmoid is just a shape. Technically tanh is also a sigmoid function, but data scientists mean the first equation when they say sigmoid.

Linear activation

Not really an activation. It’s what it sounds like. \(y=w*x + b\). Might show up on the exam.

ReLU

\[Max(0,x)\]

Things to note:

No saturation problem like sigmoids, but it can have the issue of a dying-relu, meaning that when the learning rate is too high, or when a neuron learns a large negative bias term for its weights, the ReLu will be stuck yielding zero all the time since it only takes negative inputs. That neural becomes is thus useless.

Leaky ReLU

\[Max(.1x, x)\]

leaky

Things to Note:

There is also a version called parametric relu, which is \(max(\alpha x,x)\), where \(\alpha\) is a learnable parameter. I think this is the same \(\alpha\) as the slope in gradient descent, but I’m not positive.

Twitter Facebook LinkedIn

You May Also Enjoy

ETL With Apache Airflow

8 minute read

Outline of the Project

Color Theory: Triadic

less than 1 minute read

I’ve recently added somemissing .SCSS files tomy websites github repository, which will now let write my own theme. I’ve taken inspiration from flower arrang...

On the topic of your brain and how you ruined it

5 minute read

Since graduating high school six years ago, I’ve always felt like my long-term memory and ability to meaningfully learn new concepts has been declining. Anec...

Week of August 9th

less than 1 minute read

What I did: Monday: Kaggle work with giant spotify dataset Tuesday: Added use of the Spotify API, spotipy to the notebook. Weds - Sat: Added recommend...