RMSprop is similar to gradient descent with momentum, but takes into consideration the sign of the gradient like this:

For the kth epoch, RMS Prop is updates as follows:

s[k] :=  β S[k-1] + (1 - β)(  L   L )
θ[k] := θ[k] - α ( L)/( \Sqrt(S[k]) + Ɛ )

We can rearrange the last term, where Ɛ«1, and divide by \(\sqrt{S}\).

\[\begin{aligned} \frac{1}{\sqrt{S} + \epsilon } & = \frac{1}{\sqrt{S}} * \frac{1}{1 + \epsilon/ \sqrt{S} } \\ & = \frac{1}{\sqrt{S}} * (1 + \epsilon/ \sqrt{S} )^{-1} \end{aligned}\]

Now, taylor expand to first order around ( (1+x)^n = 1+nx )

\[\begin{aligned} & = \frac{1}{\sqrt{S}} * (1 - \frac{\epsilon}{\sqrt{S}}) \\ & = \frac{1}{\sqrt{S}} - \frac{\epsilon}{\vert S \vert } \end{aligned}\]

This then makes the update algorithm:

s[k] :=  β S[k-1] + (1 - β)(  L   L )
θ[k] := θ[k] - α ( L)(1/Sqrt{s} - Ɛ/|s| )

Is this anymore efficient? I doubt it, but it was worth doing the math and seeing the expanded formula to think about it. The problem is that what may be simpler to solve on paper isn’t necessarily an easier problem for a computer.