Week of December 13th
GRU’s, strings, and batch normalization. I didn’t post anything last week because I was busy with grad school work and traveling – I flew back home to New York from Poland. I’ll also be busy next week with christmas time and transitioning from android to iPhone.
What I did
- Chatbot tensorflow edition. I went through some RNN and transformer recap
- I’ve been working my way through the original seq2seq and transformer papers
What I Learned
str()
vsrepr()
-
They basically do the same thing, though
repr()
gives the actual string representation, which is more useful for debugging, whereasstr()
returns a cleaner form, more suitable for a non-debugging print statement. Also,repr()
gives more sig figs when typecasting into a string. -
Recall that vanilla RNN suffer from the vanishing gradient problem during backpropagation. This causes RNN’s to forget what it learned previously (memory too short). GRU and LSTM’s use gates to save a hidden state to increase this short term memory problem.
- A GRU (gated recurrent unit) is similar to LSTM, but is typically worse than LSTM. When GRU works just as well as LSTM, it is less computationally expensive.
- Hidden state
- Reset gate (decides how much of previous state to take into account. Pass the current input and the hidden state into a tanh. If $R_t = 0$, you only take into account $x_t$ and bias. If $R_t = 1$, take into account previous state).
- Update gate (Take into account past hidden state and current state to give output (if not reset). $Z_t = 1$, means don’t update. $Z_t = 0$ means update entirely).
- In my lectures, I learned that batch normalization was effective because it minimized internal covariant shift. Recently, however, I’ve read that its effectiveness may be due to a smoothing of the objective function. There are some other thoughts about why it works, but I’m more interested in knowing what works, when it works, and developing an intuition about it – rather than debating subtleties that are irrespective of its implementation.
What I will do next
- Chatbot. Then, deploy, deploy, deploy.