Week of May 30th
What I did
- Worked on the backprop post. I wrote the backprop section.
- Major updates to resume. Formed a new branch for my 1 page resume, specifically tailored to data science jobs.
- Read another chapter of the stats textbook
- I’ve spent a lot of time this week reflecting on my privilege and how different my experiences growing up have been. Black lives matter. While, this blog is only meant to serve as a form of note taking for what I am learning in data science, I mention this introspection here because it may seem like I didn’t do much learning this week. In reality, I have done a lot of learning and growing. Just as a person instead of as a data scientist.
What I learned
- With regards to the backprop post, I feel much more confident in how backprop works. It’s not that I didn’t understand it before, but more that I have a better understanding now. It’s the difference of doing a physics problem and explaining a physics problem. These posts are my Oxbridge-style tutorials.
- T-tests, a test statistic to compare two population means. Use a pooled t-test when the standard deviations of the samples are very similar. \(S_1\) and \(S_2\) are the STD of the respective samples. Only use pooled samples when the standard deviations are the same (or very close), the samples are independent (obviously).
This means that the confidence interval of (1-\(\alpha\))% is:
\[(\overline{x}_1 - \overline{x}_2) \pm t_{\alpha/2} \sqrt{\sigma_1^2/n_1 + \sigma_2^2/n_2}\]for a two tailed hypothesis test, where t is calculated from the degrees of freedom I mentioned above.
-
If each population is normal (or sample size is large enough, i.e. the central limit theorem), then the sampling distribution of \(\bar{x_i}\) is also normally distributed with mean \(\mu_i\), Standard Error \(\frac{\sigma_i}{\sqrt{n_i}}\) and Estimated Standard Error of \(\frac{s_i}{\sqrt{n_i}}\). i is for sample 1 or 2.
-
Pandas Series vs Numpy Array Review
pd.Series([1,2,3])
is a 1D, indexed array.
ds = pd.Series([2,4,6]) print(ds) > 0 2 > 1 4 > 2 6 print(ds.values) # Returns a numpy array > array([2,4,6]) print(ds[2]) > 6
What I will do next
- I really need to start another Kaggle project.
- The next i2dl project is on PyTorch. Time to master pytorch
- It would be a good idea to read through my previous posts to help consolidate what I’ve covered into more permanent.