Week of May 10th
What I did
- Installed Ubuntu on a VM in Windows and set up a virtual environment
- Completed TUM Course Exercise 03 – 3 Jupyter notebooks to build data loaders
- Continued reading stats textbook.
What I learned
- I learned a lot about data-loading and managing large quantities of data. For example:
- If you need to load more data than you computer has memory to hold, you have to do it in increments. For this, we can overload the dunder method
__iter__
with a generator function. For example, this function chunks the dataset into lists of 5.
- If you need to load more data than you computer has memory to hold, you have to do it in increments. For this, we can overload the dunder method
def __iter__(self):
list = []
for x in self.dataset:
if len(list) == 5:
yield list
list = []
or if you need to read a giant CSV, you can do this:
def csv_reader(file_name):
for row in open(file_name, "r"):
yield row
or more simply: csv_gen = (row for row in open(file_name))
.
-I learned about confidence intervals, and refreshed myself on the 68.26-95.44-99.74 rule. These are numbers that I don’t ever want to forget. Here’s an example of finding a confidence interval:
\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{4}{\sqrt{36}}\]You have a sample size of 36 people and know the population std is 4. Thus the std of the sample means will be:
\[\mu = \bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\]You can then calculate \(< x >\) (the sample mean) and say that \(\mu\) – the population mean – lies within \(\bar{x} \pm 2\sigma_{\bar{x}}\) with 95.44% confidence. You can also change your confidence interval by looking up the value of \(z_{\alpha/2}\). For example, for a 95% confidence interval, \(\alpha = 1 - .95\). So you can look it up in a table and find that \(z_{.5/2}\) = 1.96. So the confidence interval for the population mean is:
\[\mu = \bar{x} \pm t_{\alpha/2} * \frac{s}{\sqrt{n}}\]If you don’t know the population mean, you can do the same calculation but with the studentized normal score and the sample mean. That is to say:
Recall the the t-score is the same idea as the z-score, except with sample mean and sample standard deviation (or \(s=\frac{\sigma}{\sqrt{n}}\) if s is unknown).
- I also learned how to insert latex equations into github markdown.
What I will do next
- I have a good amount of schoolwork to do, so I will likely focus on my CS project and do the homework for my functional materials class.
- The forecast for tomorrow looks pretty nice, so I might read more of my statistics textbook outside.