Week of May 10th

What I did

Installed Ubuntu on a VM in Windows and set up a virtual environment
Completed TUM Course Exercise 03 – 3 Jupyter notebooks to build data loaders
Continued reading stats textbook.

What I learned

I learned a lot about data-loading and managing large quantities of data. For example:
- If you need to load more data than you computer has memory to hold, you have to do it in increments. For this, we can overload the dunder method __iter__ with a generator function. For example, this function chunks the dataset into lists of 5.

def __iter__(self):
list = []
for x in self.dataset:
  if len(list) == 5:
    yield list
    list = []

or if you need to read a giant CSV, you can do this:

  def csv_reader(file_name):
    for row in open(file_name, "r"):
        yield row

or more simply: csv_gen = (row for row in open(file_name)).

-I learned about confidence intervals, and refreshed myself on the 68.26-95.44-99.74 rule. These are numbers that I don’t ever want to forget. Here’s an example of finding a confidence interval:

You have a sample size of 36 people and know the population std is 4. Thus the std of the sample means will be:

\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{4}{\sqrt{36}}\]

You can then calculate \(< x >\) (the sample mean) and say that \(\mu\) – the population mean – lies within \(\bar{x} \pm 2\sigma_{\bar{x}}\) with 95.44% confidence. You can also change your confidence interval by looking up the value of \(z_{\alpha/2}\). For example, for a 95% confidence interval, \(\alpha = 1 - .95\). So you can look it up in a table and find that \(z_{.5/2}\) = 1.96. So the confidence interval for the population mean is:

\[\mu = \bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\]

If you don’t know the population mean, you can do the same calculation but with the studentized normal score and the sample mean. That is to say:

\[\mu = \bar{x} \pm t_{\alpha/2} * \frac{s}{\sqrt{n}}\]

Recall the the t-score is the same idea as the z-score, except with sample mean and sample standard deviation (or \(s=\frac{\sigma}{\sqrt{n}}\) if s is unknown).

I also learned how to insert latex equations into github markdown.

What I will do next

I have a good amount of schoolwork to do, so I will likely focus on my CS project and do the homework for my functional materials class.
The forecast for tomorrow looks pretty nice, so I might read more of my statistics textbook outside.