__iter__
with a generator function. For example, this function chunks the dataset into lists of 5.def __iter__(self):
list = []
for x in self.dataset:
if len(list) == 5:
yield list
list = []
or if you need to read a giant CSV, you can do this:
def csv_reader(file_name):
for row in open(file_name, "r"):
yield row
or more simply: csv_gen = (row for row in open(file_name))
.
-I learned about confidence intervals, and refreshed myself on the 68.26-95.44-99.74 rule. These are numbers that I don’t ever want to forget. Here’s an example of finding a confidence interval:
\[\sigma_{\bar{x}} = \frac{\sigma}{\sqrt{n}} = \frac{4}{\sqrt{36}}\]You have a sample size of 36 people and know the population std is 4. Thus the std of the sample means will be:
\[\mu = \bar{x} \pm z_{\alpha/2} \frac{\sigma}{\sqrt{n}}\]You can then calculate \(< x >\) (the sample mean) and say that \(\mu\) – the population mean – lies within \(\bar{x} \pm 2\sigma_{\bar{x}}\) with 95.44% confidence. You can also change your confidence interval by looking up the value of \(z_{\alpha/2}\). For example, for a 95% confidence interval, \(\alpha = 1 - .95\). So you can look it up in a table and find that \(z_{.5/2}\) = 1.96. So the confidence interval for the population mean is:
\[\mu = \bar{x} \pm t_{\alpha/2} * \frac{s}{\sqrt{n}}\]If you don’t know the population mean, you can do the same calculation but with the studentized normal score and the sample mean. That is to say:
Recall the the t-score is the same idea as the z-score, except with sample mean and sample standard deviation (or \(s=\frac{\sigma}{\sqrt{n}}\) if s is unknown).