What I did

What I learned

def parse_obj(obj):
    for key in obj:
        if isinstance(obj[key], str):
            obj[key] = obj[key].encode('latin_1').decode('utf-8')
        elif isinstance(obj[key], list):
            obj[key] = list(map(lambda x: x if type(x) != str else x.encode('latin_1').decode('utf-8'), obj[key]))
        pass
    return obj

with open(sarah_json1) as f:
     fixed_json = json.load(f, object_hook=parse_obj)
     df = pd.json_normalize(fixed_json["messages"])

L1 vs L2 regularization

\[W = (1-\alpha \lambda)W - \frac{\partial J}{\partial W}\]

where $\alpha$ is the learning rate, $\lambda$ is the regularization hyper-parameter, and J is the cost. Every iteration the weights are pushed closer to zero since your multiplying the weights by a number \(<1\). For L2, we have:

\[\text{L2 cost} = \Sigma (y_i - y)^2 + \lambda * \Sigma(W)^2\]

L1 regularization is just the same as above, but with an absolute value for the regularization term instead of a square. L1 is known as LASSO (least absolute shrinkage and selection operator), because it shrinks the less important features’ coefficients to zero. This is because for small values \(abs(w)\) is a much stiffer penalty than \(w^2\). Thus, L1 is thus a good choice when you have a ton of features.

Famous Network Architectures

\[r = \frac{\sigma(xy)}{\sigma(x)\sigma(y)} = \frac{\Sigma\left[(x-\bar{x})(y-\bar{y})\right]}{\sqrt{\Sigma(x-\bar{x})^2} \sqrt{\Sigma(y-\bar{y})^2}}\]

Here, the top term is the covariance of x and y

\[cov(x,y) = \frac{\Sigma\left[(x-\bar{x})(y-\bar{y})\right]}{N}\]

and the bottom terms are just the standard deviation of x and of y.

\[\sigma(x) = \sqrt{\frac{\Sigma(x-\bar{x})^2}{N}}, \hspace{2em} \sigma(y) = \sqrt{\frac{\Sigma(y-\bar{y})^2}{N}}\]

What I will do next