sklearn and from scratch), as well as calculating the Karl Pearson Correlation Coefficient (both with numpy and from scratch).uft-8, but gets decoded as latin-1, before being loaded as utf-8. To remedy this, reversed the process; I loaded the JSON, then went through line by line, encoding it in latin-1, then decoding it as utf-8 again.def parse_obj(obj):
for key in obj:
if isinstance(obj[key], str):
obj[key] = obj[key].encode('latin_1').decode('utf-8')
elif isinstance(obj[key], list):
obj[key] = list(map(lambda x: x if type(x) != str else x.encode('latin_1').decode('utf-8'), obj[key]))
pass
return obj
with open(sarah_json1) as f:
fixed_json = json.load(f, object_hook=parse_obj)
df = pd.json_normalize(fixed_json["messages"])
u'\u2019' (i.e. right single quotation
marks) that show up. I can’t seem regex-replace them. This is particularly troublesome for this project because the word-cloud package
interprets that symbol as cause to split the word in too. Thus, it’ll split a word like 'they're into they and re. The STOPWORDS package removes they from the cloud, but obviously re isn’t a word, but it is super common since they're is a common word. For now I’ve just added all the phonemes that follow apostrophes to the list of stop-words, but I need to fix this issue for the chatbot training data.where $\alpha$ is the learning rate, $\lambda$ is the regularization hyper-parameter, and J is the cost. Every iteration the weights are pushed closer to zero since your multiplying the weights by a number \(<1\). For L2, we have:
\[\text{L2 cost} = \Sigma (y_i - y)^2 + \lambda * \Sigma(W)^2\]L1 regularization is just the same as above, but with an absolute value for the regularization term instead of a square. L1 is known as LASSO (least absolute shrinkage and selection operator), because it shrinks the less important features’ coefficients to zero. This is because for small values \(abs(w)\) is a much stiffer penalty than \(w^2\). Thus, L1 is thus a good choice when you have a ton of features.

5 X 5 kernel is the equivalent of two 3 x 3 kernels (no padding and
stride of 1 on an input of 5 x 5 x 1). Or one 7 X 7 can be replaced by three 3 X 3, or one 11 X 11 can be replaced by five 3 X 3 kernels.

Here, the top term is the covariance of x and y
\[cov(x,y) = \frac{\Sigma\left[(x-\bar{x})(y-\bar{y})\right]}{N}\]and the bottom terms are just the standard deviation of x and of y.
\[\sigma(x) = \sqrt{\frac{\Sigma(x-\bar{x})^2}{N}}, \hspace{2em} \sigma(y) = \sqrt{\frac{\Sigma(y-\bar{y})^2}{N}}\]