sklearn
and from scratch), as well as calculating the Karl Pearson Correlation Coefficient (both with numpy
and from scratch).uft-8
, but gets decoded as latin-1
, before being loaded as utf-8
. To remedy this, reversed the process; I loaded the JSON, then went through line by line, encoding it in latin-1
, then decoding it as utf-8
again.def parse_obj(obj):
for key in obj:
if isinstance(obj[key], str):
obj[key] = obj[key].encode('latin_1').decode('utf-8')
elif isinstance(obj[key], list):
obj[key] = list(map(lambda x: x if type(x) != str else x.encode('latin_1').decode('utf-8'), obj[key]))
pass
return obj
with open(sarah_json1) as f:
fixed_json = json.load(f, object_hook=parse_obj)
df = pd.json_normalize(fixed_json["messages"])
u'\u2019'
(i.e. right single quotation
marks) that show up. I can’t seem regex-replace them. This is particularly troublesome for this project because the word-cloud package
interprets that symbol as cause to split the word in too. Thus, it’ll split a word like 'they're
into they
and re
. The STOPWORDS
package removes they
from the cloud, but obviously re
isn’t a word, but it is super common since they're
is a common word. For now I’ve just added all the phonemes that follow apostrophes to the list of stop-words, but I need to fix this issue for the chatbot training data.where $\alpha$ is the learning rate, $\lambda$ is the regularization hyper-parameter, and J is the cost. Every iteration the weights are pushed closer to zero since your multiplying the weights by a number \(<1\). For L2, we have:
\[\text{L2 cost} = \Sigma (y_i - y)^2 + \lambda * \Sigma(W)^2\]L1 regularization is just the same as above, but with an absolute value for the regularization term instead of a square. L1 is known as LASSO (least absolute shrinkage and selection operator), because it shrinks the less important features’ coefficients to zero. This is because for small values \(abs(w)\) is a much stiffer penalty than \(w^2\). Thus, L1 is thus a good choice when you have a ton of features.
5 X 5
kernel is the equivalent of two 3 x 3
kernels (no padding and
stride of 1 on an input of 5 x 5 x 1
). Or one 7 X 7
can be replaced by three 3 X 3
, or one 11 X 11
can be replaced by five 3 X 3
kernels.
Here, the top term is the covariance of x and y
\[cov(x,y) = \frac{\Sigma\left[(x-\bar{x})(y-\bar{y})\right]}{N}\]and the bottom terms are just the standard deviation of x and of y.
\[\sigma(x) = \sqrt{\frac{\Sigma(x-\bar{x})^2}{N}}, \hspace{2em} \sigma(y) = \sqrt{\frac{\Sigma(y-\bar{y})^2}{N}}\]