SQL, precision, recall, and fun with Numpy

What I did

What I learned

\[\text{Precision} = \frac{\text{\# of True Positives}}{\text{\# of True Positives + \# of False Positives}} = \frac{\text{True Positives}}{\text{Predicted Positives}}\]


\(\text{Recall} = \frac{\text{\# of True Positives}}{\text{\# of True Positives + \# of False Negatives}} = \frac{\text{True Positives}}{\text{Actual Positives}}\)

\(Precision = \frac{\text{10 cats}}{10 + 10 ~ \text{cats}}\) \(Recall = \frac{\text{10 cats}}{10 + 0 ~ \text{cats}}\)

The tradeoff is that the more you recall, the more likely you are to make a mistake. High precision is important for things like search results where you want only relevant results. JSTOR, for example, has shitty precision and high recall, unless you really dig deep into the advanced search features.

import numpy as np

vec1 = [1,2,3,4,5]
vec2 = [6,7,8,9,10]

def euclidean_dist(vec1, vec2):
  return np.linalg.norm(np.array(vec1) - np.array(vec2))
vec1 = [1,2,3,4]
vec2 = [5,6,7,8]

# 1st way
def cosine_dist(vec1, vec2):
  vec1 = np.array(vec1)
  vec2 = np.array(vec2)
  return np.dot(vec1, vec2)/np.linalg.norm(vec1 * vec2)

# 2nd way
def cosine_dist(vec1, vec2):
  vec1 = np.array(vec1)
  vec2 = np.array(vec2)
  return (vec1 @ vec2.T) / np.linalg.norm(vec1 * vec2) 
vec1 = [1,2,3,4], vec2 = [5,6,7,8]

def truncate(vec):
  vec = np.array(vec)
  return np.where(vec > 1, 1, vec).tolist()
  # without the tolist it returns a dict

# this is the equivilent
def truncate(vec):
  return [1 if val > 1 else val for val in vec]
vec = [1,2,3,4]

def any_nans(vec):
  return np.isnan(np.array(vec)).any()

# or also, less elegantly
def any_nans(vec):
  for value in np.array(vec).ravel():
    if np.isnan(value):
      return True
def print_2x(*args):
  print(2*num for num in args)

count(1,2,5,5,6,7)

What I will do Next