Bayesian analysis of Amazon reviews in R and the most important functions in Pandas

What I did

What I Learned

# Bayesian Analysis of Amazon Reviews

# cable A, $19.99
# 222 reviews, 4.4 stars, 74% positive (4 & 5 stars)
# Cable B, $15.99
# 638 reviews, 4.2 stars, 79% positive (4 & 5 stars)
# Csble C, $15.99
# 293 reviews, 4.5 stars, 88% positive (4 & 5 stars)
theta = seq(0, 1, .01)
plot(theta, dbeta(theta, 1 + 222*.74, 1 + 222*.26), col = "red", type="l", ylim = c(0,25),
    main="Amazon Ratings",
    xlab="Positive sentiment",
    ylab="",
    sub="for cables") # A
lines(theta, dbeta(theta, 1 + 638*.79, 1 + 638*.21), col = "blue", type="l") # B
lines(theta, dbeta(theta, 1 + 293*.88, 1 + 293*.12), col = "green") # C
legend("topleft", legend=c("A", "B", "C"),
       col=c("red", "blue", "green"), lwd=(1))

import pandas as pd
df = pd.DataFrame({'Column 1': [50, 21], 'Column 2': [131, 2]})
df.iloc[0] #gives all of 0th row (i.e.)
df.iloc[:,0] #gives entire column of 'Column 1'
df.iloc[:2] #gives rows 0, and 1 of all the columns. Note how this differs from loc
import pandas as pd
df = pd.DataFrame[{'Column 1': [10, 20], 'Column 2': [30, 40]}]
df.loc[0, 'Column 1'] #gives row 0, column 1
df.loc[:, ['Column 1', 'Column 2']] #gives all rows for the 2 columns
df.loc[:1] #gives rows 0, and 1 of all the columns #note how this differs from iloc
reviews.loc[reviews.country == 'Italy']

# Here's a useful one I forgot about
reviews.loc[reviews.country.isin(['Italy', 'France'])]
reviews.loc[reviews.price.notnull()]
#for col1 I mean something like df.reviews or df['reviews'], I'm just trying to be more generalizable
df.describe()
df.col1.mean()
df.col1.unique()
df.col1.value_counts()

Mapping

While Python is not a functional language, it has adopted a few functional features, such as the lambda function and map().

review_points_mean = reviews.points.mean()
reviews.points.map(lambda p: p - review_points_mean)
def remean_points(row):
    row.points = row.points - review_points_mean
    return row

reviews.apply(remean_points, axis='columns')
bargain_wine = reviews.iloc[(reviews.points/reviews.price).idxmax()].title
# or in 2 lines:
bargain_index = (reviews.points/reviews.price).idxmax()
bargain_name = reviews.loc[bargain_idx, 'title']
# remember the the .loc gives the row of bargain_index and the column (i.e. value) 'title'

What I Will Do Next