Bayesian analysis of Amazon reviews in R and the most important functions in Pandas
R.# Bayesian Analysis of Amazon Reviews
# cable A, $19.99
# 222 reviews, 4.4 stars, 74% positive (4 & 5 stars)
# Cable B, $15.99
# 638 reviews, 4.2 stars, 79% positive (4 & 5 stars)
# Csble C, $15.99
# 293 reviews, 4.5 stars, 88% positive (4 & 5 stars)
theta = seq(0, 1, .01)
plot(theta, dbeta(theta, 1 + 222*.74, 1 + 222*.26), col = "red", type="l", ylim = c(0,25),
main="Amazon Ratings",
xlab="Positive sentiment",
ylab="",
sub="for cables") # A
lines(theta, dbeta(theta, 1 + 638*.79, 1 + 638*.21), col = "blue", type="l") # B
lines(theta, dbeta(theta, 1 + 293*.88, 1 + 293*.12), col = "green") # C
legend("topleft", legend=c("A", "B", "C"),
col=c("red", "blue", "green"), lwd=(1))

Pandas uses loc nd iloc. Use iloc for index based selections in the row, column format. iloc uses stdlib index, so the first index is included and the last is excluded. For example:import pandas as pd
df = pd.DataFrame({'Column 1': [50, 21], 'Column 2': [131, 2]})
df.iloc[0] #gives all of 0th row (i.e.)
df.iloc[:,0] #gives entire column of 'Column 1'
df.iloc[:2] #gives rows 0, and 1 of all the columns. Note how this differs from loc
loc for label based selections. loc indexes inclusively. Remember this by thinking about how loc should be inclusive because you typically pass it strings. For example:import pandas as pd
df = pd.DataFrame[{'Column 1': [10, 20], 'Column 2': [30, 40]}]
df.loc[0, 'Column 1'] #gives row 0, column 1
df.loc[:, ['Column 1', 'Column 2']] #gives all rows for the 2 columns
df.loc[:1] #gives rows 0, and 1 of all the columns #note how this differs from iloc
reviews where the entry in the column country is “Italy.”reviews.loc[reviews.country == 'Italy']
# Here's a useful one I forgot about
reviews.loc[reviews.country.isin(['Italy', 'France'])]
reviews.loc[reviews.price.notnull()]
#for col1 I mean something like df.reviews or df['reviews'], I'm just trying to be more generalizable
df.describe()
df.col1.mean()
df.col1.unique()
df.col1.value_counts()
While Python is not a functional language, it has adopted a few functional features, such as the lambda function and map().
review_points_mean = reviews.points.mean()
reviews.points.map(lambda p: p - review_points_mean)
apply function to enact it.def remean_points(row):
row.points = row.points - review_points_mean
return row
reviews.apply(remean_points, axis='columns')
idxmax.bargain_wine = reviews.iloc[(reviews.points/reviews.price).idxmax()].title
# or in 2 lines:
bargain_index = (reviews.points/reviews.price).idxmax()
bargain_name = reviews.loc[bargain_idx, 'title']
# remember the the .loc gives the row of bargain_index and the column (i.e. value) 'title'