1 minute read

R equivalencies in Python.

What I did

  • I finished the bayesian stats course on Saturday, but I wanted to redo the final exercises in python instead of R.
  • I worked on some multi-linear regression and some more traditional statistics in Python. Also matplotlib.

What I learned

  • So, I already learned about the p-value test statistic, but I hadn’t seen it in the context of a linear regression. It struck me as odd and I needed to look up what it actually was referring to.
from scipy import stats
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)

According to the documentation, the p_value returns a “Two-sided p-value for a hypothesis test whose null hypothesis is that the slope is zero, using Wald Test with t-distribution of the test statistic.” Essentially, what is the probability of obtaining a result at least as extreme as the measured slope, assuming the null hypothesis of zero. For a 2-tailed test, the p-value is:

\[\text{p value} = 2P(Z > |z_0|) \\ \text{where } Z_0 = \frac{\bar{x} - \mu_0}{\sigma/\sqrt{n}}\]
  • A good python equivalent to regression in R, is:
import statsmodels.formula.api as sfm
# regress acc onto dist
mod = sfm.ols(formula='accuracy ~ avg_dist', data = df) 
res=mod.fit()
print(res.summary())

What I Will do next

  • I should redo the last section of that stats course in R. Since most jobs require python or R, I figured I should just know how to do everything in python. But there are definitely some things that are better in R, such as ways to get statistical information and plotting.
  • Practice manipulating dataframes with pandas. It’s been awhile since I learned them, and I keep having to keep looking up basic things. I will find some W3 or Kaggle walk-through for pandas to refresh myself.

Categories:

Updated: