1 minute read

Examples of using pandas .pipe()

What I did

  • Monday: Made a nice powerpoint of my issues with regression skew-norm. Worked on \(g(r)\) script.
  • Tuesday: Had a good meeting where I realized I was calculating MSD incorrectly. I worked on some scripts to fix these issues and finished my \(g(r)\) script
  • Wednesday: Worked on fixing the MSD values via the script fix-cipher2.ipynb.
  • Thursday: I completed tasks one and two (as explained in the Wednesday What I learned section).
  • Friday: Completed task three.

What I learned

  • Monday: I’m really into using .pipe() and chaining commands in pandas now.
  • Tuesday: Xanes is not sensitive to the whole 516 atom structure so I should only calcuate the MSD of the 13 nearest neighbors. Also \(MSD \neq shift\) since I changed my methodology, so I have to recalculate all those MSD values as well.
  • Wednesday: There are essentially 3 tasks I need to do: one is a script to fix the MSD values in the folders of the real disordered structures, one is a script to do the same for the isotropic structures, and then I need to modify the multi-moment gaussianator script (and it’s variants).
  • Thursday: I wrote a nice .query function. Note, in .query, outside variables need an @ before them, and column names that have a space in them require ticks (` `) surrounding them.
  • Friday: Leaned about Shap Values

What I will do next

  • Investigate why some spectra still have duplicates and what to do about them.

Random Aside

I was doing some interview practice and typed these up while talking about beta coefficients. Figured I might as well leave them here. The regression equation: \(y=\beta x + \beta_0\) where:

\[\beta = \frac{S_{xy}}{S_{xx}} \\ \beta_0 = \bar{y} - \beta\bar{x} \\ S_{xy} = \sum(x-\bar{x})(y-\bar{y}) \\ S_{xx} = \sum(x-\bar{x})(x-\bar{x})\]

The multiple regression equation for OLS is: \(\beta = (X^T X)^{-1}X^T y\)

This would work for linear regression, too, and is how you would actually probably want to calculate it in code because of the optimization processors have for vectorized operations.

Note that \(X\), the design matrix or matrix of regressors, is assumed to be full rank. This means that each row is linearly independent (makes sense, you ) and the matrix is invertible. You can test it’s full rank by making sure the determinant is non-zero.