3rd-May-2019

The list of libraries that Mode supports makes for a good list of libraries and directions in DataAnalysis that one could/should check out. For instance, prior to seeing this page I hadn't heard of the notion of "defensive data analysis" (see Engarde) and I hadn't seen a description of survival analysis that was nearly this good (from the Lifelines package).


Last Friday and Saturday I was wondering about whether if a dataset excludes a median statistic but provides max, min, and avg, one could still use the midpoint of max and min ((max + min)/2) as a stand-in for the median, comparing this midpoint with the avg to determine whether the distribution considered is skewed. I suggested to Ryan that it might be possible for one to use a measure of (Avg/Midpoint) to decide whether a given average (avg mean) is above or below the median, but I'd need to test this by running simulations (my first instance of realizing if/when simulations might be run and why!) with known medians.

I was considering this after looking over a salary guide/report provided by the recruiting agency that I've been working with, Accounting Principals. The guide provided only upper bounds, lower bounds, and averages for salaries across many positions. I was worried that the averages would be skewed, so I got to thinking if/how I might be able to use the other data that is given to determine whether it is indeed skewed. The above formula was what I came up with for comparison (Avg/((max + min)/2)), but I'm still not certain whether it'd behave in the way that I'm hoping it does, either always or a decent % of the time.

Edit: May 23, 2019: I think that this test would fail in precisely the cases I hope to use it to detect, cases in which there's skewness