26th-January-2018: Liberapay Stats Analysis
I'm diving in to the paydays.json
file for Liberapay (live, archived), and oh boy has this file cleared up a bunch of issues:
- Delineated USD/EUR data! The most obvious improvement over my scraped data is that the currency data in this JSON file is broken down by currency.
- (Come to find out, the original data actually was a sum of both currencies, with USD converted to EUR! I had no idea that this was the case while looking at the Liberapay Stats graphs, but I was told so in Liberapay's Gitter chat by the head developer.)
- This should help with isolating which weeks were pre/post-USD inclusion. (I was unsure about this)
- Date/Time-Series Data! No more need for me to wonder about which week data corresponds to which dates!
- This should help with future analyses.
I'm now in it with RStudio, learning how to take it apart. Learned to use the R library jsonlite
, and I'm following this guide here: https://cran.r-project.org/web/packages/jsonlite/vignettes/json-aaquickstart.html
I'm not certain whether or not I should worry about making a CSV from this, as at least for myself, I'm learning how to work with it itself from within RStudio.
However, I do think that a CSV format of it would remove some technical barriers to analysis (say, by Quantative Econ students — or even myself when a professor demands that I use EViews). Given this, I think that making a CSV from it would be a good form of community service + further exercise in R-lang.
CSV Structuring
But how should I structure it? Putting same columns next to each other (but with diff currencies) is confusing, so I'll make one set of columns per one currency and then another set with the other currency. All will be with one record, as before.
My previous Liberapay scraper worked in terms of making lists for each column, then combined all columns into a central DataFrame. I think that I'll do the same thing for now because (1) it's familiar to me, which I'd think would make things faster for me, and (2) I don't intend to be doing this very often, so even if this process is inefficient (as I suspect that it is) it's fine.