Detects changes in time series with a python wrapper around the R package gets (https://cran.r-project.org/web/packages/gets/index.html). Uses a combination of Google BigQuery and Python to query data, which is then fed to the R change detection code. Outputs a table containing results.
pip install change_detection
Anaconda users may have to conda install rpy2 and conda install geopandas if not already installed.
See https://github.com/ebmdatalab/change_detection/blob/master/examples/examples.ipynb for examples of use.
- Get data, by:
- using a csv in 
data/<name>, which must have only the fieldscode,month,numeratoranddenominator - creating a BigQuery SQL query in the same folder as the notebook that you're using, query must produce a table with only the fields 
code,month,numeratoranddenominator - querying any number of the OpenPrescribing measures in BigQuery
 
 - using a csv in 
 - Reshapes data with Pandas
 - Splits data into chunks and passes each chunk to the R change detection code
 - The resulting output is then extracted with further R code
 - The R outputs are then concatenated
 
namespecifies either the name of the custom SQL file, or the name of the BigQuery measure to be queriedverbosemakes the R output more verbose to help with bug fixing default = Falsesamplefor testing purposes, takes a random sample of 100 entities, to reduce processing time default = Falsemeasurespecifies that thenamespecified refers to a measure, rather than custom SQL default = Falsedirectionspecifies which direction to look for changes, may be'up','down', or'both', default = 'both'use_cachepasses theuse_cacheoption tobq.cached_readdefault = Truecsv_nameto specify a .csv file to be used in the change detection, rather than getting the data from BigQueryoverwriteforces reprocessing of the change detection, default behaviour is to not re-run if the output files exist default = Falsedraw_figuresdraw an R plot for each of the time-series, along with plotting regression lines/breaks. These are stored in thefiguresfolder. Options are'no'or'yes'default = 'no'
is.tfirst First negative break
is.tfirst.pknown First negative break after a known intervention date
is.tfirst.pknown.offs First negative break after a known intervention date not offset by a XX% increase
is.tfirst.offs First negative break not offset by a XX% increase
is.tfirst.big Steepest break as identified by is.slope.ma
is.slope.ma Average slope over steepest segment contributing at least XX% of total drop
is.slope.ma.prop Average slope as proportion to prior level
is.slope.ma.prop.lev Percentage of the total drop the segment used to evaluate the slope makes up
is.intlev.initlev Pre-drop level
is.intlev.finallev End level
is.intlev.levd Difference between pre and end level
is.intlev.levdprop Proportion of drop
Python with an associated install of R. Python dependencies should be dealt with on installation (though for my install, I had to install rpy2 separately. R packages should be installed with the package is first loaded.
- ebmdatalab library https://github.com/ebmdatalab/datalab-pandas
 - rpy2 (to install R and the below libraries)
 - pandas
 - pandas-gbq
 - numpy
 
- zoo
 - caTools
 - gets