Change detection in prescribing data

Detects changes in time series with a python wrapper around the R package gets (https://cran.r-project.org/web/packages/gets/index.html). Uses a combination of Google BigQuery and Python to query data, which is then fed to the R change detection code. Outputs a table containing results.

Installation

pip install change_detection

Anaconda users may have to conda install rpy2 and conda install geopandas if not already installed.

Usage

See https://github.com/ebmdatalab/change_detection/blob/master/examples/examples.ipynb for examples of use.

Data flow

Get data, by:
- using a csv in data/<name>, which must have only the fields code, month, numerator and denominator
- creating a BigQuery SQL query in the same folder as the notebook that you're using, query must produce a table with only the fields code, month, numerator and denominator
- querying any number of the OpenPrescribing measures in BigQuery
Reshapes data with Pandas
Splits data into chunks and passes each chunk to the R change detection code
The resulting output is then extracted with further R code
The R outputs are then concatenated

Options

name specifies either the name of the custom SQL file, or the name of the BigQuery measure to be queried
verbose makes the R output more verbose to help with bug fixing default = False
sample for testing purposes, takes a random sample of 100 entities, to reduce processing time default = False
measure specifies that the name specified refers to a measure, rather than custom SQL default = False
direction specifies which direction to look for changes, may be 'up', 'down', or 'both', default = 'both'
use_cache passes the use_cache option to bq.cached_read default = True
csv_name to specify a .csv file to be used in the change detection, rather than getting the data from BigQuery
overwrite forces reprocessing of the change detection, default behaviour is to not re-run if the output files exist default = False
draw_figures draw an R plot for each of the time-series, along with plotting regression lines/breaks. These are stored in the figures folder. Options are 'no' or 'yes' default = 'no'

Output table

Timing Measures

is.tfirst First negative break is.tfirst.pknown First negative break after a known intervention date is.tfirst.pknown.offs First negative break after a known intervention date not offset by a XX% increase is.tfirst.offs First negative break not offset by a XX% increase is.tfirst.big Steepest break as identified by is.slope.ma

Slope Measures

is.slope.ma Average slope over steepest segment contributing at least XX% of total drop is.slope.ma.prop Average slope as proportion to prior level is.slope.ma.prop.lev Percentage of the total drop the segment used to evaluate the slope makes up

Level Measures

is.intlev.initlev Pre-drop level is.intlev.finallev End level is.intlev.levd Difference between pre and end level is.intlev.levdprop Proportion of drop

Requirements

Python with an associated install of R. Python dependencies should be dealt with on installation (though for my install, I had to install rpy2 separately. R packages should be installed with the package is first loaded.

Python installation requires:

ebmdatalab library https://github.com/ebmdatalab/datalab-pandas
rpy2 (to install R and the below libraries)
pandas
pandas-gbq
numpy

R installation requires:

zoo
caTools
gets

Name		Name	Last commit message	Last commit date
Latest commit History 70 Commits
change_detection		change_detection
examples		examples
.gitignore		.gitignore
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Change detection in prescribing data

Installation

Usage

Data flow

Options

Output table

Timing Measures

Slope Measures

Level Measures

Requirements

Python installation requires:

R installation requires:

About

Uh oh!

Releases 1

Packages

Languages

ebmdatalab/change_detection

Folders and files

Latest commit

History

Repository files navigation

Change detection in prescribing data

Installation

Usage

Data flow

Options

Output table

Timing Measures

Slope Measures

Level Measures

Requirements

Python installation requires:

R installation requires:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages