Price index RAP

This repo contains an example of a simple pipeline to make a price index with dvc.

To make the example concrete, consider building a fairly standard hierarchical price index for the retail services industry over a five year period. Price data come from a sample of 1,000 businesses and revenue weights for aggregation come from a statistical register of businesses. Each business is assigned a four-digit hierarchical code to classify the sub-sector of the retail services industry in which that businesses operates. The index is constructed by first calculating elementary indexes for each business and then aggregating the elementary indexes with an arithmetic (Lowe or Young, depending on the interpretation of the weights) index.

flowchart TD
    top["1"] --> a1["11"]
    top --> a2["12"]
    a1 --> b1["111"]
    a1 --> b2["112"]
    b1 --> c1["1111"]
    b1 --> c2["1112"]
    c1 --> B000
    c1 --> B001
    c1 --> B003

Prices are collected from most business with a monthly survey to capture the prices of 10 products each month. These data are first processed and then used to construct a Jevons elementary index for each business each month. Ten businesses are not given a survey as pre-processed retail-scanner data are available for the parent enterprise of these businesses. A Fisher GEKS index is made for the parent enterprise and used as the elementary index for these 10 businesses each month. This index is not revisable and so the GEKS index is made over a 13-month rolling window and combined with a mean splice.

Setup

Start by clong this repository, then use conda to setup an environment.

conda env create -f environment.yml
conda activate price-index-pipeline

dvc init

Make the data

Make aggregation weights and raw price quotes from businesses in \data and put these data under version control. This produces both survey-like and scanner-like price data.

Rscript make-weights.R
Rscript make-prices.R 2020-01 2024-12

dvc add data/raw-survey-prices.parquet data/scanner-prices.parquet data/weights.parquet

Make the index

The index is made by first processing the raw survey prices to remove extreme prices and making the GEKS elementary indexes. The remaining elementary indexes are calculated from the processed survey-like data, and the aggregation weights are used to build the hierarchical index (along with product contributions).

dvc repro

flowchart TD
        node1["data/raw-survey-prices.parquet.dvc"]
        node2["data/scanner-prices.parquet.dvc"]
        node3["data/weights.parquet.dvc"]
        node4["make-geks"]
        node5["make-index"]
        node6["process-prices"]
        node1-->node6
        node2-->node4
        node3-->node5
        node4-->node5
        node6-->node5

The results can be put under version control.

git add .
git commit -m "Index for 2024-12"
git tag -a "2024-12" -m "Index for 2024-12"

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.dvc		.dvc
.vscode		.vscode
R		R
data		data
output		output
.dvcignore		.dvcignore
LICENSE.md		LICENSE.md
README.md		README.md
dvc.yaml		dvc.yaml
environment.yml		environment.yml
make-prices.R		make-prices.R
make-weights.R		make-weights.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Price index RAP

Setup

Make the data

Make the index

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

UN-Task-Team-for-Scanner-Data/price-index-pipeline

Folders and files

Latest commit

History

Repository files navigation

Price index RAP

Setup

Make the data

Make the index

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages