Skip to content

Commit 261503a

Browse files
authored
Merge pull request #225 from cmu-delphi/safegraph_patterns
pipeline for Safegraph patterns
2 parents cf6f4c1 + 3d76de4 commit 261503a

38 files changed

+57374
-0
lines changed

safegraph_patterns/.gitignore

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
# You should hard commit a prototype for this file, but we
2+
# want to avoid accidental adding of API tokens and other
3+
# private data parameters
4+
params.json
5+
6+
# Do not commit output files
7+
receiving/*.csv
8+
9+
# Remove macOS files
10+
.DS_Store
11+
12+
# virtual environment
13+
dview/
14+
15+
# Byte-compiled / optimized / DLL files
16+
__pycache__/
17+
*.py[cod]
18+
*$py.class
19+
20+
# C extensions
21+
*.so
22+
23+
# Distribution / packaging
24+
coverage.xml
25+
.Python
26+
build/
27+
develop-eggs/
28+
dist/
29+
downloads/
30+
eggs/
31+
.eggs/
32+
lib/
33+
lib64/
34+
parts/
35+
sdist/
36+
var/
37+
wheels/
38+
*.egg-info/
39+
.installed.cfg
40+
*.egg
41+
MANIFEST
42+
43+
# PyInstaller
44+
# Usually these files are written by a python script from a template
45+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
46+
*.manifest
47+
*.spec
48+
49+
# Installer logs
50+
pip-log.txt
51+
pip-delete-this-directory.txt
52+
53+
# Unit test / coverage reports
54+
htmlcov/
55+
.tox/
56+
.coverage
57+
.coverage.*
58+
.cache
59+
nosetests.xml
60+
coverage.xml
61+
*.cover
62+
.hypothesis/
63+
.pytest_cache/
64+
65+
# Translations
66+
*.mo
67+
*.pot
68+
69+
# Django stuff:
70+
*.log
71+
.static_storage/
72+
.media/
73+
local_settings.py
74+
75+
# Flask stuff:
76+
instance/
77+
.webassets-cache
78+
79+
# Scrapy stuff:
80+
.scrapy
81+
82+
# Sphinx documentation
83+
docs/_build/
84+
85+
# PyBuilder
86+
target/
87+
88+
# Jupyter Notebook
89+
.ipynb_checkpoints
90+
91+
# pyenv
92+
.python-version
93+
94+
# celery beat schedule file
95+
celerybeat-schedule
96+
97+
# SageMath parsed files
98+
*.sage.py
99+
100+
# Environments
101+
.env
102+
.venv
103+
env/
104+
venv/
105+
ENV/
106+
env.bak/
107+
venv.bak/
108+
109+
# Spyder project settings
110+
.spyderproject
111+
.spyproject
112+
113+
# Rope project settings
114+
.ropeproject
115+
116+
# mkdocs documentation
117+
/site
118+
119+
# mypy
120+
.mypy_cache/

safegraph_patterns/.pylintrc

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
[DESIGN]
2+
3+
min-public-methods=1
4+
5+
6+
[MESSAGES CONTROL]
7+
8+
disable=R0801, C0330, E1101, E0611, C0114, C0116, C0103, R0913, R0914, W0702

safegraph_patterns/DETAILS.md

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
# Patterns Dataset in Safegraph Mobility Data
2+
3+
We import Zip Code-level raw mobility indicators from Safegraph **Weekly
4+
Patterns** dataset, calculate functions of the raw data, and then aggregate
5+
he data to the county, hrr, msa and state levels.
6+
7+
## Brand Information
8+
Safegraph provides daily number of visits to points of interest (POIs) in Weekly
9+
Patterns datasets which is documanted [here](https://docs.safegraph.com/docs/weekly-patterns).
10+
Base information such as location name, address, category, and brand association
11+
for POIs are provided in **Places Schema** dataset which is documented [here]
12+
(https://docs.safegraph.com/docs/places-schema). Safegraph does not update their
13+
list of POIs frequently but there does exist versioning issue. The release
14+
version can be found in `release-metadata` in Weekly Patterns dataset and there
15+
are correspounding `brand_info.csv` provided in Places Schema dataset. To save
16+
storage space, we do not download the whole Places Schema dataset, but only add
17+
new necesary `brand_info.csv` in `./statics` with suffix YYYYMM(release version).
18+
19+
## Geographical Levels
20+
* `county`: reported using zero-padded FIPS codes (consistency with the
21+
other COVIDcast data)
22+
* `msa`: reported using cbsa (consistent with all other COVIDcast sensors)
23+
* `hrr`: reported using HRR number (consistent with all other COVIDcast sensors)
24+
* `state`: reported using two-letter postal code
25+
26+
## Metrics, Level 1 (`m1`)
27+
* `bars_visit`: The number of visits to bars(places with naics code = 722410)
28+
* `restaurants_visit`: The number of visits to restaurants(places with naics
29+
code = 722511)
30+
31+
## Metrics, Level 2 (`m2`)
32+
* `num`: number of new deaths on a given week
33+
* `prop`: `num` / population * 100,000 (Notice the population here only includes
34+
population aggregated at Zip Code level. If there are no POIs for a certain
35+
Zip Code, the population there won't be considered.)
36+
37+
38+
## API Key
39+
40+
We access the Safegraph data using an AWS key-secret pair which is valid
41+
until June 15, 2021. The AWS credentials have been issued under
42+
@huisaddison's Safegraph Data Catalog account.

safegraph_patterns/README.md

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,67 @@
1+
# Patterns Dataset in Safegraph Mobility Data
2+
3+
We import raw mobility data from Safegraph Weekly Patterns, calculate some
4+
statistics upon it, and aggregate the data from the Zip Code level to County,
5+
HRR, MSA and State levels. For detailed information see the files `DETAILS.md`
6+
contained in this directory.
7+
8+
## Running the Indicator
9+
10+
The indicator is run by directly executing the Python module contained in this
11+
directory. The safest way to do this is to create a virtual environment,
12+
installed the common DELPHI tools, and then install the module and its
13+
dependencies. To do this, run the following code from this directory:
14+
15+
```
16+
python -m venv env
17+
source env/bin/activate
18+
pip install ../_delphi_utils_python/.
19+
pip install .
20+
```
21+
22+
One must also install the
23+
[AWS CLI](https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-welcome.html).
24+
Please refer to OS-specific instructions to install this command line
25+
interface, and verify that it is installed by calling `which aws`.
26+
If `aws` is not installed prior to running the pipeline, it will raise
27+
a `FileNotFoundError`.
28+
29+
All of the user-changable parameters are stored in `params.json`. To execute
30+
the module and produce the output datasets (by default, in `receiving`), run
31+
the following:
32+
33+
```
34+
env/bin/python -m delphi_safegraph_patterns
35+
```
36+
37+
Once you are finished with the code, you can deactivate the virtual environment
38+
and (optionally) remove the environment itself.
39+
40+
```
41+
deactivate
42+
rm -r env
43+
```
44+
45+
## Testing the code
46+
47+
To do a static test of the code style, it is recommended to run **pylint** on
48+
the module. To do this, run the following from the main module directory:
49+
50+
```
51+
env/bin/pylint delphi_safegraph_patterns
52+
```
53+
54+
The most aggressive checks are turned off; only relatively important issues
55+
should be raised and they should be manually checked (or better, fixed).
56+
57+
Unit tests are also included in the module. To execute these, run the following
58+
command from this directory:
59+
60+
```
61+
(cd tests && ../env/bin/pytest --cov=delphi_safegraph_patterns --cov-report=term-missing)
62+
```
63+
64+
The output will show the number of unit tests that passed and failed, along
65+
with the percentage of code covered by the tests. None of the tests should
66+
fail and the code lines that are not covered by unit tests should be small and
67+
should not include critical sub-routines.

safegraph_patterns/REVIEW.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
## Code Review (Python)
2+
3+
A code review of this module should include a careful look at the code and the
4+
output. To assist in the process, but certainly not in replace of it, please
5+
check the following items.
6+
7+
**Documentation**
8+
9+
- [ ] the README.md file template is filled out and currently accurate; it is
10+
possible to load and test the code using only the instructions given
11+
- [ ] minimal docstrings (one line describing what the function does) are
12+
included for all functions; full docstrings describing the inputs and expected
13+
outputs should be given for non-trivial functions
14+
15+
**Structure**
16+
17+
- [ ] code should use 4 spaces for indentation; other style decisions are
18+
flexible, but be consistent within a module
19+
- [ ] any required metadata files are checked into the repository and placed
20+
within the directory `static`
21+
- [ ] any intermediate files that are created and stored by the module should
22+
be placed in the directory `cache`
23+
- [ ] final expected output files to be uploaded to the API are placed in the
24+
`receiving` directory; output files should not be committed to the respository
25+
- [ ] all options and API keys are passed through the file `params.json`
26+
- [ ] template parameter file (`params.json.template`) is checked into the
27+
code; no personal (i.e., usernames) or private (i.e., API keys) information is
28+
included in this template file
29+
30+
**Testing**
31+
32+
- [ ] module can be installed in a new virtual environment
33+
- [ ] pylint with the default `.pylint` settings run over the module produces
34+
minimal warnings; warnings that do exist have been confirmed as false positives
35+
- [ ] reasonably high level of unit test coverage covering all of the main logic
36+
of the code (e.g., missing coverage for raised errors that do not currently seem
37+
possible to reach are okay; missing coverage for options that will be needed are
38+
not)
39+
- [ ] all unit tests run without errors

safegraph_patterns/cache/.gitignore

Whitespace-only changes.
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# -*- coding: utf-8 -*-
2+
"""Module to process Safegraph mobility data.
3+
4+
This file defines the functions that are made public by the module. As the
5+
module is intended to be executed though the main method, these are primarily
6+
for testing.
7+
"""
8+
9+
from __future__ import absolute_import
10+
11+
from . import process
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
# -*- coding: utf-8 -*-
2+
"""Call the function run_module when executed.
3+
4+
This file indicates that calling the module (`python -m MODULE_NAME`) will
5+
call the function `run_module` found within the run.py file. There should be
6+
no need to change this template.
7+
"""
8+
9+
from .run import run_module # pragma: no cover
10+
11+
run_module() # pragma: no cover

0 commit comments

Comments
 (0)