Gridtracer

An open source project from Data to AI Lab at MIT.

Gridtracer

This repository is part of a research project developed for a master's thesis. The goal is to create a data preprocessing pipeline for generating georeferenced datasets that serve as the foundation for building synthetic low-voltage grid infrastructure schemata in the United States.

Project Focus: This repository solely focuses on the data preprocessing pipeline. It takes a user-defined US region and generates specific, georeferenced datasets suitable as input for downstream synthetic grid modeling tasks (handled in a separate project).

Overview

This pipeline collects and processes geospatial data for any US region (state, county, or subdivision) to create comprehensive building-level datasets.

Key Outputs:

Classified building footprints with energy-relevant attributes
Routable road networks for transportation analysis
Points of Interest and land use data
Regional boundaries and demographic information

All outputs are georeferenced and organized by administrative hierarchy for seamless integration into energy system modeling, urban planning, or technoeconomic analysis workflows.

Installation

make install

Configuration

The pipeline is initialized through a YAML-based configuration file (gridtracer/config/config.yaml). This file specifies the geographic scope for data collection:

REGION:
  STATE: "MA"                     # Required: State abbreviation (e.g., "MA")
  COUNTY: "Middlesex County"      # Required: Full county name
  COUNTY_SUBDIVISION: "Cambridge city" # Optional: Full county subdivision name. If omitted, processes the entire county.

Input validation ensures correct state abbreviations and county names are used via FIPS code lookup.

Data Sources

Source	Data Extracted	Purpose
OpenStreetMap	Buildings, POIs, Land Use, Roads, Power Infrastructure	Base geometry, network, feature extraction
NREL	Residential building typology datasets	Building vintage distributions for energy modeling
US Census TIGER	Administrative boundaries (state, county, subdivision)	Defining regional scope, FIPS code resolution
US Census Data	Demographic data (population density, housing units)	Building classification heuristics
Microsoft Buildings	ML-derived building footprints with height data	Enhanced building geometry and attributes

Pipeline Outputs

The primary outputs generated for the specified region are:

Routable Road Network: An .sql file containing the road network processed and formatted for direct import into a PostgreSQL/PostGIS database with the pgRouting extension (roads_pgr.sql).
Classified Building Footprints: A shapefile (buildings_classified.shp) containing building polygons with attributes derived from the detailed classification heuristic (see Workflow below).
Transformer Network: A GeoJSON file (transformers.geojson) containing points representing electrical transformers extracted from OSM.

Output Structure

All outputs are organized in a hierarchical directory structure by administrative region:

output/
└── [STATE]/                        # e.g., MA
    └── [COUNTY]/                   # e.g., Middlesex_County
        └── [SUBDIVISION]/          # e.g., Cambridge_city (optional)
            ├── CENSUS/             # Administrative boundaries and census data
            ├── NREL/               # Building typology distributions
            ├── OSM/                # OpenStreetMap extracts
            ├── MICROSOFT_BUILDINGS/ # ML-derived building footprints
            ├── BUILDINGS_OUTPUT/   # Final classified buildings
            ├── ROAD_NETWORK/       # Routable road networks
            └── PLOTS/              # Visualization outputs

Pipeline Workflow

The pipeline processes data through seven sequential stages:

Step 1: Census Boundary Definition

Parse YAML configuration and resolve FIPS codes for target region
Establish precise geographic boundaries using Census TIGER/Line data
Create output directory structure for all data products

Step 2: Census Subdivision Segmentation

Generate comprehensive subdivision datasets for the target region
Extract population and housing unit metrics for classification heuristics

Step 3: NREL Data Processing

Process NREL residential building typology datasets
Extract vintage distribution data for energy modeling parameters

Step 4: OpenStreetMap Data Extraction

Query and download OSM data (buildings, roads, POIs, power infrastructure)
Clip and store processed OSM datasets for subsequent analysis steps

Step 5: Microsoft Buildings Integration

Download ML-derived building footprints with height information
Integrate with existing building data for enhanced geometry

Step 6: Building Classification

Apply energy-focused classification heuristics using all data sources
Generate final classified building footprints with typology and structural attributes

Step 7: Road Network Generation

Process OSM road network topology for routing applications
Export pgRouting-compatible SQL files for database integration

Usage

Running the Pipeline

# Run the complete data processing pipeline
python -m gridtracer.scripts.main

# Or run directly from the scripts directory
python gridtracer/scripts/main.py

Testing

# Run tests with coverage
make test

# Run specific test file
python -m pytest tests/path/to/test_file.py

# Check code style
make lint

# Auto-fix code style issues
make fix-lint

Documentation

📄 Homepage: https://github.com/DAI-Lab/gridtracer
📚 Documentation: https://DAI-Lab.github.io/gridtracer

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github		.github
docs		docs
gridtracer		gridtracer
tests		tests
.editorconfig		.editorconfig
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.travis.yml		.travis.yml
AUTHORS.rst		AUTHORS.rst
CONTRIBUTING.rst		CONTRIBUTING.rst
HISTORY.md		HISTORY.md
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
ruff.toml		ruff.toml
setup.cfg		setup.cfg
setup.py		setup.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Gridtracer

Overview

Installation

Configuration

Data Sources

Pipeline Outputs

Output Structure

Pipeline Workflow

Step 1: Census Boundary Definition

Step 2: Census Subdivision Segmentation

Step 3: NREL Data Processing

Step 4: OpenStreetMap Data Extraction

Step 5: Microsoft Buildings Integration

Step 6: Building Classification

Step 7: Road Network Generation

Usage

Running the Pipeline

Testing

Documentation

About

Uh oh!

Releases

Packages

Languages

License

DAI-Lab/Gridtracer

Folders and files

Latest commit

History

Repository files navigation

Gridtracer

Overview

Installation

Configuration

Data Sources

Pipeline Outputs

Output Structure

Pipeline Workflow

Step 1: Census Boundary Definition

Step 2: Census Subdivision Segmentation

Step 3: NREL Data Processing

Step 4: OpenStreetMap Data Extraction

Step 5: Microsoft Buildings Integration

Step 6: Building Classification

Step 7: Road Network Generation

Usage

Running the Pipeline

Testing

Documentation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages