Skip to content
Alessandro Demo edited this page Aug 13, 2025 · 28 revisions

📊 Geospatial Performance Benchmark

This document tracks the performance and characteristics of different geospatial software stacks for the use cases defined in the project.

Use Case 1: Data Ingestion

Goal: To measure the one-time cost of preparing OSM data (.pbf) for analysis, either by loading it into a database or by creating a smaller subset.

Operation Performed Technology Test Dataset Key Metrics Notes / Qualitative Observations
Load entire .pbf into a database PostGIS + osm2pgsql italy-latest.osm.pbf Import Time: ~1840s (~31 min)
Disk Space: 19 GB
Represents the high upfront cost to create a fully indexed, queryable database. Ideal for many subsequent, fast queries.
Extract a regional .pbf from a national .pbf osmium-tool italy-latest.osm.pbf Execution Time: ~21s Represents the cost of creating a smaller, more manageable file for file-based workflows. A pre-processing step for tools like PyOsmium.

(Note: For file-based tools like QuackOSM, the "ingestion" and "filtering" steps happen simultaneously and their performance is measured in Use Case 2)


Use Case 2: Data Filtering and Extraction

Goal: To measure the efficiency of extracting a specific subset of data (buildings in Milan) from a larger source, analyzing the impact of system caches.

Operation Performed Technology Run Type Test Dataset Key Metrics Notes / Qualitative Observations
Extract buildings from .pbf and save to .geoparquet DuckDB + QuackOSM Cold Start italy-latest.osm.pbf Execution Time: ~1.34s
Output: 6.97 MB
Performance is I/O-bound. This time reflects a true first-time run, limited by reading the large PBF from disk.
Extract buildings from .pbf and save to .geoparquet DuckDB + QuackOSM Hot Cache Avg. italy-latest.osm.pbf Execution Time: ~0.22s
Runs: 99
Pure in-memory performance. The ~6x speedup is due to the Operating System's file cache, proving the bottleneck is disk I/O, not CPU.
Extract buildings from database PostGIS Cold Start planet_osm_polygon table Execution Time: ~1.10s Represents a first-time query, limited by reading data pages and indexes from disk into PostgreSQL's internal cache.
Extract buildings from database PostGIS Hot Cache Avg. planet_osm_polygon table Execution Time: ~0.40s
Runs: 99
Pure database query performance. The ~2.75x speedup is due to the database's internal caching mechanism.
Read .pbf, build GeoDataFrame and filter by area PyOsmium + GeoPandas Cold Start lombardy-latest.osm.pbf Execution Time: ~358.9s
Output: 7.61 MB
Performance is heavily CPU-bound. This time reflects the high computational cost of the two-pass parsing and geometry construction in Python.
Read .pbf, build GeoDataFrame and filter by area PyOsmium + GeoPandas Hot Cache Avg. lombardy-latest.osm.pbf Execution Time: ~360.1s
Runs: 49
Nearly identical to the cold start, confirming the bottleneck is CPU processing, not disk I/O. The OS cache provides minimal benefit.

Use Case 3: Single Table Analysis (Geometric Properties)

Goal: Evaluate performance on calculations that do not require joins, but operate on geometries, scaling across different data sizes.

Operation Performed Technology Run Type Test Dataset Key Metrics Notes / Qualitative Observations
3.1. Top 10 Largest Areas (Pinerolo) DuckDB Spatial Cold Start pinerolo_buildings.geoparquet Execution Time: ~0.028s Found 10 results. Minimal time difference from hot run.
3.1. Top 10 Largest Areas (Pinerolo) DuckDB Spatial Hot Cache Avg. pinerolo_buildings.geoparquet Execution Time: ~0.014s
Runs: 99
The ~2x speedup on the small dataset suggests a file metadata caching benefit.
3.1. Top 10 Largest Areas (Pinerolo) PostGIS Cold Start pinerolo_buildings table Execution Time: ~0.044s Found 10 results. Performance comparable to other techs.
3.1. Top 10 Largest Areas (Pinerolo) PostGIS Hot Cache Avg. pinerolo_buildings table Execution Time: ~0.017s
Runs: 99
Significant speedup from database caching on the small dataset.
3.1. Top 10 Largest Areas (Pinerolo) GeoPandas Cold Start pinerolo_buildings.geoparquet Execution Time: ~0.071s Found 10 results. Slower on cold start due to library overhead.
3.1. Top 10 Largest Areas (Pinerolo) GeoPandas Hot Cache Avg. pinerolo_buildings.geoparquet Execution Time: ~0.053s
Runs: 99
The slowest of the three on this task, but still very fast.
3.2. Total Buffered Area (Pinerolo) DuckDB Spatial Cold Start pinerolo_buildings.geoparquet Execution Time: ~0.117s Result ~11.0M sqm. Performance is CPU-bound.
3.2. Total Buffered Area (Pinerolo) DuckDB Spatial Hot Cache Avg. pinerolo_buildings.geoparquet Execution Time: ~0.115s
Runs: 99
Result remains a significant outlier compared to other technologies.
3.2. Total Buffered Area (Pinerolo) PostGIS Cold Start pinerolo_buildings table Execution Time: ~0.144s Result ~7.05M sqm. Performance is CPU-bound.
3.2. Total Buffered Area (Pinerolo) PostGIS Hot Cache Avg. pinerolo_buildings table Execution Time: ~0.144s
Runs: 99
The result is consistent with GeoPandas, suggesting it is the correct baseline.
3.2. Total Buffered Area (Pinerolo) GeoPandas Cold Start pinerolo_buildings.geoparquet Execution Time: ~0.174s Result ~7.05M sqm. Performance is CPU-bound.
3.2. Total Buffered Area (Pinerolo) GeoPandas Hot Cache Avg. pinerolo_buildings.geoparquet Execution Time: ~0.176s
Runs: 99
The result confirms the PostGIS calculation.
3.3. Restaurants NOT near Bus Stops (Pinerolo) DuckDB Spatial Cold Start pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet Execution Time: ~0.025s Found 16 results. Spatial anti-join operation.
3.3. Restaurants NOT near Bus Stops (Pinerolo) DuckDB Spatial Hot Cache Avg. pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet Execution Time: ~0.007s
Runs: 99
The result count differs from the other two techs (PostGIS/GeoPandas).
3.3. Restaurants NOT near Bus Stops (Pinerolo) PostGIS Cold Start pinerolo_restaurants and pinerolo_bus_stops tables Execution Time: ~0.003s Found 15 results. Extremely fast due to database optimizations.
3.3. Restaurants NOT near Bus Stops (Pinerolo) PostGIS Hot Cache Avg. pinerolo_restaurants and pinerolo_bus_stops tables Execution Time: ~0.0008s
Runs: 99
The fastest technology for this small-scale join.
3.3. Restaurants NOT near Bus Stops (Pinerolo) GeoPandas Cold Start pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet Execution Time: ~0.007s Found 17 results. The result is different from both techs.
3.3. Restaurants NOT near Bus Stops (Pinerolo) GeoPandas Hot Cache Avg. pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet Execution Time: ~0.004s
Runs: 99
Very fast, but slower than PostGIS.
3.1. Top 10 Largest Areas (Milan) DuckDB Spatial Cold Start milan_buildings.geoparquet Execution Time: ~0.196s Found 10 results. Operation is now CPU-bound on a larger dataset.
3.1. Top 10 Largest Areas (Milan) DuckDB Spatial Hot Cache Avg. milan_buildings.geoparquet Execution Time: ~0.189s
Runs: 99
Minimal speedup, as the ST_Area calculation on all geometries is the dominant cost.
3.1. Top 10 Largest Areas (Milan) PostGIS Cold Start milan_buildings table Execution Time: ~0.214s Found 10 results. Performance is nearly identical to DuckDB.
3.1. Top 10 Largest Areas (Milan) PostGIS Hot Cache Avg. milan_buildings table Execution Time: ~0.196s
Runs: 99
Similar CPU-bound behavior.
3.1. Top 10 Largest Areas (Milan) GeoPandas Cold Start milan_buildings.geoparquet Execution Time: ~0.835s Found 10 results. Significantly slower than database approaches on a medium dataset.
3.1. Top 10 Largest Areas (Milan) GeoPandas Hot Cache Avg. milan_buildings.geoparquet Execution Time: ~0.869s
Runs: 99
The performance gap widens on larger data.
3.2. Total Buffered Area (Milan) DuckDB Spatial Cold Start milan_buildings.geoparquet Execution Time: ~1.98s Result ~178.3M sqm. Numerical result is the outlier.
3.2. Total Buffered Area (Milan) DuckDB Spatial Hot Cache Avg. milan_buildings.geoparquet Execution Time: ~1.99s
Runs: 99
Performance is consistently CPU-bound.
3.2. Total Buffered Area (Milan) PostGIS Cold Start milan_buildings table Execution Time: ~1.37s Result ~111.5M sqm. Faster than DuckDB and numerically consistent with GeoPandas.
3.2. Total Buffered Area (Milan) PostGIS Hot Cache Avg. milan_buildings table Execution Time: ~1.34s
Runs: 99
Consistently CPU-bound behavior.
3.2. Total Buffered Area (Milan) GeoPandas Cold Start milan_buildings.geoparquet Execution Time: ~2.99s Result ~111.5M sqm. Slower than the databases, but confirms the PostGIS result.
3.2. Total Buffered Area (Milan) GeoPandas Hot Cache Avg. milan_buildings.geoparquet Execution Time: ~2.93s
Runs: 99
Consistently CPU-bound.
3.3. Restaurants NOT near Bus Stops (Milan) DuckDB Spatial Cold Start milan_restaurants.geoparquet and milan_bus_stops.geoparquet Execution Time: ~5.27s Result 2384. All the techs have different results.
3.3. Restaurants NOT near Bus Stops (Milan) DuckDB Spatial Hot Cache Avg. milan_restaurants.geoparquet and milan_bus_stops.geoparquet Execution Time: ~5.11s
Runs: 99
Fast performance, but numerically different from the other techs.
3.3. Restaurants NOT near Bus Stops (Milan) PostGIS Cold Start milan_restaurants and milan_bus_stops tables Execution Time: ~10.53s Result 2135. Slower than DuckDB on this operation.
3.3. Restaurants NOT near Bus Stops (Milan) PostGIS Hot Cache Avg. milan_restaurants and milan_bus_stops tables Execution Time: ~10.49s
Runs: 99
Result is different with GeoPandas and DuckDB ones, PostGIS is very slow too on this operation.
3.3. Restaurants NOT near Bus Stops (Milan) GeoPandas Cold Start milan_restaurants.geoparquet and milan_bus_stops.geoparquet Execution Time: ~0.015s Result 2672. Extraordinarily fast. sjoin_nearest is highly optimized.
3.3. Restaurants NOT near Bus Stops (Milan) GeoPandas Hot Cache Avg. milan_restaurants.geoparquet and milan_bus_stops.geoparquet Execution Time: ~0.016s
Runs: 99
The result is different to DuckDB and PostGIS ones, confirming that there will be done more researches about this differences.
3.1. Top 10 Largest Areas (Rome) DuckDB Spatial Cold Start rome_buildings.geoparquet Execution Time: ~0.56s Performance scales predictably.
3.1. Top 10 Largest Areas (Rome) DuckDB Spatial Hot Cache Avg. rome_buildings.geoparquet Execution Time: ~0.54s
Runs: 99
Consistently CPU-bound on the largest dataset.
3.1. Top 10 Largest Areas (Rome) PostGIS Cold Start rome_buildings table Execution Time: ~0.40s PostGIS becomes faster than DuckDB on the largest dataset for this query.
3.1. Top 10 Largest Areas (Rome) PostGIS Hot Cache Avg. rome_buildings table Execution Time: ~0.38s
Runs: 99
Demonstrates better scaling than DuckDB for this operation.
3.1. Top 10 Largest Areas (Rome) GeoPandas Cold Start rome_buildings.geoparquet Execution Time: ~2.52s The performance gap between GeoPandas and the databases is largest on this dataset.
3.1. Top 10 Largest Areas (Rome) GeoPandas Hot Cache Avg. rome_buildings.geoparquet Execution Time: ~2.63s
Runs: 99
Confirms that for pure computation on large in-memory data, database engines are more optimized.
3.2. Total Buffered Area (Rome) DuckDB Spatial Cold Start rome_buildings.geoparquet Execution Time: ~6.08s Result ~423.0M sqm. The numerical result remains the outlier.
3.2. Total Buffered Area (Rome) DuckDB Spatial Hot Cache Avg. rome_buildings.geoparquet Execution Time: ~6.27s
Runs: 99
Performance is stable and CPU-bound.
3.2. Total Buffered Area (Rome) PostGIS Cold Start rome_buildings table Execution Time: ~2.80s Result ~299.2M sqm. PostGIS is over 2x faster than DuckDB on this heavy computation.
3.2. Total Buffered Area (Rome) PostGIS Hot Cache Avg. rome_buildings table Execution Time: ~2.78s
Runs: 99
Confirms superior performance for complex, single-table aggregations on large data.
3.2. Total Buffered Area (Rome) GeoPandas Cold Start rome_buildings.geoparquet Execution Time: ~9.38s Result ~299.2M sqm. Numerically consistent with PostGIS but the slowest tech.
3.2. Total Buffered Area (Rome) GeoPandas Hot Cache Avg. rome_buildings.geoparquet Execution Time: ~9.14s
Runs: 99
The performance gap widens significantly on the largest dataset.
3.3. Restaurants NOT near Bus Stops (Rome) DuckDB Spatial Cold Start rome_restaurants.geoparquet and rome_bus_stops.geoparquet Execution Time: ~22.9s Result 2562. The result count is still different for every tech.
3.3. Restaurants NOT near Bus Stops (Rome) DuckDB Spatial Hot Cache Avg. rome_restaurants.geoparquet and rome_bus_stops.geoparquet Execution Time: ~17.2s
Runs: 99
Fastest of the database systems for this join.
3.3. Restaurants NOT near Bus Stops (Rome) PostGIS Cold Start rome_restaurants and rome_bus_stops tables Execution Time: ~40.1s Result 2298. The differences keep on being there.
3.3. Restaurants NOT near Bus Stops (Rome) PostGIS Hot Cache Avg. rome_restaurants and rome_bus_stops tables Execution Time: ~40.3s
Runs: 99
Performance of the join is significantly slower than DuckDB on the largest dataset.
3.3. Restaurants NOT near Bus Stops (Rome) GeoPandas Cold Start rome_restaurants.geoparquet and rome_bus_stops.geoparquet Execution Time: ~0.035s Result 3069. The performance of sjoin_nearest is extraordinary and orders of magnitude faster than the databases.
3.3. Restaurants NOT near Bus Stops (Rome) GeoPandas Hot Cache Avg. rome_restaurants.geoparquet and rome_bus_stops.geoparquet Execution Time: ~0.036s
Runs: 99
The result still differs, confirming the method could be incorrect, even though performance is exceptional.

Use Case 4: Complex Spatial Join

Goal: To test performance on computationally intensive join operations on large datasets.

Operation Performed Technology Run Type Test Dataset Key Metrics Notes / Qualitative Observations
4.1. Restaurants per Neighborhood (Pinerolo) DuckDB Spatial Cold Start pinerolo_neighborhoods.geoparquet and pinerolo_restaurants.geoparquet Execution Time: ~0.008s
Output: 0.13 MB
Found 13 valid polygon neighborhoods. The join is extremely fast on small data.
4.1. Restaurants per Neighborhood (Pinerolo) DuckDB Spatial Hot Cache Avg. pinerolo_neighborhoods.geoparquet and pinerolo_restaurants.geoparquet Execution Time: ~0.006s
Runs: 99
Minimal speedup, suggesting the operation is CPU-bound even on small data.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo) DuckDB Spatial Cold Start pinerolo_hospitals.geoparquet, pinerolo_residential_streets.geoparquet and pinerolo_trees.geoparquet Execution Time: ~0.067s Result: 0 trees and ~6.9k meters of streets.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo) DuckDB Spatial Hot Cache Avg. pinerolo_hospitals.geoparquet, pinerolo_residential_streets.geoparquet and pinerolo_trees.geoparquet Execution Time: ~0.058s
Runs: 99
A plausible null result for a small town, confirmed to be CPU-bound.
4.3. Area Not Covered by Parks (Pinerolo) DuckDB Spatial Cold Start pinerolo_parks.geoparquet and osmnx boundary Execution Time: ~0.014s Result ~105 km². Numerically incorrect due to the oversized boundary from osmnx.
4.3. Area Not Covered by Parks (Pinerolo) DuckDB Spatial Hot Cache Avg. pinerolo_parks.geoparquet and osmnx boundary Execution Time: ~0.014s
Runs: 99
Confirms the calculation is based on a flawed input. Performance is CPU-bound.
4.1. Restaurants per Neighborhood (Milan) DuckDB Spatial Cold Start milan_neighborhoods.geoparquet and milan_restaurants.geoparquet Execution Time: ~0.360s
Output: 0.29 MB
Found 34 valid polygon neighborhoods. Performance scales well.
4.1. Restaurants per Neighborhood (Milan) DuckDB Spatial Hot Cache Avg. milan_neighborhoods.geoparquet and milan_restaurants.geoparquet Execution Time: ~0.352s
Runs: 99
The operation remains strongly CPU-bound on the medium dataset.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan) DuckDB Spatial Cold Start milan_hospitals.geoparquet, milan_residential_streets.geoparquet and milan_trees.geoparquet Execution Time: ~4.03s Result: 79 trees and ~26.5k meters of streets.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan) DuckDB Spatial Hot Cache Avg. milan_hospitals.geoparquet, milan_residential_streets.geoparquet and milan_trees.geoparquet Execution Time: ~4.01s
Runs: 99
The heavy computational cost makes the operation CPU-bound.
4.3. Area Not Covered by Parks (Milan) DuckDB Spatial Cold Start milan_parks.geoparquet and osmnx boundary Execution Time: ~6.58s Result ~349 km². Numerically incorrect. The osmnx boundary for Milan is also oversized.
4.3. Area Not Covered by Parks (Milan) DuckDB Spatial Hot Cache Avg. milan_parks.geoparquet and osmnx boundary Execution Time: ~6.57s
Runs: 99
Confirms the data/calculation issue persists. The task is CPU-bound.
4.1. Restaurants per Neighborhood (Rome) DuckDB Spatial Cold Start rome_neighborhoods.geoparquet and rome_restaurants.geoparquet Execution Time: ~4.78s
Output: 2.24 MB
Found 52 valid polygon neighborhoods. Performance continues to scale predictably.
4.1. Restaurants per Neighborhood (Rome) DuckDB Spatial Hot Cache Avg. rome_neighborhoods.geoparquet and rome_restaurants.geoparquet Execution Time: ~4.72s
Runs: 99
The operation is definitively CPU-bound on the largest dataset.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome) DuckDB Spatial Cold Start rome_hospitals.geoparquet, rome_residential_streets.geoparquet and rome_trees.geoparquet Execution Time: ~18.37s Result: 268 trees and ~111k meters of streets.
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome) DuckDB Spatial Hot Cache Avg. rome_hospitals.geoparquet, rome_residential_streets.geoparquet and rome_trees.geoparquet Execution Time: ~18.61s
Runs: 99
The heaviest join operation, confirmed to be entirely CPU-bound.
4.3. Area Not Covered by Parks (Rome) DuckDB Spatial Cold Start rome_parks.geoparquet and osmnx boundary Execution Time: ~13.01s Result ~2,242 km². The numerical result remains incorrect due to the osmnx boundary.
4.3. Area Not Covered by Parks (Rome) DuckDB Spatial Hot Cache Avg. rome_parks.geoparquet and osmnx boundary Execution Time: ~12.89s
Runs: 99
This serves as a benchmark for a computationally extreme task, highlighting the impact of input data quality.
(Selected Operations) PostGIS Test to be done pinerolo/milan/rome datasets • Query Time (s) Test to be done
(Selected Operations) GeoPandas Test to be done pinerolo/milan/rome datasets • Execution Time (s) Test to be done

Appendix: Benchmarks on NON-OSM Data Formats

This section will track the performance of the core technologies on standard geospatial data formats, without the complexity of OSM-specific parsing libraries.

Pure Vector Data (Shapefile)

Goal: To measure ingestion and filtering performance on a standard Shapefile dataset, analyzing the impact of system caches.

Use Case 1: Ingestion

Operation Performed Technology Run Type Test Dataset Key Metrics Notes / Qualitative Observations
Read Shapefile using ST_Read DuckDB Spatial Cold Start Com01012025_WGS84.shp Execution Time: ~0.44s Found Execution 7896 features. Very fast initial read from disk.
Read Shapefile using ST_Read DuckDB Spatial Hot Cache Avg. Com01012025_WGS84.shp Execution Time: ~0.33s
Runs: 99
The OS file cache provides a noticeable, but smaller (~1.3x), speedup.
Load Shapefile into database PostGIS + shp2pgsql // Com01012025_WGS84.shp Execution Time: ~21s Represents the one-time cost of manual ingestion via CLI. Significantly slower due to database overhead.
Read Shapefile into GeoDataFrame GeoPandas Cold Start Com01012025_WGS84.shp Execution Time: ~0.83s Found 7896 features. Performance reflects reading the file from disk.
Read Shapefile into GeoDataFrame GeoPandas Hot Cache Avg. Com01012025_WGS84.shp Execution Time: ~0.34s
Runs: 99
The ~2.4x speedup is due to the OS file cache.

Use Case 2: Filtering

Operation Performed Technology Run Type Test Dataset Key Metrics Notes / Qualitative Observations
Filter table by attribute DuckDB Spatial Cold Start Com01012025_WGS84.shp Execution Time: ~0.020s
Output: 12.57 MB
Filtered to 1180 features. Very fast in-memory query.
Filter table by attribute DuckDB Spatial Hot Cache Avg. Com01012025_WGS84.shp Execution Time: ~0.019s
Runs: 99
CPU-bound, with minimal difference between cold and hot runs.
Filter table by attribute PostGIS Cold Start comuni_istat table Execution Time: ~0.22s Found 1180 features. Represents a query where data is read from disk into the database cache.
Filter table by attribute PostGIS Hot Cache Avg. comuni_istat table Execution Time: ~0.075s
Runs: 99
The ~2.9x speedup is due to PostgreSQL's internal caching mechanism.
Filter GeoDataFrame by attribute GeoPandas Cold Start Com01012025_WGS84.shp Execution Time: ~0.0010s
Output: 12.58 MB
Filtered to 1180 features. In-memory operation is extremely fast.
Filter GeoDataFrame by attribute GeoPandas Hot Cache Avg. Com01012025_WGS84.shp Execution Time: ~0.0009s
Runs: 99
Negligible difference from cold start, confirming the operation is CPU-bound.

Raster Data (GeoTIFF)

Goal: To measure ingestion and filtering performance on a large-scale raster dataset (GHS-POP), analyzing the impact of system caches and comparing file-based vs database approaches.

Use Case 1: Ingestion

Operation Performed Technology Run Type Test Dataset Key Metrics Notes / Qualitative Observations
Open GeoTIFF file DuckDB Spatial N/A GHS_POP_ITALY_100m.tif Execution Time: N/A Technology does not support native raster operations.
Load Raster into database PostGIS + raster2pgsql // GHS_POP_ITALY_100m.tif Execution Time: ~370s (~6.2 min) One-time cost for manual ingestion via CLI. Includes table creation, tiling, and indexing of the entire Italy raster.
Open GeoTIFF file Python (rioxarray) Cold Start GHS_POP_ITALY_100m.tif Execution Time: ~0.15s Measures the time to read file metadata from disk.
Open GeoTIFF file Python (rioxarray) Hot Cache Avg. GHS_POP_ITALY_100m.tif Execution Time: ~0.03s
Runs: 99
The ~5.7x speedup is due to the OS file caching of the file headers.

Use Case 2: Filtering

Operation Performed Technology Run Type Test Dataset Key Metrics Notes / Qualitative Observations
Clip raster to Milan DuckDB Spatial N/A GHS_POP_ITALY_100m.tif Execution Time: N/A Technology does not support native raster operations.
Clip raster to Milan PostGIS Cold Start ghs_population table Execution Time: ~8.12s Found 1,256,044 people. The query is computationally expensive.
Clip raster to Milan PostGIS Hot Cache Avg. ghs_population table Execution Time: ~7.40s
Runs: 99
The minimal speedup confirms the operation is CPU-bound (raster processing) rather than I/O-bound.
Clip raster to Milan Python (rioxarray) Cold Start GHS_POP_ITALY_100m.tif Execution Time: ~7.57s
Output: 0.08 MB
Found 1,256,044 people. Performance is highly competitive with PostGIS.
Clip raster to Milan Python (rioxarray) Hot Cache Avg. GHS_POP_ITALY_100m.tif Execution Time: ~7.91s
Runs: 99
Performance is nearly identical to the cold start, definitively proving the operation is CPU-bound.

(Note: GHS_POP_ITALY_100m.tif is a TIF file generated using the GDAL merging tool via CLI, the dataset downloaded from Copernicus are the for 4 that contains every part of Italy.)


Use Case 5: Vector-Raster Analysis

Goal: To evaluate the performance of different technologies for calculating raster statistics based on vector zones.

Operation Performed Technology Run Type Test Dataset Key Metrics Notes / Qualitative Observations
Calculate Population per Municipality (Piedmont) PostGIS Cold Start comuni_istat_clean and ghs_population tables Execution Time: ~2.50s
GPKG Loading Time: ~17.87s
Calculated total population of 4,222,758 for 1180 municipalities. Result is numerically identical to the Python stack, even though it's needed an additional script to convert the data in a GPKG file, which needs more time to download the data again in the database.
Calculate Population per Municipality (Piedmont) PostGIS Hot Cache Avg. comuni_istat_clean and ghs_population tables Execution Time: ~2.43s
Runs: 99
The operation is CPU-bound. PostGIS is ~5.5x faster than the Python stack for this task.
Calculate Population per Municipality (Piedmont) Python (rasterstats) Cold Start Com01012025_WGS84.shp and GHS_POP_ITALY_100m.tif Execution Time: ~13.97s Calculated total population of 4,222,758 for 1180 municipalities. Result is numerically identical to PostGIS.
Calculate Population per Municipality (Piedmont) Python (rasterstats) Hot Cache Avg. Com01012025_WGS84.shp and GHS_POP_ITALY_100m.tif Execution Time: ~13.45s
Runs: 99
The operation is also CPU-bound. The high-level libraries introduce a significant performance overhead compared to the database.

(Note: DuckDB still does not support native raster operations.)