Tables

📊 Geospatial Performance Benchmark

This document tracks the performance and characteristics of different geospatial software stacks for the use cases defined in the project.

Use Case 1: Data Ingestion

Goal: To measure the one-time cost of preparing OSM data (.pbf) for analysis, either by loading it into a database or by creating a smaller subset.

Operation Performed	Technology	Test Dataset	Key Metrics	Notes / Qualitative Observations
Load entire `.pbf` into a database	`PostGIS + osm2pgsql`	`italy-latest.osm.pbf`	• Import Time: ~1840s (~31 min) • Disk Space: 19 GB	Represents the high upfront cost to create a fully indexed, queryable database. Ideal for many subsequent, fast queries.
Extract a regional `.pbf` from a national `.pbf`	`osmium-tool`	`italy-latest.osm.pbf`	• Execution Time: ~21s	Represents the cost of creating a smaller, more manageable file for file-based workflows. A pre-processing step for tools like PyOsmium.

(Note: For file-based tools like QuackOSM, the "ingestion" and "filtering" steps happen simultaneously and their performance is measured in Use Case 2)

Use Case 2: Data Filtering and Extraction

Goal: To measure the efficiency of extracting a specific subset of data (buildings in Milan) from a larger source, analyzing the impact of system caches.

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Extract buildings from `.pbf` and save to `.geoparquet`	`DuckDB + QuackOSM`	Cold Start	`italy-latest.osm.pbf`	• Execution Time: ~1.34s • Output: 6.97 MB	Performance is I/O-bound. This time reflects a true first-time run, limited by reading the large PBF from disk.
Extract buildings from `.pbf` and save to `.geoparquet`	`DuckDB + QuackOSM`	Hot Cache Avg.	`italy-latest.osm.pbf`	• Execution Time: ~0.22s • Runs: 99	Pure in-memory performance. The ~6x speedup* is due to the Operating System's file cache, proving the bottleneck is disk I/O, not CPU.*
Extract buildings from database	`PostGIS`	Cold Start	`planet_osm_polygon` table	• Execution Time: ~1.10s	Represents a first-time query, limited by reading data pages and indexes from disk into PostgreSQL's internal cache.
Extract buildings from database	`PostGIS`	Hot Cache Avg.	`planet_osm_polygon` table	• Execution Time: ~0.40s • Runs: 99	Pure database query performance. The ~2.75x speedup* is due to the database's internal caching mechanism.*
Read `.pbf`, build GeoDataFrame and filter by area	`PyOsmium + GeoPandas`	Cold Start	`lombardy-latest.osm.pbf`	• Execution Time: ~358.9s • Output: 7.61 MB	Performance is heavily CPU-bound. This time reflects the high computational cost of the two-pass parsing and geometry construction in Python.
Read `.pbf`, build GeoDataFrame and filter by area	`PyOsmium + GeoPandas`	Hot Cache Avg.	`lombardy-latest.osm.pbf`	• Execution Time: ~360.1s • Runs: 49	Nearly identical to the cold start, confirming the bottleneck is CPU processing, not disk I/O. The OS cache provides minimal benefit.

Use Case 3: Single Table Analysis (Geometric Properties)

Goal: Evaluate performance on calculations that do not require joins, but operate on geometries, scaling across different data sizes.

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
3.1. Top 10 Largest Areas (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.028s	Found 10 results. Minimal time difference from hot run.
3.1. Top 10 Largest Areas (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.014s • Runs: 99	The ~2x speedup* on the small dataset suggests a file metadata caching benefit.*
3.1. Top 10 Largest Areas (Pinerolo)	`PostGIS`	Cold Start	`pinerolo_buildings` table	• Execution Time: ~0.044s	Found 10 results. Performance comparable to other techs.
3.1. Top 10 Largest Areas (Pinerolo)	`PostGIS`	Hot Cache Avg.	`pinerolo_buildings` table	• Execution Time: ~0.017s • Runs: 99	Significant speedup from database caching on the small dataset.
3.1. Top 10 Largest Areas (Pinerolo)	`GeoPandas`	Cold Start	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.071s	Found 10 results. Slower on cold start due to library overhead.
3.1. Top 10 Largest Areas (Pinerolo)	`GeoPandas`	Hot Cache Avg.	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.053s • Runs: 99	The slowest of the three on this task, but still very fast.
3.2. Total Buffered Area (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.117s	Result ~11.0M sqm. Performance is CPU-bound.
3.2. Total Buffered Area (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.115s • Runs: 99	Result remains a significant outlier compared to other technologies.
3.2. Total Buffered Area (Pinerolo)	`PostGIS`	Cold Start	`pinerolo_buildings` table	• Execution Time: ~0.144s	Result ~7.05M sqm. Performance is CPU-bound.
3.2. Total Buffered Area (Pinerolo)	`PostGIS`	Hot Cache Avg.	`pinerolo_buildings` table	• Execution Time: ~0.144s • Runs: 99	The result is consistent with GeoPandas, suggesting it is the correct baseline.
3.2. Total Buffered Area (Pinerolo)	`GeoPandas`	Cold Start	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.174s	Result ~7.05M sqm. Performance is CPU-bound.
3.2. Total Buffered Area (Pinerolo)	`GeoPandas`	Hot Cache Avg.	`pinerolo_buildings.geoparquet`	• Execution Time: ~0.176s • Runs: 99	The result confirms the PostGIS calculation.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_restaurants.geoparquet` and `pinerolo_bus_stops.geoparquet`	• Execution Time: ~0.025s	Found 16 results. Spatial anti-join operation.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_restaurants.geoparquet` and `pinerolo_bus_stops.geoparquet`	• Execution Time: ~0.007s • Runs: 99	The result count differs from the other two techs (PostGIS/GeoPandas).
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`PostGIS`	Cold Start	`pinerolo_restaurants` and `pinerolo_bus_stops` tables	• Execution Time: ~0.003s	Found 15 results. Extremely fast due to database optimizations.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`PostGIS`	Hot Cache Avg.	`pinerolo_restaurants` and `pinerolo_bus_stops` tables	• Execution Time: ~0.0008s • Runs: 99	The fastest technology for this small-scale join.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`GeoPandas`	Cold Start	`pinerolo_restaurants.geoparquet` and `pinerolo_bus_stops.geoparquet`	• Execution Time: ~0.007s	Found 17 results. The result is different from both techs.
3.3. Restaurants NOT near Bus Stops (Pinerolo)	`GeoPandas`	Hot Cache Avg.	`pinerolo_restaurants.geoparquet` and `pinerolo_bus_stops.geoparquet`	• Execution Time: ~0.004s • Runs: 99	Very fast, but slower than PostGIS.
3.1. Top 10 Largest Areas (Milan)	`DuckDB Spatial`	Cold Start	`milan_buildings.geoparquet`	• Execution Time: ~0.196s	Found 10 results. Operation is now CPU-bound* on a larger dataset.*
3.1. Top 10 Largest Areas (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_buildings.geoparquet`	• Execution Time: ~0.189s • Runs: 99	Minimal speedup, as the `ST_Area` calculation on all geometries is the dominant cost.
3.1. Top 10 Largest Areas (Milan)	`PostGIS`	Cold Start	`milan_buildings` table	• Execution Time: ~0.214s	Found 10 results. Performance is nearly identical to DuckDB.
3.1. Top 10 Largest Areas (Milan)	`PostGIS`	Hot Cache Avg.	`milan_buildings` table	• Execution Time: ~0.196s • Runs: 99	Similar CPU-bound* behavior.*
3.1. Top 10 Largest Areas (Milan)	`GeoPandas`	Cold Start	`milan_buildings.geoparquet`	• Execution Time: ~0.835s	Found 10 results. Significantly slower than database approaches on a medium dataset.
3.1. Top 10 Largest Areas (Milan)	`GeoPandas`	Hot Cache Avg.	`milan_buildings.geoparquet`	• Execution Time: ~0.869s • Runs: 99	The performance gap widens on larger data.
3.2. Total Buffered Area (Milan)	`DuckDB Spatial`	Cold Start	`milan_buildings.geoparquet`	• Execution Time: ~1.98s	Result ~178.3M sqm. Numerical result is the outlier.
3.2. Total Buffered Area (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_buildings.geoparquet`	• Execution Time: ~1.99s • Runs: 99	Performance is consistently CPU-bound.
3.2. Total Buffered Area (Milan)	`PostGIS`	Cold Start	`milan_buildings` table	• Execution Time: ~1.37s	Result ~111.5M sqm. Faster than DuckDB and numerically consistent with GeoPandas.
3.2. Total Buffered Area (Milan)	`PostGIS`	Hot Cache Avg.	`milan_buildings` table	• Execution Time: ~1.34s • Runs: 99	Consistently CPU-bound* behavior.*
3.2. Total Buffered Area (Milan)	`GeoPandas`	Cold Start	`milan_buildings.geoparquet`	• Execution Time: ~2.99s	Result ~111.5M sqm. Slower than the databases, but confirms the PostGIS result.
3.2. Total Buffered Area (Milan)	`GeoPandas`	Hot Cache Avg.	`milan_buildings.geoparquet`	• Execution Time: ~2.93s • Runs: 99	Consistently CPU-bound.
3.3. Restaurants NOT near Bus Stops (Milan)	`DuckDB Spatial`	Cold Start	`milan_restaurants.geoparquet` and `milan_bus_stops.geoparquet`	• Execution Time: ~5.27s	Result 2384. All the techs have different results.
3.3. Restaurants NOT near Bus Stops (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_restaurants.geoparquet` and `milan_bus_stops.geoparquet`	• Execution Time: ~5.11s • Runs: 99	Fast performance, but numerically different from the other techs.
3.3. Restaurants NOT near Bus Stops (Milan)	`PostGIS`	Cold Start	`milan_restaurants` and `milan_bus_stops` tables	• Execution Time: ~10.53s	Result 2135. Slower than DuckDB on this operation.
3.3. Restaurants NOT near Bus Stops (Milan)	`PostGIS`	Hot Cache Avg.	`milan_restaurants` and `milan_bus_stops` tables	• Execution Time: ~10.49s • Runs: 99	Result is different with GeoPandas and DuckDB ones, PostGIS is very slow too on this operation.
3.3. Restaurants NOT near Bus Stops (Milan)	`GeoPandas`	Cold Start	`milan_restaurants.geoparquet` and `milan_bus_stops.geoparquet`	• Execution Time: ~0.015s	Result 2672. Extraordinarily fast. `sjoin_nearest` is highly optimized.
3.3. Restaurants NOT near Bus Stops (Milan)	`GeoPandas`	Hot Cache Avg.	`milan_restaurants.geoparquet` and `milan_bus_stops.geoparquet`	• Execution Time: ~0.016s • Runs: 99	The result is different to DuckDB and PostGIS ones, confirming that there will be done more researches about this differences.
3.1. Top 10 Largest Areas (Rome)	`DuckDB Spatial`	Cold Start	`rome_buildings.geoparquet`	• Execution Time: ~0.56s	Performance scales predictably.
3.1. Top 10 Largest Areas (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_buildings.geoparquet`	• Execution Time: ~0.54s • Runs: 99	Consistently CPU-bound* on the largest dataset.*
3.1. Top 10 Largest Areas (Rome)	`PostGIS`	Cold Start	`rome_buildings` table	• Execution Time: ~0.40s	PostGIS becomes faster than DuckDB on the largest dataset for this query.
3.1. Top 10 Largest Areas (Rome)	`PostGIS`	Hot Cache Avg.	`rome_buildings` table	• Execution Time: ~0.38s • Runs: 99	Demonstrates better scaling than DuckDB for this operation.
3.1. Top 10 Largest Areas (Rome)	`GeoPandas`	Cold Start	`rome_buildings.geoparquet`	• Execution Time: ~2.52s	The performance gap between GeoPandas and the databases is largest on this dataset.
3.1. Top 10 Largest Areas (Rome)	`GeoPandas`	Hot Cache Avg.	`rome_buildings.geoparquet`	• Execution Time: ~2.63s • Runs: 99	Confirms that for pure computation on large in-memory data, database engines are more optimized.
3.2. Total Buffered Area (Rome)	`DuckDB Spatial`	Cold Start	`rome_buildings.geoparquet`	• Execution Time: ~6.08s	Result ~423.0M sqm. The numerical result remains the outlier.
3.2. Total Buffered Area (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_buildings.geoparquet`	• Execution Time: ~6.27s • Runs: 99	Performance is stable and CPU-bound.
3.2. Total Buffered Area (Rome)	`PostGIS`	Cold Start	`rome_buildings` table	• Execution Time: ~2.80s	Result ~299.2M sqm. PostGIS is over 2x faster than DuckDB on this heavy computation.
3.2. Total Buffered Area (Rome)	`PostGIS`	Hot Cache Avg.	`rome_buildings` table	• Execution Time: ~2.78s • Runs: 99	Confirms superior performance for complex, single-table aggregations on large data.
3.2. Total Buffered Area (Rome)	`GeoPandas`	Cold Start	`rome_buildings.geoparquet`	• Execution Time: ~9.38s	Result ~299.2M sqm. Numerically consistent with PostGIS but the slowest tech.
3.2. Total Buffered Area (Rome)	`GeoPandas`	Hot Cache Avg.	`rome_buildings.geoparquet`	• Execution Time: ~9.14s • Runs: 99	The performance gap widens significantly on the largest dataset.
3.3. Restaurants NOT near Bus Stops (Rome)	`DuckDB Spatial`	Cold Start	`rome_restaurants.geoparquet` and `rome_bus_stops.geoparquet`	• Execution Time: ~22.9s	Result 2562. The result count is still different for every tech.
3.3. Restaurants NOT near Bus Stops (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_restaurants.geoparquet` and `rome_bus_stops.geoparquet`	• Execution Time: ~17.2s • Runs: 99	Fastest of the database systems for this join.
3.3. Restaurants NOT near Bus Stops (Rome)	`PostGIS`	Cold Start	`rome_restaurants` and `rome_bus_stops` tables	• Execution Time: ~40.1s	Result 2298. The differences keep on being there.
3.3. Restaurants NOT near Bus Stops (Rome)	`PostGIS`	Hot Cache Avg.	`rome_restaurants` and `rome_bus_stops` tables	• Execution Time: ~40.3s • Runs: 99	Performance of the join is significantly slower than DuckDB on the largest dataset.
3.3. Restaurants NOT near Bus Stops (Rome)	`GeoPandas`	Cold Start	`rome_restaurants.geoparquet` and `rome_bus_stops.geoparquet`	• Execution Time: ~0.035s	Result 3069. The performance of `sjoin_nearest` is extraordinary and orders of magnitude faster than the databases.
3.3. Restaurants NOT near Bus Stops (Rome)	`GeoPandas`	Hot Cache Avg.	`rome_restaurants.geoparquet` and `rome_bus_stops.geoparquet`	• Execution Time: ~0.036s • Runs: 99	The result still differs, confirming the method could be incorrect, even though performance is exceptional.

Use Case 4: Complex Spatial Join

Goal: To test performance on computationally intensive join operations on large datasets.

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
4.1. Restaurants per Neighborhood (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_neighborhoods.geoparquet` and `pinerolo_restaurants.geoparquet`	• Execution Time: ~0.008s • Output: 0.13 MB	Found 13 valid polygon neighborhoods. The join is extremely fast on small data.
4.1. Restaurants per Neighborhood (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_neighborhoods.geoparquet` and `pinerolo_restaurants.geoparquet`	• Execution Time: ~0.006s • Runs: 99	Minimal speedup, suggesting the operation is CPU-bound* even on small data.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_hospitals.geoparquet`, `pinerolo_residential_streets.geoparquet` and `pinerolo_trees.geoparquet`	• Execution Time: ~0.067s	Result: 0 trees* and ~6.9k meters of streets.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_hospitals.geoparquet`, `pinerolo_residential_streets.geoparquet` and `pinerolo_trees.geoparquet`	• Execution Time: ~0.058s • Runs: 99	A plausible null result for a small town, confirmed to be CPU-bound.
4.3. Area Not Covered by Parks (Pinerolo)	`DuckDB Spatial`	Cold Start	`pinerolo_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.014s	Result ~105 km². Numerically incorrect due to the oversized boundary from `osmnx`.
4.3. Area Not Covered by Parks (Pinerolo)	`DuckDB Spatial`	Hot Cache Avg.	`pinerolo_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~0.014s • Runs: 99	Confirms the calculation is based on a flawed input. Performance is CPU-bound.
4.1. Restaurants per Neighborhood (Milan)	`DuckDB Spatial`	Cold Start	`milan_neighborhoods.geoparquet` and `milan_restaurants.geoparquet`	• Execution Time: ~0.360s • Output: 0.29 MB	Found 34 valid polygon neighborhoods. Performance scales well.
4.1. Restaurants per Neighborhood (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_neighborhoods.geoparquet` and `milan_restaurants.geoparquet`	• Execution Time: ~0.352s • Runs: 99	The operation remains strongly CPU-bound* on the medium dataset.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan)	`DuckDB Spatial`	Cold Start	`milan_hospitals.geoparquet`, `milan_residential_streets.geoparquet` and `milan_trees.geoparquet`	• Execution Time: ~4.03s	Result: 79 trees* and ~26.5k meters of streets.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_hospitals.geoparquet`, `milan_residential_streets.geoparquet` and `milan_trees.geoparquet`	• Execution Time: ~4.01s • Runs: 99	The heavy computational cost makes the operation CPU-bound.
4.3. Area Not Covered by Parks (Milan)	`DuckDB Spatial`	Cold Start	`milan_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~6.58s	Result ~349 km². Numerically incorrect. The `osmnx` boundary for Milan is also oversized.
4.3. Area Not Covered by Parks (Milan)	`DuckDB Spatial`	Hot Cache Avg.	`milan_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~6.57s • Runs: 99	Confirms the data/calculation issue persists. The task is CPU-bound.
4.1. Restaurants per Neighborhood (Rome)	`DuckDB Spatial`	Cold Start	`rome_neighborhoods.geoparquet` and `rome_restaurants.geoparquet`	• Execution Time: ~4.78s • Output: 2.24 MB	Found 52 valid polygon neighborhoods. Performance continues to scale predictably.
4.1. Restaurants per Neighborhood (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_neighborhoods.geoparquet` and `rome_restaurants.geoparquet`	• Execution Time: ~4.72s • Runs: 99	The operation is definitively CPU-bound* on the largest dataset.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome)	`DuckDB Spatial`	Cold Start	`rome_hospitals.geoparquet`, `rome_residential_streets.geoparquet` and `rome_trees.geoparquet`	• Execution Time: ~18.37s	Result: 268 trees* and ~111k meters of streets.*
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_hospitals.geoparquet`, `rome_residential_streets.geoparquet` and `rome_trees.geoparquet`	• Execution Time: ~18.61s • Runs: 99	The heaviest join operation, confirmed to be entirely CPU-bound.
4.3. Area Not Covered by Parks (Rome)	`DuckDB Spatial`	Cold Start	`rome_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~13.01s	Result ~2,242 km². The numerical result remains incorrect due to the `osmnx` boundary.
4.3. Area Not Covered by Parks (Rome)	`DuckDB Spatial`	Hot Cache Avg.	`rome_parks.geoparquet` and `osmnx` boundary	• Execution Time: ~12.89s • Runs: 99	This serves as a benchmark for a computationally extreme task, highlighting the impact of input data quality.
(Selected Operations)	`PostGIS`	Test to be done	`pinerolo/milan/rome` datasets	• Query Time (s)	Test to be done
(Selected Operations)	`GeoPandas`	Test to be done	`pinerolo/milan/rome` datasets	• Execution Time (s)	Test to be done

Appendix: Benchmarks on NON-OSM Data Formats

This section will track the performance of the core technologies on standard geospatial data formats, without the complexity of OSM-specific parsing libraries.

Pure Vector Data (Shapefile)

Goal: To measure ingestion and filtering performance on a standard Shapefile dataset, analyzing the impact of system caches.

Use Case 1: Ingestion

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Read Shapefile using ST_Read	`DuckDB Spatial`	Cold Start	`Com01012025_WGS84.shp`	• Execution Time: ~0.44s	Found Execution 7896 features. Very fast initial read from disk.
Read Shapefile using ST_Read	`DuckDB Spatial`	Hot Cache Avg.	`Com01012025_WGS84.shp`	• Execution Time: ~0.33s • Runs: 99	The OS file cache provides a noticeable, but smaller (~1.3x), speedup.
Load Shapefile into database	`PostGIS + shp2pgsql`	//	`Com01012025_WGS84.shp`	• Execution Time: ~21s	Represents the one-time cost of manual ingestion via CLI. Significantly slower due to database overhead.
Read Shapefile into GeoDataFrame	`GeoPandas`	Cold Start	`Com01012025_WGS84.shp`	• Execution Time: ~0.83s	Found 7896 features. Performance reflects reading the file from disk.
Read Shapefile into GeoDataFrame	`GeoPandas`	Hot Cache Avg.	`Com01012025_WGS84.shp`	• Execution Time: ~0.34s • Runs: 99	The ~2.4x speedup* is due to the OS file cache.*

Use Case 2: Filtering

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Filter table by attribute	`DuckDB Spatial`	Cold Start	`Com01012025_WGS84.shp`	• Execution Time: ~0.020s • Output: 12.57 MB	Filtered to 1180 features. Very fast in-memory query.
Filter table by attribute	`DuckDB Spatial`	Hot Cache Avg.	`Com01012025_WGS84.shp`	• Execution Time: ~0.019s • Runs: 99	CPU-bound, with minimal difference between cold and hot runs.
Filter table by attribute	`PostGIS`	Cold Start	`comuni_istat` table	• Execution Time: ~0.22s	Found 1180 features. Represents a query where data is read from disk into the database cache.
Filter table by attribute	`PostGIS`	Hot Cache Avg.	`comuni_istat` table	• Execution Time: ~0.075s • Runs: 99	The ~2.9x speedup* is due to PostgreSQL's internal caching mechanism.*
Filter GeoDataFrame by attribute	`GeoPandas`	Cold Start	`Com01012025_WGS84.shp`	• Execution Time: ~0.0010s • Output: 12.58 MB	Filtered to 1180 features. In-memory operation is extremely fast.
Filter GeoDataFrame by attribute	`GeoPandas`	Hot Cache Avg.	`Com01012025_WGS84.shp`	• Execution Time: ~0.0009s • Runs: 99	Negligible difference from cold start, confirming the operation is CPU-bound.

Raster Data (GeoTIFF)

Goal: To measure ingestion and filtering performance on a large-scale raster dataset (GHS-POP), analyzing the impact of system caches and comparing file-based vs database approaches.

Use Case 1: Ingestion

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Open GeoTIFF file	`DuckDB Spatial`	N/A	`GHS_POP_ITALY_100m.tif`	• Execution Time: N/A	Technology does not support native raster operations.
Load Raster into database	`PostGIS + raster2pgsql`	//	`GHS_POP_ITALY_100m.tif`	• Execution Time: ~370s (~6.2 min)	One-time cost for manual ingestion via CLI. Includes table creation, tiling, and indexing of the entire Italy raster.
Open GeoTIFF file	`Python (rioxarray)`	Cold Start	`GHS_POP_ITALY_100m.tif`	• Execution Time: ~0.15s	Measures the time to read file metadata from disk.
Open GeoTIFF file	`Python (rioxarray)`	Hot Cache Avg.	`GHS_POP_ITALY_100m.tif`	• Execution Time: ~0.03s • Runs: 99	The ~5.7x speedup* is due to the OS file caching of the file headers.*

Use Case 2: Filtering

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Clip raster to Milan	`DuckDB Spatial`	N/A	`GHS_POP_ITALY_100m.tif`	• Execution Time: N/A	Technology does not support native raster operations.
Clip raster to Milan	`PostGIS`	Cold Start	`ghs_population` table	• Execution Time: ~8.12s	Found 1,256,044* people. The query is computationally expensive.*
Clip raster to Milan	`PostGIS`	Hot Cache Avg.	`ghs_population` table	• Execution Time: ~7.40s • Runs: 99	The minimal speedup confirms the operation is CPU-bound* (raster processing) rather than I/O-bound.*
Clip raster to Milan	`Python (rioxarray)`	Cold Start	`GHS_POP_ITALY_100m.tif`	• Execution Time: ~7.57s • Output: 0.08 MB	Found 1,256,044* people. Performance is highly competitive with PostGIS.*
Clip raster to Milan	`Python (rioxarray)`	Hot Cache Avg.	`GHS_POP_ITALY_100m.tif`	• Execution Time: ~7.91s • Runs: 99	Performance is nearly identical to the cold start, definitively proving the operation is CPU-bound.

(Note: GHS_POP_ITALY_100m.tif is a TIF file generated using the GDAL merging tool via CLI, the dataset downloaded from Copernicus are the for 4 that contains every part of Italy.)

Use Case 5: Vector-Raster Analysis

Goal: To evaluate the performance of different technologies for calculating raster statistics based on vector zones.

Operation Performed	Technology	Run Type	Test Dataset	Key Metrics	Notes / Qualitative Observations
Calculate Population per Municipality (Piedmont)	`PostGIS`	Cold Start	`comuni_istat_clean` and `ghs_population` tables	• Execution Time: ~2.50s • GPKG Loading Time: ~17.87s	Calculated total population of 4,222,758 for 1180 municipalities. Result is numerically identical to the Python stack, even though it's needed an additional script to convert the data in a GPKG file, which needs more time to download the data again in the database.
Calculate Population per Municipality (Piedmont)	`PostGIS`	Hot Cache Avg.	`comuni_istat_clean` and `ghs_population` tables	• Execution Time: ~2.43s • Runs: 99	The operation is CPU-bound. PostGIS is ~5.5x faster* than the Python stack for this task.*
Calculate Population per Municipality (Piedmont)	`Python (rasterstats)`	Cold Start	`Com01012025_WGS84.shp` and `GHS_POP_ITALY_100m.tif`	• Execution Time: ~13.97s	Calculated total population of 4,222,758* for 1180 municipalities. Result is numerically identical to PostGIS.*
Calculate Population per Municipality (Piedmont)	`Python (rasterstats)`	Hot Cache Avg.	`Com01012025_WGS84.shp` and `GHS_POP_ITALY_100m.tif`	• Execution Time: ~13.45s • Runs: 99	The operation is also CPU-bound. The high-level libraries introduce a significant performance overhead compared to the database.

(Note: DuckDB still does not support native raster operations.)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tables

📊 Geospatial Performance Benchmark

Use Case 1: Data Ingestion

Use Case 2: Data Filtering and Extraction

Use Case 3: Single Table Analysis (Geometric Properties)

Use Case 4: Complex Spatial Join

Appendix: Benchmarks on NON-OSM Data Formats

Pure Vector Data (Shapefile)

Use Case 1: Ingestion

Use Case 2: Filtering

Raster Data (GeoTIFF)

Use Case 1: Ingestion

Use Case 2: Filtering

Use Case 5: Vector-Raster Analysis

Clone this wiki locally