-
Notifications
You must be signed in to change notification settings - Fork 0
Tables
This document tracks the performance and characteristics of different geospatial software stacks for the use cases defined in the project.
Goal: To measure the one-time cost of preparing OSM data (.pbf
) for analysis, either by loading it into a database or by creating a smaller subset.
Operation Performed | Technology | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|
Load entire .pbf into a database |
PostGIS + osm2pgsql |
italy-latest.osm.pbf |
• Import Time: ~1840s (~31 min) • Disk Space: 19 GB |
Represents the high upfront cost to create a fully indexed, queryable database. Ideal for many subsequent, fast queries. |
Extract a regional .pbf from a national .pbf
|
osmium-tool |
italy-latest.osm.pbf |
• Execution Time: ~21s | Represents the cost of creating a smaller, more manageable file for file-based workflows. A pre-processing step for tools like PyOsmium. |
(Note: For file-based tools like QuackOSM, the "ingestion" and "filtering" steps happen simultaneously and their performance is measured in Use Case 2)
Goal: To measure the efficiency of extracting a specific subset of data (buildings in Milan) from a larger source, analyzing the impact of system caches.
Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|---|
Extract buildings from .pbf and save to .geoparquet
|
DuckDB + QuackOSM |
Cold Start | italy-latest.osm.pbf |
• Execution Time: ~1.34s • Output: 6.97 MB |
Performance is I/O-bound. This time reflects a true first-time run, limited by reading the large PBF from disk. |
Extract buildings from .pbf and save to .geoparquet
|
DuckDB + QuackOSM |
Hot Cache Avg. | italy-latest.osm.pbf |
• Execution Time: ~0.22s • Runs: 99 |
Pure in-memory performance. The ~6x speedup is due to the Operating System's file cache, proving the bottleneck is disk I/O, not CPU. |
Extract buildings from database | PostGIS |
Cold Start |
planet_osm_polygon table |
• Execution Time: ~1.10s | Represents a first-time query, limited by reading data pages and indexes from disk into PostgreSQL's internal cache. |
Extract buildings from database | PostGIS |
Hot Cache Avg. |
planet_osm_polygon table |
• Execution Time: ~0.40s • Runs: 99 |
Pure database query performance. The ~2.75x speedup is due to the database's internal caching mechanism. |
Read .pbf , build GeoDataFrame and filter by area |
PyOsmium + GeoPandas |
Cold Start | lombardy-latest.osm.pbf |
• Execution Time: ~358.9s • Output: 7.61 MB |
Performance is heavily CPU-bound. This time reflects the high computational cost of the two-pass parsing and geometry construction in Python. |
Read .pbf , build GeoDataFrame and filter by area |
PyOsmium + GeoPandas |
Hot Cache Avg. | lombardy-latest.osm.pbf |
• Execution Time: ~360.1s • Runs: 49 |
Nearly identical to the cold start, confirming the bottleneck is CPU processing, not disk I/O. The OS cache provides minimal benefit. |
Goal: Evaluate performance on calculations that do not require joins, but operate on geometries, scaling across different data sizes.
Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|---|
3.1. Top 10 Largest Areas (Pinerolo) | DuckDB Spatial |
Cold Start | pinerolo_buildings.geoparquet |
• Execution Time: ~0.028s | Found 10 results. Minimal time difference from hot run. |
3.1. Top 10 Largest Areas (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. | pinerolo_buildings.geoparquet |
• Execution Time: ~0.014s • Runs: 99 |
The ~2x speedup on the small dataset suggests a file metadata caching benefit. |
3.1. Top 10 Largest Areas (Pinerolo) | PostGIS |
Cold Start |
pinerolo_buildings table |
• Execution Time: ~0.044s | Found 10 results. Performance comparable to other techs. |
3.1. Top 10 Largest Areas (Pinerolo) | PostGIS |
Hot Cache Avg. |
pinerolo_buildings table |
• Execution Time: ~0.017s • Runs: 99 |
Significant speedup from database caching on the small dataset. |
3.1. Top 10 Largest Areas (Pinerolo) | GeoPandas |
Cold Start | pinerolo_buildings.geoparquet |
• Execution Time: ~0.071s | Found 10 results. Slower on cold start due to library overhead. |
3.1. Top 10 Largest Areas (Pinerolo) | GeoPandas |
Hot Cache Avg. | pinerolo_buildings.geoparquet |
• Execution Time: ~0.053s • Runs: 99 |
The slowest of the three on this task, but still very fast. |
3.2. Total Buffered Area (Pinerolo) | DuckDB Spatial |
Cold Start | pinerolo_buildings.geoparquet |
• Execution Time: ~0.117s | Result ~11.0M sqm. Performance is CPU-bound. |
3.2. Total Buffered Area (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. | pinerolo_buildings.geoparquet |
• Execution Time: ~0.115s • Runs: 99 |
Result remains a significant outlier compared to other technologies. |
3.2. Total Buffered Area (Pinerolo) | PostGIS |
Cold Start |
pinerolo_buildings table |
• Execution Time: ~0.144s | Result ~7.05M sqm. Performance is CPU-bound. |
3.2. Total Buffered Area (Pinerolo) | PostGIS |
Hot Cache Avg. |
pinerolo_buildings table |
• Execution Time: ~0.144s • Runs: 99 |
The result is consistent with GeoPandas, suggesting it is the correct baseline. |
3.2. Total Buffered Area (Pinerolo) | GeoPandas |
Cold Start | pinerolo_buildings.geoparquet |
• Execution Time: ~0.174s | Result ~7.05M sqm. Performance is CPU-bound. |
3.2. Total Buffered Area (Pinerolo) | GeoPandas |
Hot Cache Avg. | pinerolo_buildings.geoparquet |
• Execution Time: ~0.176s • Runs: 99 |
The result confirms the PostGIS calculation. |
3.3. Restaurants NOT near Bus Stops (Pinerolo) | DuckDB Spatial |
Cold Start |
pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet
|
• Execution Time: ~0.025s | Found 16 results. Spatial anti-join operation. |
3.3. Restaurants NOT near Bus Stops (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. |
pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet
|
• Execution Time: ~0.007s • Runs: 99 |
The result count differs from the other two techs (PostGIS/GeoPandas). |
3.3. Restaurants NOT near Bus Stops (Pinerolo) | PostGIS |
Cold Start |
pinerolo_restaurants and pinerolo_bus_stops tables |
• Execution Time: ~0.003s | Found 15 results. Extremely fast due to database optimizations. |
3.3. Restaurants NOT near Bus Stops (Pinerolo) | PostGIS |
Hot Cache Avg. |
pinerolo_restaurants and pinerolo_bus_stops tables |
• Execution Time: ~0.0008s • Runs: 99 |
The fastest technology for this small-scale join. |
3.3. Restaurants NOT near Bus Stops (Pinerolo) | GeoPandas |
Cold Start |
pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet
|
• Execution Time: ~0.007s | Found 17 results. The result is different from both techs. |
3.3. Restaurants NOT near Bus Stops (Pinerolo) | GeoPandas |
Hot Cache Avg. |
pinerolo_restaurants.geoparquet and pinerolo_bus_stops.geoparquet
|
• Execution Time: ~0.004s • Runs: 99 |
Very fast, but slower than PostGIS. |
3.1. Top 10 Largest Areas (Milan) | DuckDB Spatial |
Cold Start | milan_buildings.geoparquet |
• Execution Time: ~0.196s | Found 10 results. Operation is now CPU-bound on a larger dataset. |
3.1. Top 10 Largest Areas (Milan) | DuckDB Spatial |
Hot Cache Avg. | milan_buildings.geoparquet |
• Execution Time: ~0.189s • Runs: 99 |
Minimal speedup, as the ST_Area calculation on all geometries is the dominant cost. |
3.1. Top 10 Largest Areas (Milan) | PostGIS |
Cold Start |
milan_buildings table |
• Execution Time: ~0.214s | Found 10 results. Performance is nearly identical to DuckDB. |
3.1. Top 10 Largest Areas (Milan) | PostGIS |
Hot Cache Avg. |
milan_buildings table |
• Execution Time: ~0.196s • Runs: 99 |
Similar CPU-bound behavior. |
3.1. Top 10 Largest Areas (Milan) | GeoPandas |
Cold Start | milan_buildings.geoparquet |
• Execution Time: ~0.835s | Found 10 results. Significantly slower than database approaches on a medium dataset. |
3.1. Top 10 Largest Areas (Milan) | GeoPandas |
Hot Cache Avg. | milan_buildings.geoparquet |
• Execution Time: ~0.869s • Runs: 99 |
The performance gap widens on larger data. |
3.2. Total Buffered Area (Milan) | DuckDB Spatial |
Cold Start | milan_buildings.geoparquet |
• Execution Time: ~1.98s | Result ~178.3M sqm. Numerical result is the outlier. |
3.2. Total Buffered Area (Milan) | DuckDB Spatial |
Hot Cache Avg. | milan_buildings.geoparquet |
• Execution Time: ~1.99s • Runs: 99 |
Performance is consistently CPU-bound. |
3.2. Total Buffered Area (Milan) | PostGIS |
Cold Start |
milan_buildings table |
• Execution Time: ~1.37s | Result ~111.5M sqm. Faster than DuckDB and numerically consistent with GeoPandas. |
3.2. Total Buffered Area (Milan) | PostGIS |
Hot Cache Avg. |
milan_buildings table |
• Execution Time: ~1.34s • Runs: 99 |
Consistently CPU-bound behavior. |
3.2. Total Buffered Area (Milan) | GeoPandas |
Cold Start | milan_buildings.geoparquet |
• Execution Time: ~2.99s | Result ~111.5M sqm. Slower than the databases, but confirms the PostGIS result. |
3.2. Total Buffered Area (Milan) | GeoPandas |
Hot Cache Avg. | milan_buildings.geoparquet |
• Execution Time: ~2.93s • Runs: 99 |
Consistently CPU-bound. |
3.3. Restaurants NOT near Bus Stops (Milan) | DuckDB Spatial |
Cold Start |
milan_restaurants.geoparquet and milan_bus_stops.geoparquet
|
• Execution Time: ~5.27s | Result 2384. All the techs have different results. |
3.3. Restaurants NOT near Bus Stops (Milan) | DuckDB Spatial |
Hot Cache Avg. |
milan_restaurants.geoparquet and milan_bus_stops.geoparquet
|
• Execution Time: ~5.11s • Runs: 99 |
Fast performance, but numerically different from the other techs. |
3.3. Restaurants NOT near Bus Stops (Milan) | PostGIS |
Cold Start |
milan_restaurants and milan_bus_stops tables |
• Execution Time: ~10.53s | Result 2135. Slower than DuckDB on this operation. |
3.3. Restaurants NOT near Bus Stops (Milan) | PostGIS |
Hot Cache Avg. |
milan_restaurants and milan_bus_stops tables |
• Execution Time: ~10.49s • Runs: 99 |
Result is different with GeoPandas and DuckDB ones, PostGIS is very slow too on this operation. |
3.3. Restaurants NOT near Bus Stops (Milan) | GeoPandas |
Cold Start |
milan_restaurants.geoparquet and milan_bus_stops.geoparquet
|
• Execution Time: ~0.015s | Result 2672. Extraordinarily fast. sjoin_nearest is highly optimized. |
3.3. Restaurants NOT near Bus Stops (Milan) | GeoPandas |
Hot Cache Avg. |
milan_restaurants.geoparquet and milan_bus_stops.geoparquet
|
• Execution Time: ~0.016s • Runs: 99 |
The result is different to DuckDB and PostGIS ones, confirming that there will be done more researches about this differences. |
3.1. Top 10 Largest Areas (Rome) | DuckDB Spatial |
Cold Start | rome_buildings.geoparquet |
• Execution Time: ~0.56s | Performance scales predictably. |
3.1. Top 10 Largest Areas (Rome) | DuckDB Spatial |
Hot Cache Avg. | rome_buildings.geoparquet |
• Execution Time: ~0.54s • Runs: 99 |
Consistently CPU-bound on the largest dataset. |
3.1. Top 10 Largest Areas (Rome) | PostGIS |
Cold Start |
rome_buildings table |
• Execution Time: ~0.40s | PostGIS becomes faster than DuckDB on the largest dataset for this query. |
3.1. Top 10 Largest Areas (Rome) | PostGIS |
Hot Cache Avg. |
rome_buildings table |
• Execution Time: ~0.38s • Runs: 99 |
Demonstrates better scaling than DuckDB for this operation. |
3.1. Top 10 Largest Areas (Rome) | GeoPandas |
Cold Start | rome_buildings.geoparquet |
• Execution Time: ~2.52s | The performance gap between GeoPandas and the databases is largest on this dataset. |
3.1. Top 10 Largest Areas (Rome) | GeoPandas |
Hot Cache Avg. | rome_buildings.geoparquet |
• Execution Time: ~2.63s • Runs: 99 |
Confirms that for pure computation on large in-memory data, database engines are more optimized. |
3.2. Total Buffered Area (Rome) | DuckDB Spatial |
Cold Start | rome_buildings.geoparquet |
• Execution Time: ~6.08s | Result ~423.0M sqm. The numerical result remains the outlier. |
3.2. Total Buffered Area (Rome) | DuckDB Spatial |
Hot Cache Avg. | rome_buildings.geoparquet |
• Execution Time: ~6.27s • Runs: 99 |
Performance is stable and CPU-bound. |
3.2. Total Buffered Area (Rome) | PostGIS |
Cold Start |
rome_buildings table |
• Execution Time: ~2.80s | Result ~299.2M sqm. PostGIS is over 2x faster than DuckDB on this heavy computation. |
3.2. Total Buffered Area (Rome) | PostGIS |
Hot Cache Avg. |
rome_buildings table |
• Execution Time: ~2.78s • Runs: 99 |
Confirms superior performance for complex, single-table aggregations on large data. |
3.2. Total Buffered Area (Rome) | GeoPandas |
Cold Start | rome_buildings.geoparquet |
• Execution Time: ~9.38s | Result ~299.2M sqm. Numerically consistent with PostGIS but the slowest tech. |
3.2. Total Buffered Area (Rome) | GeoPandas |
Hot Cache Avg. | rome_buildings.geoparquet |
• Execution Time: ~9.14s • Runs: 99 |
The performance gap widens significantly on the largest dataset. |
3.3. Restaurants NOT near Bus Stops (Rome) | DuckDB Spatial |
Cold Start |
rome_restaurants.geoparquet and rome_bus_stops.geoparquet
|
• Execution Time: ~22.9s | Result 2562. The result count is still different for every tech. |
3.3. Restaurants NOT near Bus Stops (Rome) | DuckDB Spatial |
Hot Cache Avg. |
rome_restaurants.geoparquet and rome_bus_stops.geoparquet
|
• Execution Time: ~17.2s • Runs: 99 |
Fastest of the database systems for this join. |
3.3. Restaurants NOT near Bus Stops (Rome) | PostGIS |
Cold Start |
rome_restaurants and rome_bus_stops tables |
• Execution Time: ~40.1s | Result 2298. The differences keep on being there. |
3.3. Restaurants NOT near Bus Stops (Rome) | PostGIS |
Hot Cache Avg. |
rome_restaurants and rome_bus_stops tables |
• Execution Time: ~40.3s • Runs: 99 |
Performance of the join is significantly slower than DuckDB on the largest dataset. |
3.3. Restaurants NOT near Bus Stops (Rome) | GeoPandas |
Cold Start |
rome_restaurants.geoparquet and rome_bus_stops.geoparquet
|
• Execution Time: ~0.035s | Result 3069. The performance of sjoin_nearest is extraordinary and orders of magnitude faster than the databases. |
3.3. Restaurants NOT near Bus Stops (Rome) | GeoPandas |
Hot Cache Avg. |
rome_restaurants.geoparquet and rome_bus_stops.geoparquet
|
• Execution Time: ~0.036s • Runs: 99 |
The result still differs, confirming the method could be incorrect, even though performance is exceptional. |
Goal: To test performance on computationally intensive join operations on large datasets.
Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|---|
4.1. Restaurants per Neighborhood (Pinerolo) | DuckDB Spatial |
Cold Start |
pinerolo_neighborhoods.geoparquet and pinerolo_restaurants.geoparquet
|
• Execution Time: ~0.008s • Output: 0.13 MB |
Found 13 valid polygon neighborhoods. The join is extremely fast on small data. |
4.1. Restaurants per Neighborhood (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. |
pinerolo_neighborhoods.geoparquet and pinerolo_restaurants.geoparquet
|
• Execution Time: ~0.006s • Runs: 99 |
Minimal speedup, suggesting the operation is CPU-bound even on small data. |
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo) | DuckDB Spatial |
Cold Start |
pinerolo_hospitals.geoparquet , pinerolo_residential_streets.geoparquet and pinerolo_trees.geoparquet
|
• Execution Time: ~0.067s | Result: 0 trees and ~6.9k meters of streets. |
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. |
pinerolo_hospitals.geoparquet , pinerolo_residential_streets.geoparquet and pinerolo_trees.geoparquet
|
• Execution Time: ~0.058s • Runs: 99 |
A plausible null result for a small town, confirmed to be CPU-bound. |
4.3. Area Not Covered by Parks (Pinerolo) | DuckDB Spatial |
Cold Start |
pinerolo_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.014s | Result ~105 km². Numerically incorrect due to the oversized boundary from osmnx . |
4.3. Area Not Covered by Parks (Pinerolo) | DuckDB Spatial |
Hot Cache Avg. |
pinerolo_parks.geoparquet and osmnx boundary |
• Execution Time: ~0.014s • Runs: 99 |
Confirms the calculation is based on a flawed input. Performance is CPU-bound. |
4.1. Restaurants per Neighborhood (Milan) | DuckDB Spatial |
Cold Start |
milan_neighborhoods.geoparquet and milan_restaurants.geoparquet
|
• Execution Time: ~0.360s • Output: 0.29 MB |
Found 34 valid polygon neighborhoods. Performance scales well. |
4.1. Restaurants per Neighborhood (Milan) | DuckDB Spatial |
Hot Cache Avg. |
milan_neighborhoods.geoparquet and milan_restaurants.geoparquet
|
• Execution Time: ~0.352s • Runs: 99 |
The operation remains strongly CPU-bound on the medium dataset. |
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan) | DuckDB Spatial |
Cold Start |
milan_hospitals.geoparquet , milan_residential_streets.geoparquet and milan_trees.geoparquet
|
• Execution Time: ~4.03s | Result: 79 trees and ~26.5k meters of streets. |
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Milan) | DuckDB Spatial |
Hot Cache Avg. |
milan_hospitals.geoparquet , milan_residential_streets.geoparquet and milan_trees.geoparquet
|
• Execution Time: ~4.01s • Runs: 99 |
The heavy computational cost makes the operation CPU-bound. |
4.3. Area Not Covered by Parks (Milan) | DuckDB Spatial |
Cold Start |
milan_parks.geoparquet and osmnx boundary |
• Execution Time: ~6.58s | Result ~349 km². Numerically incorrect. The osmnx boundary for Milan is also oversized. |
4.3. Area Not Covered by Parks (Milan) | DuckDB Spatial |
Hot Cache Avg. |
milan_parks.geoparquet and osmnx boundary |
• Execution Time: ~6.57s • Runs: 99 |
Confirms the data/calculation issue persists. The task is CPU-bound. |
4.1. Restaurants per Neighborhood (Rome) | DuckDB Spatial |
Cold Start |
rome_neighborhoods.geoparquet and rome_restaurants.geoparquet
|
• Execution Time: ~4.78s • Output: 2.24 MB |
Found 52 valid polygon neighborhoods. Performance continues to scale predictably. |
4.1. Restaurants per Neighborhood (Rome) | DuckDB Spatial |
Hot Cache Avg. |
rome_neighborhoods.geoparquet and rome_restaurants.geoparquet
|
• Execution Time: ~4.72s • Runs: 99 |
The operation is definitively CPU-bound on the largest dataset. |
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome) | DuckDB Spatial |
Cold Start |
rome_hospitals.geoparquet , rome_residential_streets.geoparquet and rome_trees.geoparquet
|
• Execution Time: ~18.37s | Result: 268 trees and ~111k meters of streets. |
4.2. Tree Count and Length Sum of Residential Roads Near Hospitals (Rome) | DuckDB Spatial |
Hot Cache Avg. |
rome_hospitals.geoparquet , rome_residential_streets.geoparquet and rome_trees.geoparquet
|
• Execution Time: ~18.61s • Runs: 99 |
The heaviest join operation, confirmed to be entirely CPU-bound. |
4.3. Area Not Covered by Parks (Rome) | DuckDB Spatial |
Cold Start |
rome_parks.geoparquet and osmnx boundary |
• Execution Time: ~13.01s | Result ~2,242 km². The numerical result remains incorrect due to the osmnx boundary. |
4.3. Area Not Covered by Parks (Rome) | DuckDB Spatial |
Hot Cache Avg. |
rome_parks.geoparquet and osmnx boundary |
• Execution Time: ~12.89s • Runs: 99 |
This serves as a benchmark for a computationally extreme task, highlighting the impact of input data quality. |
(Selected Operations) | PostGIS |
Test to be done |
pinerolo/milan/rome datasets |
• Query Time (s) | Test to be done |
(Selected Operations) | GeoPandas |
Test to be done |
pinerolo/milan/rome datasets |
• Execution Time (s) | Test to be done |
This section will track the performance of the core technologies on standard geospatial data formats, without the complexity of OSM-specific parsing libraries.
Goal: To measure ingestion and filtering performance on a standard Shapefile dataset, analyzing the impact of system caches.
Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|---|
Read Shapefile using ST_Read | DuckDB Spatial |
Cold Start | Com01012025_WGS84.shp |
• Execution Time: ~0.44s | Found Execution 7896 features. Very fast initial read from disk. |
Read Shapefile using ST_Read | DuckDB Spatial |
Hot Cache Avg. | Com01012025_WGS84.shp |
• Execution Time: ~0.33s • Runs: 99 |
The OS file cache provides a noticeable, but smaller (~1.3x), speedup. |
Load Shapefile into database | PostGIS + shp2pgsql |
// | Com01012025_WGS84.shp |
• Execution Time: ~21s | Represents the one-time cost of manual ingestion via CLI. Significantly slower due to database overhead. |
Read Shapefile into GeoDataFrame | GeoPandas |
Cold Start | Com01012025_WGS84.shp |
• Execution Time: ~0.83s | Found 7896 features. Performance reflects reading the file from disk. |
Read Shapefile into GeoDataFrame | GeoPandas |
Hot Cache Avg. | Com01012025_WGS84.shp |
• Execution Time: ~0.34s • Runs: 99 |
The ~2.4x speedup is due to the OS file cache. |
Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|---|
Filter table by attribute | DuckDB Spatial |
Cold Start | Com01012025_WGS84.shp |
• Execution Time: ~0.020s • Output: 12.57 MB |
Filtered to 1180 features. Very fast in-memory query. |
Filter table by attribute | DuckDB Spatial |
Hot Cache Avg. | Com01012025_WGS84.shp |
• Execution Time: ~0.019s • Runs: 99 |
CPU-bound, with minimal difference between cold and hot runs. |
Filter table by attribute | PostGIS |
Cold Start |
comuni_istat table |
• Execution Time: ~0.22s | Found 1180 features. Represents a query where data is read from disk into the database cache. |
Filter table by attribute | PostGIS |
Hot Cache Avg. |
comuni_istat table |
• Execution Time: ~0.075s • Runs: 99 |
The ~2.9x speedup is due to PostgreSQL's internal caching mechanism. |
Filter GeoDataFrame by attribute | GeoPandas |
Cold Start | Com01012025_WGS84.shp |
• Execution Time: ~0.0010s • Output: 12.58 MB |
Filtered to 1180 features. In-memory operation is extremely fast. |
Filter GeoDataFrame by attribute | GeoPandas |
Hot Cache Avg. | Com01012025_WGS84.shp |
• Execution Time: ~0.0009s • Runs: 99 |
Negligible difference from cold start, confirming the operation is CPU-bound. |
Goal: To measure ingestion and filtering performance on a large-scale raster dataset (GHS-POP), analyzing the impact of system caches and comparing file-based vs database approaches.
Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|---|
Open GeoTIFF file | DuckDB Spatial |
N/A | GHS_POP_ITALY_100m.tif |
• Execution Time: N/A | Technology does not support native raster operations. |
Load Raster into database | PostGIS + raster2pgsql |
// | GHS_POP_ITALY_100m.tif |
• Execution Time: ~370s (~6.2 min) | One-time cost for manual ingestion via CLI. Includes table creation, tiling, and indexing of the entire Italy raster. |
Open GeoTIFF file | Python (rioxarray) |
Cold Start | GHS_POP_ITALY_100m.tif |
• Execution Time: ~0.15s | Measures the time to read file metadata from disk. |
Open GeoTIFF file | Python (rioxarray) |
Hot Cache Avg. | GHS_POP_ITALY_100m.tif |
• Execution Time: ~0.03s • Runs: 99 |
The ~5.7x speedup is due to the OS file caching of the file headers. |
Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|---|
Clip raster to Milan | DuckDB Spatial |
N/A | GHS_POP_ITALY_100m.tif |
• Execution Time: N/A | Technology does not support native raster operations. |
Clip raster to Milan | PostGIS |
Cold Start |
ghs_population table |
• Execution Time: ~8.12s | Found 1,256,044 people. The query is computationally expensive. |
Clip raster to Milan | PostGIS |
Hot Cache Avg. |
ghs_population table |
• Execution Time: ~7.40s • Runs: 99 |
The minimal speedup confirms the operation is CPU-bound (raster processing) rather than I/O-bound. |
Clip raster to Milan | Python (rioxarray) |
Cold Start | GHS_POP_ITALY_100m.tif |
• Execution Time: ~7.57s • Output: 0.08 MB |
Found 1,256,044 people. Performance is highly competitive with PostGIS. |
Clip raster to Milan | Python (rioxarray) |
Hot Cache Avg. | GHS_POP_ITALY_100m.tif |
• Execution Time: ~7.91s • Runs: 99 |
Performance is nearly identical to the cold start, definitively proving the operation is CPU-bound. |
(Note: GHS_POP_ITALY_100m.tif
is a TIF file generated using the GDAL merging tool via CLI, the dataset downloaded from Copernicus are the for 4 that contains every part of Italy.)
Goal: To evaluate the performance of different technologies for calculating raster statistics based on vector zones.
Operation Performed | Technology | Run Type | Test Dataset | Key Metrics | Notes / Qualitative Observations |
---|---|---|---|---|---|
Calculate Population per Municipality (Piedmont) | PostGIS |
Cold Start |
comuni_istat_clean and ghs_population tables |
• Execution Time: ~2.50s • GPKG Loading Time: ~17.87s |
Calculated total population of 4,222,758 for 1180 municipalities. Result is numerically identical to the Python stack, even though it's needed an additional script to convert the data in a GPKG file, which needs more time to download the data again in the database. |
Calculate Population per Municipality (Piedmont) | PostGIS |
Hot Cache Avg. |
comuni_istat_clean and ghs_population tables |
• Execution Time: ~2.43s • Runs: 99 |
The operation is CPU-bound. PostGIS is ~5.5x faster than the Python stack for this task. |
Calculate Population per Municipality (Piedmont) | Python (rasterstats) |
Cold Start |
Com01012025_WGS84.shp and GHS_POP_ITALY_100m.tif
|
• Execution Time: ~13.97s | Calculated total population of 4,222,758 for 1180 municipalities. Result is numerically identical to PostGIS. |
Calculate Population per Municipality (Piedmont) | Python (rasterstats) |
Hot Cache Avg. |
Com01012025_WGS84.shp and GHS_POP_ITALY_100m.tif
|
• Execution Time: ~13.45s • Runs: 99 |
The operation is also CPU-bound. The high-level libraries introduce a significant performance overhead compared to the database. |
(Note: DuckDB still does not support native raster operations.)