This repository contains a collection of hands-on demonstrations for modern data engineering on Google Cloud.
All demos in this repository require the following:
- A Google Cloud Platform (GCP) project with billing enabled.
- The Google Cloud SDK (
gcloudCLI) installed and authenticated to your project. - For Python-based demos, uv is used for package management. See individual demo guides for details.
-
BigQuery & Iceberg Open Lakehouse
- Demonstrates a multi-engine lakehouse using BigQuery and Spark on a single Iceberg table.
-
Data Engineering Agent Introduction
- Shows how to use a conversational AI agent to build a data pipeline from natural language prompts.
-
Dataproc Serverless Performance Benchmark
- Showcases the performance and cost-efficiency of the Dataproc Serverless Premium Tier by comparing it against the Standard Tier on a complex, shuffle-intensive workload.
-
- Demonstrates how to enforce fine-grained data access for multiple users on a single, shared Dataproc cluster using service account-based security.