A Rust crate to quickly build replication solutions for Postgres. Build data pipelines which continually copy data from Postgres to other systems.
Examples
This crate builds abstractions on top of Postgres's logical streaming replication protocol and pushes users towards the pit of success without letting them worry about low level details of the protocol.
- Features
- Installation
- Quickstart
- Database Setup
- Running Tests
- Docker
- Architecture
- Troubleshooting
- License
The etl
crate supports the following destinations:
- BigQuery
- Apache Iceberg (planned)
- DuckDB (planned)
To use etl
in your Rust project, add the core library and desired destinations via git dependencies in Cargo.toml
:
[dependencies]
etl = { git = "https://github.com/supabase/etl" }
etl-destinations = { git = "https://github.com/supabase/etl", features = ["bigquery"] }
The etl
crate provides the core replication functionality, while etl-destinations
contains destination-specific implementations. Each destination is behind a feature of the same name in the etl-destinations
crate. The git dependency is needed for now because the crates are not yet published on crates.io.
To quickly get started with etl
, see the etl-examples crate which contains practical examples and detailed setup instructions.
Before running the examples, tests, or the API and replicator components, you'll need to set up a PostgreSQL database. We provide a convenient script to help you with this setup. For detailed instructions on how to use the database setup script, please refer to our Database Setup Guide.
To run the test suite:
cargo test --all-features
The repository includes Docker support for both the replicator
and api
components:
# Build replicator image
docker build -f ./etl-replicator/Dockerfile .
# Build api image
docker build -f ./etl-api/Dockerfile .
For a detailed explanation of the ETL architecture and design decisions, please refer to our Design Document.
If you see the following error when running tests on macOS:
called `Result::unwrap()` on an `Err` value: Os { code: 24, kind: Uncategorized, message: "Too many open files" }
Raise the limit of open files per process with:
ulimit -n 10000
Currently, the system parallelizes the copying of different tables, but each individual table is still copied in sequential batches. This limits performance for large tables. We plan to address this once the ETL system reaches greater stability.
Distributed under the Apache-2.0 License. See LICENSE
for more information.