- 
                Notifications
    
You must be signed in to change notification settings  - Fork 514
 
AutoMQ x Databend: Cloud Data Warehouse with Kafka
Databend is a state-of-the-art, cloud-native data warehouse developed using Rust, tailored for cloud architectures and leveraging object storage. It provides enterprises with a robust big data analytics platform featuring an integrated lakehouse architecture and a separation of compute and storage resources.
This article outlines the steps to import data from AutoMQ into Databend using bend-ingest-kafka.
Initially, navigate to Databend Cloud to launch a Warehouse, and proceed to create a database and a test table in the worksheet.
create database automq_db;
create table users (
    id bigint NOT NULL,
    name string NOT NULL,
    ts timestamp,
    status string
)
Follow the Stand-alone Deployment guide to set up AutoMQ, ensuring there is network connectivity between AutoMQ and Databend.
Promptly create a topic named example_topic in AutoMQ and populate it with test JSON data by following these instructions.
To set up a topic using Apache Kafka® command-line tools, first ensure that you have access to a Kafka environment and the Kafka service is active. Here's an example command to create a topic:
./kafka-topics.sh --create --topic exampleto_topic --bootstrap-server 10.0.96.4:9092  --partitions 1 --replication-factor 1
When executing the command, replace `topic` and `bootstrap-server` with the actual address of the Kafka server being used.
Once the topic is created, use the command below to confirm that the topic was successfully established.
./kafka-topics.sh --describe example_topic --bootstrap-server 10.0.96.4:9092
Create a JSON formatted test data, matching the previous table.
{
  "id": 1,
  "name": "testuser",
  "timestamp": "2023-11-10T12:00:00",
  "status": "active"
}
Write test data to a topic named `example_topic` using Kafka's command-line tools or through programming. Here is an example using the command-line tool:
echo '{"id": 1, "name": "testuser", "timestamp": "2023-11-10T12:00:00", "status": "active"}' | sh kafka-console-producer.sh --broker-list 10.0.96.4:9092 --topic example_topic
When executing the command, replace `topic` and `bootstrap-server` with the actual address of the Kafka server being used.
Use the following command to view the data just written to the topic:
sh kafka-console-consumer.sh --bootstrap-server 10.0.96.4:9092 --topic example_topic --from-beginning
bend-ingest-kafka is capable of monitoring Kafka and batching data into a Databend Table. Once bend-ingest-kafka is deployed, the data import job can be started.
bend-ingest-kafka --kafka-bootstrap-servers="localhost:9094" --kafka-topic="example_topic" --kafka-consumer-group="Consumer Group" --databend-dsn="https://cloudapp:password@host:443" --databend-table="automq_db.users" --data-format="json" --batch-size=5 --batch-max-interval=30s
When executing the command, replace `kafka-bootstrap-servers` with the actual Kafka server address being used.
The DSN for connecting to the warehouse, provided by Databend Cloud, is detailed in this document.
bend-ingest-kafka accumulates data up to the specified batch-size before initiating a data synchronization.
Access the Databend Cloud worksheet and execute a query on the automq_db.users table to verify the synchronization of data from AutoMQ to the Databend Table.
- What is automq: Overview
 - Difference with Apache Kafka
 - Difference with WarpStream
 - Difference with Tiered Storage
 - Compatibility with Apache Kafka
 - Licensing
 
- Deploy Locally
 - Cluster Deployment on Linux
 - Cluster Deployment on Kubernetes
 - Example: Produce & Consume Message
 - Example: Simple Benchmark
 - Example: Partition Reassignment in Seconds
 - Example: Self Balancing when Cluster Nodes Change
 - Example: Continuous Data Self Balancing
 
- Architecture: Overview
 - S3stream shared streaming storage
 - Technical advantage
 
- Deployment: Overview
 - Runs on Cloud
 - Runs on CEPH
 - Runs on CubeFS
 - Runs on MinIO
 - Runs on HDFS
 - Configuration
 
- Data analysis
 - Object storage
 - Kafka ui
 - Observability
 - Data integration
 
