FleetStream 🚀

FleetStream is an experimental sensor‑to‑dashboard playground. It emulates a fleet of vehicles, captures raw telemetry, crunches it in real‑time with Apache Spark Structured Streaming, and exposes the results for further analysis.

✨ What does it do?

Layer	Tooling	Role
Simulation	FastAPI(containerised REST service)	Fires random telemetry events into Kafka (vehicle.telemetry.raw)
Queue	Apache Kafka 3.5	Buffers raw messages and receives aggregated streams
Processing	Apache Spark 3.5(Structured Streaming)	Reads fromvehicle.telemetry.raw, computes stats (avg. speed, fuel level, etc.) and writes tovehicle.telemetry.agg
Orchestration	Apache Airflow 2.8	Scheduled batches & housekeeping (e.g. Kafka log compaction, backups)
Analytics	Metabase 0.47	Live dashboards on the aggregated data

Everything is wrapped in Docker Compose – start / stop the whole stack with a single command.

🏗️ Architecture at a glance

┌────────────┐   HTTP/JSON   ┌────────────┐        ┌────────┐
│  Simulator ├──────────────►│   Kafka    │◄──────►│ Spark  │
└────────────┘  (vehicle.*)  └────────────┘  topic │(stream)│
                        ▲                    ▲     └────────┘
                        │                    │         │
                        │        DAGs        │         ▼
                 ┌────────────┐   Airflow    │      Metabase
                 │  REST API  │◄─────────────┘     Dashboards
                 └────────────┘

⚡ Quick start

Prerequisites: Docker & Docker Compose v2 (Linux, macOS or WSL 2).

# 1. Clone the repository
$ git clone https://github.com/<your‑handle>/fleetstream.git
$ cd fleetstream/docker

# 2. Build and launch the entire stack (add -d to detach and mute logs)
$ docker compose up --build

# 3. (optional) Create topics if Kafka is fresh
$ bin/create_topics.sh        # helper script

# 4. Open the portals
- Spark UI:  http://localhost:8080
- Metabase:  http://localhost:3000
- Airflow:   http://localhost:8081  (login: admin / admin)

# 5. Start the simulator
$ ../scripts/simulation_start.sh   # ~1 msg/s by default

# 6. Peek at the results. Peek at the results
./consume.sh

To stop and wipe volumes: docker compose down -v.

🗂️ Repository layout

.
├─ docker/                       # Compose files, infra images & helpers
│   ├─ docker-compose.yml
│   ├─ dags/                     # Airflow DAGs (mounted into the scheduler)
│   ├─ logs/                     # Local log volume mounts
│   ├─ spark.Dockerfile          # Custom Spark image
│   └─ …                         # Inne pliki konfiguracyjne
├─ services/                     # Application-level code & images
│   ├─ processing/               # Streaming jobs
│   │   └─ spark/stream_agg.py
│   ├─ consumer.py               # Kafka consumer helper
│   ├─ main.py                   # FastAPI entry-point
│   ├─ Dockerfile                # Builds the API container
│   └─ requirements.txt          # Service-specific deps
├─ scripts/                      # Bash helpers – start/stop the simulator
│   ├─ simulation_start.sh
│   └─ simulation_stop.sh
├─ README.md                     # You’re reading it
├─ requirements.txt              # Local dev / tooling deps
└─ .gitignore

🔧 Configuration tips

The key knobs live in docker/docker-compose.yml – adjust partitions, ports or default simulation rate there first.

KAFKA_CFG_ADVERTISED_LISTENERS** must be PLAINTEXT://kafka:9092 inside the stack.**
SPARK_MODE = master / worker depending on the container.
Tweak executor memory in services/processing/spark/Dockerfile (add --executor-memory).

🚀 First things to try

Once the containers are up and humming you’ll probably want to see something rather than stare at logs.

Metabase first‑run wizard (soon)
- Open http://localhost:3000
- Create the initial admin user.
- Add a new PostgreSQL database connection only if you’ve enabled the future Postgres sink (see Roadmap).
- Click Skip on sample data, then Ask a question → Native query and point it to the vehicle.telemetry.agg topic via the Kafka JDBC connector (already bundled).
Airflow sanity check (soon)
- Visit http://localhost:8081 (credentials: admin / admin).
- Enable the bundled example DAG fleetstream_daily_housekeeping – it just prints the size of each Kafka topic to the logs every hour.
- Trigger it manually once and watch the task logs populate.
Build your first dashboard (soon)
- In Metabase, create a new Question with avg(speed_kmh) grouped by 5‑minute bins and vehicle_id.
- Save it to a dashboard named Fleet overview.
Verify Spark is streaming
- Spark UI → Streaming tab → confirm the Input Rate isn’t flat‑lining.
- Click on the latest batch to inspect operator metrics.

Feel free to crank the event rate with curl -X POST http://localhost:8000/start-sim -d 'rate_hz=10' (10 msgs/s) – Spark will automatically scale partitions.

🛣️ Roadmap

Phase	Milestone	Why it matters
🔜 Short‑term	Persist aggregates toPostgreSQLand surface them in Metabase via a CDC pipeline (Debezium).	Durable storage & SQL joins with reference data
	BundleGrafana + Lokifor centralised dashboards and log aggregation.	One place for infra + app metrics
	GitHub ActionsCI/CD: build & push Docker images, run smoke tests.	Reproducible builds & early breakage detection
🛫 Mid‑term	Ship aHelm chartso the stack can be deployed on any Kubernetes cluster.	Cloud‑deployable in a singlehelm install
	AddPrometheus exportersfor Kafka & Spark to enable alerting.	Production‑grade observability
	Beef‑up the simulator – realistic fault codes, GPS drifts, harsh braking.	More interesting analytics scenarios
🌅 Long‑term	REST gateway forreal OBD‑II / CAN‑bus hardwareingestion.	Bridge from lab to the road
	Showcasestateful & windowed joins(e.g. geofencing alerts) in Spark.	Advanced stream‑processing patterns
	ExploreEdge deployment: mini‑Kafka + Spark Connect on Raspberry Pi.	Low‑latency local analytics

Excited to hack on any of these? Open an issue or send a PR – contributions welcome! 👋

📝 License

Released under the MIT License.

Have fun & drive safe – even if it’s only bytes on the road 🚗💨

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FleetStream 🚀

✨ What does it do?

🏗️ Architecture at a glance

⚡ Quick start

🗂️ Repository layout

🔧 Configuration tips

🚀 First things to try

🛣️ Roadmap

📝 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
.github/workflows		.github/workflows
docker		docker
fleetstream		fleetstream
logs		logs
scripts		scripts
services		services
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

SculptTechProject/FleetStream

Folders and files

Latest commit

History

Repository files navigation

FleetStream 🚀

✨ What does it do?

🏗️ Architecture at a glance

⚡ Quick start

🗂️ Repository layout

🔧 Configuration tips

🚀 First things to try

🛣️ Roadmap

📝 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages