Skip to content

FleetStream is an experimental sensor‑to‑dashboard playground. It emulates a fleet of vehicles, captures raw telemetry, crunches it in real‑time with Apache Spark Structured Streaming and exposes the results for further analysis.

Notifications You must be signed in to change notification settings

SculptTechProject/FleetStream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

FleetStream 🚀

FleetStream is an experimental sensor‑to‑dashboard playground. It emulates a fleet of vehicles, captures raw telemetry, crunches it in real‑time with Apache Spark Structured Streaming, and exposes the results for further analysis.


✨ What does it do?

Layer Tooling Role
Simulation FastAPI(containerised REST service) Fires random telemetry events into Kafka (vehicle.telemetry.raw)
Queue Apache Kafka 3.5 Buffers raw messages and receives aggregated streams
Processing Apache Spark 3.5(Structured Streaming) Reads fromvehicle.telemetry.raw, computes stats (avg. speed, fuel level, etc.) and writes tovehicle.telemetry.agg
Orchestration Apache Airflow 2.8 Scheduled batches & housekeeping (e.g. Kafka log compaction, backups)
Analytics Metabase 0.47 Live dashboards on the aggregated data

Everything is wrapped in Docker Compose – start / stop the whole stack with a single command.


🏗️ Architecture at a glance

┌────────────┐   HTTP/JSON   ┌────────────┐        ┌────────┐
│  Simulator ├──────────────►│   Kafka    │◄──────►│ Spark  │
└────────────┘  (vehicle.*)  └────────────┘  topic │(stream)│
                        ▲                    ▲     └────────┘
                        │                    │         │
                        │        DAGs        │         ▼
                 ┌────────────┐   Airflow    │      Metabase
                 │  REST API  │◄─────────────┘     Dashboards
                 └────────────┘

⚡ Quick start

Prerequisites: Docker & Docker Compose v2 (Linux, macOS or WSL 2).

# 1. Clone the repository
$ git clone https://github.com/<your‑handle>/fleetstream.git
$ cd fleetstream/docker

# 2. Build and launch the entire stack (add -d to detach and mute logs)
$ docker compose up --build

# 3. (optional) Create topics if Kafka is fresh
$ bin/create_topics.sh        # helper script

# 4. Open the portals
- Spark UI:  http://localhost:8080
- Metabase:  http://localhost:3000
- Airflow:   http://localhost:8081  (login: admin / admin)

# 5. Start the simulator
$ ../scripts/simulation_start.sh   # ~1 msg/s by default

# 6. Peek at the results. Peek at the results
./consume.sh

To stop and wipe volumes: docker compose down -v.


🗂️ Repository layout

.
├─ docker/                       # Compose files, infra images & helpers
│   ├─ docker-compose.yml
│   ├─ dags/                     # Airflow DAGs (mounted into the scheduler)
│   ├─ logs/                     # Local log volume mounts
│   ├─ spark.Dockerfile          # Custom Spark image
│   └─ …                         # Inne pliki konfiguracyjne
├─ services/                     # Application-level code & images
│   ├─ processing/               # Streaming jobs
│   │   └─ spark/stream_agg.py
│   ├─ consumer.py               # Kafka consumer helper
│   ├─ main.py                   # FastAPI entry-point
│   ├─ Dockerfile                # Builds the API container
│   └─ requirements.txt          # Service-specific deps
├─ scripts/                      # Bash helpers – start/stop the simulator
│   ├─ simulation_start.sh
│   └─ simulation_stop.sh
├─ README.md                     # You’re reading it
├─ requirements.txt              # Local dev / tooling deps
└─ .gitignore

🔧 Configuration tips

The key knobs live in docker/docker-compose.yml – adjust partitions, ports or default simulation rate there first.

  • KAFKA_CFG_ADVERTISED_LISTENERS** must be PLAINTEXT://kafka:9092 inside the stack.**
  • SPARK_MODE = master / worker depending on the container.
  • Tweak executor memory in services/processing/spark/Dockerfile (add --executor-memory).

🚀 First things to try

Once the containers are up and humming you’ll probably want to see something rather than stare at logs.

  1. Metabase first‑run wizard (soon)
    • Open http://localhost:3000
    • Create the initial admin user.
    • Add a new PostgreSQL database connection only if you’ve enabled the future Postgres sink (see Roadmap).
    • Click Skip on sample data, then Ask a question → Native query and point it to the vehicle.telemetry.agg topic via the Kafka JDBC connector (already bundled).
  2. Airflow sanity check (soon)
    • Visit http://localhost:8081 (credentials: admin / admin).
    • Enable the bundled example DAG fleetstream_daily_housekeeping – it just prints the size of each Kafka topic to the logs every hour.
    • Trigger it manually once and watch the task logs populate.
  3. Build your first dashboard (soon)
    • In Metabase, create a new Question with avg(speed_kmh) grouped by 5‑minute bins and vehicle_id.
    • Save it to a dashboard named ​Fleet overview​.
  4. Verify Spark is streaming
    • Spark UI → Streaming tab → confirm the Input Rate isn’t flat‑lining.
    • Click on the latest batch to inspect operator metrics.

Feel free to crank the event rate with curl -X POST http://localhost:8000/start-sim -d 'rate_hz=10' (10 msgs/s) – Spark will automatically scale partitions.


🛣️ Roadmap

Phase Milestone Why it matters
🔜 Short‑term Persist aggregates toPostgreSQLand surface them in Metabase via a CDC pipeline (Debezium). Durable storage & SQL joins with reference data
BundleGrafana + Lokifor centralised dashboards and log aggregation. One place for infra + app metrics
GitHub ActionsCI/CD: build & push Docker images, run smoke tests. Reproducible builds & early breakage detection
🛫 Mid‑term Ship aHelm chartso the stack can be deployed on any Kubernetes cluster. Cloud‑deployable in a singlehelm install
AddPrometheus exportersfor Kafka & Spark to enable alerting. Production‑grade observability
Beef‑up the simulator – realistic fault codes, GPS drifts, harsh braking. More interesting analytics scenarios
🌅 Long‑term REST gateway for****real OBD‑II / CAN‑bus hardwareingestion. Bridge from lab to the road
Showcasestateful & windowed joins(e.g. geofencing alerts) in Spark. Advanced stream‑processing patterns
ExploreEdge deployment: mini‑Kafka + Spark Connect on Raspberry Pi. Low‑latency local analytics

Excited to hack on any of these? Open an issue or send a PR – contributions welcome! 👋


📝 License

Released under the MIT License.

Have fun & drive safe – even if it’s only bytes on the road 🚗💨

About

FleetStream is an experimental sensor‑to‑dashboard playground. It emulates a fleet of vehicles, captures raw telemetry, crunches it in real‑time with Apache Spark Structured Streaming and exposes the results for further analysis.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published