KubeSight is a high-performance approximate query engine designed specifically for Kubernetes observability data. It delivers 10-100x faster query performance with 90-95% accuracy using advanced probabilistic data structures and intelligent sampling techniques.
Traditional observability queries on Kubernetes metrics can take hours to complete on large datasets. KubeSight solves this by providing approximate answers in milliseconds while maintaining business-grade accuracy for trend analysis and decision making.
- High Performance: 10-100x faster than exact queries with sub-second response times
- Probabilistic Algorithms: HyperLogLog, Count-Min Sketch, and Bloom Filters for efficient approximation
- Real-time Processing: Stream processing for live Kubernetes metrics via Kafka
- SQL-like Interface: Familiar query syntax with automatic error bounds
- Kubernetes Native: Designed specifically for container and pod metrics
- Production Ready: Complete monitoring stack with Prometheus and Grafana integration
- Go 1.22+
- Docker & Docker Compose
- kubectl (for K8s deployment)
- Make (recommended)
git clone https://github.com/asmit27rai/kubesight.git
cd kubesight
make build
make run
make docker-up
This starts:
- KubeSight Server: http://localhost:8080
- Kafka UI: http://localhost:8081
- Grafana: http://localhost:3000 (admin/admin)
- Prometheus: http://localhost:9090
make setup-grafana
make generate-query
make docker-down
make clean
┌─────────────────┐ ┌──────────────────┐ ┌─────────────────────┐
│ Kubernetes │───▶│ Stream Processor │───▶│ Probabilistic DS │
│ Metrics/Logs │ │ (Kafka + Flink) │ │ (HLL, CMS, Bloom) │
└─────────────────┘ └──────────────────┘ └─────────────────────┘
│
┌─────────────────┐ ┌──────────────────┐ │
│ Dashboard │◀───│ Query Engine │◀─────────────┘
│ (Web UI) │ │ (Go Service) │
└─────────────────┘ └──────────────────┘
COUNT_DISTINCT(pod_name) WHERE cluster_id='production'
PERCENTILE(95, cpu_usage) WHERE namespace='web-services'
TOP_K(10, memory_usage) WHERE cluster_id='production'
SUM(network_bytes) WHERE timestamp > '1h ago'
AVG(response_time) WHERE service='api'
CONTAINS('pod-xyz-123') FROM pod_restarts
POST /api/v1/query
Content-Type: application/json
{
"query": "COUNT_DISTINCT(pod_name)",
"query_type": "count_distinct",
"filters": {
"cluster_id": "production",
"namespace": "default"
}
}
GET /api/v1/stats
GET /api/v1/health
POST /api/v1/demo/generate
Content-Type: application/json
{
"count": 10000,
"cluster_id": "test-cluster"
}
server:
host: "0.0.0.0"
port: 8080
kafka:
brokers: ["localhost:9092"]
topics:
metrics: "k8s-metrics"
logs: "k8s-logs"
events: "k8s-events"
sampling:
default_rate: 0.05 # 5% base sampling
incident_rate: 0.5 # 50% during anomalies
reservoir_size: 10000
window_size_min: 60
adaptive_enabled: true
storage:
hll_precision: 14 # ±1.6% error
cms_width: 2048
cms_depth: 5
bloom_size: 1000000
bloom_hashes: 5
kubesight_queries_total
- Total queries processedkubesight_query_duration_milliseconds
- Query latency distributionkubesight_samples_total
- Total samples processedkubesight_error_rate
- Approximation error ratekubesight_memory_usage_bytes
- Memory consumption
make setup-grafana
make k8s-deploy
😊 Start exploring Kubernetes at scale with KubeSight today!