Kagent Embeddings Service

A high-performance Rust-based REST API service for generating sentence embeddings using transformer models. Built with Axum and rust-bert for fast, scalable NLP workloads.

Features

Fast Performance: 2-4x faster than Python equivalents using LibTorch backend
REST API: Simple HTTP endpoints for embedding generation
Batch Processing: Process multiple texts in a single request
GPU Support: Optional CUDA acceleration (6x performance improvement)
Thread-Safe: Concurrent request handling with async mutex protection
Docker Ready: Multi-stage builds with production-ready containers
Health Monitoring: Built-in health checks and logging

Quick Start

Prerequisites

Rust 1.70+
Docker (optional)
CUDA toolkit (for GPU acceleration)

Local Development

# Build the service
cargo build --release

# Run the service
cargo run
# Service will be available at http://localhost:9000

Docker Deployment

# Build and run with Docker Compose
docker-compose up -d

# View logs
docker-compose logs -f embeddings-service

# Production deployment with nginx
docker-compose --profile production up -d

API Endpoints

Health Check

GET /health

Response:

{
  "status": "healthy",
  "models": ["all-minilm-l6-v2"]
}

Generate Embeddings (Batch)

POST /embeddings
Content-Type: application/json

{
  "texts": ["Hello world", "How are you?"],
  "model": "all-minilm-l6-v2"
}

Response:

{
  "embeddings": [[0.1, 0.2, ...], [0.3, 0.4, ...]],
  "model": "all-minilm-l6-v2",
  "dimensions": 384
}

Generate Single Embedding

GET /embeddings?text=Hello%20world&model=all-minilm-l6-v2

Models

all-minilm-l6-v2: Default model (384 dimensions)
- Sentence transformers model optimized for semantic similarity
- Cached locally in ~/.cache/.rustbert/ or /app/.cache/.rustbert/ in Docker

Configuration

Environment Variables

Variable	Default	Description
`PORT`	`9000`	Server port
`RUST_LOG`	`info`	Logging level
`TORCH_CUDA_VERSION`	-	CUDA version for GPU builds (e.g., `cu124`)

Docker Configuration

# docker-compose.yaml
services:
  embeddings-service:
    build: .
    ports:
      - "3000:3000"
    environment:
      - RUST_LOG=info
      - PORT=3000
    volumes:
      - model_cache:/app/.cache/.rustbert

Performance

Benchmarks

CPU: 2-4x faster than Python-based solutions
GPU: 6x improvement with CUDA acceleration
Memory: 512MB minimum, 2GB recommended for production
First Build: 5-15 minutes (downloads LibTorch)

Optimization

Models are loaded once at startup
Thread-safe model access with async mutex
Batch processing for multiple texts
Memory-efficient Docker builds (~200MB final image)

Development

Building

# Debug build
cargo build

# Release build (optimized)
cargo build --release

# Quick syntax check
cargo check

Testing

# Run unit tests
cargo test

# Run with logging
RUST_LOG=debug cargo test

GPU Support

# Build with CUDA support
docker build --build-arg TORCH_CUDA_VERSION=cu124 -t embeddings-service-gpu .

Architecture

Web Framework: Axum with CORS support
ML Backend: rust-bert with LibTorch
Model Management: Arc-wrapped state with async mutex
Serialization: Serde for JSON handling
Logging: Tracing with configurable levels

Production Deployment

Resource Requirements

Memory: 2GB limit, 512MB reservation
CPU: 1.0 limit, 0.5 reservation
Storage: Volume for model cache persistence

Monitoring

Health check endpoint with 30s intervals
Structured logging with tracing
Container restart policies
Optional nginx reverse proxy

Contributing

Fork the repository
Create a feature branch
Run tests: cargo test
Submit a pull request

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
src		src
supply-chain		supply-chain
.gitignore		.gitignore
BUILD.md		BUILD.md
CLAUDE.md		CLAUDE.md
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
Makefile		Makefile
PERFORMANCE.md		PERFORMANCE.md
README.md		README.md
docker-compose.yaml		docker-compose.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kagent Embeddings Service

Features

Quick Start

Prerequisites

Local Development

Docker Deployment

API Endpoints

Health Check

Generate Embeddings (Batch)

Generate Single Embedding

Models

Configuration

Environment Variables

Docker Configuration

Performance

Benchmarks

Optimization

Development

Building

Testing

GPU Support

Architecture

Production Deployment

Resource Requirements

Monitoring

Contributing

License

About

Uh oh!

Releases

Packages

Languages

dimetron/kagent-embeddings

Folders and files

Latest commit

History

Repository files navigation

Kagent Embeddings Service

Features

Quick Start

Prerequisites

Local Development

Docker Deployment

API Endpoints

Health Check

Generate Embeddings (Batch)

Generate Single Embedding

Models

Configuration

Environment Variables

Docker Configuration

Performance

Benchmarks

Optimization

Development

Building

Testing

GPU Support

Architecture

Production Deployment

Resource Requirements

Monitoring

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages