Skip to content

jsam/inferox

Repository files navigation

Inferox

Safe, Fast, and Modular ML Inference Engine for Rust

PR Checks Coverage Crates.io Documentation License Rust Version Downloads Stars


Overview

Inferox is a high-performance ML inference engine built in Rust, designed with a two-pillar architecture that separates model compilation from runtime execution. Compile your model architectures into shared libraries (.so/.dylib) and load them dynamically into the engine with complete type safety.

Key Features

  • 🔒 Type-Safe Dynamic Loading: Load models as trait objects - no manual FFI required
  • 🚀 Multiple Backend Support: Candle backend implemented, extensible for ONNX, TensorFlow, etc.
  • 🎯 Zero-Copy Inference: Efficient tensor operations without unnecessary allocations
  • 🔧 Hot Reloadable: Swap model libraries without recompiling the engine
  • 🦀 Pure Rust: Memory safety and RAII throughout, minimal unsafe confined to libloading

Architecture

┌─────────────────────────────────────────────┐
│  Model Library (libmlp_classifier.dylib)    │
│  ┌──────────────────────────────────────┐   │
│  │ #[no_mangle]                         │   │
│  │ pub fn create_model()                │   │
│  │   -> Box<dyn Model>                  │   │
│  └──────────────────────────────────────┘   │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│  Engine (loads via libloading)              │
│  ┌──────────────────────────────────────┐   │
│  │ let lib = Library::new(path)         │   │
│  │ let factory = lib.get("create_model")│   │
│  │ let model: Box<dyn Model> = factory()│   │
│  │ engine.register_boxed_model(model)   │   │
│  └──────────────────────────────────────┘   │
└─────────────────────────────────────────────┘
                    ↓
┌─────────────────────────────────────────────┐
│  InferoxEngine manages all models           │
│  - Type-safe trait interface                │
│  - RAII memory management                   │
│  - No unsafe in user code                   │
└─────────────────────────────────────────────┘

Quick Start

1. Define Your Model Architecture

use inferox_core::{Model, ModelMetadata, InferoxError};
use inferox_candle::{CandleBackend, CandleTensor};
use candle_nn::{Linear, VarBuilder};

pub struct MLP {
    fc1: Linear,
    fc2: Linear,
    fc3: Linear,
    name: String,
}

impl Model for MLP {
    type Backend = CandleBackend;
    type Input = CandleTensor;
    type Output = CandleTensor;
    
    fn forward(&self, input: Self::Input) -> Result<Self::Output, InferoxError> {
        let x = self.fc1.forward(&input.inner())?;
        let x = x.relu()?;
        let x = self.fc2.forward(&x)?;
        let x = x.relu()?;
        let x = self.fc3.forward(&x)?;
        Ok(CandleTensor::new(x))
    }
    
    fn name(&self) -> &str {
        &self.name
    }
    
    fn metadata(&self) -> ModelMetadata {
        ModelMetadata::new("mlp", "1.0.0")
            .with_description("Multi-Layer Perceptron")
    }
}

2. Compile to Shared Library

Create a library crate with crate-type = ["cdylib"]:

// models/classifier/src/lib.rs
use inferox_candle::{CandleBackend, CandleModelBuilder, CandleTensor};
use inferox_core::Model;
use candle_core::Device;

#[no_mangle]
pub fn create_model() -> Box<dyn Model<Backend = CandleBackend, Input = CandleTensor, Output = CandleTensor>> {
    let builder = CandleModelBuilder::new(Device::Cpu);
    let model = MLP::new("classifier", 10, 8, 3, builder.var_builder())
        .expect("Failed to create classifier model");
    Box::new(model)
}

Build the model:

cargo build --release -p mlp-classifier

3. Load and Run in Engine

use inferox_engine::{InferoxEngine, EngineConfig};
use inferox_candle::CandleBackend;
use libloading::{Library, Symbol};

type ModelFactory = fn() -> Box<dyn Model<Backend = CandleBackend, Input = CandleTensor, Output = CandleTensor>>;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let backend = CandleBackend::cpu();
    let config = EngineConfig::default();
    let mut engine = InferoxEngine::new(backend.clone(), config);
    
    // Load model from shared library
    unsafe {
        let lib = Library::new("target/release/libmlp_classifier.dylib")?;
        let factory: Symbol<ModelFactory> = lib.get(b"create_model")?;
        let model = factory();
        engine.register_boxed_model(model);
        std::mem::forget(lib);
    }
    
    // Run inference
    let input = backend.tensor_builder().build_from_vec(
        vec![0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0],
        &[1, 10]
    )?;
    
    let output = engine.infer("classifier", input)?;
    println!("Output: {:?}", output.to_vec2::<f32>()?);
    
    Ok(())
}

Project Structure

inferox/
├── crates/
│   ├── inferox-core/        # Core traits and types
│   ├── inferox-candle/      # Candle backend implementation
│   └── inferox-engine/      # Inference engine runtime
├── examples/
│   └── mlp/                 # MLP example with dynamic loading
│       ├── src/
│       │   ├── lib.rs       # MLP architecture
│       │   └── main.rs      # Engine runtime
│       └── models/
│           ├── classifier/  # Compiled to .dylib/.so
│           └── small/       # Compiled to .dylib/.so
├── Makefile                 # Development commands
└── .github/workflows/       # CI/CD pipelines

Core Components

inferox-core

Core trait definitions for backends, tensors, and models:

  • Backend - Hardware abstraction (CPU, CUDA, Metal, etc.)
  • Tensor - N-dimensional array operations
  • Model - Model trait with forward pass, metadata, and state management
  • DataType - Numeric type system with safe conversions

inferox-candle

Candle backend implementation using Hugging Face's Candle:

  • CandleBackend - Backend for Candle tensors
  • CandleTensor - Tensor wrapper with type-safe operations
  • CandleModelBuilder - Model initialization with weight loading
  • CandleVarMap - Weight management and serialization

inferox-engine

High-level inference engine with model management:

  • InferoxEngine - Multi-model inference orchestration
  • InferenceSession - Stateful inference with context
  • EngineConfig - Runtime configuration (batch size, device, etc.)
  • Dynamic model loading via trait objects

Examples

MLP Example

A complete example demonstrating the two-pillar architecture:

# Build model libraries
make models

# Run the engine with multiple models
cargo run --bin mlp --release -- \
  target/release/libmlp_classifier.dylib \
  target/release/libmlp_small.dylib

Output:

Inferox MLP Engine
==================

✓ Created CPU backend

Loading model from: target/release/libmlp_classifier.dylib
✓ Registered 'classifier' - Multi-Layer Perceptron (10 → 8 → 8 → 3)

Loading model from: target/release/libmlp_small.dylib
✓ Registered 'small' - Multi-Layer Perceptron (5 → 4 → 4 → 2)

2 models loaded

Available models:
  - classifier v1.0.0: Multi-Layer Perceptron (10 → 8 → 8 → 3)
  - small v1.0.0: Multi-Layer Perceptron (5 → 4 → 4 → 2)

Running test inference on all models:
  classifier -> output shape: [1, 3]
  small -> output shape: [1, 2]

✓ All models working!

See examples/mlp/README.md for detailed documentation.

Development

Prerequisites

  • Rust 1.70+ (2021 edition)
  • Cargo

Building

# Build all crates
make build

# Build in release mode
make build-release

# Build model libraries
make models

# Build examples
make examples

Testing

# Run tests + quick lint (recommended)
make test

# Run tests only
make test-quick

# Run specific crate tests
make test-core
make test-candle
make test-engine

Linting and Formatting

# Format code
make format

# Run clippy linter
make lint

# Quick pre-commit checks
make pre-commit

Documentation

# Generate and open docs
make doc

# Generate docs including private items
make doc-private

CI/CD

The project uses GitHub Actions for continuous integration:

  • Format Check: Ensures code follows rustfmt standards
  • Clippy Lint: Catches common mistakes and anti-patterns
  • Test Suite: Runs on Ubuntu and macOS with stable Rust
  • Model Libraries: Verifies model binaries build correctly
  • Documentation: Ensures docs build without warnings
  • Examples: Validates all examples compile and run

See .github/workflows/pr-checks.yml for the complete pipeline.

Roadmap

  • Core trait system
  • Candle backend
  • Inference engine
  • Dynamic model loading
  • MLP example
  • ResNet18 example
  • ONNX backend
  • Batch inference optimization
  • Model quantization support
  • GPU acceleration (CUDA, Metal)
  • Production deployment guide

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Run make pre-commit before committing
  4. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

Library for unified, backend-agnostic interface for machine learning model inference.

Resources

Stars

Watchers

Forks

Packages

No packages published