GitHub - mozilla-ai/encoderfile: Distribute and run transformer encoders with a single file.

Project logo

🚀 Overview

Encoderfile packages transformer encoders—optionally with classification heads—into a single, self-contained executable. No Python runtime, no dependencies, no network calls. Just a fast, portable binary that runs anywhere.

While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures with optional classification heads. It supports embedding, sequence classification, and token classification models—covering most encoder-based NLP tasks, from text similarity to classification and tagging—all within one compact binary.

Under the hood, Encoderfile uses ONNX Runtime for inference, ensuring compatibility with a wide range of transformer architectures.

Why?

Smaller footprint: a single binary measured in tens-to-hundreds of megabytes, not gigabytes of runtime and packages
Compliance-friendly: deterministic, offline, security-boundary-safe
Integration-ready: drop into existing systems as a CLI, microservice, or API without refactoring your stack

Encoderfiles can run as:

REST API
gRPC microservice
CLI
(Future) MCP server
(Future) FFI support for near-universal cross-language embedding

Supported Architectures

Encoderfile supports the following Hugging Face model classes (and their ONNX-exported equivalents):

Task	Supported classes	Examples models
Embeddings / Feature Extraction	`AutoModel`, `AutoModelForMaskedLM`	`bert-base-uncased`, `distilbert-base-uncased`
Sequence Classification	`AutoModelForSequenceClassification`	`distilbert-base-uncased-finetuned-sst-2-english`, `roberta-large-mnli`
Token Classification	`AutoModelForTokenClassification`	`dslim/bert-base-NER`, `bert-base-cased-finetuned-conll03-english`

✅ All architectures must be encoder-only transformers — no decoders, no encoder–decoder hybrids (so no T5, no BART).
⚙️ Models must have ONNX-exported weights (path/to/your/model/model.onnx).
🧠 The ONNX graph input must include input_ids and optionally attention_mask.
🚫 Models relying on generation heads (AutoModelForSeq2SeqLM, AutoModelForCausalLM, etc.) are not supported.

Gotchas

XLNet, Transfomer XL, and derivative architectures are not yet supported.

🧰 Setup

Prerequisites:

To set up your dev environment, run the following:

make setup

This will install Rust dependencies, create a virtual environment, and download model weights for integration tests (these will show up in models/).

🏗️ Building an Encoderfile

Prepare your Model

To create an Encoderfile, you must have a HuggingFace model downloaded in an accessible directory. The model directory must have exported ONNX weights.

Export a Model

optimum-cli export onnx \
  --model <model_id>  \
  --task <task_type> \
  <path_to_model_directory>

Task types: See HuggingFace task guide for available tasks (feature-extraction, text-classification, token-classification, etc.)

Use a pre-exported model

Some models on HuggingFace already have ONNX weights in their repos.

Your model directory should look like this:

my_model/
├── config.json
├── model.onnx
├── special_tokens_map.json
├── tokenizer_config.json
├── tokenizer.json
└── vocab.txt

Build the binary

uv run -m encoderbuild build \
    -n my-model-name \
    -t [embedding|sequence_classification|token_classification] \
    -m path/to/model/dir

Run REST Server

Your final binary is target/release/encoderfile. To run it as a server: Default port: 8080 (override with --http-port)

chmod +x target/release/encoderfile
./target/release/encoderfile serve

REST API Usage

Embeddings

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": ["this is a sentence"]}'

Extracts token-level embeddings

Sequence Classification / Token Classification

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": ["this is a sentence"]}'

Returns predictions and logits.

🔧 Walkthrough Example - Sequence Classification

Let's use encoderfile to perform sentiment analysis on a few input strings

We'll work with distilbert-base-uncased-finetuned-sst-2-english, which is a fine-tuned version of the DistilBERT model.

Export Model to ONNX

optimum-cli export onnx \
  --model distilbert-base-uncased-finetuned-sst-2-english \
  --task text-classification \
  <path_to_model_directory>

Build Encoderfile

uv run -m encoderbuild build \
  -n sentiment-analyzer \
  -t sequence_classification \
  -m <path_to_model_directory>

Start Server

Use --http-port parameter to start the REST server on a specific port

./target/release/encoderfile serve

Analyze Sentiment

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{"inputs": ["This is the cutest cat ever!", "Boring video, waste of time", "These cats are so funny!"]}'

Expected Output

JSON Output

{
    "results": [
        {
            "logits": [
                -4.045369,
                4.3970084
            ],
            "scores": [
                0.00021549074,
                0.9997845
            ],
            "predicted_index": 1,
            "predicted_label": "POSITIVE"
        },
        {
            "logits": [
                4.7616825,
                -3.8323877
            ],
            "scores": [
                0.9998148,
                0.0001851664
            ],
            "predicted_index": 0,
            "predicted_label": "NEGATIVE"
        },
        {
            "logits": [
                -4.2407384,
                4.565653
            ],
            "scores": [
                0.00014975043,
                0.9998503
            ],
            "predicted_index": 1,
            "predicted_label": "POSITIVE"
        }
    ],
    "model_id": "sentiment-analyzer"
}

Name		Name	Last commit message	Last commit date
Latest commit History 333 Commits
.cargo		.cargo
.github		.github
.vscode		.vscode
docs		docs
encoderfile-core		encoderfile-core
encoderfile-utils		encoderfile-utils
encoderfile		encoderfile
examples		examples
schemas		schemas
scripts		scripts
transforms		transforms
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
THIRDPARTY.md		THIRDPARTY.md
about.hbs		about.hbs
about.toml		about.toml
codecov.yml		codecov.yml
docker-compose.yml		docker-compose.yml
install.sh		install.sh
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
test_config.yml		test_config.yml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Overview

Supported Architectures

Gotchas

🧰 Setup

🏗️ Building an Encoderfile

Prepare your Model

Export a Model

Use a pre-exported model

Build the binary

Run REST Server

REST API Usage

Embeddings

Sequence Classification / Token Classification

🔧 Walkthrough Example - Sequence Classification

Export Model to ONNX

Build Encoderfile

Start Server

Analyze Sentiment

Expected Output

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages

License

mozilla-ai/encoderfile

Folders and files

Latest commit

History

Repository files navigation

🚀 Overview

Supported Architectures

Gotchas

🧰 Setup

🏗️ Building an Encoderfile

Prepare your Model

Export a Model

Use a pre-exported model

Build the binary

Run REST Server

REST API Usage

Embeddings

Sequence Classification / Token Classification

🔧 Walkthrough Example - Sequence Classification

Export Model to ONNX

Build Encoderfile

Start Server

Analyze Sentiment

Expected Output

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 5

Uh oh!

Languages

Packages