Retab is the complete developer platform and SDK for shipping state-of-the-art document processing in the age of LLMs.
We want you to use Retab for a defined purpose: get SHIP FAST automations to get STRUCTURED & QUALITY data.
For this mission, we provide the best-in-class preprocessing, help you generate prompts & extraction schemas that fit your preferred model providers, iterate & evaluate the accuracy of your configuration, and ship fast your automation directly in your code or with your preferred platforms such as n8n or Dify [WIP].
Because of a new, lighter paradigm
Large Language Models collapse entire layers of legacy OCR pipelines into a single, elegant abstraction. When a model can read, reason, and structure text natively, we no longer need brittle heuristics, handcrafted parsers, or heavyweight ETL jobs. Instead, we can expose a small, principled API: "give me the document, tell me the schema, and get back structured truth." Complexity evaporates, reliability rises, speed follows, and costs fall—because every component you remove is one that can no longer break.
LLM‑first design lets us focus less on plumbing and more on the questions we actually want answered—Retab stands here. We help you unlock these capabilities, offering you all the software-defined primitives to build your own document processing solutions. We see it as Stripe for document processing.
Check our documentation.
Join our Discord and share your feedback.
To use the API, you need to sign up on Retab.
- Install the SDK
pip install retab
- Generate a Schema
from pathlib import Path
from retab import Retab
client = Retab(api_key="YOUR_RETAB_API_KEY")
response = client.schemas.generate(
documents=["Invoice.pdf"],
model="gpt-4.1", # or any model your plan supports
temperature=0.0, # keep the generation deterministic
modality="native", # "native" = let the API decide best modality
)
- Extract Data
from pathlib import Path
from retab import Retab
from retab import Retab
client = Retab()
response = client.documents.extract(
json_schema = "Invoice_schema.json",
document = "Invoice.pdf",
model="gpt-4.1-nano",
temperature=0
)
print(response)
On the Platform, Projects provide a systematic way to test and validate your extraction schemas against known ground truth data. Think of it as unit testing for document AI—you can measure accuracy, compare different models, and optimize your extraction pipelines with confidence.
The project workflow for schema optimization:
- Run initial project → identify low-accuracy fields
- Refine descriptions and add reasoning prompts → re-run project
- Compare accuracy improvements → iterate until satisfied
- Deploy optimized schema to production
from retab import Retab
client = Retab()
# Submit a single document
completion = client.deployments.extract(
project_id="eval_***",
iteration_id="base-configuration", # or the configuration that gave you the best precision score
document="path/to/document.pdf"
)
print(completion)
Projects give you an easy-to-use automation engine that's easy to integrate in your codebase and workflows.
Check our documentation.
Let's create the future of document processing together.
Join our Discord to share your journey, discuss best practices, and give your feedback. You can also follow us on X (Twitter) at us.
We can't wait to see how you'll use Retab.
-
API: Documentation
-
SDKs: Python & JavaScript SDK
-
Low-code Frameworks: Dify
We share our roadmap publicly. Please submit your feature requests on Github
Among the features we're working on:
- Schema optimization autopilot
- Sources API
- Document Edit API
- n8n plugin