Skip to content

Integrator

Rolf Morel edited this page Nov 10, 2025 · 4 revisions
image

Design Points

Ingress

  • Model databases (HF, KB) host framework models (PT, XLA, ONNX) and shouldn't need to know how to load them, but use the Model Importer module for that, which can also be used by just passing a Python file.
  • torch-mlir usage will be enclosed in this importer, which helps users make sense of its APIs (current and future). This should be used by the project to help design its own API for more users.
  • We could cache many things on the way: MLIR text/binary files, saved models. We can do all, or some, and put them in different local directories. These should not be committed (unless explicitly moved to an actual input directory). We should have a script that helps us regenerate the cache, or just let it be populated by usage.

Schedules & Pipelines

  • Same as workload input, we should cache them as MLIR, but here we also want to make easy to pass a Python function to create the schedule on the workload module directly.
  • If building a pass pipeline, we're restricted to Python functions and external compiled binaries (ex. tpp-mlir and IREE). In the latter case, we don't even need a schedule/pipeline on the integrator size, just pipe the workload module to the execution engine and let the runner execute its own pipeline.

Execution Engine

  • This is more of an "engine driver" than an engine: the actual engine is the runner (mlir-runner, tpp-run, iree).
  • The driver itself is just providing a configurable way to load the required libraries, set up the environment and call the runners in the right way.

Auto-Tuning

  • This is the least known part of the integrator, and would only really work if you have a schedule/pipeline in Python. But we could still use it to run/profile/update cost models, even if we're just running the engine.
  • Still early days to decide if we provide the ingress workload to the schedule builder or not (fear of overfitting solutions).
Clone this wiki locally