Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 16 additions & 6 deletions docs/compilation/compile_models.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,18 +3,29 @@
Compile Model Libraries
=======================

To run a model with MLC LLM in any platform, you need:
To run a model with MLC LLM in any platform, we need:

1. **Model weights** converted to MLC format (e.g. `RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC <https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/tree/main>`__.)
2. **Model library** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs>`__).

If you are simply adding a model variant, follow :ref:`convert-weights-via-MLC` suffices.
2. **Model library** that comprises the inference logic

This page describes how to compile a model library with MLC LLM. Model compilation optimizes
the model inference for a given platform, allowing users bring their own new model
architecture, use different quantization modes, and customize the overall model
optimization flow.



Notably, in many cases you do not need to explicit call compile.

- If you are using the Python API, you can skip specifying ``model_lib`` and
the system will JIT compile the library.

- If you are building iOS/android package, checkout :ref:`package-libraries-and-weights`,
which provides a simpler high-level command that leverages the compile behind the scheme.


This page is still helpful to understand the compilation flow behind the scheme,
or be used to explicit create model libraries.
We compile ``RedPajama-INCITE-Chat-3B-v1`` with ``q4f16_1`` as an example for all platforms.

.. note::
Expand All @@ -23,8 +34,7 @@ We compile ``RedPajama-INCITE-Chat-3B-v1`` with ``q4f16_1`` as an example for al

Please also follow the instructions in :ref:`deploy-cli` / :ref:`deploy-python-chat-module` to obtain
the CLI app / Python API that can be used to chat with the compiled model.
Finally, we strongly recommend you to read :ref:`project-overview` first to get
familiarized with the high-level terminologies.


.. contents:: Table of Contents
:depth: 1
Expand Down
3 changes: 1 addition & 2 deletions docs/compilation/convert_weights.rst
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,7 @@ This can be extended to, e.g.:

Please also follow the instructions in :ref:`deploy-cli` / :ref:`deploy-python-chat-module` to obtain
the CLI app / Python API that can be used to chat with the compiled model.
Finally, we strongly recommend you to read :ref:`project-overview` first to get
familiarized with the high-level terminologies.


.. contents:: Table of Contents
:depth: 1
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _package-model-libraries-weights:
.. _package-libraries-and-weights:

Package Model Libraries & Weights
=================================
Package Libraries and Weights
=============================

When we want to build LLM applications with MLC LLM (e.g., iOS/Android apps),
usually we need to build static model libraries and app binding libraries,
Expand Down Expand Up @@ -177,6 +177,17 @@ Example:
}
}

Compilation Cache
-----------------
``mlc_llm package`` leverage a local JIT cache to avoid repetitive compilation of the same input.
It also leverages a local cache to download weights from remote. These caches
are shared across the entire project. Sometimes it is helpful to force rebuild when
we have a new compiler update or when something goes wrong with the ached library.
You can do so by setting the environment variable ``MLC_JIT_POLICY=REDO``

.. code:: bash

MLC_JIT_POLICY=REDO mlc_llm package

Arguments of ``mlc_llm package``
--------------------------------
Expand Down
4 changes: 2 additions & 2 deletions docs/deploy/ide_integration.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _deploy-ide-integration:

Code Completion IDE Integration
===============================
IDE Integration
===============

.. contents:: Table of Contents
:local:
Expand Down
4 changes: 2 additions & 2 deletions docs/deploy/ios.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _deploy-ios:

iOS and Swift SDK
=================
iOS Swift SDK
=============

.. contents:: Table of Contents
:local:
Expand Down
6 changes: 3 additions & 3 deletions docs/deploy/mlc_chat_config.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
.. _configure-mlc-chat-json:

Customize MLC Config File in JSON
=================================
Customize MLC Chat Config
=========================

``mlc-chat-config.json`` is required for both compile-time and runtime, hence serving two purposes:

Expand Down Expand Up @@ -112,7 +112,7 @@ Conversation Structure
^^^^^^^^^^^^^^^^^^^^^^

MLC-LLM provided a set of pre-defined conversation templates, which you can directly use by
specifying ``--conv-template [name]`` when generating config. Below is a list (not complete) of
specifying ``--conv-template [name]`` when generating config. Below is a list (not complete) of
supported conversation templates:

- ``llama-2``
Expand Down
39 changes: 20 additions & 19 deletions docs/deploy/rest.rst
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,7 @@ MODEL The model folder after compiling with MLC-LLM build proce
(e.g. ``Llama-2-7b-chat-hf-q4f16_1``), or a full path to the model
folder. In the former case, we will use the provided name to search
for the model folder over possible paths.

--model-lib A field to specify the full path to the model library file to use (e.g. a ``.so`` file).
--device The description of the device to run on. User should provide a string in the
form of 'device_name:device_id' or 'device_name', where 'device_name' is one of
Expand Down Expand Up @@ -137,7 +138,7 @@ The REST API provides the following endpoints:
- **name** (*Optional[str]*): An optional name for the sender of the message.
- **tool_calls** (*Optional[List[ChatToolCall]]*): A list of calls to external tools or functions made within this message, applicable when the role is `tool`.
- **tool_call_id** (*Optional[str]*): A unique identifier for the tool call, relevant when integrating external tools or services.

- **model** (*str*, required): The model to be used for generating responses.

- **frequency_penalty** (*float*, optional, default=0.0): Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat tokens.
Expand Down Expand Up @@ -183,51 +184,51 @@ The REST API provides the following endpoints:
**ChatCompletionResponseChoice**

- **finish_reason** (*Optional[Literal["stop", "length", "tool_calls", "error"]]*, optional): The reason the completion process was terminated. It can be due to reaching a stop condition, the maximum length, output of tool calls, or an error.

- **index** (*int*, required, default=0): Indicates the position of this choice within the list of choices.

- **message** (*ChatCompletionMessage*, required): The message part of the chat completion, containing the content of the chat response.

- **logprobs** (*Optional[LogProbs]*, optional): Optionally includes log probabilities for each output token

**ChatCompletionStreamResponseChoice**

- **finish_reason** (*Optional[Literal["stop", "length", "tool_calls"]]*, optional): Specifies why the streaming completion process ended. Valid reasons are "stop", "length", and "tool_calls".

- **index** (*int*, required, default=0): Indicates the position of this choice within the list of choices.

- **delta** (*ChatCompletionMessage*, required): Represents the incremental update or addition to the chat completion message in the stream.

- **logprobs** (*Optional[LogProbs]*, optional): Optionally includes log probabilities for each output token

**ChatCompletionResponse**

- **id** (*str*, required): A unique identifier for the chat completion session.

- **choices** (*List[ChatCompletionResponseChoice]*, required): A collection of `ChatCompletionResponseChoice` objects, representing the potential responses generated by the model.

- **created** (*int*, required, default=current time): The UNIX timestamp representing when the response was generated.

- **model** (*str*, required): The name of the model used to generate the chat completions.

- **system_fingerprint** (*str*, required): A system-generated fingerprint that uniquely identifies the computational environment.

- **object** (*Literal["chat.completion"]*, required, default="chat.completion"): A string literal indicating the type of object, here always "chat.completion".

- **usage** (*UsageInfo*, required, default=empty `UsageInfo` object): Contains information about the API usage for this specific request.

**ChatCompletionStreamResponse**

- **id** (*str*, required): A unique identifier for the streaming chat completion session.

- **choices** (*List[ChatCompletionStreamResponseChoice]*, required): A list of `ChatCompletionStreamResponseChoice` objects, each representing a part of the streaming chat response.

- **created** (*int*, required, default=current time): The creation time of the streaming response, represented as a UNIX timestamp.

- **model** (*str*, required): Specifies the model that was used for generating the streaming chat completions.

- **system_fingerprint** (*str*, required): A unique identifier for the system generating the streaming completions.

- **object** (*Literal["chat.completion.chunk"]*, required, default="chat.completion.chunk"): A literal indicating that this object represents a chunk of a streaming chat completion.

------------------------------------------------
Expand All @@ -238,7 +239,7 @@ The REST API provides the following endpoints:
Below is an example of using the API to interact with MLC-LLM in Python with Streaming.

.. code:: bash

import requests
import json

Expand Down
88 changes: 0 additions & 88 deletions docs/get_started/project_overview.rst

This file was deleted.

2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ Check out :ref:`introduction-to-mlc-llm` for the introduction and tutorial of a

compilation/convert_weights.rst
compilation/compile_models.rst
compilation/package_model_libraries_weights.rst
compilation/package_libraries_and_weights.rst
compilation/define_new_models.rst

.. toctree::
Expand Down