mlc-ai · tqchen · May 10, 2024 · May 10, 2024
diff --git a/docs/compilation/compile_models.rst b/docs/compilation/compile_models.rst
@@ -3,18 +3,29 @@
 Compile Model Libraries
 =======================
 
-To run a model with MLC LLM in any platform, you need:
+To run a model with MLC LLM in any platform, we need:
 
 1. **Model weights** converted to MLC format (e.g. `RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC <https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/tree/main>`__.)
-2. **Model library** that comprises the inference logic (see repo `binary-mlc-llm-libs <https://github.com/mlc-ai/binary-mlc-llm-libs>`__).
-
-If you are simply adding a model variant, follow :ref:`convert-weights-via-MLC` suffices.
+2. **Model library** that comprises the inference logic
 
 This page describes how to compile a model library with MLC LLM. Model compilation optimizes
 the model inference for a given platform, allowing users bring their own new model
 architecture, use different quantization modes, and customize the overall model
 optimization flow.
 
+
+
+Notably, in many cases you do not need to explicit call compile.
+
+- If you are using the Python API, you can skip specifying ``model_lib`` and
+  the system will JIT compile the library.
+
+- If you are building iOS/android package, checkout :ref:`package-libraries-and-weights`,
+  which provides a simpler high-level command that leverages the compile behind the scheme.
+
+
+This page is still helpful to understand the compilation flow behind the scheme,
+or be used to explicit create model libraries.
 We compile ``RedPajama-INCITE-Chat-3B-v1`` with ``q4f16_1`` as an example for all platforms.
 
 .. note::
@@ -23,8 +34,7 @@ We compile ``RedPajama-INCITE-Chat-3B-v1`` with ``q4f16_1`` as an example for al
 
     Please also follow the instructions in :ref:`deploy-cli` / :ref:`deploy-python-chat-module` to obtain
     the CLI app / Python API that can be used to chat with the compiled model.
-    Finally, we strongly recommend you to read :ref:`project-overview` first to get
-    familiarized with the high-level terminologies.
+
 
 .. contents:: Table of Contents
     :depth: 1

diff --git a/docs/compilation/convert_weights.rst b/docs/compilation/convert_weights.rst
@@ -26,8 +26,7 @@ This can be extended to, e.g.:
 
     Please also follow the instructions in :ref:`deploy-cli` / :ref:`deploy-python-chat-module` to obtain
     the CLI app / Python API that can be used to chat with the compiled model.
-    Finally, we strongly recommend you to read :ref:`project-overview` first to get
-    familiarized with the high-level terminologies.
+
 
 .. contents:: Table of Contents
     :depth: 1

diff --git a/...ation/package_model_libraries_weights.rst → ...ilation/package_libraries_and_weights.rst b/...ation/package_model_libraries_weights.rst → ...ilation/package_libraries_and_weights.rst
@@ -1,7 +1,7 @@
-.. _package-model-libraries-weights:
+.. _package-libraries-and-weights:
 
-Package Model Libraries & Weights
-=================================
+Package Libraries and Weights
+=============================
 
 When we want to build LLM applications with MLC LLM (e.g., iOS/Android apps),
 usually we need to build static model libraries and app binding libraries,
@@ -177,6 +177,17 @@ Example:
       }
    }
 
+Compilation Cache
+-----------------
+``mlc_llm package`` leverage a local JIT cache to avoid repetitive compilation of the same input.
+It also leverages a local cache to download weights from remote. These caches
+are shared across the entire project. Sometimes it is helpful to force rebuild when
+we have a new compiler update or when something goes wrong with the ached library.
+You can do so by setting the environment variable ``MLC_JIT_POLICY=REDO``
+
+.. code:: bash
+
+   MLC_JIT_POLICY=REDO mlc_llm package
 
 Arguments of ``mlc_llm package``
 --------------------------------

diff --git a/docs/deploy/ide_integration.rst b/docs/deploy/ide_integration.rst
@@ -1,7 +1,7 @@
 .. _deploy-ide-integration:
 
-Code Completion IDE Integration
-===============================
+IDE Integration
+===============
 
 .. contents:: Table of Contents
    :local:

diff --git a/docs/deploy/ios.rst b/docs/deploy/ios.rst
@@ -1,7 +1,7 @@
 .. _deploy-ios:
 
-iOS and Swift SDK
-=================
+iOS Swift SDK
+=============
 
 .. contents:: Table of Contents
    :local:

diff --git a/docs/deploy/mlc_chat_config.rst b/docs/deploy/mlc_chat_config.rst
@@ -1,7 +1,7 @@
 .. _configure-mlc-chat-json:
 
-Customize MLC Config File in JSON
-=================================
+Customize MLC Chat Config
+=========================
 
 ``mlc-chat-config.json`` is required for both compile-time and runtime, hence serving two purposes:
 
@@ -112,7 +112,7 @@ Conversation Structure
 ^^^^^^^^^^^^^^^^^^^^^^
 
 MLC-LLM provided a set of pre-defined conversation templates, which you can directly use by
-specifying ``--conv-template [name]`` when generating config. Below is a list (not complete) of 
+specifying ``--conv-template [name]`` when generating config. Below is a list (not complete) of
 supported conversation templates:
 
 - ``llama-2``

diff --git a/docs/deploy/rest.rst b/docs/deploy/rest.rst
@@ -73,6 +73,7 @@ MODEL                  The model folder after compiling with MLC-LLM build proce
                        (e.g. ``Llama-2-7b-chat-hf-q4f16_1``), or a full path to the model
                        folder. In the former case, we will use the provided name to search
                        for the model folder over possible paths.
+
 --model-lib            A field to specify the full path to the model library file to use (e.g. a ``.so`` file).
 --device               The description of the device to run on. User should provide a string in the
                        form of 'device_name:device_id' or 'device_name', where 'device_name' is one of
@@ -137,7 +138,7 @@ The REST API provides the following endpoints:
     - **name** (*Optional[str]*): An optional name for the sender of the message.
     - **tool_calls** (*Optional[List[ChatToolCall]]*): A list of calls to external tools or functions made within this message, applicable when the role is `tool`.
     - **tool_call_id** (*Optional[str]*): A unique identifier for the tool call, relevant when integrating external tools or services.
-    
+
 - **model** (*str*, required): The model to be used for generating responses.
 
 - **frequency_penalty** (*float*, optional, default=0.0): Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model’s likelihood to repeat tokens.
@@ -183,51 +184,51 @@ The REST API provides the following endpoints:
 **ChatCompletionResponseChoice**
 
 - **finish_reason** (*Optional[Literal["stop", "length", "tool_calls", "error"]]*, optional): The reason the completion process was terminated. It can be due to reaching a stop condition, the maximum length, output of tool calls, or an error.
-  
+
 - **index** (*int*, required, default=0): Indicates the position of this choice within the list of choices.
-  
+
 - **message** (*ChatCompletionMessage*, required): The message part of the chat completion, containing the content of the chat response.
-  
+
 - **logprobs** (*Optional[LogProbs]*, optional): Optionally includes log probabilities for each output token
 
 **ChatCompletionStreamResponseChoice**
 
 - **finish_reason** (*Optional[Literal["stop", "length", "tool_calls"]]*, optional): Specifies why the streaming completion process ended. Valid reasons are "stop", "length", and "tool_calls".
-  
+
 - **index** (*int*, required, default=0): Indicates the position of this choice within the list of choices.
-  
+
 - **delta** (*ChatCompletionMessage*, required): Represents the incremental update or addition to the chat completion message in the stream.
-  
+
 - **logprobs** (*Optional[LogProbs]*, optional): Optionally includes log probabilities for each output token
 
 **ChatCompletionResponse**
 
 - **id** (*str*, required): A unique identifier for the chat completion session.
-  
+
 - **choices** (*List[ChatCompletionResponseChoice]*, required): A collection of `ChatCompletionResponseChoice` objects, representing the potential responses generated by the model.
-  
+
 - **created** (*int*, required, default=current time): The UNIX timestamp representing when the response was generated.
-  
+
 - **model** (*str*, required): The name of the model used to generate the chat completions.
-  
+
 - **system_fingerprint** (*str*, required): A system-generated fingerprint that uniquely identifies the computational environment.
-  
+
 - **object** (*Literal["chat.completion"]*, required, default="chat.completion"): A string literal indicating the type of object, here always "chat.completion".
-  
+
 - **usage** (*UsageInfo*, required, default=empty `UsageInfo` object): Contains information about the API usage for this specific request.
 
 **ChatCompletionStreamResponse**
 
 - **id** (*str*, required): A unique identifier for the streaming chat completion session.
-  
+
 - **choices** (*List[ChatCompletionStreamResponseChoice]*, required): A list of `ChatCompletionStreamResponseChoice` objects, each representing a part of the streaming chat response.
-  
+
 - **created** (*int*, required, default=current time): The creation time of the streaming response, represented as a UNIX timestamp.
-  
+
 - **model** (*str*, required): Specifies the model that was used for generating the streaming chat completions.
-  
+
 - **system_fingerprint** (*str*, required): A unique identifier for the system generating the streaming completions.
-  
+
 - **object** (*Literal["chat.completion.chunk"]*, required, default="chat.completion.chunk"): A literal indicating that this object represents a chunk of a streaming chat completion.
 
 ------------------------------------------------
@@ -238,7 +239,7 @@ The REST API provides the following endpoints:
 Below is an example of using the API to interact with MLC-LLM in Python with Streaming.
 
 .. code:: bash
-   
+
    import requests
    import json
 

diff --git a/docs/get_started/project_overview.rst b/docs/get_started/project_overview.rst
diff --git a/docs/index.rst b/docs/index.rst
@@ -45,7 +45,7 @@ Check out :ref:`introduction-to-mlc-llm` for the introduction and tutorial of a
 
    compilation/convert_weights.rst
    compilation/compile_models.rst
-   compilation/package_model_libraries_weights.rst
+   compilation/package_libraries_and_weights.rst
    compilation/define_new_models.rst
 
 .. toctree::