Skip to content

[RFC] Structured Error Handling Mechanism #2279

@JennyChou5

Description

@JennyChou5

Problem Statement
TVM is gaining users in the research community and in production deployments. Simplifying the user experience can drive faster adoption, particularly among developers and users unfamiliar with the stack. A critical aspect of UX is better error handling. A structured error handling mechanism would help users of the TVM stack reduce the time to debug and deliver long-term dividends in usability. A structured mechanism must separate error-handling code from regular code and provide the ability to propagate errors up the call stack. Although TVM stack uses built-in python/C++ exception handling, the error mechanism does not provide the ability to group and differentiate error types.

Current TVM Error Handling
The TVM stack currently throws built-in python/C++ exceptions as error messages. Parts of the code throw errors at the condition detecting site collecting information relevant to the error while parts of the code throw errors that are then passed implicitly to “exception” handlers, using the now classic try exception idiom. This exception class is an easy-to-use feature in modern object-oriented programming languages. However, the scheme is insufficient for production use when TVM is incorporated into a larger stack or service.

Proposed Error Handling Mechanism
We proposed a structured error-handling mechanism characterized by the following features:
o Represents all errors as functions or objects
o Allows you to define your own error types or classes
o Allows you to explicitly raise an error
o Allows you to catch particular error types in a particular layer
o Allow you to propagate error from the inner layer to the outer layer

First, we separate TVM into 4 major layers: model parsing layer (parse TF, MX, PyTorch, ONNX models into NNVM graph), NNVM pass layer (infer shape and type, graph-level optimization, e.g. op fusion, precompute), TVM compute layer, TVM schedule layer. For each layer, we catalog errors into pre-defined object and separate them in one util folder. All files will import from that file for using the error-handling related classes.

Example of the Proposed Error Handling Mechanism
For example, in the model parsing layer as parsing different frameworks into NNVM graphs, we can define some error-handling functions in one class. Here, we first categorize errors into four types in this layer: OperatorNotImplemented, OperatorAttributeNotImplemented, OperatorAttributeRequired, OperatorAttributeValueNotValid and provide three definitions for that (see functions: _required_attr, _raise_attr_value_error, _raise_not_supported). For all frame parsing files, such as mxnet.py, tenflow.py, onnx.py, they could import the corresponding error-handling class/functions and utilize them inside it. Of note, the following function are from mxnet.py.

image

In such case, we provide the unified way to categorize the error types in this layer. Users can add different error types easily and provide the ability to propagate error reporting up the call stack.

Risk and Limitation
We have to define the error-handling templates first in each layer and ask TVM’s contributors to follow. In addition, for the existing codebase, moving to structured error- handling scheme will need some effort.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions