[RFC] Structured Error Handling Mechanism

**Problem Statement**
TVM is gaining users in the research community and in production deployments. Simplifying the user experience can drive faster adoption, particularly among developers and users unfamiliar with the stack. A critical aspect of UX is better error handling. A structured error handling mechanism would help users of the TVM stack reduce the time to debug and deliver long-term dividends in usability. A structured mechanism must separate error-handling code from regular code and provide the ability to propagate errors up the call stack. Although  TVM stack uses built-in python/C++ exception handling, the error mechanism does not provide the ability to group and differentiate error types.


**Current TVM Error Handling**
The TVM stack currently throws built-in python/C++ exceptions as error messages. Parts of the code throw errors at the condition detecting site collecting information relevant to the error while parts of the code throw errors that are then passed implicitly to “exception” handlers, using the now classic try <desiredaction> exception <recoveryaction> idiom. This exception class is an easy-to-use feature in modern object-oriented programming languages. However, the scheme is insufficient for production use when TVM is incorporated into a larger stack or service.


**Proposed Error Handling Mechanism**
We proposed a structured error-handling mechanism characterized by the following features:
o	Represents all errors as functions or objects
o	Allows you to define your own error types or classes
o	Allows you to explicitly raise an error
o	Allows you to catch particular error types in a particular layer 
o	Allow you to propagate error from the inner layer to the outer layer 

First, we separate TVM into 4 major layers: model parsing layer (parse TF, MX, PyTorch, ONNX models into NNVM graph), NNVM pass layer (infer shape and type, graph-level optimization, e.g. op fusion, precompute), TVM compute layer, TVM schedule layer. For each layer, we catalog errors into pre-defined object and separate them in one util folder. All files will import from that file for using the error-handling related classes.

**Example of the Proposed Error Handling Mechanism**
For example, in the model parsing layer as parsing different frameworks into NNVM graphs, we can define some error-handling functions in one class. Here, we first categorize errors into four types in this layer: OperatorNotImplemented, OperatorAttributeNotImplemented, OperatorAttributeRequired, OperatorAttributeValueNotValid and provide three definitions for that (see functions: _required_attr, _raise_attr_value_error, _raise_not_supported). For all frame parsing files, such as mxnet.py, tenflow.py, onnx.py, they could import the corresponding error-handling class/functions and utilize them inside it. Of note, the following function are from mxnet.py.


![image](https://user-images.githubusercontent.com/39837160/49856801-a7b6b400-fda5-11e8-9ce9-1db735be93e4.png)


In such case, we provide the unified way to categorize the error types in this layer. Users can add different error types easily and provide the ability to propagate error reporting up the call stack.


**Risk and Limitation**
We have to define the error-handling templates first in each layer and ask TVM’s contributors to follow. In addition, for the existing codebase, moving to structured error- handling scheme will need some effort. 
     


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] Structured Error Handling Mechanism #2279

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[RFC] Structured Error Handling Mechanism #2279

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions