-
Notifications
You must be signed in to change notification settings - Fork 3.7k
Description
Currently, frontend models has two different input layout: NHWC and NCHW. Tensorflow and TFLite are NHWC layout, while like CoreML frontend is NCHW layout.
For converting model with NHWC input layout, currently there is no unified way. Some framework convert NHWC into NCHW input layout. For example, Intel OpenVINO, Tensorflow-CoreML converters (https://github.com/tf-coreml/tf-coreml). This has some advantages, for example on GPU. And for TVM, we support NCHW very well, for example:
-
auto tuning only support NCHW currently(https://github.com/dmlc/tvm/blob/master/python/tvm/autotvm/task/topi_integration.py#L132)
-
conv2d_transpose only support NCHW(https://github.com/dmlc/tvm/blob/master/nnvm/python/nnvm/top/nn.py#L287)
-
many other places...
This is way our TVM Tensorflow Lite frontend (convert TFLite's NHWC into NCHW) takes. However, it also has disadvantages. When we handle shape transformations(like Reshape, Squeeze, Concat and so on), we should be very careful. For example, Tensorflow-CoreML converter has complex logic to handle reshape: https://github.com/tf-coreml/tf-coreml/blob/master/tfcoreml/_shape_sensitive_layers.py#L222.
Another way is to keep the same as original model's input layout, i.e. NHWC. This is the way our TVM Tensorflow frontend takes. However, for performance, we extend NCHW layout support when we want to run on GPU. But the way we take is to insert into transpose op before / after convolution. It will cost a noticeable fraction of the runtime when the convolution executes fast. We even find out it occupies half of the total running time during our one model test.
To avoid this issue, maybe we should do it in graph pass and eliminate shape transpose as much / far as possible.
Open this RFC is just to raise this concern and let us discuss how to do it will be better.