-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Labels
questionQuestion about the usageQuestion about the usage
Description
I recently gone through this tutorial: https://tvm.apache.org/docs/tutorial/autotvm_relay_x86.html
Model execution performance on Orange Pi Mali improved quite a lot during the optimization process; crucially, the optimization is not a fixed set of optimizations but rather an iterative search that improves model inference performance on your specific hardware.
In contrast, it looks like the MLC compilation using Relax, even when using the maximum optimization settings, involves a set of fixed optimizations and that there is no equivalent iterative search.
I wonder if an iterative search the likes of AutoTVM could make remarkable improvement on inference speeds for LLMs on MLC for certain hardware.
Thoughts?
Metadata
Metadata
Assignees
Labels
questionQuestion about the usageQuestion about the usage