This could potentially be added as a pass as well: https://github.com/pytorch/executorch/blob/main/backends/arm/_passes/broadcast_args_pass.py. But long term the ideal solution would be to add broadcast support to CMSIS-NN to get it accelerated w/o memcopies.
Originally posted by @AdrianLundell in #13296 (comment)
Creating this issue to track further improvements in AOT phase:
- CMSIS-NN kernel dispatch could be further abstracted into a pass (Broadcast Pass / Similar new dedicated pass):