Now that we have the ability to reliably determine that functions are pure, we should try to make our broadcast/map functionality take advantage of this to the extent possible. I'm not 100% sure what needs to be done here, to make sure we generate good code, but it would be nice to get faster performance automatically (at least for the simple stuff) without needing LoopVectorization,