It would be very cool if the performance improvements from https://github.com/ggerganov/llama.cpp/pull/613 could be backported to this repo. I couldn't find an issue for this, if there is one, I'm happy to close this.