Looking at the source code, it seems this crate always uses a sequential algorithm and only one core, even when the scrypt parameter p is larger than 1. For p>1, the derivation could be made faster by using a parallel algorithm. Are there plans to implement this?