-
Notifications
You must be signed in to change notification settings - Fork 2.3k
OpenCL AES: Optionally use inverse tables for decryption key schedule #5806
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: bleeding-jumbo
Are you sure you want to change the base?
OpenCL AES: Optionally use inverse tables for decryption key schedule #5806
Conversation
343bb99
to
9709487
Compare
Ready to merge. Like I said in #5800, AMD Vega got 42% boost and nvidia 1080ti got 44% when running cryptosafe-opencl which has no KDF at all and decrypts just a single block of AES-256 per key setup. Good stuff. BTW it also boosts key setup for encryption a tiny bit (~1%) as it no longer copies unneeded tables to local memory. |
Using relbench I see that only cryptosafe get this huge boost but I believe/hope that's mainly because test vectors are usually smaller than real world hashes. Cryptosafe is so poorly designed our format needs on-device mask to fully exploit it. Oh and another reason might be that some formats need a cost parameter to actually use AES or only AES (where default cost may be eg. RC4). I used |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not a proper review, but I skimmed...
b4c3e16
to
a5c921c
Compare
For completeness I should probably improve the non-table inverse code path similar to the more efficient code we now have in the inverse table code path. Then we can compare apples to apples. |
b21e38f
to
b8237ab
Compare
This boosts AES_set_decrypt_key() by halving the number of table lookups. It mostly affects formats that decrypt a small amount per key (several formats only decrypt one or two blocks). Closes openwall#5800, see openwall#5613 (comment)
Now only a macro definition is different between tables or not. This revealed that the inverse tables does no good, the improved swap/invert code does. Consequently we disable the use of tables - but leave the option in there.
b8237ab
to
9d5193c
Compare
As I said in #5800 this revealed it was that code that gave us a 40% boost, not the tables. Let's not merge this yet, I'll try to get to the bottom of it. |
This made little difference except maybe on CPU, at least under MacOS.
This is my current take on #5800, subject to change.