Skip to content

OpenCL AES: Optionally use inverse tables for decryption key schedule #5806

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: bleeding-jumbo
Choose a base branch
from

Conversation

magnumripper
Copy link
Member

This is my current take on #5800, subject to change.

said50-sys

This comment was marked as off-topic.

@magnumripper magnumripper force-pushed the opencl-aes-inv-tables branch 2 times, most recently from 343bb99 to 9709487 Compare July 24, 2025 00:16
@magnumripper
Copy link
Member Author

magnumripper commented Jul 24, 2025

Ready to merge. Like I said in #5800, AMD Vega got 42% boost and nvidia 1080ti got 44% when running cryptosafe-opencl which has no KDF at all and decrypts just a single block of AES-256 per key setup. Good stuff.

BTW it also boosts key setup for encryption a tiny bit (~1%) as it no longer copies unneeded tables to local memory.

@magnumripper magnumripper requested a review from solardiz July 24, 2025 00:34
@magnumripper
Copy link
Member Author

magnumripper commented Jul 24, 2025

Using relbench I see that only cryptosafe get this huge boost but I believe/hope that's mainly because test vectors are usually smaller than real world hashes. Cryptosafe is so poorly designed our format needs on-device mask to fully exploit it. Oh and another reason might be that some formats need a cost parameter to actually use AES or only AES (where default cost may be eg. RC4).

I used -format:@aes,+opencl and a fixed LWS of 256 but let the formats autotune GWS.

Copy link
Member

@solardiz solardiz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a proper review, but I skimmed...

@magnumripper magnumripper force-pushed the opencl-aes-inv-tables branch from b4c3e16 to a5c921c Compare July 25, 2025 21:21
@magnumripper
Copy link
Member Author

For completeness I should probably improve the non-table inverse code path similar to the more efficient code we now have in the inverse table code path. Then we can compare apples to apples.

@magnumripper magnumripper force-pushed the opencl-aes-inv-tables branch from b21e38f to b8237ab Compare July 25, 2025 23:44
magnumripper and others added 2 commits July 26, 2025 01:53
This boosts AES_set_decrypt_key() by halving the number of table lookups.
It mostly affects formats that decrypt a small amount per key (several
formats only decrypt one or two blocks).

Closes openwall#5800, see openwall#5613 (comment)
Now only a macro definition is different between tables or not. This
revealed that the inverse tables does no good, the improved swap/invert
code does.

Consequently we disable the use of tables - but leave the option in there.
@magnumripper magnumripper force-pushed the opencl-aes-inv-tables branch from b8237ab to 9d5193c Compare July 25, 2025 23:57
@magnumripper
Copy link
Member Author

For completeness I should probably improve the non-table inverse code path similar to the more efficient code we now have in the inverse table code path. Then we can compare apples to apples.

As I said in #5800 this revealed it was that code that gave us a 40% boost, not the tables.

Let's not merge this yet, I'll try to get to the bottom of it.

This made little difference except maybe on CPU, at least under MacOS.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants