-
Notifications
You must be signed in to change notification settings - Fork 152
Adopt s2n-bignum's AArch64 bignum functions to aws-lc's montgomery multiplication #1108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…tion This patch adopts s2n-bignum's bignum functions to aws-lc's montgomery multiplication. Additionally, if the `__ARM_NEON` macro is defined, it invokes the functions that use NEON instructions (https://developer.arm.com/documentation/ihi0053/d/). The scalar bignum functions are fully verified versions (verified by @jargh). The NEON instructions are still under verification by myself. I will mark this PR as draft, and mark it as ready once it is done. If the NEON is used, the performance numbers are as follows. The processor is Graviton 2, and `tool/bssl speed -filter RSA` has been used. ``` Unit: ops/sec Bits Operation baseline AWS-LC s2n-bignum (+ NEON) speedup vs baseline 2048 RSA sign 299.3 495.8 65.65% verify (fresh key) 10736.3 18836.6 75.45% 3072 RSA sign 95.4 126.4 32.49% verify (fresh key) 4917.7 6579.1 33.78% 4096 RSA sign 41.7 78.3 87.77% verify (fresh key) 2781.6 3800.3 36.62% ``` Without NEON, there is a milder but still observable speedup. ``` Bits Operation baseline AWS-LC s2n-bignum speedup vs baseline 2048 RSA sign 299.3 399 33.31% verify (fresh key) 10736.3 15491 44.29% 3072 RSA sign 95.4 113.2 18.66% verify (fresh key) 4917.7 6001.7 22.04% 4096 RSA sign 41.7 63.2 51.56% verify (fresh key) 2781.6 3451 24.07% ``` Currently, the s2n-bignum alternatives are always used, but I expect this must be conditionally used in some cases. I will be happy to receive feedbacks about this part.
This commit - Removes unused assembly files in s2n-bignum/arm/generic - Add x86 scalar s2n-bignum ops - add BN_MONTGOMERY_USE_S2N_BIGNUM macro - Fix compilation errors
The condition follows that of p384.c
|
From the latest CI failure I could see this error message, but couldn't figure out its solution: I would appreciate very much if someone could give advice about this failure, thanks. |
|
I could locally reproduce the failure after building boringssl's For this case I can manually calculate I can preprocess these macros by simply compiling & disassembling s2n-bignum assemblies - this will remove these uses of complicated syntax. |
The offset form has to fit into one of the regex here: https://github.com/aws/aws-lc/blob/main/util/fipstools/delocate/delocate.peg#L114-L123. So ideally we should add an expression to that list that covers your particular case. |
|
Hello @dkostic , thanks for the reference! I could fix the case by adding rules specific to the patterns to the However, I am afraid that I have a few concerns about fixing delocate.peg... .set I, 1
.rep ((16/2)-1)
ldp x0, x1, [x19, #16*(16+I)]
ldp x2, x3, [x19, #16*((16/2)+I)]
adcs x0, x0, x2
adcs x1, x1, x3
stp x0, x1, [x19, #16*(16 +I)]
.set I, (I+1)
.endrIt seems the PEG syntax isn't considering macro directives in ARM such as Another concern is that extending delocate.peg doesn't seem straightforward. I think help is needed in this direction. Or, things will be simpler if we simply put macro-expanded assemblies into third_party/s2n-bignum directory. |
|
Closing this PR, as this will be finally superceded by #1164 |
Description of changes:
This patch adopts s2n-bignum's bignum functions to aws-lc's montgomery multiplication. Additionally, if the
__ARM_NEONmacro is defined, it invokes the functions that use NEON instructions (https://developer.arm.com/documentation/ihi0053/d/).The scalar bignum functions are fully verified versions (verified by @jargh). The NEON instructions are still under verification by myself. I will mark this PR as draft, and mark it as ready once it is done.
If the NEON is used, the performance numbers of RSA signing are as follows. The processor is Graviton 2, and
tool/bssl speed -filter RSAhas been used. (Unit: ops/sec)Without NEON, there is a milder but still observable speedup.
This is only adopted for AArch64. For x86-64 the performance benefit isn't clear. For other architectures, s2n-bignum does not have implementation yet.
Currently, the s2n-bignum alternatives are always used, but I expect this must be conditionally used in some cases. I will be happy to receive feedbacks about this part.
Call-outs:
This PR contains addition of multiple files from awslabs/s2n-bignum.
Testing:
Tested via
bssl speed -filter RSABy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.