diff options
| author | Frank Denis <124872+jedisct1@users.noreply.github.com> | 2022-11-17 13:07:07 +0100 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2022-11-17 13:07:07 +0100 |
| commit | 7cfeae1ce7aa9f1b3a219d032c43bc2e694ba63b (patch) | |
| tree | 7818e427398bef4e3415a095db5bfe600fbd22fc /src/stage1 | |
| parent | 58d9004cea5f8aa73c76382cd21e1c88b1bc21e1 (diff) | |
| download | zig-7cfeae1ce7aa9f1b3a219d032c43bc2e694ba63b.tar.gz zig-7cfeae1ce7aa9f1b3a219d032c43bc2e694ba63b.zip | |
std.crypto.onetimeauth.ghash: faster GHASH on modern CPUs (#13566)
* std.crypto.onetimeauth.ghash: faster GHASH on modern CPUs
Carryless multiplication was slow on older Intel CPUs, justifying
the need for using Karatsuba multiplication.
This is not the case any more; using 4 multiplications to multiply
two 128-bit numbers is actually faster than 3 multiplications +
shifts and additions.
This is also true on aarch64.
Keep using Karatsuba only when targeting x86 (granted, this is a bit
of a brutal shortcut, we should really list all the CPU models that
had a slow clmul instruction).
Also remove useless agg_2 treshold and restore the ability to
precompute only H and H^2 in ReleaseSmall.
Finally, avoid using u256. Using 128-bit registers is actually faster.
* Use a switch, add some comments
Diffstat (limited to 'src/stage1')
0 files changed, 0 insertions, 0 deletions
