diff options
| author | Frank Denis <124872+jedisct1@users.noreply.github.com> | 2022-11-07 21:45:29 +0100 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2022-11-07 21:45:29 +0100 |
| commit | 7d48cb11389f5dfe73d25e59c1038398a8ba9ca9 (patch) | |
| tree | 7a06b19b14b978ac1ed6fd0a6afb642017d1c982 /src/codegen/llvm.zig | |
| parent | 88d2e4f66a988dabca45c9add992c9f029c2176f (diff) | |
| download | zig-7d48cb11389f5dfe73d25e59c1038398a8ba9ca9.tar.gz zig-7d48cb11389f5dfe73d25e59c1038398a8ba9ca9.zip | |
std.crypto: make ghash faster, esp. for small messages (#13464)
* std.crypto: make ghash faster, esp. for small messages
Aggregated reduction requires 5 additional multiplications (to
precompute the powers of H), in order to save 2 multiplications
per batch.
So, only use large batches when it's actually interesting to do so.
For the last blocks, reuse the precomputations in order to perform
a single reduction.
Also, even in .ReleaseSmall, allow 2-block aggregation.
The speedup is worth it, and the code increase is reasonable.
And in .ReleaseFast, bump the upper batch size up to 16.
Leverage comptime by the way instead of duplicating code.
std/crypto/benchmark.zig on Apple M1:
Zig 0.10.0: 2769 MiB/s
Before: 6014 MiB/s
After: 7334 MiB/s
Normalize function names by the way.
* Change clmul() to accept the half to be processed
This avoids a bunch of truncate() calls.
* Add more ghash tests to check all code paths
Diffstat (limited to 'src/codegen/llvm.zig')
0 files changed, 0 insertions, 0 deletions
