diff options
| author | Cody Tapscott <topolarity@tapscott.me> | 2022-10-24 09:47:31 -0700 |
|---|---|---|
| committer | Cody Tapscott <topolarity@tapscott.me> | 2022-10-28 15:21:10 -0700 |
| commit | 4c1f71e866088a1a2e943331256115ed7e3daf98 (patch) | |
| tree | ff801e8aa7b5f1f578434198144c28a8350da6ed /src/codegen.zig | |
| parent | ee241c47ee675050e4e4b0eabd6ba06a82cc626e (diff) | |
| download | zig-4c1f71e866088a1a2e943331256115ed7e3daf98.tar.gz zig-4c1f71e866088a1a2e943331256115ed7e3daf98.zip | |
std.crypto: Optimize SHA-256 intrinsics for AMD x86-64
This gets us most of the way back to the performance I had when
I was using the LLVM intrinsics:
- Intel Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz:
190.67 MB/s (w/o intrinsics) -> 1285.08 MB/s
- AMD EPYC 7763 (VM) @ 2.45 GHz:
240.09 MB/s (w/o intrinsics) -> 1360.78 MB/s
- Apple M1:
216.96 MB/s (w/o intrinsics) -> 2133.69 MB/s
Minor changes to this source can swing performance from 400 MB/s to
1400 MB/s or... 20 MB/s, depending on how it interacts with the
optimizer. I have a sneaking suspicion that despite LLVM inheriting
GCC's extremely strict inline assembly semantics, its passes are
rather skittish around inline assembly (and almost certainly, its
instruction cost models can assume nothing)
Diffstat (limited to 'src/codegen.zig')
0 files changed, 0 insertions, 0 deletions
