std.crypto: Optimize SHA-256 intrinsics for AMD x86-64 - zig - General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software. https://ziglang.org

diff options

author	Cody Tapscott <topolarity@tapscott.me>	2022-10-24 09:47:31 -0700
committer	Cody Tapscott <topolarity@tapscott.me>	2022-10-28 15:21:10 -0700
commit	4c1f71e866088a1a2e943331256115ed7e3daf98 (patch)
tree	ff801e8aa7b5f1f578434198144c28a8350da6ed /src/codegen.zig
parent	ee241c47ee675050e4e4b0eabd6ba06a82cc626e (diff)
download	zig-4c1f71e866088a1a2e943331256115ed7e3daf98.tar.gz zig-4c1f71e866088a1a2e943331256115ed7e3daf98.zip

std.crypto: Optimize SHA-256 intrinsics for AMD x86-64

This gets us most of the way back to the performance I had when I was using the LLVM intrinsics: - Intel Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz: 190.67 MB/s (w/o intrinsics) -> 1285.08 MB/s - AMD EPYC 7763 (VM) @ 2.45 GHz: 240.09 MB/s (w/o intrinsics) -> 1360.78 MB/s - Apple M1: 216.96 MB/s (w/o intrinsics) -> 2133.69 MB/s Minor changes to this source can swing performance from 400 MB/s to 1400 MB/s or... 20 MB/s, depending on how it interacts with the optimizer. I have a sneaking suspicion that despite LLVM inheriting GCC's extremely strict inline assembly semantics, its passes are rather skittish around inline assembly (and almost certainly, its instruction cost models can assume nothing)

Diffstat (limited to 'src/codegen.zig')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: