aboutsummaryrefslogtreecommitdiff
path: root/src/ThreadPool.zig
diff options
context:
space:
mode:
authorCody Tapscott <topolarity@tapscott.me>2022-10-24 09:47:31 -0700
committerCody Tapscott <topolarity@tapscott.me>2022-10-28 15:21:10 -0700
commit4c1f71e866088a1a2e943331256115ed7e3daf98 (patch)
treeff801e8aa7b5f1f578434198144c28a8350da6ed /src/ThreadPool.zig
parentee241c47ee675050e4e4b0eabd6ba06a82cc626e (diff)
downloadzig-4c1f71e866088a1a2e943331256115ed7e3daf98.tar.gz
zig-4c1f71e866088a1a2e943331256115ed7e3daf98.zip
std.crypto: Optimize SHA-256 intrinsics for AMD x86-64
This gets us most of the way back to the performance I had when I was using the LLVM intrinsics: - Intel Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz: 190.67 MB/s (w/o intrinsics) -> 1285.08 MB/s - AMD EPYC 7763 (VM) @ 2.45 GHz: 240.09 MB/s (w/o intrinsics) -> 1360.78 MB/s - Apple M1: 216.96 MB/s (w/o intrinsics) -> 2133.69 MB/s Minor changes to this source can swing performance from 400 MB/s to 1400 MB/s or... 20 MB/s, depending on how it interacts with the optimizer. I have a sneaking suspicion that despite LLVM inheriting GCC's extremely strict inline assembly semantics, its passes are rather skittish around inline assembly (and almost certainly, its instruction cost models can assume nothing)
Diffstat (limited to 'src/ThreadPool.zig')
0 files changed, 0 insertions, 0 deletions