speed up chacha20 - zig - General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software. https://ziglang.org

diff options

author	Marc Tiehuis <marctiehuis@gmail.com>	2018-08-27 22:55:53 -0700
committer	Shawn Landden <shawn@git.icu>	2018-08-27 22:55:53 -0700
commit	87eb95f816b01c0133de47eb3c94ac470f9d8bf2 (patch)
tree	2400b9a02d7da98cff4c0ef50d0e264ace2ae8d4 /src/codegen.cpp
parent	444edd9aed84ebfd9153817259e2a4e1228b7120 (diff)
download	zig-87eb95f816b01c0133de47eb3c94ac470f9d8bf2.tar.gz zig-87eb95f816b01c0133de47eb3c94ac470f9d8bf2.zip

speed up chacha20

The main changes are: Unrolling the inner rounds of salsa20_wordtobyte which doubles the speed. Passing the slice explicitly instead of returning the array saves a copy (can optimize out in future with copy elision) and gives ~10% improvement. Inlining the outer loop gives ~15-20% improvement but it costs an extra 4Kb of code space. I think the tradeoff is worthwhile here. The other inline loops are small and can be done by the compiler if it is worthwhile. The rotate function replacement doesn't alter the performance from the former. The modified throughput test I've used to benchmark is as follows. Interestingly we need to allocate memory instead of using a fixed buffer else Zig optimizes the whole thing out. https://github.com/ziglang/zig/pull/1369#issuecomment-416456628

Diffstat (limited to 'src/codegen.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: