std.mem.countScalar: rework to benefit from simd (#25477) - zig - General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software. https://ziglang.org

diff options

author	Henry John Kupty <hkupty@users.noreply.github.com>	2025-10-07 18:32:13 +0200
committer	GitHub <noreply@github.com>	2025-10-07 09:32:13 -0700
commit	163ebe044b76ada70b2bee2e17b9f3e948d54754 (patch)
tree	2cf3358476275fab5283e835452565e61dd14ce6 /lib/std/Io
parent	9760068826e01e5540da9168d2f02e15957a99cc (diff)
download	zig-163ebe044b76ada70b2bee2e17b9f3e948d54754.tar.gz zig-163ebe044b76ada70b2bee2e17b9f3e948d54754.zip

std.mem.countScalar: rework to benefit from simd (#25477)

`findScalarPos` might do repetitive work, even if using simd. For example, when searching the string `/abcde/fghijk/lm` for the character `/`, a 16-byte wide search would yield `1000001000000100` but would only count the first `1` and re-search the remaining of the string. When testing locally, the difference was quite significative: ``` count scalar 5737 iterations 522.83us per iterations 0 bytes per iteration worst: 2370us median: 512us stddev: 107.64us count v2 38333 iterations 78.03us per iterations 0 bytes per iteration worst: 713us median: 76us stddev: 10.62us count scalar v2 99565 iterations 29.80us per iterations 0 bytes per iteration worst: 41us median: 29us stddev: 1.04us ``` Note that `count v2` is a simpler string search, similar to the remaining version of the simd approach: ``` pub fn countV2(comptime T: type, haystack: []const T, needle: T) usize { const n = haystack.len; if (n < 1) return 0; var count: usize = 0; for (haystack[0..n]) |item| { count += @intFromBool(item == needle); } return count; } ``` Which implies the compiler yields some optimized code for a simpler loop that is more performant than the `findScalarPos`-based approach, hence the usage of iterative approach for the remaining of the haystack. Co-authored-by: StAlKeR7779 <stalkek7779@yandex.ru>

Diffstat (limited to 'lib/std/Io')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: