aboutsummaryrefslogtreecommitdiff
path: root/lib/std/crypto/aes/soft.zig
AgeCommit message (Collapse)Author
2025-07-13std.crypto: remove `inline` from most functionsAndrew Kelley
To quote the language reference, It is generally better to let the compiler decide when to inline a function, except for these scenarios: * To change how many stack frames are in the call stack, for debugging purposes. * To force comptime-ness of the arguments to propagate to the return value of the function, as in the above example. * Real world performance measurements demand it. Don't guess! Note that inline actually restricts what the compiler is allowed to do. This can harm binary size, compilation speed, and even runtime performance. `zig run lib/std/crypto/benchmark.zig -OReleaseFast` [-before-] vs {+after+} md5: [-990-] {+998+} MiB/s sha1: [-1144-] {+1140+} MiB/s sha256: [-2267-] {+2275+} MiB/s sha512: [-762-] {+767+} MiB/s sha3-256: [-680-] {+683+} MiB/s sha3-512: [-362-] {+363+} MiB/s shake-128: [-835-] {+839+} MiB/s shake-256: [-680-] {+681+} MiB/s turboshake-128: [-1567-] {+1570+} MiB/s turboshake-256: [-1276-] {+1282+} MiB/s blake2s: [-778-] {+789+} MiB/s blake2b: [-1071-] {+1086+} MiB/s blake3: [-1148-] {+1137+} MiB/s ghash: [-10044-] {+10033+} MiB/s polyval: [-9726-] {+10033+} MiB/s poly1305: [-2486-] {+2703+} MiB/s hmac-md5: [-991-] {+998+} MiB/s hmac-sha1: [-1134-] {+1137+} MiB/s hmac-sha256: [-2265-] {+2288+} MiB/s hmac-sha512: [-765-] {+764+} MiB/s siphash-2-4: [-4410-] {+4438+} MiB/s siphash-1-3: [-7144-] {+7225+} MiB/s siphash128-2-4: [-4397-] {+4449+} MiB/s siphash128-1-3: [-7281-] {+7374+} MiB/s aegis-128x4 mac: [-73385-] {+74523+} MiB/s aegis-256x4 mac: [-30160-] {+30539+} MiB/s aegis-128x2 mac: [-66662-] {+67267+} MiB/s aegis-256x2 mac: [-16812-] {+16806+} MiB/s aegis-128l mac: [-33876-] {+34055+} MiB/s aegis-256 mac: [-8993-] {+9087+} MiB/s aes-cmac: 2036 MiB/s x25519: [-20670-] {+16844+} exchanges/s ed25519: [-29763-] {+29576+} signatures/s ecdsa-p256: [-4762-] {+4900+} signatures/s ecdsa-p384: [-1465-] {+1500+} signatures/s ecdsa-secp256k1: [-5643-] {+5769+} signatures/s ed25519: [-21926-] {+21721+} verifications/s ed25519: [-51200-] {+50880+} verifications/s (batch) chacha20Poly1305: [-1189-] {+1109+} MiB/s xchacha20Poly1305: [-1196-] {+1107+} MiB/s xchacha8Poly1305: [-1466-] {+1555+} MiB/s xsalsa20Poly1305: [-660-] {+620+} MiB/s aegis-128x4: [-76389-] {+78181+} MiB/s aegis-128x2: [-53946-] {+53495+} MiB/s aegis-128l: [-27219-] {+25621+} MiB/s aegis-256x4: [-49351-] {+49542+} MiB/s aegis-256x2: [-32390-] {+32366+} MiB/s aegis-256: [-8881-] {+8944+} MiB/s aes128-gcm: [-6095-] {+6205+} MiB/s aes256-gcm: [-5306-] {+5427+} MiB/s aes128-ocb: [-8529-] {+13974+} MiB/s aes256-ocb: [-7241-] {+9442+} MiB/s isapa128a: [-204-] {+214+} MiB/s aes128-single: [-133857882-] {+134170944+} ops/s aes256-single: [-96306962-] {+96408639+} ops/s aes128-8: [-1083210101-] {+1073727253+} ops/s aes256-8: [-762042466-] {+767091778+} ops/s bcrypt: 0.009 s/ops scrypt: [-0.018-] {+0.017+} s/ops argon2: [-0.037-] {+0.060+} s/ops kyber512d00: [-206057-] {+205779+} encaps/s kyber768d00: [-156074-] {+150711+} encaps/s kyber1024d00: [-116626-] {+115469+} encaps/s kyber512d00: [-181149-] {+182046+} decaps/s kyber768d00: [-136965-] {+135676+} decaps/s kyber1024d00: [-101307-] {+100643+} decaps/s kyber512d00: [-123624-] {+123375+} keygen/s kyber768d00: [-69465-] {+70828+} keygen/s kyber1024d00: [-43117-] {+43208+} keygen/s
2024-11-22std.crypto.aes: introduce AES block vectors (#22023)Frank Denis
* std.crypto.aes: introduce AES block vectors Modern Intel CPUs with the VAES extension can handle more than a single AES block per instruction. So can some ARM and RISC-V CPUs. Software implementations with bitslicing can also greatly benefit from this. Implement low-level operations on AES block vectors, and the parallel AEGIS variants on top of them. AMD Zen4: aegis-128x4: 73225 MiB/s aegis-128x2: 51571 MiB/s aegis-128l: 25806 MiB/s aegis-256x4: 46742 MiB/s aegis-256x2: 30227 MiB/s aegis-256: 8436 MiB/s aes128-gcm: 5926 MiB/s aes256-gcm: 5085 MiB/s AES-GCM, and anything based on AES-CTR are also going to benefit from this later. * Make AEGIS-MAC twice a fast
2024-11-20crypto.aes.soft: use std.atomic.cache_line instead of a harcoded value (#22026)Frank Denis
2024-08-21std: update eval branch quotas after bdbc485mlugg
Also, update `std.math.Log2Int[Ceil]` to more efficient implementations that don't use up so much damn quota!
2023-10-31std.builtin.Endian: make the tags lower caseAndrew Kelley
Let's take this breaking change opportunity to fix the style of this enum.
2023-10-31mem: fix ub in writeIntJacob Young
Use inline to vastly simplify the exposed API. This allows a comptime-known endian parameter to be propogated, making extra functions for a specific endianness completely unnecessary.
2023-07-18Replace hand-written endian-specific loads with std.mem.readInt*() (#16431)Frank Denis
And when we have the choice, favor little-endian because it's 2023. Gives a slight performance improvement: md5: 552 -> 555 MiB/s sha1: 768 -> 786 MiB/s sha512: 211 -> 217 MiB/s
2023-06-24all: migrate code to new cast builtin syntaxmlugg
Most of this migration was performed automatically with `zig fmt`. There were a few exceptions which I had to manually fix: * `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten * `@truncate`'s fixup is incorrect for vectors * Test cases are not formatted, and their error locations change
2023-05-16workaround AstGen's love for copying arraysVeikka Tuominen
2023-03-14Move std.crypto.config options to std.options (#14906)Frank Denis
Options have been moved to a single namespace.
2023-03-13Add configurable side channels mitigations; enable them on soft AES (#13739)Frank Denis
* Add configurable side channels mitigations; enable them on soft AES Our software AES implementation doesn't have any mitigations against side channels. Go's generic implementation is not protected at all either, and even OpenSSL only has minimal mitigations. Full mitigations against cache-based attacks (bitslicing, fixslicing) come at a huge performance cost, making AES-based primitives pretty much useless for many applications. They also don't offer any protection against other classes of side channel attacks. In practice, partially protected, or even unprotected implementations are not as bad as it sounds. Exploiting these side channels requires an attacker that is able to submit many plaintexts/ciphertexts and perform accurate measurements. Noisy measurements can still be exploited, but require a significant amount of attempts. Wether this is exploitable or not depends on the platform, application and the attacker's proximity. So, some libraries made the choice of minimal mitigations and some use better mitigations in spite of the performance hit. It's a tradeoff (security vs performance), and there's no one-size-fits all implementation. What applies to AES applies to other cryptographic primitives. For example, RSA signatures are very sensible to fault attacks, regardless of them using the CRT or not. A mitigation is to verify every produced signature. That also comes with a performance cost. Wether to do it or not depends on wether fault attacks are part of the threat model or not. Thanks to Zig's comptime, we can try to address these different requirements. This PR adds a `side_channels_protection` global, that can later be complemented with `fault_attacks_protection` and possibly other knobs. It can have 4 different values: - `none`: which doesn't enable additional mitigations. "Additional", because it only disables mitigations that don't have a big performance cost. For example, checking authentication tags will still be done in constant time. - `basic`: which enables mitigations protecting against attacks in a common scenario, where an attacker doesn't have physical access to the device, cannot run arbitrary code on the same thread, and cannot conduct brute-force attacks without being throttled. - `medium`: which enables additional mitigations, offering practical protection in a shared environement. - `full`: which enables all the mitigations we have. The tradeoff is that the more mitigations we enable, the bigger the performance hit will be. But this let applications choose what's best for their use case. `medium` is the default. Currently, this only affects software AES, but that setting can later be used by other primitives. For AES, our implementation is a traditional table-based, with 4 32-bit tables and a sbox. Lookups in that table have been replaced by function calls. These functions can add a configurable noise level, making cache-based attacks more difficult to conduct. In the `none` mitigation level, the behavior is exactly the same as before. Performance also remains the same. In other levels, we compress the T tables into a single one, and read data from multiple cache lines (all of them in `full` mode), for all bytes in parallel. More precise measurements and way more attempts become necessary in order to find correlations. In addition, we use distinct copies of the sbox for key expansion and encryption, so that they don't share the same L1 cache entries. The best known attacks target the first two AES round, or the last one. While future attacks may improve on this, AES achieves full diffusion after 4 rounds. So, we can relax the mitigations after that. This is what this implementation does, enabling mitigations again for the last two rounds. In `full` mode, all the rounds are protected. The protection assumes that lookups within a cache line are secret. The cachebleed attack showed that it can be circumvented, but that requires an attacker to be able to abuse hyperthreading and run code on the same core as the encryption, which is rarely a practical scenario. Still, the current AES API allows us to transparently switch to using fixslicing/bitslicing later when the `full` mitigation level is enabled. * Software AES: use little-endian representation. Virtually all platforms are little-endian these days, so optimizing for big-endian CPUs doesn't make sense any more.
2023-02-18improve error message for byref capture of byval arrayAndrew Kelley
2023-02-18update std lib and compiler sources to new for loop syntaxAndrew Kelley
2022-05-25std.crypto: cosmetic improvement to AES multiplication algorithm (#11616)Helio Machado
std.crypto: cosmetic improvement to AES multiplication algorithm (#11616)
2022-05-09std.crypto: generate AES constants at compile time (#11612)Helio Machado
* std/crypto: generate AES constants at compile time * Apply suggestions from code review Co-authored-by: Frank Denis <124872+jedisct1@users.noreply.github.com> * Update lib/std/crypto/aes/soft.zig * Separate encryption and decryption tables * Run `zig fmt` * Increase branch quota and remove redundant align * Update lib/std/crypto/aes/soft.zig Co-authored-by: Frank Denis <124872+jedisct1@users.noreply.github.com> * Rename identifiers and simplify dataflow * Increase branch quota (again) and fix comment Co-authored-by: Frank Denis <124872+jedisct1@users.noreply.github.com>
2021-08-24remove redundant license headers from zig standard libraryAndrew Kelley
We already have a LICENSE file that covers the Zig Standard Library. We no longer need to remind everyone that the license is MIT in every single file. Previously this was introduced to clarify the situation for a fork of Zig that made Zig's LICENSE file harder to find, and replaced it with their own license that required annual payments to their company. However that fork now appears to be dead. So there is no need to reinforce the copyright notice in every single file.
2021-06-21std, src, doc, test: remove unused variablesJacob G-W
2021-05-20Run `zig fmt` on src/ and lib/std/Isaac Freund
This replaces callconv(.Inline) with the more idiomatic inline keyword.
2021-02-10Convert inline fn to callconv(.Inline) everywhereTadeo Kondrak
2020-12-31Year++Frank Denis
2020-10-24Fix a typo (s/multple/multiple/)Frank Denis
2020-10-17std/crypto: make the whole APIs more consistentFrank Denis
- use `PascalCase` for all types. So, AES256GCM is now Aes256Gcm. - consistently use `_length` instead of mixing `_size` and `_length` for the constants we expose - Use `minimum_key_length` when it represents an actual minimum length. Otherwise, use `key_length`. - Require output buffers (for ciphertexts, macs, hashes) to be of the right size, not at least of that size in some functions, and the exact size elsewhere. - Use a `_bits` suffix instead of `_length` when a size is represented as a number of bits to avoid confusion. - Functions returning a constant-sized slice are now defined as a slice instead of a pointer + a runtime assertion. This is the case for most hash functions. - Use `camelCase` for all functions instead of `snake_case`. No functional changes, but these are breaking API changes.
2020-09-29std/crypto: add the AEGIS128L AEADFrank Denis
Showcase that Zig can be a great option for high performance cryptography. The AEGIS family of authenticated encryption algorithms was selected for high-performance applications in the final portfolio of the CAESAR competition. They reuse the AES core function, but are substantially faster than the CCM, GCM and OCB modes while offering a high level of security. AEGIS algorithms are especially fast on CPUs with built-in AES support, and the 128L variant fully takes advantage of the pipeline in modern Intel CPUs. Performance of the Zig implementation is on par with libsodium.
2020-09-24Revamp crypto/aesFrank Denis
* Reorganize crypto/aes in order to separate parameters, implementations and modes. * Add a zero-cost abstraction over the internal representation of a block, so that blocks can be kept in vector registers in optimized implementations. * Add architecture-independent aesenc/aesdec/aesenclast/aesdeclast operations, so that any AES-based primitive can be implemented, including these that don't use the original key schedule (AES-PRF, AEGIS, MeowHash...) * Add support for parallelization/wide blocks to take advantage of hardware implementations. * Align T-tables to cache lines in the software implementations to slightly reduce side channels. * Add an optimized implementation for modern Intel CPUs with AES-NI. * Add new tests (AES256 key expansion). * Reimplement the counter mode to work with any block cipher, any endianness and to take advantage of wide blocks. * Add benchmarks for AES.