aboutsummaryrefslogtreecommitdiff
path: root/test/cases/compile_errors/invalid_unicode_escape.zig
AgeCommit message (Collapse)Author
2025-09-16test: rename backend=stage2 to backend=selfhosted, and add backend=autoAlex Rønne Petersen
backend=auto (now the default if backend is omitted) means to let the compiler pick whatever backend it wants as the default. This is important for platforms where we don't yet have a self-hosted backend, such as loongarch64. Also purge a bunch of redundant target=native.
2024-07-31std.zig.tokenizer: simplifyAndrew Kelley
I pointed a fuzzer at the tokenizer and it crashed immediately. Upon inspection, I was dissatisfied with the implementation. This commit removes several mechanisms: * Removes the "invalid byte" compile error note. * Dramatically simplifies tokenizer recovery by making recovery always occur at newlines, and never otherwise. * Removes UTF-8 validation. * Moves some character validation logic to `std.zig.parseCharLiteral`. Removing UTF-8 validation is a regression of #663, however, the existing implementation was already buggy. When adding this functionality back, it must be fuzz-tested while checking the property that it matches an independent Unicode validation implementation on the same file. While we're at it, fuzzing should check the other properties of that proposal, such as no ASCII control characters existing inside the source code. Other changes included in this commit: * Deprecate `std.unicode.utf8Decode` and its WTF-8 counterpart. This function has an awkward API that is too easy to misuse. * Make `utf8Decode2` and friends use arrays as parameters, eliminating a runtime assertion in favor of using the type system. After this commit, the crash found by fuzzing, which was "\x07\xd5\x80\xc3=o\xda|a\xfc{\x9a\xec\x91\xdf\x0f\\\x1a^\xbe;\x8c\xbf\xee\xea" no longer causes a crash. However, I did not feel the need to add this test case because the simplified logic eradicates most crashes of this nature.
2024-07-15Tokenizer bug fixes and improvementsgooncreeper
Fixes many error messages corresponding to invalid bytes displaying the wrong byte. Additionaly improves handling of UTF-8 in some places.