| Age | Commit message (Collapse) | Author |
|
I meant to call it this originally, I just got mixed up -- sorry!
|
|
This is essentially just a rename. I also changed the representation of
`AnalSubject` to use a `packed struct` rather than a non-exhaustive
enum, but that change is relatively trivial.
|
|
We need special logic for updating line numbers anyway, so it's fine to
just use absolute numbers here. This eliminates a field from `Decl`.
|
|
This patch is a pure rename plus only changing the file path in
`@import` sites, so it is expected to not create version control
conflicts, even when rebasing.
|
|
`LazySrcLoc` now stores a reference to the "base AST node" to which it
is relative. The previous tagged union is `LazySrcLoc.Offset`. To make
working with this structure convenient, `Sema.Block` contains a
convenience `src` method which takes an `Offset` and returns a
`LazySrcLoc`.
The "base node" of a source location is no longer given by a `Decl`, but
rather a `TrackedInst` representing either a `declaration`,
`struct_decl`, `union_decl`, `enum_decl`, or `opaque_decl`. This is a
more appropriate model, and removes an unnecessary responsibility from
`Decl` in preparation for the upcoming refactor which will split it into
`Nav` and `Cau`.
As a part of these `Decl` reworks, the `src_node` field is eliminated.
This change aids incremental compilation, and simplifies `Decl`. In some
cases -- particularly in backends -- the source location of a
declaration is desired. This was previously `Decl.srcLoc` and worked for
any `Decl`. Now, it is `Decl.navSrcLoc` in reference to the upcoming
refactor, since the set of `Decl`s this works for precisely corresponds
to what will in future become a `Nav` -- that is, source-level
declarations and generic function instantiations, but *not* type owner
Decls.
This commit introduces more tags to `LazySrcLoc.Offset` so as to
eliminate the concept of `error.NeededSourceLocation`. Now, `.unneeded`
should only be used to assert that an error path is unreachable. In the
future, uses of `.unneeded` can probably be replaced with `undefined`.
The `src_decl` field of `Sema.Block` no longer has a role in type
resolution. Its main remaining purpose is to handle namespacing of type
names. It will be eliminated entirely in a future commit to remove
another undue responsibility from `Decl`.
It is worth noting that in future, the `Zcu.SrcLoc` type should probably
be eliminated entirely in favour of storing `Zcu.LazySrcLoc` values.
This is because `Zcu.SrcLoc` is not valid across incremental updates,
and we want to be able to reuse error messages from previous updates
even if the source file in question changed. The error reporting logic
should instead simply resolve the location from the `LazySrcLoc` on the
fly.
|
|
This is in preparation for some upcoming changes to how we represent
source locations in the compiler. The bulk of the change here is dealing
with the removal of `src()` methods from `Zir` types.
|
|
This was a "fake" type used to handle C varargs parameters, much like
generic poison. In fact, it is treated identically to generic poison in
all cases other than one (the final coercion of a call argument), which
is trivially special-cased. Thus, it makes sense to remove this special
tag and instead use `generic_poison_type` in its place. This fixes
several bugs in Sema related to missing handling of this tag.
Resolves: #19781
|
|
Closes #19721
|
|
We've got a big one here! This commit reworks how we represent pointers
in the InternPool, and rewrites the logic for loading and storing from
them at comptime.
Firstly, the pointer representation. Previously, pointers were
represented in a highly structured manner: pointers to fields, array
elements, etc, were explicitly represented. This works well for simple
cases, but is quite difficult to handle in the cases of unusual
reinterpretations, pointer casts, offsets, etc. Therefore, pointers are
now represented in a more "flat" manner. For types without well-defined
layouts -- such as comptime-only types, automatic-layout aggregates, and
so on -- we still use this "hierarchical" structure. However, for types
with well-defined layouts, we use a byte offset associated with the
pointer. This allows the comptime pointer access logic to deal with
reinterpreted pointers far more gracefully, because the "base address"
of a pointer -- for instance a `field` -- is a single value which
pointer accesses cannot exceed since the parent has undefined layout.
This strategy is also more useful to most backends -- see the updated
logic in `codegen.zig` and `codegen/llvm.zig`. For backends which do
prefer a chain of field and elements accesses for lowering pointer
values, such as SPIR-V, there is a helpful function in `Value` which
creates a strategy to derive a pointer value using ideally only field
and element accesses. This is actually more correct than the previous
logic, since it correctly handles pointer casts which, after the dust
has settled, end up referring exactly to an aggregate field or array
element.
In terms of the pointer access code, it has been rewritten from the
ground up. The old logic had become rather a mess of special cases being
added whenever bugs were hit, and was still riddled with bugs. The new
logic was written to handle the "difficult" cases correctly, the most
notable of which is restructuring of a comptime-only array (for
instance, converting a `[3][2]comptime_int` to a `[2][3]comptime_int`.
Currently, the logic for loading and storing work somewhat differently,
but a future change will likely improve the loading logic to bring it
more in line with the store strategy. As far as I can tell, the rewrite
has fixed all bugs exposed by #19414.
As a part of this, the comptime bitcast logic has also been rewritten.
Previously, bitcasts simply worked by serializing the entire value into
an in-memory buffer, then deserializing it. This strategy has two key
weaknesses: pointers, and undefined values. Representations of these
values at comptime cannot be easily serialized/deserialized whilst
preserving data, which means many bitcasts would become runtime-known if
pointers were involved, or would turn `undefined` values into `0xAA`.
The new logic works by "flattening" the datastructure to be cast into a
sequence of bit-packed atomic values, and then "unflattening" it; using
serialization when necessary, but with special handling for `undefined`
values and for pointers which align in virtual memory. The resulting
code is definitely slower -- more on this later -- but it is correct.
The pointer access and bitcast logic required some helper functions and
types which are not generally useful elsewhere, so I opted to split them
into separate files `Sema/comptime_ptr_access.zig` and
`Sema/bitcast.zig`, with simple re-exports in `Sema.zig` for their small
public APIs.
Whilst working on this branch, I caught various unrelated bugs with
transitive Sema errors, and with the handling of `undefined` values.
These bugs have been fixed, and corresponding behavior test added.
In terms of performance, I do anticipate that this commit will regress
performance somewhat, because the new pointer access and bitcast logic
is necessarily more complex. I have not yet taken performance
measurements, but will do shortly, and post the results in this PR. If
the performance regression is severe, I will do work to to optimize the
new logic before merge.
Resolves: #19452
Resolves: #19460
|
|
This deletes a ton of lookups and avoids many UAF bugs.
Closes #19485
|
|
`{}` for decls
`{p}` for enum fields
`{p_}` for struct fields and in contexts following a `.`
Elsewhere, `{p}` was used since it's equivalent to the old behavior.
|
|
There is no way to know the expected parent pointer attributes (most
notably alignment) from the type of the field pointer, so provide them
in the first argument.
|
|
Closes #14904
|
|
Legacy anon decls now have three uses:
* Type owner decls
* Function owner decls
* `@export` and `@extern`
Therefore, there are no longer any cases where we wish to explicitly
omit legacy anon decls from the binary. This means we can remove the
concept of an "alive" vs "dead" `Decl`, which also allows us to remove
the separate `anon_work_queue` in `Compilation`.
|
|
`Decl` can no longer store un-interned values, so this field is now
unnecessary. The type can instead be fetched with the new `typeOf`
helper method, which just gets the type of the Decl's `Value`.
|
|
This commit changes how we represent comptime-mutable memory
(`comptime var`) in the compiler in order to implement the intended
behavior that references to such memory can only exist at comptime.
It does *not* clean up the representation of mutable values, improve the
representation of comptime-known pointers, or fix the many bugs in the
comptime pointer access code. These will be future enhancements.
Comptime memory lives for the duration of a single Sema, and is not
permitted to escape that one analysis, either by becoming runtime-known
or by becoming comptime-known to other analyses. These restrictions mean
that we can represent comptime allocations not via Decl, but with state
local to Sema - specifically, the new `Sema.comptime_allocs` field. All
comptime-mutable allocations, as well as any comptime-known const allocs
containing references to such memory, live in here. This allows for
relatively fast checking of whether a value references any
comptime-mtuable memory, since we need only traverse values up to
pointers: pointers to Decls can never reference comptime-mutable memory,
and pointers into `Sema.comptime_allocs` always do.
This change exposed some faulty pointer access logic in `Value.zig`.
I've fixed the important cases, but there are some TODOs I've put in
which are definitely possible to hit with sufficiently esoteric code. I
plan to resolve these by auditing all direct accesses to pointers (most
of them ought to use Sema to perform the pointer access!), but for now
this is sufficient for all realistic code and to get tests passing.
This change eliminates `Zcu.tmp_hack_arena`, instead using the Sema
arena for comptime memory mutations, which is possible since comptime
memory is now local to the current Sema.
This change should allow `Decl` to store only an `InternPool.Index`
rather than a full-blown `ty: Type, val: Value`. This commit does not
perform this refactor.
|
|
A pointer type already has an alignment, so this information does not
need to be duplicated on the function type. This already has precedence
with addrspace which is already disallowed on function types for this
reason. Also fixes `@TypeOf(&func)` to have the correct addrspace and
alignment.
|
|
|
|
This was an oversight in my original design. This new form of dependency
is invalidated when the resolved IES for a runtime function changes.
|
|
Most basic re-analysis logic is now in place. Trivial updates are
hitting linker assertions.
|
|
|
|
|
|
This fixes an issue with the implementation of #18816. Consider the
following code:
```zig
pub fn Wrap(comptime T: type) type {
return struct {
pub const T1 = T;
inner: struct { x: T1 },
};
}
```
Previously, the type of `inner` was not considered to be "capturing" any
value, as `T1` is a decl. However, since it is declared within a generic
function, this decl reference depends on the context, and thus should be
treated as a capture.
AstGen has been augmented to tunnel references to decls through closure
when the decl was declared in a potentially-generic context (i.e. within
a function).
|
|
This implements the accepted proposal #18816. Namespace-owning types
(struct, enum, union, opaque) are no longer unique whenever analysed;
instead, their identity is determined based on their AST node and the
set of values they capture.
Reified types (`@Type`) are deduplicated based on the structure of the
type created. For instance, if two structs are created by the same
reification with identical fields, layout, etc, they will be the same
type.
This commit does not produce a working compiler; the next commit, adding
captures for decl references, is necessary. It felt appropriate to split
this up.
Resolves: #18816
|
|
These were previously associated with the type's namespace, but we need
to store them directly in the InternPool for #18816.
|
|
Namespace types (`struct`, `enum`, `union`, `opaque`) do not use
structural equality - equivalence is based on their Decl index (and soon
will change to AST node + captures). However, we previously stored all
other information in the corresponding `InternPool.Key` anyway. For
logical consistency, it makes sense to have the key only be the true key
(that is, the Decl index) and to load all other data through another
function. This introduces those functions, by the name of
`loadStructType` etc. It's a big diff, but most of it is no-brainer
changes.
In future, it might be nice to eliminate a bunch of the loaded state in
favour of accessor functions on the `LoadedXyzType` types (like how we
have `LoadedUnionType.size()`), but that can be explored at a later
date.
|
|
This changes the representation of closures in Zir and Sema. Rather than
a pair of instructions `closure_capture` and `closure_get`, the system
now works as follows:
* Each ZIR type declaration (`struct_decl` etc) contains a list of
captures in the form of ZIR indices (or, for efficiency, direct
references to parent captures). This is an ordered list; indexes into
it are used to refer to captured values.
* The `extended(closure_get)` ZIR instruction refers to a value in this
list via a 16-bit index (limiting this index to 16 bits allows us to
store this in `extended`).
* `Module.Namespace` has a new field `captures` which contains the list
of values captured in a given namespace. This is initialized based on
the ZIR capture list whenever a type declaration is analyzed.
This change eliminates `CaptureScope` from semantic analysis, which is a
nice simplification; but the main motivation here is that this change is
a prerequisite for #18816.
|
|
* Introduce `-Ddebug-extensions` for enabling compiler debug helpers
* Replace safety mode checks with `std.debug.runtime_safety`
* Replace debugger helper checks with `!builtin.strip_debug_info`
Sometimes, you just have to debug optimized compilers...
|
|
Part of an effort to ship more of the compiler in source form.
|
|
Note that the correctness of these enum tag values is still protected
by the comptime logic at the top of Zcu (currently src/Module.zig).
|
|
Part of an effort to ship more of the compiler in source form.
|
|
x86_64: pass more tests
|
|
* make test names contain the fully qualified name
* make test filters match the fully qualified name
* allow multiple test filters, where a test is skipped if it does not
match any of the specified filters
|
|
These used to be lowered elementwise in air, and now are a single air
instruction that can be lowered elementwise in the backend if necessary.
|
|
|
|
If an adapted string key with embedded nulls was put in a hash map with
`std.hash_map.StringIndexAdapter`, then an incorrect hash would be
entered for that entry such that it is possible that when looking for
the exact key that matches the prefix of the original key up to the
first null would sometimes match this entry due to hash collisions and
sometimes not if performed later after a grow + rehash, causing the same
key to exist with two different indices breaking every string equality
comparison ever, for example claiming that a container type doesn't
contain a field because the field name string in the struct and the
string representing the identifier to lookup might be equal strings but
have different string indices. This could maybe be fixed by changing
`std.hash_map.StringIndexAdapter.hash` to only hash up to the first
null, therefore ensuring that the entry's hash is correct and that all
future lookups will be consistent, but I don't trust anything so instead
I assert that there are no embedded nulls.
|
|
Begin re-implementing incremental compilation
|
|
This commit only does the file rename to be friendlier to version
control conflicts.
|
|
* Functions failing codegen now set this failure on the function
analysis state. Decl analysis `codegen_failure` is reserved for
failures generating constant values.
* `liveness_failure` is consolidated into `codegen_failure`, as we do
not need to distinguish these, and Liveness.Verify is just a debugging
feature anyway.
* `sema_failure_retryable` and `codegen_failure_retryable` are removed.
Instead, retryable failures are recorded in the new
`Zcu.retryable_failures` list. On an incremental update, this list is
flushed, and all elements are marked as outdated so that we re-attempt
analysis and code generation.
Also remove the `generation` fields from `Zcu` and `Decl` as these are
not needed by our new strategy for incremental updates.
|
|
Sema now tracks dependencies appropriately. Early logic in Zcu for
resolving outdated decls/functions is in place. The setup used does not
support `usingnamespace`; compilations using this construct are not yet
supported by this incremental compilation model.
|
|
This change eliminates some problematic recursive logic in InternPool,
and provides a safer API.
|
|
It is problematic for the cached `InternPool` state to directly
reference ZIR instruction indices, as these are not stable across
incremental updates. The existing ZIR mapping logic attempts to handle
this by iterating the existing Decl graph for a file after `AstGen` and
update ZIR indices on `Decl`s, struct types, etc. However, this is
unreliable due to generic instantiations, and relies on specialized
logic for everything which may refer to a ZIR instruction (e.g. a
struct's owner decl). I therefore determined that a prerequisite change
for incremental compilation would be to rework how we store these
indices.
This commit introduces a `TrackedInst` type which provides a stable
index (`TrackedInst.Index`) for a single ZIR instruction in the
compilation. The `InternPool` now stores these values in place of ZIR
instruction indices. This makes the ZIR mapping logic relatively
trivial: after `AstGen` completes, we simply iterate all `TrackedInst`
values and update those indices which have changed. In future, if the
corresponding ZIR instruction has been removed, we must also invalidate
any dependencies on this instruction to trigger any required
re-analysis, however the dependency system does not yet exist.
|
|
This commit changes how declarations (`const`, `fn`, `usingnamespace`,
etc) are represented in ZIR. Previously, these were represented in the
container type's extra data (e.g. as trailing data on a `struct_decl`).
However, this introduced the complexity of the ZIR mapping logic having
to also correlate some ZIR extra data indices. That isn't really a
problem today, but it's tricky for the introduction of `TrackedInst` in
the commit following this one. Instead, these type declarations now
simply contain a trailing list of ZIR indices to `declaration`
instructions, which directly encode all data related to the declaration
(including containing the declaration's body). Additionally, the ZIR for
`align` etc have been split out into their own bodies. This is not
strictly necessary, but it's much simpler to understand for an
insignificant cost in bytes, and will simplify the resolution of #131
(where we may need to evaluate the pointer type, including align etc,
without immediately evaluating the value body).
|
|
|
|
Closes #17817
|
|
|
|
|
|
Closes #18039
|
|
The motivating problem here was a memory leak in the hash maps of
Module.Namespace.
The commit deletes more of the legacy incremental compilation
implementation. It had things like use of orderedRemove and trying to do
too much OOP-style creation and deletion of objects.
Instead, this commit iterates over all the namespaces on Module deinit
and calls deinit on the hash map fields. This logic is much simpler to
reason about.
Similarly, change global inline assembly to an array hash map since
iterating over the values is a primary use of it, and clean up the
remaining values on Module deinit, solving another memory leak.
After this there are no more memory leaks remaining when using the
x86 backend in a libc-less compiler.
|
|
This change allows struct field inits to use layout information
of their own struct without causing a circular dependency.
`semaStructFields` caches the ranges of the init bodies in the `StructType`
trailing data. The init bodies are then resolved by `resolveStructFieldInits`,
which is called before the inits are actually required.
Within the init bodies, the struct decl's instruction is repurposed to refer
to the field type itself. This is to allow us to easily rebuild the inst_map
mapping required for the init body instructions to refer to the field type.
Thanks to @mlugg for the guidance on this one!
|