| Age | Commit message (Collapse) | Author |
|
Removes the `files` field from the Wasm linker, storing the ZigObject
as its own field instead using a tagged union.
This removes a layer of indirection when accessing the ZigObject, and
untangles logic so that we can introduce a "pre-link" phase that
prepares the linker state to handle only incremental updates to the
ZigObject and then minimize logic inside flush().
Furthermore, don't make array elements store their own indexes, that's
always a waste.
Flattens some of the file system hierarchy and unifies variable names
for easier refactoring.
Introduces type safety for optional object indexes.
|
|
|
|
|
|
This introduces some type safety so we cannot accidently give an atom
index as a symbol index. This also means we do not have to store any
optionals and therefore allow for memory optimizations. Lastly, we can
now always simply access the symbol index of an atom, rather than having
to call `getSymbolIndex` as it is easy to forget.
|
|
Rather than using the optional, we now directly use `File.Index` which
can already represent an unknown file due to its `.null` value. This
means we do not pay for the memory cost.
This type of index is now used for:
- SymbolLoc
- Key of the functions map
- InitFunc
Now we can simply pass things like atom.file, object.file, loc.file etc
whenever we need to access its representing object file which makes it
a lot easier.
|
|
|
|
Also, consolidate the creation of Atoms so they all use `createAtom`.
|
|
This corrects calculating the offsets to the code section as we now
correctly allocate the code atoms during write taking the 'size' into
account. We also handle dead symbols which are garbage-collected by
writing -2 and -1 to skip ranges, loc and other sections respectively.
|
|
We delay atom allocation for the code section until we write the actual
atoms. We do this to ensure the offset of the atom also includes the
'size' field which is leb128-encoded and therefore variable. We need this
correct offset to ensure debug info works correctly.
The ordering of the code section is now automatic due to iterating the
function section and then finding the corresponding atom to each
function. This also ensures each function corresponds to the right atom,
and they do not go out-of-sync.
Lastly, we removed the `next` field as it is no longer required and also
removed manually setting the offset in synthetic functions. This means
atoms use less memory and synthetic functions are less prone. They will
also be placed in order of function order correctly.
|
|
Rather than parsing every symbol into an atom, we now only parse them
into an atom when such atom is marked. This means garbage-collected
symbols will also not be parsed into atoms, and neither are discarded
symbols which have been resolved by other symbols. (Such as multiple
weak symbols).
This also introduces a binary search for finding the start index into
the list of relocations. This speeds up finding the corresponding
relocations tremendously as they're ordered ascended by address.
Lastly, we re-use the memory of atom's data as well as relocations
instead of duplicating it. This means we half the memory usage of
atom's data and relocations for linked object files. As we are
aware of decls and synthetic atoms, we free the memory of those
atoms indepedently of the atoms of object files to prevent double-frees.
|
|
Let's take this breaking change opportunity to fix the style of this
enum.
|
|
Use inline to vastly simplify the exposed API. This allows a
comptime-known endian parameter to be propogated, making extra functions
for a specific endianness completely unnecessary.
|
|
Structs were previously using `SegmentedList` to be given indexes, but
were not actually backed by the InternPool arrays.
After this, the only remaining uses of `SegmentedList` in the compiler
are `Module.Decl` and `Module.Namespace`. Once those last two are
migrated to become backed by InternPool arrays as well, we can introduce
state serialization via writing these arrays to disk all at once.
Unfortunately there are a lot of source code locations that touch the
struct type API, so this commit is still work-in-progress. Once I get it
compiling and passing the test suite, I can provide some interesting
data points such as how it affected the InternPool memory size and
performance comparison against master branch.
I also couldn't resist migrating over a bunch of alignment API over to
use the log2 Alignment type rather than a mismash of u32 and u64 byte
units with 0 meaning something implicitly different and special at every
location. Turns out you can do all the math you need directly on the
log2 representation of alignments.
|
|
|
|
Most of this migration was performed automatically with `zig fmt`. There
were a few exceptions which I had to manually fix:
* `@alignCast` and `@addrSpaceCast` cannot be automatically rewritten
* `@truncate`'s fixup is incorrect for vectors
* Test cases are not formatted, and their error locations change
|
|
|
|
Linker now parses segments with regards to TLS segments. If the name
represents a TLS segment but does not contain the TLS flag, we set it
manually as the object file is created using an older compiler (LLVM).
For now we panic when we find a TLS relocation and implement those
later.
|
|
For data symbols we will now store its virtual address. This means
we do no longer have to calculate it each time a relocation asks
for the address. This is now done for all data symbols only once
rather than every single relocation for that symbol.
This now also allows us directly store the virtual address of synthetic
symbols without having to create an atom for them. This means we also
don't need to have a "synthetic" segment any longer and do not emit
the synthetic symbols such as __heap_end and __heap_base into the final
binary.
|
|
|
|
|
|
|
|
When exporting a data symbol, generate a regular global and use
the data symbol's virtual addres as the value (init) of the global.
|
|
When an atom has one or multiple aliasses, we we could not find the
target atom from the alias'd symbol. This is solved by ensuring that
we also insert each alias symbol in the symbol-atom map.
|
|
Previously we used the relocation index to find the corresponding
symbol that represents the type. However, the index actually
represents the index into the list of types. We solved this by
first retrieving the original type, and then finding its location
in the new list of types. When the atom file is 'null', it means
the type originates from a Zig function pointer or a synthetic
function. In both cases, the final type index was already resolved
and therefore equals to relocation's index value.
|
|
|
|
Addends in relocations are signed integers as theoretically it could
be a negative number. As Atom's offsets are relative to their parent
section, the relocation value should still result in a postive number.
For this reason, the final result is stored as an unsigned integer.
Also, rather than using `null` for relocations that do not support
addends. We set the value to 0 for those that do not support addends,
and have to call `addendIsPresent` to determine if an addend exists
or not. This means each Relocation costs 4 bytes less than before,
saving memory while linking.
|
|
|
|
Although the wasm-linker previously already supported
debug information in incremental-mode, this was no longer
working as-is with the addition of supporting object-file-parsed
debug information. This commit implements the Zig-created debug information
structure from scratch which is a lot more robust and also allows
being linked with debug information from other object files.
|
|
When linking a Zig-compilation with an object file,
we allow mixing the debug atoms to make sure debug
information is preserved from object files. By default,
we now always initialize all debug sections if the `strip` flag
is unset.
This also fixes relocations for debug information as previously
the offset of an atom wasn't calculated, and neither was the code
size itself which meant that debug lines were off and file names
from other object files were missing.
|
|
This correctly performs a relocation for debug sections.
The result is that the wasm-linker can now correctly create
a binary from object files while preserving all debug information.
|
|
This also fixes performing relocations for data symbols
of which the target symbol exists in an external object file.
We do this by checking if the target symbol was discarded,
and if so: get the new location so that we can find the
corresponding atom that belongs to said new location. Previously
it would always assume the symbol would live in the same file
as the atom/symbol that is doing the relocation.
|
|
Generate symbols for extern variables and try to resolve them.
Unresolved 'data' symbols generate an error as they cannot be
exported from the Wasm runtime into a Wasm module. This means,
they can only be resolved by other object files such as from other
Zig or C code compiled to Wasm.
|
|
|
|
When a new symbol is resolved to an existing symbol where
it doesn't overwrite the existing symbol, we now add this symbol
to the discarded list. This is required so when any relocation points
to the symbol, we can retrieve the correct symbol it's resolved by instead.
|
|
When performing relocations for a type index,
we first check if the target symbol is undefined. In which case,
we will obtain the type from the `import` rather than look into the
`functions` table.
|
|
Multiple symbols can point to the same function, this means that when we loop over
the symbol list, we must deduplicate those functions being added twice.
Additionaly, we must also ensure that when we append a new type and set the type
index on a function, we must not do this again for the same function.
This commit also implements sorting of code atoms to ensure their order matches
the order of the function section to ensure the function signature matches
that of the function body.
|
|
|
|
This commit adds the ability to emit the following debug sections:
.debug_info
.debug_abbrev
.debug_line
.debug_str
Line information and files are now being loaded correctly by browser debuggers.
|
|
|
|
When linking with an object file, verify if a relocation is a table index relocation.
If that's the case, add the relocation target to the function table.
|
|
- atoms may have relocations, so freeing them when we update the parent
atom will cause segfaults.
- Not all declarations will live in symbol_atom
|
|
This also unifies the wasm backend to use `generateSymbol` when lowering a constant
that cannot be lowered to an immediate value.
As both decls and constants are now refactored, the old `genTypedValue` is removed.
|
|
For all symbols read from object files as well as generated from Zig code
will now be interned and have their offset into the string table saved on the `Symbol` instead.
Besides interning, local symbols now also use a decl's fully qualified name.
When a decl/symbol is extern/to-be-imported, the name of the decl itself will be used for symbol resolving.
Similarly for symbols that will be exported, will have their 'export name' set.
|
|
Rather than ping ponging between codegen and the linker to generate the symbols/atoms
for a local constant and its relocations. We now create all neccesary objects within the linker.
This simplifies the code as we can now simply call `lowerUnnamedConst` from anywhere in codegen,
allowing us to further improve lowering constants into .rodata so we do not have to sacrifice
lowering certain types such as decl_ref's where its type is a slice.
|
|
We now correctly implement exporting decls. This means it is possible to export
a decl with a different name than the decl that is doing the export.
This also sets the symbols with the correct flags, so when we emit a relocatable
object file, a linker can correctly resolve symbols and/or export the symbol to the host environment.
This commit also includes fixes to ensure relocations have the correct offset to how other
linkers will expect the offset, rather than what we use internally.
Other linkers accept the offset, relative to the section.
Internally we use an offset relative to the atom.
|
|
When generating a relocatable object file, we now emit a custom "reloc.CODE" and "reloc.DATA" section
which will contain the relocations for each section.
Using a new symbol location -> Atom mapping, we can now easily find the corresponding `Atom` from a symbol.
This can be used to construct the symbol table, as well as easier access to a target atom when performing
a relocation for a data symbol.
|
|
When creating a relocatable object file, we do no longer perform the following actions:
- Merge data segments
- Calculate stack size
- Relocations
We now also make the stack pointer symbol `undefined` for this use case as well as add the symbol
as an import.
|
|
- Correctly get discard symbol by first checking if it was discarded or not.
- Remove imports if extern symbols were resolved by an object file.
- Correctly relocate data symbols by ensuring the atom is from the correct file.
- Fix the `Names` section by using the resolved symbols, rather than the ones defined in Zig code.
|
|
We now correctly allocate and create atoms for symbols from other object files.
Imports are now also resolved and appended when required.
Besides those changes, we now duplicate all symbol names, so we can correctly
generate unique names for unnamed constants.
TODO: String interning
|
|
This implements the merging of all sections, to generate a valid wasm binary where all symbols
have been resolved and their respective sections have been merged into the final binary.
|