wasm linker: aggressive rewrite towards Data-Oriented Design #22220

andrewrk · 2024-12-13T02:20:10Z

The goals of this branch are to:

compile faster when using the wasm linker and backend
enable saving compiler state by directly copying in-memory linker state to disk.
more efficient compiler memory utilization
introduce integer type safety to wasm linker code
generate better WebAssembly code
fully participate in incremental compilation
do as much work as possible outside of flush(), while continuing to do linker garbage collection.
avoid unnecessary heap allocations
avoid unnecessary indirect function calls

In order to accomplish these goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding.

This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated.

Merge Checklist

implement the prelink phase in the frontend
fix regressions / get the tests passing again
eliminate TODOs

Demo: Incremental Compilation

After this branch is ready to merge, I'll put a demo here.

Demo: Serializing and Deserializing Linker State

After this branch is ready to merge, I'll put a demo here.

Followup

After landing this branch I plan to set a firm release date for the 0.14.0 tag.

ELF, COFF, and MachO need the same treatment. I started with Wasm because it is significantly fewer lines of code. Some strategies can be shared there, however, I don't expect to keep as much in memory with those linkers, since the total object file size could be enormous.

Post-Merge Roadmap:

One month of QA for 0.14.0
Release 0.14.0
Enhance wasm linker enough to pass LLD's test suite for Wasm.
Remove dependency on LLD for Wasm.
Repeat steps 3-4 for ELF
Repeat steps 3-4 for COFF
Repeat steps 3-4 for MachO
Rework ELF linker code with respect to incremental compilation goals
Rework COFF linker code with respect to incremental compilation goals
Rework MachO linker code with respect to incremental compilation goals

The goals of this branch are to: * compile faster when using the wasm linker and backend * enable saving compiler state by directly copying in-memory linker state to disk. * more efficient compiler memory utilization * introduce integer type safety to wasm linker code * generate better WebAssembly code * fully participate in incremental compilation * do as much work as possible outside of flush(), while continuing to do linker garbage collection. * avoid unnecessary heap allocations * avoid unnecessary indirect function calls In order to accomplish this goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily. For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding. This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated. This commit is not a complete implementation of all these goals; it is not even passing semantic analysis.

Makes linker functions have small error sets, required to report diagnostics properly rather than having a massive error set that has a lot of codes. Other linker implementations are not ported yet. Also the branch is not passing semantic analysis yet.

See #363. Please file issues rather than making TODO comments.

mainly, rework how relocations works. This is the point at which symbol indexes are known - not before. And don't emit unnecessary relocations! They're only needed when emitting an object file. Changes wasm linker to keep MIR around long-lived so that fixups can be reapplied after linker garbage collection. use labeled switch while we're at it

Still, the branch is not yet passing semantic analysis.

This branch is passing type checking now.

andrewrk added 24 commits December 12, 2024 17:49

remove "FIXME" from codebase

5a4252f

See #363. Please file issues rather than making TODO comments.

macho linker conforms to explicit error sets, again

53a608b

elf linker: conform to explicit error sets

5f4240c

rework error handling in the backends

c7f1eb5

compiler: add type safety for export indices

16d405e

std.array_list: tiny refactor for pleasure

35361d4

wasm codegen: fix some compilation errors

0beaca0

wasm: implement errors_len as a MIR opcode with no linker involvement

0392ea1

wasm codegen: switch on bool instead of int

b553e35

wasm codegen: rename func: CodeGen to cg: CodeGen

0939e97

wasm: move error_name lowering to Emit phase

f16253a

wasm: use call_intrinsic MIR instruction

a5d3c36

switch to ArrayListUnmanaged for machine code

1e5410e

wasm: fix many compilation errors

12a26ae

Still, the branch is not yet passing semantic analysis.

wasm linker: support export section as implicit symbols

7d3c56a

frontend: add const to more Zcu pointers

d074c1a

wasm linker: implement name, module name, and type for function imports

389c235

wasm linker: flush implemented up to the export section

ea14c09

wasm linker: flush export section

23c3bc4

wasm linker: finish the flush function

786e082

This branch is passing type checking now.

fix compilation when enabling llvm

c9bf6eb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

andrewrk commented Dec 13, 2024 •

edited

Loading

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

Are you sure you want to change the base?

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

Conversation

andrewrk commented Dec 13, 2024 • edited Loading

Merge Checklist

Demo: Incremental Compilation

Demo: Serializing and Deserializing Linker State

Followup

andrewrk commented Dec 13, 2024 •

edited

Loading