Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm linker: aggressive rewrite towards Data-Oriented Design #22220

Open
wants to merge 24 commits into
base: master
Choose a base branch
from

Conversation

andrewrk
Copy link
Member

@andrewrk andrewrk commented Dec 13, 2024

The goals of this branch are to:

  • compile faster when using the wasm linker and backend
  • enable saving compiler state by directly copying in-memory linker state to disk.
  • more efficient compiler memory utilization
  • introduce integer type safety to wasm linker code
  • generate better WebAssembly code
  • fully participate in incremental compilation
  • do as much work as possible outside of flush(), while continuing to do linker garbage collection.
  • avoid unnecessary heap allocations
  • avoid unnecessary indirect function calls

In order to accomplish these goals, this removes the ZigObject abstraction, as well as Symbol and Atom. These abstractions resulted in overly generic code, doing unnecessary work, and needless complications that simply go away by creating a better in-memory data model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to wasm code during linking, with optimal function indexes etc, or relocations are emitted if outputting an object. Previously, this would always emit relocations, which are fully unnecessary when emitting an executable, and required all function calls to use the maximum size LEB encoding.

This branch introduces the concept of the "prelink" phase which occurs after all object files have been parsed, but before any Zcu updates are sent to the linker. This allows the linker to fully parse all objects into a compact memory model, which is guaranteed to be complete when Zcu code is generated.

Merge Checklist

  • implement the prelink phase in the frontend
  • fix regressions / get the tests passing again
  • eliminate TODOs

Demo: Incremental Compilation

After this branch is ready to merge, I'll put a demo here.

Demo: Serializing and Deserializing Linker State

After this branch is ready to merge, I'll put a demo here.

Followup

After landing this branch I plan to set a firm release date for the 0.14.0 tag.

ELF, COFF, and MachO need the same treatment. I started with Wasm because it is significantly fewer lines of code. Some strategies can be shared there, however, I don't expect to keep as much in memory with those linkers, since the total object file size could be enormous.

Post-Merge Roadmap:

  1. One month of QA for 0.14.0
  2. Release 0.14.0
  3. Enhance wasm linker enough to pass LLD's test suite for Wasm.
  4. Remove dependency on LLD for Wasm.
  5. Repeat steps 3-4 for ELF
  6. Repeat steps 3-4 for COFF
  7. Repeat steps 3-4 for MachO
  8. Rework ELF linker code with respect to incremental compilation goals
  9. Rework COFF linker code with respect to incremental compilation goals
  10. Rework MachO linker code with respect to incremental compilation goals

The goals of this branch are to:
* compile faster when using the wasm linker and backend
* enable saving compiler state by directly copying in-memory linker
  state to disk.
* more efficient compiler memory utilization
* introduce integer type safety to wasm linker code
* generate better WebAssembly code
* fully participate in incremental compilation
* do as much work as possible outside of flush(), while continuing to do
  linker garbage collection.
* avoid unnecessary heap allocations
* avoid unnecessary indirect function calls

In order to accomplish this goals, this removes the ZigObject
abstraction, as well as Symbol and Atom. These abstractions resulted
in overly generic code, doing unnecessary work, and needless
complications that simply go away by creating a better in-memory data
model and emitting more things lazily.

For example, this makes wasm codegen emit MIR which is then lowered to
wasm code during linking, with optimal function indexes etc, or
relocations are emitted if outputting an object. Previously, this would
always emit relocations, which are fully unnecessary when emitting an
executable, and required all function calls to use the maximum size LEB
encoding.

This branch introduces the concept of the "prelink" phase which occurs
after all object files have been parsed, but before any Zcu updates are
sent to the linker. This allows the linker to fully parse all objects
into a compact memory model, which is guaranteed to be complete when Zcu
code is generated.

This commit is not a complete implementation of all these goals; it is
not even passing semantic analysis.
Makes linker functions have small error sets, required to report
diagnostics properly rather than having a massive error set that has a
lot of codes.

Other linker implementations are not ported yet.

Also the branch is not passing semantic analysis yet.
See #363. Please file issues rather than making TODO comments.
mainly, rework how relocations works. This is the point at which symbol
indexes are known - not before. And don't emit unnecessary relocations!
They're only needed when emitting an object file.

Changes wasm linker to keep MIR around long-lived so that fixups can be
reapplied after linker garbage collection.

use labeled switch while we're at it
Still, the branch is not yet passing semantic analysis.
This branch is passing type checking now.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant