C to x86-64 compiler, assembler and linker from scratch with no external dependencies (except for go and linux).
Compiler produces System V ABI-compliant position-independant code from a C source (in ANSI C grammar), which is then converted to relocatable ELF files that the linker can link to produce:
- statically linked relocatable file from multiple relocatable files
- statically linked executables
- shared objects and dynamically linkable executables which are compatible with GNU runtime dynamic linker
Linker can also link with object files or shared libraries produced by the GNU toolchain, and the other way around.
Almost all C features are implemented; the more important not-implemented ones are:
- unions and bit fields
- passing structs by value (pass by pointer instead)
- variadic functions
- a couple of arithmetic operators (bitwise ones)
Besides that, a full 64bit type system is supported (integral, floating, strings, pointers, arrays and structs with arbitrary dimensions/nesting).
Assembler is limited to instructions generated by the compiler.
Run: go build .
For all options run ./main --help
- compile to relocatable ELF:
./main -o <path to output object file> -c <path to input c source>
- link relocatables to single relocatable:
/main -o <path to output object file> -r <path first to object file>...
- build statically linked executable:
./main -o <path to output executable> -e <path to input object file>
- build shared library:
./main -o <path to output .so file> [-L <dependency search dir>...] [-l <dependency so>...] -s <path to input object file>
- build dynamically linkable executable:
./main -o <path to output executable file> [-L <dependency search dir>...] [-l <dependency so>...] -d <path to input object file>
Currently no libc-equivalent wrapping is done by the linker; run things with gdb or link with libc manually. As no debug symbols are present to run dynamically linked executables one should find entry point (entry field in readelf -h <file>
) and in gdb set breakpoint on it, then use run
. If code is linked with GNU ld one can instead run it simply from command line.
The compiler uses ANSI C grammar that can be found here, or with slight modifications in resources.
The compiler is built in layers, mostly based on the stanford course. Each layer is implemented from scratch:
- Grammar reader that reads the grammar from resources.
- Lexer that uses defined tokens to tokenize source.
- LALR(1) parser that constructs Abstract Syntax Tree based on the grammar productions.
- Semantic analyzer and type engine that type checks AST and performs more or less advanced error checking.
- Intermediate Representation generator that converts AST to Three Address Code (TAC) IR.
- Code generator that produces System V ABI-compliant, position-independant X86-64 assembly.
For now code generator uses basic register allocation strategy, and nearly no IR/code optimizations are performed. The resulting code is similar to the one that gcc -O0
produces.