Compiler From Scratch - Part 1
Intro
I just really like building compilers; there is a lot to be learned from building compilers and it’s a good way to validate knowledge of algorithms and data structures.
Nothing about building compilers, and even a full new programming language, is hard. But it has potential to blow out in complexity as you add scope, like anything in software development. By the end of these posts, I hope to have a full end to end compiler covering lexing, parsing, some tree rewriting optimizations, code generation, LSP support, and memory management strategies. I’m keeping it flexible and open to changes in the specifications.
Just because I find it fun, I want to do code generation in 3 ways:
- A C backend - let’s me cover doing language to language transformation, or
transpilationas it sometimes known. It is also quickest to produce a working binary for our code. - A FASM backend - let’s me play with flat assembler and producing raw x86_64 instructions. As bare metal as I care to go, and also FASM is just really cool.
- An LLVM backend - let’s us play with the gold-standard backend of languages.
clangcan compile C to LLVM, and Rust also has an LLVM backend so let’s do that too.
However, to support various features of the language, like constant folding, which falls in the realm of being an optimization, it’s also going to have an interpreter mode. This is by far, the easiest way to make the illusion of a programming language coming to life as it allows us to “run” a program in our language, but I will explain the caveats of this later.
So, the goal is to write ShitLang. A shitty language to do shit things and it will do it rather shitly.
This is to ensure that the experience is light hearted and enjoyable, and not a rush to create something
phenomenal. Good languages are drafted over the course of years of experience using other languages,
identifying pain points, and determining solutions to these. Beyond that, they are vetted by a community
of software engineers over more years, and backed by people with an invested interest in seeing the
language “take off” and dominate. ShitLang is shit and will inspire no one and do nothing of any
value in the software development world. But hopefully, you’ll learn something, and more importantly
I will have fun.
What is a compiler?
Compilers are effectively translators; from one data representation to another. The representation they are transforming is actually an expression of computation. When a compiler runs against source code, we’re asking it to transform that language into one the CPU can understand. At least, that’s the high level expectation.
Compilers have several significant parts:
- Lexer
- Parser
- Optimizations? (This is optional, and also potentially several stages in itself)
- Interpreter? (Another optional stage, and possible even final stage for some)
- Intermediate form? (Also potentially optional, depending on design)
- Code generation? (… Wow, seems like everything is potentially optional)
A compiler you write will have a subset of these parts most likely. It will depend on scope.
For ShitLang, we’re going to do all these parts plus more. ShitLang is full of scope creep and
the desire to touch as many different aspects of compiler design as possible.