An introduction to compiler construction using flex and yacc

Ever wondered, how compilers work or wanted to build one yourself, but just didn't know where to start? Seen the tools commonly used for compiler construction, but just couldn't wrap your head around on how to use them? Well, you have come to the right place then. In today's blogpost, I'd like to give an introduction on how to use flex and yacc, by showing how to implement a toy programming language, as well as a virtual machine able to run the compiled code of programs written in this language.

Since I am a strong believer in ready to run examples and fast results, I will present this how-to in form of heavily documented sourcecode as a ZIP archive for download. The reason for doing it this way instead of providing snippets of code and explaining how to put them together step by step in order to get a working compiler, is to reduce complexity. The bottom up approach only works well with a firm idea of the finished product in mind. Implementing a well known language (e.g. Java) would provide such an idea at the price of adding a lot of overhead to the how-to. Implementing a simplified language removes this overhead, but also abandons the clear idea. The best way to deal with this problem therefore is to provide a ready to use compiler and then give pointers on how to decompose it.

In order to make use of the ZIP file, some previous knowledge is required, which must be gathered elsewhere:

  • Programming experience in general and C skills in particular.
  • Solid understanding of context free grammars and the BNF notation.
  • Some knowledge in assembler programming and/or hardware architectures.

The suggested way of using the how-to is to first unzip the source and build the compiler (instructions for doing this can be found in the doc/USING file). Afterwards get familiar with the toy programming language by running the example programs in the doc/programs directory. The specifications of loopwhile can be found in doc/concepts. The most interesting files in regard to compiler construction are src/engine.h describing the opcodes understood by the virtual machine and src/langdef.y describing the the compiler itself. The ode is heavily documented and should not pose a problem to any seasoned C developer.