Modern Compiler Implementation in C

Andrew W. Appel, Maia Ginsburg

Mentioned 7

Describes all phases of a modern compiler, including techniques in code generation and register allocation for imperative, functional and object-oriented languages.

More on Amazon.com

Mentioned in questions and answers.

Preferred languages: C/C++, Java, and Ruby.

I am looking for some helpful books/tutorials on how to write your own compiler simply for educational purposes. I am most familiar with C/C++, Java, and Ruby, so I prefer resources that involve one of those three, but any good resource is acceptable.

The Dragon Book is definitely the "building compilers" book, but if your language isn't quite as complicated as the current generation of languages, you may want to look at the Interpreter pattern from Design Patterns.

The example in the book designs a regular expression-like language and is well thought through, but as they say in the book, it's good for thinking through the process but is effective really only on small languages. However, it is much faster to write an Interpreter for a small language with this pattern than having to learn about all the different types of parsers, yacc and lex, et cetera...

I think Modern Compiler Implementation in ML is the best introductory compiler writing text. There's a Java version and a C version too, either of which might be more accessible given your languages background. The book packs a lot of useful basic material (scanning and parsing, semantic analysis, activation records, instruction selection, RISC and x86 native code generation) and various "advanced" topics (compiling OO and functional languages, polymorphism, garbage collection, optimization and single static assignment form) into relatively little space (~500 pages).

I prefer Modern Compiler Implementation to the Dragon book because Modern Compiler implementation surveys less of the field--instead it has really solid coverage of all the topics you would need to write a serious, decent compiler. After you work through this book you'll be ready to tackle research papers directly for more depth if you need it.

I must confess I have a serious soft spot for Niklaus Wirth's Compiler Construction. It is available online as a PDF. I find Wirth's programming aesthetic simply beautiful, however some people find his style too minimal (for example Wirth favors recursive descent parsers, but most CS courses focus on parser generator tools; Wirth's language designs are fairly conservative.) Compiler Construction is a very succinct distillation of Wirth's basic ideas, so whether you like his style or not or not, I highly recommend reading this book.

I concur with the Dragon Book reference; IMO, it is the definitive guide to compiler construction. Get ready for some hardcore theory, though.

If you want a book that is lighter on theory, Game Scripting Mastery might be a better book for you. If you are a total newbie at compiler theory, it provides a gentler introduction. It doesn't cover more practical parsing methods (opting for non-predictive recursive descent without discussing LL or LR parsing), and as I recall, it doesn't even discuss any sort of optimization theory. Plus, instead of compiling to machine code, it compiles to a bytecode that is supposed to run on a VM that you also write.

It's still a decent read, particularly if you can pick it up for cheap on Amazon. If you only want an easy introduction into compilers, Game Scripting Mastery is not a bad way to go. If you want to go hardcore up front, then you should settle for nothing less than the Dragon Book.

Big List of Resources:

Legend:

  • ¶ Link to a PDF file
  • $ Link to a printed book

This is a pretty vague question, I think; just because of the depth of the topic involved. A compiler can be decomposed into two separate parts, however; a top-half and a bottom-one. The top-half generally takes the source language and converts it into an intermediate representation, and the bottom half takes care of the platform specific code generation.

Nonetheless, one idea for an easy way to approach this topic (the one we used in my compilers class, at least) is to build the compiler in the two pieces described above. Specifically, you'll get a good idea of the entire process by just building the top-half.

Just doing the top half lets you get the experience of writing the lexical analyzer and the parser and go to generating some "code" (that intermediate representation I mentioned). So it will take your source program and convert it to another representation and do some optimization (if you want), which is the heart of a compiler. The bottom half will then take that intermediate representation and generate the bytes needed to run the program on a specific architecture. For example, the the bottom half will take your intermediate representation and generate a PE executable.

Some books on this topic that I found particularly helpful was Compilers Principles and Techniques (or the Dragon Book, due to the cute dragon on the cover). It's got some great theory and definitely covers Context-Free Grammars in a really accessible manner. Also, for building the lexical analyzer and parser, you'll probably use the *nix tools lex and yacc. And uninterestingly enough, the book called "lex and yacc" picked up where the Dragon Book left off for this part.

The quickest approach is through two books:

1990 version of An Introduction to Compiling Techniques, a First Course using ANSI C, LeX, and YaCC by JP Bennett - a perfect balance of example code, parsing theory and design- it contains a complete compiler written in C, lex and yacc for a simple grammar

Dragon Book (older version) - mostly a detailed reference for the features not covered in the former book

How do you evaluate publications? Im currently searching for a CS research topic and reading various papers. My dilemma on reading a paper usually is - is it really worthwhile continuing research in this topic?

what are the indicators of impact of research?

btw, im currently interested in - Liveness analysis. what do you think of it?

The highest impact papers are those that are cited most. Citeseer and the ACM will show you how often a paper is cited. Really influential papers are cited long after they cease to actually be useful. Everyone cites papers they haven't read because they are certain that the paper is the definitive reference.

The definitive way to know the good papers is to have looked at everything in the area, but the question really is where to start.

A good strategy I've found is to start in textbooks, as they will sometimes cite the most important work at the time they are written. Obviously, use a recent text. Liveness comes under compilers, so try Cooper/Torczon, Muchnick, or Appel. Look at the end of the chapters, where there are often mini-literature surveys. (I don't usually recommend the Dragon Book. I just checked it though, and there's nothing useful.)

Finally, look for others to do the work for you. Look at the comments on the top of source files in gcc or LLVM. Look for survey papers. Look for papers that you already know the content of who touched on the topic, and follow the citation trail.

Example: Lets take a quick example. I remember a few papers that use liveness. One is Sam Guyer's 2006 PLDI paper, "Free Me". And I did a bit of work on SSA recently, and people use liveness a lot with SSA. I don't remember a specific recent paper, but I expect that Briggs' semi-pruned SSA probably talks about liveness, so that's somewhere to go second.

So looking at Guyer's paper, I went to the bibliography, and there were maybe two papers that mentioned liveness:

  • M. Hirzel, A. Diwan, and J. Henkel. On the usefulness of type and liveness accuracy for garbage collection and leak detection. ACM TOPLAS
  • H. Inoue, D. Stefanovi´c, and S. Forrest. Object lifetime prediction in Java. Technical Report TR-CS-2003-28, University of New Mexico, May 2003.

TOPLAS is a quality journal, so I'd look there first. And so on...

I'm planning to write a simple interpreter ( like TI-BASIC language for TI-89 ) or compiler ( C compiler ) using C++. I'm currently taking a course about programming languages, and learning the basic of BNF, EBNF. I wonder is it good enough to start on this project? In addition, could anyone know some good books about this area? Any feedback would be greatly appreciated.

Thanks,

Everyone I know rants and raves about Modern Compiler Construction in C, although the Java version usually gets more credit. However if you want a more C++ focused book you can't go wrong with Writing Compiler and Interpreters.

I want to build a lexer in C and I am following the dragon book, I can understand the state transitions but how to implement them?

Is there a better book?

The fact that I have to parse a string through a number of states so that I can tell whether the string is acceptable or not!

If you're looking for a more modern treatment than the dragon book(s) : Andrew W. Appel and Maia Ginsburg, Modern Compiler Implementation in C, Cambridge University Press, 2008.

Chapter 2 is focused on Lexical Analysis : Lexical tokens, Regular expressions, Finite automata; Nondeterministic Finite Automata; Lexical analyzer generators

Look at the Table of Contents

Big List of Resources:

Legend:
¶ Link to a PDF
$ Link to a printed book

A compiler is a program which translates one language into another. Compiler construction is the process of creating a compiler.

The tag should be applied to questions concerning the programming of compilers or for questions about the detailed inner workings of compilers.

One language to another? I thought they made executables!

The notional, garden variety compiler does exactly that: it translates a human readable computer programming language (like fortran or c++ or java) into a machine executable format. Or not.

In fact many real world compilers translate a high level language into assembly code which is subsequently assembled by a separate program. The standard java compiler translate java code into JVM bytecode, which must be run by a dedicated program (the Java execution environment) which may include a Just In Time (JIT) compiler that translates the bytecode into chip native machine instructions on the fly. The earliest versions of the language that became c++ were called cfront and were compiled to c. And so on.

Big List of Resources: