LISP in Small Pieces

Christian Queinnec

Mentioned 12

This will become the new standard reference for people wanting to know about the Lisp family of languages.

More on Amazon.com

Mentioned in questions and answers.

Preferred languages: C/C++, Java, and Ruby.

I am looking for some helpful books/tutorials on how to write your own compiler simply for educational purposes. I am most familiar with C/C++, Java, and Ruby, so I prefer resources that involve one of those three, but any good resource is acceptable.

The Dragon Book is definitely the "building compilers" book, but if your language isn't quite as complicated as the current generation of languages, you may want to look at the Interpreter pattern from Design Patterns.

The example in the book designs a regular expression-like language and is well thought through, but as they say in the book, it's good for thinking through the process but is effective really only on small languages. However, it is much faster to write an Interpreter for a small language with this pattern than having to learn about all the different types of parsers, yacc and lex, et cetera...

I think Modern Compiler Implementation in ML is the best introductory compiler writing text. There's a Java version and a C version too, either of which might be more accessible given your languages background. The book packs a lot of useful basic material (scanning and parsing, semantic analysis, activation records, instruction selection, RISC and x86 native code generation) and various "advanced" topics (compiling OO and functional languages, polymorphism, garbage collection, optimization and single static assignment form) into relatively little space (~500 pages).

I prefer Modern Compiler Implementation to the Dragon book because Modern Compiler implementation surveys less of the field--instead it has really solid coverage of all the topics you would need to write a serious, decent compiler. After you work through this book you'll be ready to tackle research papers directly for more depth if you need it.

I must confess I have a serious soft spot for Niklaus Wirth's Compiler Construction. It is available online as a PDF. I find Wirth's programming aesthetic simply beautiful, however some people find his style too minimal (for example Wirth favors recursive descent parsers, but most CS courses focus on parser generator tools; Wirth's language designs are fairly conservative.) Compiler Construction is a very succinct distillation of Wirth's basic ideas, so whether you like his style or not or not, I highly recommend reading this book.

I concur with the Dragon Book reference; IMO, it is the definitive guide to compiler construction. Get ready for some hardcore theory, though.

If you want a book that is lighter on theory, Game Scripting Mastery might be a better book for you. If you are a total newbie at compiler theory, it provides a gentler introduction. It doesn't cover more practical parsing methods (opting for non-predictive recursive descent without discussing LL or LR parsing), and as I recall, it doesn't even discuss any sort of optimization theory. Plus, instead of compiling to machine code, it compiles to a bytecode that is supposed to run on a VM that you also write.

It's still a decent read, particularly if you can pick it up for cheap on Amazon. If you only want an easy introduction into compilers, Game Scripting Mastery is not a bad way to go. If you want to go hardcore up front, then you should settle for nothing less than the Dragon Book.

Big List of Resources:

Legend:

  • ¶ Link to a PDF file
  • $ Link to a printed book

This is a pretty vague question, I think; just because of the depth of the topic involved. A compiler can be decomposed into two separate parts, however; a top-half and a bottom-one. The top-half generally takes the source language and converts it into an intermediate representation, and the bottom half takes care of the platform specific code generation.

Nonetheless, one idea for an easy way to approach this topic (the one we used in my compilers class, at least) is to build the compiler in the two pieces described above. Specifically, you'll get a good idea of the entire process by just building the top-half.

Just doing the top half lets you get the experience of writing the lexical analyzer and the parser and go to generating some "code" (that intermediate representation I mentioned). So it will take your source program and convert it to another representation and do some optimization (if you want), which is the heart of a compiler. The bottom half will then take that intermediate representation and generate the bytes needed to run the program on a specific architecture. For example, the the bottom half will take your intermediate representation and generate a PE executable.

Some books on this topic that I found particularly helpful was Compilers Principles and Techniques (or the Dragon Book, due to the cute dragon on the cover). It's got some great theory and definitely covers Context-Free Grammars in a really accessible manner. Also, for building the lexical analyzer and parser, you'll probably use the *nix tools lex and yacc. And uninterestingly enough, the book called "lex and yacc" picked up where the Dragon Book left off for this part.

The quickest approach is through two books:

1990 version of An Introduction to Compiling Techniques, a First Course using ANSI C, LeX, and YaCC by JP Bennett - a perfect balance of example code, parsing theory and design- it contains a complete compiler written in C, lex and yacc for a simple grammar

Dragon Book (older version) - mostly a detailed reference for the features not covered in the former book

I have created a compiler in C (using lex & bison) for a dynamic typed programming language that supports loops, functions declarations inside functions, recursive calls etc. I also created a virtual machine for that runs the intermediate code created by the compiler.

I was now thinking instead of compiling to my own intermediate code, compile it to java byte code.

I saw that the question about creating a JVM language has already been asked but I don't find the answer very informative.

So here are my questions:

  1. I guess to create a language for JVM a must is to read the JVM specification book, what other books can you suggest (except Dragon Book of course)? I'm mostly concerned in books or tutorials on how to create a JVM language, not a compiler in general.
  2. There are many Java libraries to read, write and change .class files like jclasslib, bcel, gnu bytecode, etc. Which one would you suggest? Also, are you aware of C libraries that do the same job?
  3. I was thinking to have a look at maybe another language that targets the JVM like Clojure, Jython or JRuby. But all these languages are very high level and complicated (to create a compiler for them). I was looking for a simpler (I don't mind it it's unknown or unused) programming language that targets the JVM and it's compiler is open source. Any ideas?

Last weekend, I was asking myself the same question to port my toy language to the JVM.

I spend only few hours searching information,so take this references with a grain of salt.

  • Language Implementation Patterns. I hate antlr but this book looks very good. If you dont like antlr neither, there is a very good about parsing "Parsing Techniques. A Practical Guide."

    Learn to build configuration file readers, data readers, model-driven code generators, source-to-source translators, source analyzers, and interpreters. You don’t need a background in computer science—ANTLR creator Terence Parr demystifies language implementation by breaking it down into the most common design patterns. Pattern by pattern, you’ll learn the key skills you need to implement your own computer languages.

    Chapter 10 cover in 30 pages (to fast IMO) this topics. But there are other chapter that probably you will be interested.

    • 10 Building Bytecode Interpreters
      • 10.1 Programming Bytecode Interpreters . .
      • 10.2 Defining an Assembly Language Syntax
      • 10.3 Bytecode Machine Architecture . . . . .
      • 10.4 Where to Go from Here . . . . . . . . . .
      • P.26. Bytecode Assembler . . . . . . . . . . .
      • P.27. Stack-Based Bytecode Interpreter . . .
      • P.28. Register-Based Bytecode Interpreter
      http://pragprog.com/titles/tpdsl/language-implementation-patterns
    • The Implementation of Lua 5.0 This is a great paper about register- based bytecode machines. Go an read it even for the sake of it.

    • Lisp in Small Pieces. This book teach how to write a 2 schme compailers that compile to C. So many lessons can be learned from this book. I own a copy of this book and it is really good for anyone interesting is lisp, but maybe not your cup of tea.

      This is a comprehensive account of the semantics and the implementation of the whole Lisp family of languages, namely Lisp, Scheme and related dialects. It describes 11 interpreters and 2 compilers ...

    http://www.amazon.com/Lisp-Small-Pieces-Christian-Queinnec/dp/0521562473

Check the Dalvik7 VM, a register-based VM. The DVM operates on bytecodes that are transformed from the Java Class files compiled by a Java compiler.

There is a mailing list about the topic, jvm-languages.

Are you planning to upload the code to anyplace? I would like to take a look.

I have decided to write a small interpreter as my next project, in Ruby. What knowledge/skills will I need to have to be successful?
I haven't decided on the language to interpret yet, but I am looking for something that is not a toy language, but would be relatively easy to write an interpreter for. Thanks in advance.

This SICP chapter shows how to write a Lisp interpreter in Lisp (a metacircular evaluator). In my opinion this is the best place to start. Then you can move on to Lisp in Small Pieces to learn how to write advanced interpreters and compilers for Lisp. The advantage of implementing a language like Lisp (in Lisp itself!) is that you get the lexical analyzer, parser, AST, data/program representation and REPL for free. You can concentrate on the task of getting your great language working!

I've written a small Scheme interpreter in an unholy mix of C/C++, but I have yet to implement proper tail calls.

I am aware of the classic Cheney on the MTA algorithm, but are there other nice ways of implementing this? I know I could put the Scheme stack on the heap, but that would still not be proper elimination, as the standard says one should support an unlimited number of active tail calls.

I've also fiddled with longjmps, but so far I think it'll only work well for non-mutual recursive tail calls.

How do the major C-based Schemes implement proper tail recursion?

If you are interested in implementation techniques of interpreters, there is no way around the book "LiSP - Lisp in Small Pieces" by Christian Queinnec. It explains all aspects of implementing a Scheme system very thoroughly with complete code. It is a wonderful book.

http://www.amazon.com/exec/obidos/ASIN/0521562473/qid=945541473/sr=1-2/002-2995245-1849825

But don't forget to check out the papers on ReadScheme.org.

The section

Compiler Technology/Implementation Techniques and Optimization http://library.readscheme.org/page8.html

has quite a few papers on tail call optimization.

Among others you will find a link to Dybvig's thesis (a classic), which is very well written. It explains and motivates everything in a very clear manner.

McCarthy's Elementary S-functions and predicates were atom, eq, car, cdr, cons

He then went on to add to his basic notation, to enable writing what he called S-functions: quote, cond, lambda, label

On that basis, we'll call these "the LISP primitives" (although I'm open to an argument about type predicates like numberp)

How would you define the defmacro function using only these primitives in the LISP of your choice? (including Scheme and Clojure)

Explaining it fully in all its details would require an awful lot of space and time for an answer here, but the outline is really pretty simple. Every LISP eventually has in its core something like the READ-EVAL-PRINT loop, which is to say something that takes a list, element by element, interprets it, and changes state -- either in memory or by printing a result.

The read part looks at each element read and does something with it:

(cond ((atom elem)(lambda ...))
      ((function-p elem) (lambda ...)))

To interpret macros, you simply (?) need to implement a function that puts the template text of the macro somewhere in storage, a predicate for that repl loop -- which means simply defining a function -- that says "Oh, this is a macro!", and then copy that template text back into the reader so it's interpreted.

If you really want to see the hairy details, read Structure and Interpretation of Computer Programs or read Queinnec's Lisp in Small PIeces.

many (may be all?) programming language consist of assembly language

how lisp implemented in assembly language?

is there any good reference, manual, tutorial, or keyword for google?

any official rule/convention for build your own lisp implementation?

such as tail recursion should follow some embodiment rule or something..

thanks

Although the other comments and posts are right, this question is overly vague and maybe a bit confused, I can't help but share some recommendations. I've collected a number of links and books on implementing Lisp as I have recently developed a bit of a fascination with language implementation. It's a deep topic, of course, but reading the stuff related to Lisp is especially compelling because you can skip a lot of the intense reading on parsing if you implement a Lisp compiler or interpreter in Lisp, and just use read. This allows the authors to get to the meat of compilation or interpretation quickly. These recommendations are books I've read or started or am reading, and mostly deal with Scheme, not Common Lisp, but still might be of some interest.

If you've got no background in language implementation, and have never had the pleasure of reading about the classic Lisp and Scheme "metacircular evaluators", I would recommend Structure and Interpretation of Computer Programs. If you've seen Lisp-in-Lisp (or Scheme-in-Scheme...) you can skip ahead. In the last two chapters of SICP, the authors present a few different interpreters for Lisp/Scheme and a few variants, as well as a byte-code compiler and a virtual machine. It's simply a brilliant book, and free.

If you don't have time to read SICP, or don't want to slog through it just to get to the interpretation and compilation chapters, I would recommend The Little Schemer. Even though it's very short and intended for newcomers to Lisp and Scheme, if you've never seen a Lisp interpreter written in Lisp, they do present one, and it's quite a delightful book, but might not be for everyone due to the cute style.

There's another free book on Scheme similar to SICP, called An Introduction to Scheme and its Implementation, which I haven't read but used as a reference for a few bits. There are sections on interpreters and compilers in there, and it seems to go a bit deeper than SICP, dealing with hairier things like parsing too. It maybe needed an editor, but it's an impressive offering nonetheless.

With a decent idea of how to do Lisp in Lisp, you can approach implementing Lisp in something lower level.

Lisp in Small Pieces is frequently recommended. I've read most of it, and can say it's definitely a great book, full of nitty gritty stuff. I'm going back over it with a fine comb, because it's easy to skim when you don't understand stuff. I also struggled with getting the code from the author's site to run; if you get it, I recommend using Gambit Scheme and running the code that relies on Meroonet with Meroon, from this distribution. Lisp in Small Pieces presents a number of interpreters written in Scheme as well as a byte-code compiler and a compiler-to-C.

Lisp in Small Pieces moves fast, and it's quite dense. If it's too much for you, perhaps start with The Essentials of Programming Languages. I've read some of it and it's quite good, but it's more interpreters. Apparently one of the older editions (1st? I'm not sure...) included a compiler. There seems to be a lot of change between the 3 editions, but the first is super cheap on Amazon, so check it out.

As for compilation to C, this is kind of a gross subject with lots of hairy bits. Compilation to C brings up all these weird corner issues, like how to optimize tail-calls and handle closures, first-class continuations and garbage collection, but it's pretty interesting, and a lot of "real" implementations of Scheme go this route. Marc Feeley's presentation on this is pretty interesting, titled The 90 Minute Scheme to C compiler.

I have fewer resources on compiling all the way down to assembly, but there is a often recommended paper which introduces compilation of Scheme to x86, called An Incremental Approach to Compiler Construction. It assumes little of the reader, however I found that it simply goes too fast and doesn't fill in enough details. Maybe you'll have better luck.

A lot of the above recommendation comes from this monster comment on Hacker News from over a year ago, from mahmud. It references a number of ML resources, and compilation using continuations. I haven't gotten that far in my study, so I can't say what's good or not. But it's an incredibly valuable comment. The referenced works include Andrew Appel's "Compiling with Continuations" and Paul Wilson's "Uniprocessor Garbage Collection Techniques" paper.

Good luck!

That it's a big question to answer well.

Short answer: JIT.

Big answer: Dragon book.

I'd like to study the official Clojure implementation. Can anyone who is familiar with the code recommend where to start reading it from? Are there certain parts which will make understanding the rest of it easier, or which are key to understanding how Clojure works?

There are some big ideas in there - that may not be apparent until you're familiar with implementing a LISP.

So even before you look at the Clojure code - you might want to look at the code for a basic LISP implementation (there are heaps online - this is one of my favourites).

Next I'd read a book like Christian Quinnec's Lisp In Small Pieces (LISP) which is a book about implementing LISP compilers - to get the paradigms.

In terms of actually starting in the Clojure source - I'd start with eval - here.

I have to do a term project in my symbolic programming class. But I'm not really sure what a good/legitimate project would be. Can anyone give me examples of symbolic programming? Just any generic ideas because right now I'm leaning towards a turn based fight game (jrpg style basically), but I really don't want to do a game.

A simple arithmetic interpreter in Scheme that takes a list of symbols as input:

(define (symbolic-arith lst)
  (case (car lst)
    ((add) (+ (cadr lst) (caddr lst)))
    ((sub) (- (cadr lst) (caddr lst)))
    ((mult) (* (cadr lst) (caddr lst)))
    ((div) (/ (cadr lst) (caddr lst)))
    (else (error "unknown operator"))))

Test run:

> (symbolic-arith '(add 2 43))
=> 45
> (symbolic-arith '(sub 10 43))
=> -33
> (symbolic-arith '(mu 50 43))
unknown operator
> (symbolic-arith '(mult 50 43))
=> 2150

That shows the basic idea behind meta-circular interpreters. Lisp in Small Pieces is the best place to learn about such interpreters.

Another simple example where lists of symbols are used to implement a key-value data store:

> (define person (list (cons 'name 'mat) (cons 'age 20)))
> person
=> ((name . mat) (age . 20))    
> (assoc 'name person)
=> (name . mat)
> (assoc 'age person)
=> (age . 20)

If you are new to Lisp, Common Lisp: A Gentle Introduction to Symbolic Computation is a good place to start.

I know there is one, but it's not easy to implement the way I want.

I would like to know the steps to interpret the lisp language and what functions are essential to be implemented.

First, you learn Lisp, then read LiSP and (given you know ActionScript well enough) you just start. PAIP also has sections about implementing Lisp interpreters and compilers.

To get an idea how it generally can be approached, you can have a look at Write Yourself a Scheme in 48 hours. It uses Haskell as an implementation language, but it would give you an idea.

It surely wouldn't be trivial, but it has been done quite often, and there is much to learn from doing it.

If you want to implement a basic lisp in a higher level language, you might get some mileage out of the later chapters of The Little Schemer (where you're shown how to write a meta-circular evaluator in Scheme), the entirety of WYAS48 (where you're shown how to implement an R5RS Scheme in Haskell) and these two Norvig articles (wherein he implements a basic lisp-like in Python).

I'm currently working on a project for my class. I'm building a compiler with Flex (lex) and Bison (YACC) and C. I have done just a little bit of semantic an syntax analysis but i have been thinking how im going to implement the object oriented part. That is, how can I handle classes, overloading, polymorphism and heritage.

I can't seem to find something useful on google and the dragon book its just too low level. I mean too focused on building a compiler from scratch. So I was hoping that someone could point me to a good book, tutorials,example, something that can help me to clear my doubts.

Thanks in advance for the help, and im sorry if someone thinks this is asking to have my homework done.

I agree with the first comment that this question is far too broad to be answered. But I'll try anyway.

There are several aspects to your question:

  1. What are the semantics of the commonly used concepts of object-oriented programming?
  2. How can they be implemented in a compiler?
  3. How are they usually implemented in other compilers?
  4. What are good resources for further studies?

Semantics

Varies widely between languages and there also is quite a bit of confusion/controversity about what OOP actually means (a nice presentation on that topic: http://www.infoq.com/presentations/It-Is-Possible-to-Do-OOP-in-Java which also has some examples of implementing OOP-features). Just pick one model and look up a reference that defines the semantics such as a language specification or a scientific paper on the model.

Javascript probably is the easiest model to implement as it very directly maps to the implementation without much of a necessary surrounding framework in the compiler. A static version of the Java model (compile time class compilation instead of runtime classloading) shouldn't be too hard either. More sophisticated models would be C++ (which allows multiple inheritance) and Smalltalk or Common Lisp/CLOS (with Meta-object-protocols).

Possible Implementations

Again a wide range of choices. As the semantics are fixed and mostly rather straightforward the implementation effort most strongly depends on the performance you want to archive and the existing infrastructure of your compiler. Storing everything in lists and scanning them for the first entry that satisfies the rules probably is the easiest implementation.

Usual Implementation

Most programming languages out of the Java/C#/C++ area do static compile-time name/signature lookups to find the definitions of the things referred to and use a http://en.wikipedia.org/wiki/Virtual_method_table to resolve polymorphic calls. They also use the Vtable pointer for instanceof-checks and for checking down-casts.

Resources

While only 30 pages are directly concerned with objects I still think Lisp in Small Pieces (LiSP) is a great book for learning to work at that level within a compiler. It focusses on implementing language features, trade-offs in implementations and fitting the pieces together. (if (you can get over the the syntax used) (it's great)).

Big List of Resources:

Legend:
¶ Link to a PDF
$ Link to a printed book

A compiler is a program which translates one language into another. Compiler construction is the process of creating a compiler.

The tag should be applied to questions concerning the programming of compilers or for questions about the detailed inner workings of compilers.

One language to another? I thought they made executables!

The notional, garden variety compiler does exactly that: it translates a human readable computer programming language (like fortran or c++ or java) into a machine executable format. Or not.

In fact many real world compilers translate a high level language into assembly code which is subsequently assembled by a separate program. The standard java compiler translate java code into JVM bytecode, which must be run by a dedicated program (the Java execution environment) which may include a Just In Time (JIT) compiler that translates the bytecode into chip native machine instructions on the fly. The earliest versions of the language that became c++ were called cfront and were compiled to c. And so on.

Big List of Resources: