I am looking for some helpful books/tutorials on how to write your own compiler simply for educational purposes. I am most familiar with C/C++, Java, and Ruby, so I prefer resources that involve one of those three, but any good resource is acceptable.

The Dragon Book is definitely the "building compilers" book, but if your language isn't quite as complicated as the current generation of languages, you may want to look at the Interpreter pattern from Design Patterns.

The example in the book designs a regular expression-like language and is well thought through, but as they say in the book, it's good for thinking through the process but is effective really only on small languages. However, it is much faster to write an Interpreter for a small language with this pattern than having to learn about all the different types of parsers, yacc and lex, et cetera...

I think Modern Compiler Implementation in ML is the best introductory compiler writing text. There's a Java version and a C version too, either of which might be more accessible given your languages background. The book packs a lot of useful basic material (scanning and parsing, semantic analysis, activation records, instruction selection, RISC and x86 native code generation) and various "advanced" topics (compiling OO and functional languages, polymorphism, garbage collection, optimization and single static assignment form) into relatively little space (~500 pages).

I prefer Modern Compiler Implementation to the Dragon book because Modern Compiler implementation surveys less of the field--instead it has really solid coverage of all the topics you would need to write a serious, decent compiler. After you work through this book you'll be ready to tackle research papers directly for more depth if you need it.

I must confess I have a serious soft spot for Niklaus Wirth's Compiler Construction. It is available online as a PDF. I find Wirth's programming aesthetic simply beautiful, however some people find his style too minimal (for example Wirth favors recursive descent parsers, but most CS courses focus on parser generator tools; Wirth's language designs are fairly conservative.) Compiler Construction is a very succinct distillation of Wirth's basic ideas, so whether you like his style or not or not, I highly recommend reading this book.

I concur with the Dragon Book reference; IMO, it is the definitive guide to compiler construction. Get ready for some hardcore theory, though.

If you want a book that is lighter on theory, Game Scripting Mastery might be a better book for you. If you are a total newbie at compiler theory, it provides a gentler introduction. It doesn't cover more practical parsing methods (opting for non-predictive recursive descent without discussing LL or LR parsing), and as I recall, it doesn't even discuss any sort of optimization theory. Plus, instead of compiling to machine code, it compiles to a bytecode that is supposed to run on a VM that you also write.

It's still a decent read, particularly if you can pick it up for cheap on Amazon. If you only want an easy introduction into compilers, Game Scripting Mastery is not a bad way to go. If you want to go hardcore up front, then you should settle for nothing less than the Dragon Book.

This is a pretty vague question, I think; just because of the depth of the topic involved. A compiler can be decomposed into two separate parts, however; a top-half and a bottom-one. The top-half generally takes the source language and converts it into an intermediate representation, and the bottom half takes care of the platform specific code generation.

Nonetheless, one idea for an easy way to approach this topic (the one we used in my compilers class, at least) is to build the compiler in the two pieces described above. Specifically, you'll get a good idea of the entire process by just building the top-half.

Just doing the top half lets you get the experience of writing the lexical analyzer and the parser and go to generating some "code" (that intermediate representation I mentioned). So it will take your source program and convert it to another representation and do some optimization (if you want), which is the heart of a compiler. The bottom half will then take that intermediate representation and generate the bytes needed to run the program on a specific architecture. For example, the the bottom half will take your intermediate representation and generate a PE executable.

Some books on this topic that I found particularly helpful was Compilers Principles and Techniques (or the Dragon Book, due to the cute dragon on the cover). It's got some great theory and definitely covers Context-Free Grammars in a really accessible manner. Also, for building the lexical analyzer and parser, you'll probably use the *nix tools lex and yacc. And uninterestingly enough, the book called "lex and yacc" picked up where the Dragon Book left off for this part.

The quickest approach is through two books:

1990 version of An Introduction to Compiling Techniques, a First Course using ANSI C, LeX, and YaCC by JP Bennett - a perfect balance of example code, parsing theory and design- it contains a complete compiler written in C, lex and yacc for a simple grammar

Dragon Book (older version) - mostly a detailed reference for the features not covered in the former book

I have a C application and I want to include a Scripting Language to put certain functionality into scripts. I just have no experience with that and don't know exactly where to start (Still learning C and trying to understand the application).

How does embedding and communication between my app and the scripts actually work? I think I need the interpreter for the scripting language as a library (.dll on Windows or C Source Code that can be compiled into my application)? And then can I do something like

interpreter->run("myscript", some_object);

How would the script know about the properties of the object? Say my script wants to read or modify some_object->some_field?

Are there any scripting languages that are optimized for that sort of embedding? I know that there is Lua which is popular in game dev, and languages like Python, Perl, PHP or Ruby which seem to be more targeted as stand-alone applications, but my knowledge in the deep architecture does not allow more educated guesses :) (Tagged Lua and Python because they would be my favorites, but as long as it runs on x86 Windows, Linux and Mac OS X, I'm open for other scripting languages as long as they are easy to implement into a C application)

You might take a look at Game Scripting Mastery. As i am interested in the high level aspect of computer games aswell this book has been recommended to me very often.

Unfortunately the book seems to be out of print (at least in Europe).

I’m starting a project where I need to implement a light-weight interpreter. The interpreter is used to execute simple scientific algorithms. The programming language that this interpreter will use should be simple, since it is targeting non- software developers (for example, mathematicians.)

The interpreter should support basic programming languages features:

  • Real numbers, variables, multi-dimensional arrays
  • Binary (+, -, *, /, %) and Boolean (==, !=, <, >, <=, >=) operations
  • Loops (for, while), Conditional expressions (if)
  • Functions

MathWorks MatLab is a good example of where I’m heading, just much simpler. The interpreter will be used as an environment to demonstrate algorithms; simple algorithms such as finding the average of a dataset/array, or slightly more complicated algorithms such as Gaussian elimination or RSA.

Best/Most practical resource I found on the subject is Ron Ayoub’s entry on Code Project (Parsing Algebraic Expressions Using the Interpreter Pattern) - a perfect example of a minified version of my problem.

The Purple Dragon Book seems to be too much, anything more practical?

The interpreter will be implemented as a .NET library, using C#. However, resources for any platform are welcome, since the design-architecture part of this problem is the most challenging.

Any practical resources?

(please avoid “this is not trivial” or “why re-invent the wheel” responses)

It might sound odd, but Game Scripting Mastery is a great resource for learning about parsing, compiling and interpreting code.

You should really check it out:

I need some resources for implementing a simple virtual machine and interpreted language. Something that is pratical is most useful. I have read the Virtual Machine Implementation book and found that it is quite old and doesn't represent the vms I see today. Also if someone know of a fairly simplistic language that would be great as well.

You don't say if this is for a new project, to work with an existing project, for learning, or what target environment, language, and OS you're using.

If you want to learn about implementing your own VM and scripting language, get the book Game Scripting Mastery. Despite its title, it is actually about implementing your own virtual machine and scripting language. The source code is for Win32, but the concepts can be applied to .Net or Linux.

As a bonus, when you're done you will have a playable, scriptable, 2D adventure game.

A compiler is a program which translates one language into another. Compiler construction is the process of creating a compiler.

The tag should be applied to questions concerning the programming of compilers or for questions about the detailed inner workings of compilers.

One language to another? I thought they made executables!

The notional, garden variety compiler does exactly that: it translates a human readable computer programming language (like fortran or c++ or java) into a machine executable format. Or not.

In fact many real world compilers translate a high level language into assembly code which is subsequently assembled by a separate program. The standard java compiler translate java code into JVM bytecode, which must be run by a dedicated program (the Java execution environment) which may include a Just In Time (JIT) compiler that translates the bytecode into chip native machine instructions on the fly. The earliest versions of the language that became c++ were called cfront and were compiled to c. And so on.

Recently I have been extremely interested in language development, I've got multiple working front ends and have had various systems for executing the code. I've decided I would like to try to develop a virtual machines type system. (Kind of like the JVM but much simpler of course) So I've managed to create a basic working instruction set with a stack and registers but I'm just curious about how some things should be implemented.

In Java for example after you've written a program you compile it with the java compiler and it creates a binary (.class) for the JVM to execute. I don't understand how this is done, how does the JVM interpret this binary, what's the transition from human readable instructions to this binary, how could I create something similar?

Thanks for any help/suggestions!

Alright, I'll bite on this generic question.

Implementing an compiler/assembler/vm combo is a tall order, especially if you're doing it by yourself. That being said: If you keep your language specification simple enough, it is quite doable; also by yourself.

Basically, to create a binary, the following is done (this is a tad bit simplified*:

1) Input source is read, lexed, and tokenized

2) The program logic is analyzed for semantical correctness.

E.g. while the following C++ would parse & tokenize, it would fail semantic analysis

float int* double = const (_identifier >><<) operator& * 

3) Build an Abstract Syntax Tree to represent the statements

4) Build symbol tables and resolve identifiers

5) Optional: Optimization of code

6) Generate code in an output format of your choice; for example binary opcodes/operands, string tables. Whatever format suits your needs best. Alternatively, you could create bytecode for an existing VM, or for a native CPU.

EDIT If you want to devise your own bytecode format, you can write, for example:

1) File Header
DWORD filesize
DWORD checksum
BYTE  endianness;
DWORD entrypoint <-- Entry point for first instruction in main() or whatever
2) String table
DWORD numstrings
DWORD stringlen
<string bytes/words>

3) Instructions
DWORD numinstructions
DWORD opcode
DWORD numops <--- or deduce from opcode
DWORD op1_type <--- stack index, integer literal, index to string table, etc
DWORD operand1
DWORD op1_type
DWORD operand2


Overall, the steps are managable, but, as always, the devil is in the details.

Some good references are:

The Dragon Book - This is heavy on theory, so it's a dry read, but worthwhile

Game Scripting Mastery - Guides you along while developing all three components in a more practical matter. However, the example code is rife with security issues, memory leaks, and overall lousy coding style (imho). However, you can take a lot of concepts away from this book, and it's worth a read.

The Art of Compiler Design - I have not read this one personally, but heard positive things about it.

If you decide to go down this road, be sure you know what you're getting yourself into. This is not something some the faint of heart, or someone new to programming. It requires a lot of conceptual thinking and prior planning. It is, however, quite rewarding and fun