Introduction to Automata Theory, Languages, and Computation

John E. Hopcroft, Rajeev Motwani, Jeffrey D. Ullman

Mentioned 5

This classic book on formal languages, automata theory, and computational complexity has been updated to present theoretical concepts in a concise and straightforward manner with the increase of hands-on, practical applications. This new edition comes with Gradiance, an online assessment tool developed for computer science. Gradiance is the most advanced online assessment tool developed for the computer science discipline. With its innovative underlying technology, Gradiance turns basic homework assignments and programming labs into an interactive learning experience for students. By using a series of root questions and hints, it not only tests a student's capability, but actually simulates a one-on-one teacher-student tutorial that allows for the student to more easily learn the material. Through the programming labs, instructors are capable of testing, tracking, and honing their students' skills, both in terms of syntax and semantics, with an unprecedented level of assessment never before offered. For more information about Gradiance, please visit

More on

Mentioned in questions and answers.

Is there a good set of tutorials for regular expressions? Particularly in a TextMate context? I am familiar with regular expression syntax, and the basic concepts. I even own a copy of Jeffrey Friedl's book "Mastering Regular Expressions" and have read though the Perl parts.

What I am looking for are some high quality demonstrations of regular expression usage with a clear explanation of the pattern being matched and how to decompose the regular expression syntax. I want to take my regular expression fu to the next level. I want to be able to think in regular expressions but I need something tangible to practice with to cement the knowledge in my head.

What would be helpful is some text and code examples to work through with specific tasks and associated regular expression solutions, preferably functional in a TextMate context. The place where I find I want to use it most is in the text editor. So the ability to do powerful search and replace functions is desired. And sometimes just search. So the ability to quickly write a partial regular expression that narrows the results down and then parse through a complex document iteratively would be handy.

Perhaps an interactive demo where the patterns are revealed as you type the regular expression. And some useful patterns and examples to test and play with. Perhaps some regular expression flash cards or a simple regular expression based game.

For you regular expression gurus out there what did you do to really cement your understanding of regular expressions?

Regex-fu, eh? In that vein: "Cry in the dojo, laugh on the battefield!"

Invest some time studying theoretical computer science, especially automata theory. Introduction to Automata Theory, Languages, and Computation is the textbook I used as an undergrad, and I recommend it highly!

"Pure" regular expressions correspond to deterministic finite state automata, and you'll want to develop a strong intuition for what they are, and are not, capable of.

Context-free grammars and methods for parsing them are something else you'll want to study carefully. This will help you know when to "upshift" to a more powerful paradigm when it's called for, rather than trying to shoehorn every text processing task into a regex exercise.

I am new in the world of regular expression. After googling I have found these books and places will be best reference for learning regular expression in depth. Here it is:


Regular Expression Cookbook

Regular Expression Recipes

net tuts+ Regular Expression Series:

  1. You Don’t Know Anything About Regular Expressions: A Complete Guide

  2. 8 Regular Expressions You Should Know

  3. Advanced Regular Expression Tips and Techniques

  4. Regular Expressions Cheat Sheets

  5. Regular Expressions for Dummies: Screencast Series

I'd like to know what are the best sites to learn about formal languages, automata, algorithms and data structures. Preferable with many solved questions... Thanks in advance

What I prefer is., a best book " On Theory of Automation", .,

I have read this book., superb it is.

What s the difference between a recursive set and recursive function?

The meaning of these terms varies depending upon your context. If we were discussing them purely from the standpoint of writing programs then recursive sets don't make much sense; however, it might just be that I have encountered it yet. That said, recursive functions are functions that call themselves in their execution. The calculation of a nth Fibonacci number is the classic example that is commonly presented:

/// <summary>A C# implementation of a function to return the nth Fibonacci number.</summary>
private static int Fibonacci(int n) {
 if (n <= 2) {
  return 1;
 } else {
  return Fibonacci(n - 1) + Fibonacci(n - 2);

That said, the other context for these terms in the context of computer science and specifically the theory of computation is when discussing of formal languages. In this context, a recursive set is defined to be a set for which there is a solvable membership problem. For example, we know that the set of all natural numbers ℕ is a recursive set because we can define a recursive function as follows:

Let f be defined as a function where f(x) = 1 if x ∈ ℕ and f(x) = 0 if x ∉ ℕ

The concept of a recursive set is important for the concept of computability because it leads to the recursively enumerable set which is a language that can be recognized by a Turing machine (i.e. Turing-recognizable). These are languages for which a Turing machine may or may not be able to determine if a given string is a member of a language, or in other words, the machine may either accept, reject, or loop. This is in contrast to a Turing-decidable language for which a the machine will enter into either the accept or reject state for a given input string.

This where the concept of a recursive function comes in play, as a recursive function (or total recursive function) can be computed by a Turing machine and halts for every input. Where as a partial recursive function can only be computed for a Turing machine with no guarantee in regard to halting behavior. Or in essence the recursive function is the counterpart to the recursive set.

So to bring things back around to your original question, in the context of the theory of computation, a recursive set is what can be generated (i.e. enumerated) or have membership tested by a recursive function on a Turing machine.

Further Reading:

I want to implement simple text directed regular expression engine using MFC in c++. So can u tell me how text directed regex engine works internally?

For example <.+> this will find a tag in html. How does this work internally?

And also please tell me how to use regular expression in MFC.

This is too complicated to explain in a SO answer. You either need an advanced textbook on compilers, or access to the source-code of an existing Regex engine. Fortunately, the OpenJDK source-code is open sourced ... as is the source code for the Apache Harmony version, the Perl version, the Python version and so on.

Some possible text books:

(Earlier editions of both books are available too ... for less money.)

For example <.+> this will find tag in html how it works internally.

The glib answer is that it depends on the Regex engine you are using.

Let's say I have several regular expressions:

expr_1: "test_file"

expr_2: "test_*"

expr_3: "test*"

All those match the string "test_file". How can I figure out in a program which rule is the most restrictive rule( in this case expr_1 )?

What I want to achieve:

I have a general rule that applies to a lot of files, but for .jpeg files for examples, I want to do a special operation. How can I figure out that the rule which selects ".jpeg" files is more restrictive than "*" rule for example?

Edit: I'm Using QRegExp from Qt, but this shouldn't change anything.

This is the correct way to solve that problem based on Language Theory:

Calculate the regexp that is the "and" or "combination" of all the other regexps. You can convert all your regex to DFA, and then you can create the intersection of all your automatons, which will give you a new DFA that will only accept things that are accepted by all three regexps. Then you can also minimize the automaton, and convert it back to a regexp. If you do that, you'll get a regexp which is as restrictive as all the other regexps together, and which is the shortest regexp possible for doing that.

Great book that explains how to do all that: Introduction to Automata Theory, Languages, and Computation