Purely Functional Data Structures

Chris Okasaki

Mentioned 52

This book describes data structures and data structure design techniques for functional languages.

More on Amazon.com

Mentioned in questions and answers.

Does anyone know what is the worst possible asymptotic slowdown that can happen when programming purely functionally as opposed to imperatively (i.e. allowing side-effects)?

Clarification from comment by itowlson: is there any problem for which the best known non-destructive algorithm is asymptotically worse than the best known destructive algorithm, and if so by how much?

According to Pippenger [1996], when comparing a Lisp system that is purely functional (and has strict evaluation semantics, not lazy) to one that can mutate data, an algorithm written for the impure Lisp that runs in O(n) can be translated to an algorithm in the pure Lisp that runs in O(n log n) time (based on work by Ben-Amram and Galil [1992] about simulating random access memory using only pointers). Pippenger also establishes that there are algorithms for which that is the best you can do; there are problems which are O(n) in the impure system which are Ω(n log n) in the pure system.

There are a few caveats to be made about this paper. The most significant is that it does not address lazy functional languages, such as Haskell. Bird, Jones and De Moor [1997] demonstrate that the problem constructed by Pippenger can be solved in a lazy functional language in O(n) time, but they do not establish (and as far as I know, no one has) whether or not a lazy functional language can solve all problems in the same asymptotic running time as a language with mutation.

The problem constructed by Pippenger requires Ω(n log n) is specifically constructed to achieve this result, and is not necessarily representative of practical, real-world problems. There are a few restrictions on the problem that are a bit unexpected, but necessary for the proof to work; in particular, the problem requires that results are computed on-line, without being able to access future input, and that the input consists of a sequence of atoms from an unbounded set of possible atoms, rather than a fixed size set. And the paper only establishes (lower bound) results for an impure algorithm of linear running time; for problems that require a greater running time, it is possible that the extra O(log n) factor seen in the linear problem may be able to be "absorbed" in the process of extra operations necessary for algorithms with greater running times. These clarifications and open questions are explored briefly by Ben-Amram [1996].

In practice, many algorithms can be implemented in a pure functional language at the same efficiency as in a language with mutable data structures. For a good reference on techniques to use for implementing purely functional data structures efficiently, see Chris Okasaki's "Purely Functional Data Structures" [Okasaki 1998] (which is an expanded version of his thesis [Okasaki 1996]).

Anyone who needs to implement algorithms on purely-functional data structures should read Okasaki. You can always get at worst an O(log n) slowdown per operation by simulating mutable memory with a balanced binary tree, but in many cases you can do considerably better than that, and Okasaki describes many useful techniques, from amortized techniques to real-time ones that do the amortized work incrementally. Purely functional data structures can be a bit difficult to work with and analyze, but they provide many benefits like referential transparency that are helpful in compiler optimization, in parallel and distributed computing, and in implementation of features like versioning, undo, and rollback.

Note also that all of this discusses only asymptotic running times. Many techniques for implementing purely functional data structures give you a certain amount of constant factor slowdown, due to extra bookkeeping necessary for them to work, and implementation details of the language in question. The benefits of purely functional data structures may outweigh these constant factor slowdowns, so you will generally need to make trade-offs based on the problem in question.

References

I'm going to be teaching a lower-division course in discrete structures. I have selected the text book Discrete Structures, Logic, and Computability in part because it contains examples and concepts that are conducive to implementation with a functional programming language. (I also think it's a good textbook.)

I want an easy-to-understand FP language to illustrate DS concepts and that the students can use. Most students will have had only one or two semesters of programming in Java, at best. After looking at Scheme, Erlang, Haskell, Ocaml, and SML, I've settled on either Haskell or Standard ML. I'm leaning towards Haskell for the reasons outlined below, but I'd like the opinion of those who are active programmers in one or the other.

  • Both Haskell and SML have pattern matching which makes describing a recursive algorithm a cinch.
  • Haskell has nice list comprehensions that match nicely with the way such lists are expressed mathematically.
  • Haskell has lazy evaluation. Great for constructing infinite lists using the list comprehension technique.
  • SML has a truly interactive interpreter in which functions can be both defined and used. In Haskell, functions must be defined in a separate file and compiled before being used in the interactive shell.
  • SML gives explicit confirmation of the function argument and return types in a syntax that's easy to understand. For example: val foo = fn : int * int -> int. Haskell's implicit curry syntax is a bit more obtuse, but not totally alien. For example: foo :: Int -> Int -> Int.
  • Haskell uses arbitrary-precision integers by default. It's an external library in SML/NJ. And SML/NJ truncates output to 70 characters by default.
  • Haskell's lambda syntax is subtle -- it uses a single backslash. SML is more explicit. Not sure if we'll ever need lambda in this class, though.

Essentially, SML and Haskell are roughly equivalent. I lean toward Haskell because I'm loving the list comprehensions and infinite lists in Haskell. But I'm worried that the extensive number of symbols in Haskell's compact syntax might cause students problems. From what I've gathered reading other posts on SO, Haskell is not recommended for beginners starting out with FP. But we're not going to be building full-fledged applications, just trying out simple algorithms.

What do you think?


Edit: Upon reading some of your great responses, I should clarify some of my bullet points.

In SML, there's no syntactic distinction between defining a function in the interpreter and defining it in an external file. Let's say you want to write the factorial function. In Haskell you can put this definition into a file and load it into GHCi:

fac 0 = 1
fac n = n * fac (n-1)

To me, that's clear, succinct, and matches the mathematical definition in the book. But if you want to write the function in GHCi directly, you have to use a different syntax:

let fac 0 = 1; fac n = n * fac (n-1)

When working with interactive interpreters, from a teaching perspective it's very, very handy when the student can use the same code in both a file and the command line.

By "explicit confirmation of the function," I meant that upon defining the function, SML right away tells you the name of the function, the types of the arguments, and the return type. In Haskell you have to use the :type command and then you get the somewhat confusing curry notation.

One more cool thing about Haskell--this is a valid function definition:

fac 0 = 1
fac (n+1) = (n+1) * fac n

Again, this matches a definition they might find in the textbook. Can't do that in SML!

Much as I love Haskell, here are the reasons I would prefer SML for a class in discrete math and data structures (and most other beginners' classes):

  • Time and space costs of Haskell programs can be very hard to predict, even for experts. SML offers much more limited ways to blow the machine.

  • Syntax for function defintion in an interactive interpreter is identical to syntax used in a file, so you can cut and paste.

  • Although operator overloading in SML is totally bogus, it is also simple. It's going to be hard to teach a whole class in Haskell without having to get into type classes.

  • Student can debug using print. (Although, as a commenter points out, it is possible to get almost the same effect in Haskell using Debug.Trace.trace.)

  • Infinite data structures blow people's minds. For beginners, you're better off having them define a stream type complete with ref cells and thunks, so they know how it works:

    datatype 'a thunk_contents = UNEVALUATED of unit -> 'a
                               | VALUE of 'a
    type 'a thunk = 'a thunk_contents ref
    val delay : (unit -> 'a) -> 'a thunk
    val force : 'a thunk -> 'a
    

    Now it's not magic any more, and you can go from here to streams (infinite lists).

  • Layout is not as simple as in Python and can be confusing.

There are two places Haskell has an edge:

  • In core Haskell you can write a function's type signature just before its definition. This is hugely helpful for students and other beginners. There just isn't a nice way to deal with type signatures in SML.

  • Haskell has better concrete syntax. The Haskell syntax is a major improvement over ML syntax. I have written a short note about when to use parentheses in an ML program; this helps a little.

Finally, there is a sword that cuts both ways:

  • Haskell code is pure by default, so your students are unlikely to stumble over impure constructs (IO monad, state monad) by accident. But by the same token, they can't print, and if you want to do I/O then at minumum you have to explain do notation, and return is confusing.

On a related topic, here is some advice for your course preparation: don't overlook Purely Functional Data Structures by Chris Okasaki. Even if you don't have your students use it, you will definitely want to have a copy.

One of the arguments I've heard against functional languages is that single assignment coding is too hard, or at least significantly harder than "normal" programming.

But looking through my code, I realized that I really don't have many (any?) use patterns that can't be written just as well using single assignment form if you're writing in a reasonably modern language.

So what are the use cases for variables that vary within a single invocation of their scope? Bearing in mind that loop indexes, parameters, and other scope bound values that vary between invocations aren't multiple assignments in this case (unless you have to change them in the body for some reason), and assuming that you are writing in something a far enough above the assembly language level, where you can write things like

values.sum

or (in case sum isn't provided)

function collection.sum --> inject(zero, function (v,t) --> t+v )

and

x = if a > b then a else b

or

n = case s 
  /^\d*$/ : s.to_int
  ''      : 0
  '*'     : a.length
  '?'     : a.length.random
  else    fail "I don't know how many you want"

when you need to, and have list comprehensions, map/collect, and so forth available.

Do you find that you still want/need mutable variables in such an environment, and if so, what for?

To clarify, I'm not asking for a recitation of the objections to SSA form, but rather concrete examples where those objections would apply. I'm looking for bits of code that are clear and concise with mutable variables and couldn't be written so without them.

My favorite examples so far (and the best objection I expect to them):

  1. Paul Johnson's Fisher-Yates algorithm answer, which is pretty strong when you include the big-O constraints. But then, as catulahoops points out, the big-O issue isn't tied to the SSA question, but rather to having mutable data types, and with that set aside the algorithm can be written rather clearly in SSA:

     shuffle(Lst) ->
         array:to_list(shuffle(array:from_list(Lst), erlang:length(Lst) - 1)).
     shuffle(Array, 0) -> Array;
     shuffle(Array, N) ->
         K = random:uniform(N) - 1,
         Ek = array:get(K, Array),
         En = array:get(N, Array),
         shuffle(array:set(K, En, array:set(N, Ek, Array)), N-1).
    
  2. jpalecek's area of a polygon example:

    def area(figure : List[Point]) : Float = {
      if(figure.empty) return 0
      val last = figure(0)
      var first= figure(0)
      val ret = 0
      for (pt <- figure) {
        ret+=crossprod(last - first, pt - first)
        last = pt
      }
      ret
    }
    

    which might still be written something like:

    def area(figure : List[Point]) : Float = {
        if figure.length < 3
            0
          else
            var a = figure(0)
            var b = figure(1)
            var c = figure(2)
            if figure.length == 3
                magnitude(crossproduct(b-a,c-a))
              else 
                foldLeft((0,a,b))(figure.rest)) { 
                   ((t,a,b),c) => (t+area([a,b,c]),a,c)
                   }
    

    Or, since some people object to the density of this formulation, it could be recast:

    def area([])    = 0.0   # An empty figure has no area
    def area([_])   = 0.0   # ...nor does a point
    def area([_,_]) = 0.0   # ...or a line segment
    def area([a,b,c]) =     # The area of a triangle can be found directly
        magnitude(crossproduct(b-a,c-a))
    def area(figure) =      # For larger figures, reduce to triangles and sum
        as_triangles(figure).collect(area).sum
    
    def as_triangles([])      = []  # No triangles without at least three points
    def as_triangles([_])     = []
    def as_triangles([_,_])   = []
    def as_triangles([a,b,c | rest) = [[a,b,c] | as_triangles([a,c | rest])]
    
  3. Princess's point about the difficulty of implementing O(1) queues with immutable structures is interesting (and may well provide the basis for a compelling example) but as stated it's fundamentally about the mutability of the data structure, and not directly about the multiple assignment issue.

  4. I'm intrigued by the Sieve of Eratosthenes answer, but unconvinced. The proper big-O, pull as many primes as you'd like generator given in the paper he cited does not look easy to implement correctly with or without SSA.


Well, thanks everyone for trying. As most of the answers turned out to be either 1) based on mutable data structures, not on single-assignment, and 2) to the extent they were about single assignment form easily countered by practitioners skilled in the art, I'm going to strike the line from my talk and / or restructure (maybe have it in backup as a discussion topic in the unlikely event I run out of words before I run out of time).

Thanks again.

I think you'll find the most productive languages allow you to mix functional and imperative styles, such as OCaml and F#.

In most cases, I can write code which is simply a long line of "map x to y, reduce y to z". In 95% of cases, functional programming simplifies my code, but there is one area where immutability shows its teeth:

The wide disparity between the ease of implementing and immutable stack and an immutable queue.

Stacks are easy and mesh well with persistence, queues are ridiculous.

The most common implementations of immutable queues use one or more internal stacks and stack rotations. The upside is that these queues run in O(1) most of the time, but some operations will run in O(n). If you're relying on persistence in your application, then its possible in principle that every operation runs in O(n). These queues are no good when you need realtime (or at least consistent) performance.

Chris Okasaki's provides an implementation of immutable queues in his book, they use laziness to achieve O(1) for all operations. Its a very clever, reasonably concise implementation of a realtime queue -- but it requires deep understanding of its underlying implementation details, and its still an order of magnitude more complex than an immutable stack.

In constrast, I can write a stack and queue using mutable linked lists which run in constant time for all operations, and the resulting code would be very straightforward.


Regarding the area of a polygon, its easy to convert it to functional form. Let's assume we have a Vector module like this:

module Vector =
    type point =
        { x : float; y : float}
        with
            static member ( + ) ((p1 : point), (p2 : point)) =
                { x = p1.x + p2.x;
                  y = p1.y + p2.y;}

            static member ( * ) ((p : point), (scalar : float)) =
                { x = p.x * scalar;
                  y = p.y * scalar;}

            static member ( - ) ((p1 : point), (p2 : point)) = 
                { x = p1.x - p2.x;
                  y = p1.y - p2.y;}

    let empty = { x = 0.; y = 0.;}
    let to_tuple2 (p : point) = (p.x, p.y)
    let from_tuple2 (x, y) = { x = x; y = y;}
    let crossproduct (p1 : point) (p2 : point) =
        { x = p1.x * p2.y; y = -p1.y * p2.x }

We can define our area function using a little bit of tuple magic:

let area (figure : point list) =
    figure
    |> Seq.map to_tuple2
    |> Seq.fold
        (fun (sum, (a, b)) (c, d) -> (sum + a*d - b*c, (c, d) ) )
        (0., to_tuple2 (List.hd figure))
    |> fun (sum, _) -> abs(sum) / 2.0

Or we can use the cross product instead

let area2 (figure : point list) =
    figure
    |> Seq.fold
        (fun (acc, prev) cur -> (acc + (crossproduct prev cur), cur))
        (empty, List.hd figure)
    |> fun (acc, _) -> abs(acc.x + acc.y) / 2.0

I don't find either function unreadable.

What would be an idiomatic way to represent a tree in Clojure? E.g.:

     A
    / \
   B   C
  /\    \
 D  E    F

Performance is not important and the trees won't grow past 1000 elements.

Trees underly just about everything in Clojure because they lend themselves so nicely to structural sharing in persistent data structure. Maps and Vectors are actually trees with a high branching factor to give them bounded lookup and insert time. So the shortest answer I can give (though it's not really that useful) is that I really recommend Purely functional data structures by Chris Okasaki for a real answer to this question. Also Rich Hickey's video on Clojure data structures on blip.tv

(set 'A 'B 'C)

Basically, I know how to create graph data structures and use Dijkstra's algorithm in programming languages where side effects are allowed. Typically, graph algorithms use a structure to mark certain nodes as 'visited', but this has side effects, which I'm trying to avoid.

I can think of one way to implement this in a functional language, but it basically requires passing around large amounts of state to different functions, and I'm wondering if there is a more space-efficient solution.

I just keep the visited set as a set and pass it as a parameter. There are efficient log-time implementations of sets of any ordered type and extra-efficient sets of integers.

To represent a graph I use adjacency lists, or I'll use a finite map that maps each node to a list of its successors. It depends what I want to do.

Rather than Abelson and Sussman, I recommend Chris Okasaki's Purely Functional Data Structures. I've linked to Chris's dissertation, but if you have the money, he expanded it into an excellent book.


Just for grins, here's a slightly scary reverse postorder depth-first search done in continuation-passing style in Haskell. This is straight out of the Hoopl optimizer library:

postorder_dfs_from_except :: forall block e . (NonLocal block, LabelsPtr e)
                          => LabelMap (block C C) -> e -> LabelSet -> [block C C]
postorder_dfs_from_except blocks b visited =
 vchildren (get_children b) (\acc _visited -> acc) [] visited
 where
   vnode :: block C C -> ([block C C] -> LabelSet -> a) 
                      -> ([block C C] -> LabelSet -> a)
   vnode block cont acc visited =
        if setMember id visited then
            cont acc visited
        else
            let cont' acc visited = cont (block:acc) visited in
            vchildren (get_children block) cont' acc (setInsert id     visited)
      where id = entryLabel block
   vchildren bs cont acc visited = next bs acc visited
      where next children acc visited =
                case children of []     -> cont acc visited
                                 (b:bs) -> vnode b (next bs) acc     visited
   get_children block = foldr add_id [] $ targetLabels bloc
   add_id id rst = case lookupFact id blocks of
                      Just b -> b : rst
                      Nothing -> rst

Most functional languages support inner functions. So you can just create your graph representation in the outermost layer and just reference it from the inner function.

This book covers it extensively http://www.amazon.com/gp/product/0262510871/ref=pd_lpo_k2_dp_sr_1?ie=UTF8&cloe_id=aa7c71b1-f0f7-4fca-8003-525e801b8d46&attrMsgId=LPWidget-A1&pf_rd_p=486539851&pf_rd_s=lpo-top-stripe-1&pf_rd_t=201&pf_rd_i=0262011530&pf_rd_m=ATVPDKIKX0DER&pf_rd_r=114DJE8K5BG75B86E1QS

This question started out from

  1. My translating of "ML for the Working Programmer" (WorldCat) by L. C. PAULSON to F# which uses functors for the examples.
  2. Eventual desire to translate "Purely Functional Data Structures" (WorldCat) by Chris Okasaki which uses functors.
  3. Reading "CATEGORIES TYPES AND STRUCTURES - An Introduction to Category Theory for the working computer scientist" (WorldCat) by Andrea Asperti and Giuseppe Longo.
  4. Not understanding it all, mostly the category theory.

SML.NET can do functors and worked with Microsoft .NET.
* See: SML.NET User Guide Section 4.8.2 Class types and functors?

I keep seeing that F# cannot do true functors because of some limitation in Microsoft .NET.
* Can ML functors be fully encoded in .NET (C#/F#)?
* Any workaround for functor?

So if SML.NET could do functors on .NET then why can't F#? What did SML.NET do that F# can't?

The more I learn about functors coming from category theory, the more I see the beauty of them and desire to have them in F#.

EDIT

In a pursuit to better understand the relation between category theory and functional programming see these Q&A at CS:StackExchange.

There's no fundamental limitation of .NET that stops functors from being implemented in F#. True, they can't be represented directly in .NET metadata, but neither can other F# language features like union types. Compilers for languages with functors (e.g., Standard ML, OCaml) have a pass called defunctorize; it works just like C++ template expansion, in that it "flattens" the functors by specializing them into normal modules.

The F# compiler could do the same thing, but you then have to ask: how will this be exposed to other .NET languages? Since functors can't be directly encoded in the .NET type system, you'd need to come up with some way to represent them; and if that representation is difficult/impossible to use from C# or VB.NET, would it still make sense to include F# functors? A non-trivial part of F#'s success comes from it's ability to easily interop (in both directions) with C# and VB.NET.

EDIT: Don't get me wrong -- I'd love to have functors in F#, they'd be really useful to handle a few cases which are currently painful and/or impossible to implement without them. I'm just pointing out that the main reason the language doesn't yet (and maybe won't ever) have functors is that the interop issue hasn't been solved; the metadata-encoding issue is actually the easy part.

EDIT 2: Code for the defunctorize pass of MLton: defunctorize.fun

Update: I had a thought about how functors actually could be expressed within the .NET type system, so I put together a little experiment. It isn't pretty, but it works -- so now we know it's at least plausible that F# could one day support functors. In practice, the complexity you see in my experimental code would all be hidden by the compiler/language. If you want to check it out: experimental-functors

I'm an OK C/C++ programmer. I find Haskell very intriguing. But it seems to me, that although it's relatively easy to write clean Haskell code, as it mimics math (which I'm very comfortable with) pretty well, it's very hard to write clean code in Haskell that runs fast.

A faster version of quicksort of Haskell is very long and scary, which has no resemblance to the naive but short (two lines), clean and intuitive implementation. The long and scary version of Haskell is actually still much slower than the shorter and simpler C counter part.

Is it because the current Haskell compiler is too dumb or is it just impossible for mortals (other than SJP of course) to write fast Haskell code?

You ask two different questions: learning and performance.

  • It took me about a month to become comfortable with functional programming using recursion, pattern matching, map, filter, and fold. I did all that with ML but it translated to Haskell very easily.
  • It took me two or three years to wrap my head around monads, but that's because I read the wrong stuff. I think there are better tutorials now. But if you're beginning, avoid monads for a while.
  • It took me several months to get good at creating new type classes, but using the existing ones was easy.
  • I'm still not sure I have the hang of lazy evaluation. But I love Haskell's purity and tend to treat lazy evaluation as an unhappy accident that only a few people (like John Hughes) know how to exploit.

You've observed a performance problem only because you've adapted an algorithm loaded with mutation, which Tony Hoare designed for imperative languages, and tried to translate into Haskell. In Haskell as in any other functional language the expensive operation is allocation. Try writing a merge sort and you'll find it's simple and performs very well.

How do you avoid making similar mistakes in the future? Have a look at Chris Okasaki's book Purely Functional Data Structures. Great book, and it will help you learn the 'functional way of doing things' without giving up performance.

I am currently working on React JS & React Native frameworks. On the half way road I came across Immutability or the Immutable-JS library, when I was reading about facebook's Flux implementation & Redux implementation.

The question is, why is immutability so important? What is wrong in mutating objects? Doesn't it make things simple?

Giving an example, Let us consider a simple News reader app. With the opening screen being a list view of news headlines.

If I set say an array of objects with a value initially. I can't manipulate it. That's what immutability principle says, right?(Correct me if I am wrong). But, what if I have a new News object that has to be updated? In usual case, I could have just added the object to the array. How do I achieve in this case? Delete the store & recreate it? Isn't adding an object to the array a less expensive operation?

PS: If the example is not the right way to explain immutability, please do let me know what's the right practical example.

I am trying to learn what's right here. Please do enlighten me :)

Although the other answers are fine, to address your question about a practical use case (from the comments on the other answers) lets step outside your running code for a minute and look at the ubiquitous answer right under your nose: git. What would happen if every time you pushed a commit you overwrote the data in the repository?

Now we're in to one of the problems that immutable collections face: memory bloat. Git is smart enough to not simply make new copies of files every time you make a change, it simply keeps track of the diffs.

While I don't know much about the inner workings of git, I can only assume it uses a similar strategy to that of libraries you reference: structural sharing. Under the hood the libraries use tries or other trees to only track the nodes that are different.

This strategy is also reasonably performant for in-memory data structures as there are well-known tree-operation algorithms that operate in logarithmic time.

Another use case: say you want an undo button on your webapp. With immutable representations of your data, implementing such is relatively trivial. But if you rely on mutation, that means you have to worry about caching the state of the world and making atomic updates.

In short, there's a price to pay for immutability in runtime performance and the learning curve. But any experienced programmer will tell you that debugging time outweighs code-writing time by an order of magnitude. And the slight hit on runtime performance is likely outweighed by the state-related bugs your users don't have to endure.

It is quite easy to fully understand standard Binary Search Tree and its operations. Because of that understanding, I even don't need to remember the implementations of those insert, delete, search operations.

I am learning Red-Black Tree now and I understand its properties for keeping the tree balanced. However I feel very hard to understand its insert and delete procedures.

I understand when inserting a new node, we mark the node as red (because red is the best we can do to avoid breaking less Red-Black tree laws). The new red node may still break the "no continuous red nodes law". Then we fix it via:

  1. check its uncle's colour, if red, then mark its parent and uncle as black, and go to grandparent.

  2. if it is right child, left rotate its parent

  3. mark its parent as black and its grandparent as red, then right rotate its grandparent.

done (basically like above).

Many places describes Red-Black tree's insert like above. They just tell you how to do it. But why those steps can fix the tree? Why first left rotate, and then right rotate?

Can anyone explains why to me more clearly, even more clear than CLRS? What's the magic of rotation?

I really wish to understand so after 1 year, I can implement Red-Black tree by myself without review a book.

Thanks

ignore my (now deleted) comment - i think okasaki's code is going to help you. if you have the book ("purely functional data structures"), look at the text on page 26 and figure 3.5 (facing, p 27). it's hard to get clearer than that.

unfortunately the thesis available on-line doesn't have that part.

i'm not going to copy it out because the diagram is important, but it shows that all the different cases are basically the same thing, and it gives some very simple ML code that hammers that home.

[update] it looks like you may be able to see this on amazon. go to the book's page, mouse over the image and enter "red black" in the search box. that gives you results that include pages 25 and 26, but you need to be logged on to see them (apparently - i haven't tried logging in to check).

I've been thinking for a while about how to go about implementing a deque (that is, a double-ended queue) as an immutable data structure.

There seem to be different ways of doing this. AFAIK, immutable data structures are generally hierarchical, so that major parts of it can be reused after modifying operations such as the insertion or removal of an item.

Eric Lippert has two articles on his blog about this topic, along with sample implementations in C#.

Both of his implementations strike me as more elaborate than is actually necessary. Couldn't deques simply be implemented as binary trees, where elements can only be inserted or removed on the very "left" (the front) and on the very "right" (the back) of the tree?

                               o
                              / \
                             …   …
                            /     \
                           …       …
                          / \     / \
              front -->  L   …   …   R  <-- back

Additionally, the tree would be kept reasonably balanced with rotations:

  • right rotations upon insertion at the front or upon removal from the back, and
  • left rotations upon removal from the front or insertion at the back.

Eric Lippert is, in my opinion, a very smart person whom I deeply respect, yet he apparently didn't consider this approach. Thus I wonder, was it for a good reason? Is my suggested way of implementing deques naïve?

As Daniel noted, implementing immutable deques with well-known balanced search trees like AVL or red-black trees gives Θ(lg n) worst case complexity. Some of the implementations Lippert discusses may seem elaborate at first glance, but there are many immutable deques with o(lg n) worst or average or amortized complexity that are built from balanced trees along with two simple ideas:

  1. Reverse the Spines

    To perform deque operations on a traditional balanced search tree, we need access to the ends, but we only have access to the center. To get to the left end, we must navigate left child pointers until we finally reach a dead end. It would be preferable to have a pointer to the left and right ends without all that navigation effort. In fact, we really don't need access to the root node very often. Let's store a balanced search tree so that access to the ends is O(1).

    Here is an example in C of how you might normally store an AVL tree:

    struct AVLTree {
      const char * value;
      int height;
      struct AVLTree * leftChild;
      struct AVLTree * rightChild;
    };
    

    To set up the tree so that we can start at the edges and move towards the root, we change the tree and store all of the pointers along the paths from the root to the left and rightmost children in reverse. (These paths are called the left and right spine, respectively). Just like reversing a singly-linked list, the last element becomes the first, so the leftmost child is now easily accessible.

    This is a little tricky to understand. To help explain it, imagine that you only did this for the left spine:

    struct LeftSpine {
      const char * value;
      int height;
      struct AVLTree * rightChild;
      struct LeftSpine * parent;
    };
    

    In some sense, the leftmost child is now the "root" of the tree. If you drew the tree this way, it would look very strange, but if you simply take your normal drawing of a tree and reverse all of the arrows on the left spine, the meaning of the LeftSpine struct should become clearer. Access to the left side of the tree is now immediate. The same can be done for the right spine:

    struct RightSpine {
      double value;
      int height;
      struct AVLTree * leftChild;
      struct RightSpine * parent;
    };
    

    If you store both a left and a right spine as well as the center element, you have immediate access to both ends. Inserting and deleting may still be Ω(lg n), because rebalancing operations may require traversing the entire left or right spine, but simply viewing to the left and rightmost elements is now O(1).

    An example of this strategy is used to make purely functional treaps with implementations in SML and Java (more documentation). This is also a key idea in several other immutable deques with o(lg n) performance.

  2. Bound the Rabalancing Work

    As noted above, inserting at the left or rightmost end of an AVL tree can require Ω(lg n) time for traversing the spine. Here is an example of an AVL tree that demonstrates this:

    A full binary tree is defined by induction as:

    • A full binary tree of height 0 is an empty node.
    • A full binary tree of height n+1 has a root node with full binary trees of height n as children.

    Pushing an element onto the left of a full binary tree will necessarily increase the maximum height of the tree. Since the AVL trees above store that information in each node, and since every tree along the left spine of a full binary tree is also a full binary tree, pushing an element onto the left of an AVL deque that happens to be a full binary tree will require incrementing Ω(lg n) height values along the left spine.

    (Two notes on this: (a) You can store AVL trees without keeping the height in the node; instead you keep only balance information (left-taller, right-taller, or even). This doesn't change the performance of the above example. (b) In AVL trees, you might need to do not only Ω(lg n) balance or height information updates, but Ω(lg n) rebalancing operations. I don't recall the details of this, and it may be only on deletions, rather than insertions.)

    In order to achieve o(lg n) deque operations, we need to limit this work. Immutable deques represented by balanced trees usually use at least one of the following strategies:

    • Anticipate where rebalancing will be needed. If you are using a tree that requires o(lg n) rebalancing but you know where that rebalancing will be needed and you can get there quickly enough, you can perform your deque operations in o(lg n) time. Deques that use this as a strategy will store not just two pointers into the deque (the ends of the left and right spines, as discussed above), but some small number of jump pointers to places higher along the spines. Deque operations can then access the roots of the trees pointed to by the jump pointers in O(1) time. If o(lg n) jump pointers are maintained for all of the places where rebalancing (or changing node information) will be needed, deque operations can take o(lg n) time.

      (Of course, this makes the tree actually a dag, since the trees on the spines pointed to by jump pointers are also pointed to by their children on the spine. Immutable data structures don't usually get along with non-tree graphs, since replacing a node pointed to by more than one other node requires replacing all of the other nodes that point to it. I have seen this fixed by just eliminating the non-jump pointers, turning the dag back into a tree. One can then store a singly-linked list with jump pointers as a list of lists. Each subordinate list contains all of the nodes between the head of that list and its jump pointer. This requires some care to deal with partially overlapping jump pointers, and a full explanation is probably not appropriate for this aside.)

      This is one of the tricks used by Tsakalidis in his paper "AVL Trees for localized search" to allow O(1) deque operations on AVL trees with a relaxed balance condition. It is also the main idea used by Kaplan and Tarjan in their paper "Purely functional, real-time deques with catenation" and a later refinement of that by Mihaesau and Tarjan. Munro et al.'s "Deterministic Skip Lists" also deserves a mention here, though translating skip lists to an immutable setting by using trees sometimes changes the properties that allow such efficient modification near the ends. For examples of the translation, see Messeguer's "Skip trees, an alternative data structure to Skip lists in a concurrent approach", Dean and Jones's "Exploring the duality between skip lists and binary search trees", and Lamoureux and Nickerson's "On the Equivalence of B-trees and deterministic skip lists".

    • Do the work in bulk. In the full binary tree example above, no rebalancing is needed on a push, but Ω(lg n) nodes need to have their height or balance information updated. Instead of actually doing the incrementation, you could simply mark the spine at the ends as needing incrementation.

      One way to understand this process is by analogy to binary numbers. (2^n)-1 is represented in binary by a string of n 1's. When adding 1 to this number, you need to change all of the 1's to 0's and then add a 1 at the end. The following Haskell encodes binary numbers as non-empty strings of bits, least significant first.

      data Bit = Zero | One
      
      type Binary = (Bit,[Bit])
      
      incr :: Binary -> Binary
      incr (Zero,x) = (One,x)
      incr (One,[]) = (Zero,[One])
      incr (One,(x:xs)) = 
          let (y,ys) = incr (x,xs)
          in (Zero,y:ys)
      

      incr is a recursive function, and for numbers of the form (One,replicate k One), incr calls itself Ω(k) times.

      Instead, we might represent groups of equal bits by only the number of bits in the group. Neighboring bits or groups of bits are combined into one group if they are equal (in value, not in number). We can increment in O(1) time:

      data Bits = Zeros Int | Ones Int
      
      type SegmentedBinary = (Bits,[Bits])
      
      segIncr :: SegmentedBinary -> SegmentedBinary
      segIncr (Zeros 1,[]) = (Ones 1,[])
      segIncr (Zeros 1,(Ones n:rest)) = (Ones (n+1),rest)
      segIncr (Zeros n,rest) = (Ones 1,Zeros (n-1):rest)
      segIncr (Ones n,[]) = (Zeros n,[Ones 1])
      segIncr (Ones n,(Zeros 1:Ones m:rest)) = (Zeros n,Ones (m+1):rest)
      segIncr (Ones n,(Zeros p:rest)) = (Zeros n,Ones 1:Zeros (p-1):rest)
      

      Since segIncr is not recursive and doesn't call any functions other than plus and minus on Ints, you can see it takes O(1) time.

      Some of the deques mentioned in the section above entitled "Anticipate where rebalancing will be needed" actually use a different numerically-inspired technique called "redundant number systems" to limit the rebalancing work to O(1) and locate it quickly. Redundant numerical representations are fascinating, but possibly too far afield for this discussion. Elmasry et al.'s "Strictly-regular number system and data structures" is not a bad place to start reading about that topic. Hinze's "Bootstrapping one-sided flexible arrays" may also be useful.

      In "Making data structures persistent", Driscoll et al. describe lazy recoloring, which they attribute to Tsakalidis. They apply it to red-black trees, which can be rebalanced after insertion or deletion with O(1) rotations (but Ω(lg n) recolorings) (see Tarjan's "Updataing a balanced tree in O(1) rotations"). The core of the idea is to mark a large path of nodes that need to be recolored but not rotated. A similar idea is used on AVL trees in the older versions of Brown & Tarjan's "A fast merging algorithm". (Newer versions of the same work use 2-3 trees; I have not read the newer ones and I do not know if they use any techniques like lazy recoloring.)

    • Randomize. Treaps, mentioned above, can be implemented in a functional setting so that they perform deque operations on O(1) time on average. Since deques do not need to inspect their elements, this average is not susceptible to malicious input degrading performance, unlike simple (no rebalancing) binary search trees, which are fast on average input. Treaps use an independent source of random bits instead of relying on randomness from the data.

      In a persistent setting, treaps may be susceptible to degraded performance from malicious input with an adversary who can both (a) use old versions of a data structure and (b) measure the performance of operations. Because they do not have any worst-case balance guarantees, treaps can become quite unbalanced, though this should happen rarely. If an adversary waits for a deque operation that takes a long time, she can initiate that same operation repeatedly in order to measure and take advantage of a possibly unbalanced tree.

      If this is not a concern, treaps are an attractively simple data structure. They are very close to the AVL spine tree described above.

      Skip lists, mentioned above, might also be amenable to functional implementations with O(1) average-time deque operations.

      The first two techniques for bounding the rebalancing work require complex modifications to data structures while usually affording a simple analysis of the complexity of deque operations. Randomization, along with the next technique, have simpler data structures but more complex analysis. The original analysis by Seidel and Aragon is not trivial, and there is some complex analysis of exact probabilities using more advanced mathematics than is present in the papers cited above -- see Flajolet et al.'s "Patterns in random binary search trees".

    • Amortize. There are several balanced trees that, when viewed from the roots up (as explained in "Reverse the Spines", above), offer O(1) amortized insertion and deletion time. Individual operations can take Ω(lg n) time, but they put the tree in such a nice state that a large number of operations following the expensive operation will be cheap.

      Unfortunately, this kind of analysis does not work when old versions of the tree are still around. A user can perform operations on the old, nearly-out-of-balance tree many times without any intervening cheap operations.

      One way to get amortized bounds in a persistent setting was invented by Chris Okasaki. It is not simple to explain how the amortization survives the ability to use arbitrary old versions of a data structure, but if I remember correctly, Okasaki's first (as far as I know) paper on the subject has a pretty clear explanation. For more comprehensive explanations, see his thesis or his book.

      As I understand it, there are two essential ingredients. First, instead of just guaranteeing that a certain number of cheap operations occur before each expensive operation (the usual approach to amortization) you actually designate and set up that specific expensive operation before performing the cheap operations that will pay for it. In some cases, the operation is scheduled to be started (and finished) only after many intervening cheap steps. In other cases, the operation is actually scheduled only O(1) steps in the future, but cheap operations may do part of the expensive operation and then reschedule more of it for later. If an adversary looking to repeat an expensive operation over and over again is actually reusing the same scheduled operation each time. This sharing is where the second ingredient comes in.

      The computation is set up using laziness. A lazy value is not computed immediately, but, once performed, its result is saved. The first time a client needs to inspect a lazy value, its value is computed. Later clients can use that cached value directly, without having to recompute it.

      #include <stdlib.h>
      
      struct lazy {
        int (*oper)(const char *);
        char * arg;
        int* ans;
      };
      
      typedef struct lazy * lazyop;
      
      lazyop suspend(int (*oper)(const char *), char * arg) {
        lazyop ans = (lazyop)malloc(sizeof(struct lazy));
        ans->oper = oper;
        ans->arg = arg;
        return ans;
      }
      
      void force(lazyop susp) {
        if (0 == susp) return;
        if (0 != susp->ans) return;
        susp->ans = (int*)malloc(sizeof(int));
        *susp->ans = susp->oper(susp->arg);
      }
      
      int get(lazyop susp) {
        force(susp);
        return *susp->ans;
      }
      

      Laziness constructs are included in some MLs, and Haskell is lazy by default. Under the hood, laziness is a mutation, which leads some authors to call it a "side effect". That might be considered bad if that kind of side effect doesn't play well with whatever the reasons were for selecting an immutable data structure in the first place, but, on the other hand, thinking of laziness as a side effect allows the application of traditional amortized analysis techniques to persistent data structures, as mentioned in a paper by Kaplan, Okasaki, and Tarjan entitled "Simple Confluently Persistent Catenable Lists".

      Consider again the adversary from above who is attempting to repeatedly force the computation of an expensive operation. After the first force of the lazy value, every remaining force is cheap.

      In his book, Okasaki explains how to build deques with O(1) amortized time required for each operation. It is essentially a B+-tree, which is a tree where all of the elements are stored at the leaves, nodes may vary in how many children they have, and every leaf is at the same depth. Okasaki uses the spine-reversal method discussed above, and he suspends (that is, stores as a lazy value) the spines above the leaf elements.

      A structure by Hinze and Paterson called "Finger trees: a simple general-purpose data structure" is halfway between the deques designed by Okasaki and the "Purely functional representations of catenable sorted lists" of Kaplan and Tarjan. Hinze and Paterson's structure has become very popular.

      As a evidence of how tricky the amortized analysis is to understand, Hinze and Paterson's finger trees are frequently implemented without laziness, making the time bounds not O(1) but still O(lg n). One implementation that seems to use laziness is the one in functional-dotnet. That project also includes an implementation of lazy values in C# which might help explain them if my explanation above is lacking.

Could deques be implemented as binary trees? Yes, and their worst-case complexity when used persistently would be no worse than those presented by Eric Lippert. However, Eric's trees are actually not complicated enough to get O(1) deque operations in a persistent setting, though only by a small complexity margin (making the center lazy) if you are willing to accept amortized performance. A different but also simple view of treaps can get O(1) expected performance in a functional setting, assuming an adversary who is not too tricky. Getting O(1) worst-case deque operations with a tree-like structure in a functional setting requires a good bit more complexity than Eric's implementations.


Two final notes (though this is a very interesting topic and I reserve the right to add more later) :-)

  1. Nearly all of the deques mentioned above are finger search trees as well. In a functional setting this means they can be split at the ith element in O(lg(min(i,n-i))) time and two trees of size n and m can be concatenated in O(lg(min(n,m))) time.

  2. I know of two ways of implementing deques that don't use trees. Okasaki presents one in his book and thesis and the paper I linked to above. The other uses a technique called "global rebuilding" and is presented in Chuang and Goldberg's "Real-time deques, multihead Turing machines, and purely functional programming".

In his seminal thesis, Chris Okasaki described the technique of data-structural bootstrapping. What work, if any, has been done to use this technique to improve locality in data structures?

For example, balanced binary trees are commonly used to create purely functional sets and dictionaries but a hash trie of small arrays are often significantly faster due to improved locality.

You could try references to his book by Haskell or Clojure folk rather than just the CMU pdf : e.g.,

http://www.amazon.com/Purely-Functional-Structures-Chris-Okasaki/dp/0521663504

There was a question here on SO at :

What is the benefit of purely functional data structure?

There is also Clojure area this :

https://github.com/viksit/clojure-datastructures

And there was this on SE :

http://cstheory.stackexchange.com/questions/1539/whats-new-in-purely-functional-data-structures-since-okasaki

Hope something there provides a basis for a search that bears results :-)

You may have to use an academic or biz ref search engine and you may want to look at poster sessions at a conf because search is not obvious here, e.g., Mercury can generate Erlang code ... so searching caching and locality with respect to performance in functional programming in some hardware area dealing with latency.

Canada'a National Research Council (NRC) had some work going on ... you could try a search of their pub's/notices/reports

But note: a search with

bigdata latency locality NRC 2012

gives rather different result from

bigdata functional latency locality NSF