Algorithms + Data Structures

Niklaus Wirth

Mentioned 7

Fundamental data structures; Sorting; Recursive algorithms; Dynamic information structures; Language structures and compilers.

More on Amazon.com

Mentioned in questions and answers.

Inspired by Eric Sink's interview on the stackoverflow podcast I would like to build a full compiler in my spare time for the learning experience. My initial thought was to build a C compiler but I'm not sure whether it would take too much time.

I am wondering if there is a smaller general purpose language that would be more appropriate to implement as a first compiler effort? Or is a C implementation doable on a reasonable timescale (200 hrs)?

It is my intention to target the CLR.

Pascal has already been mentioned, but I'd like to add that Niklaus Wirth's book Algorithms + Data Structures = Programs contains a complete implementation of a small Pascal-like language using recursive descent. If you're looking for a theory-intensive discussion of parsing, look elsewhere; but if you want straightforward code that lets you learn by doing, then I'd recommend A + DP = P.

There's a lot of old Delphi books available inexpensively. As a self-taught (advanced) beginner, it is difficult for me to know which ideas are still relevant and up-to-date, and which have become outdated. I'm hoping for a little guidance. For example, would it be outmoded to learn about databases powered by BDE? Is COM no longer a commonly used model? (note: I may be off in how I'm using these terms ... I don't know much about them.)

Thanks for your thoughts!

Death of a coding practice is a very relative thing. I still know of COBOL applications that are still running... mainly because they still work and don't deal with dates so it was more than ok to just let them run. Sometimes it might not be the best "new way" of doing things, but if it works without any changes... why mess with it.

The concept of COM hasn't really died... its evolving, and knowing how to use it can help you understand and apply the latest evolution. Do you need to know assembly to be a good Delphi programmer? Absolutely not, but it is knowledge that can be helpful in understanding how to better optimize your routines.

In Delphi, COM isn't just about the object model. Its also about interfaces. Interfaces can still be a very useful tool in the bag and if you know COM development in Delphi you know how interfaces work.

As for the legacy books... I say keep them on the shelf and glance at them from time to time. Sometimes looking back might help you leap forward. Its why I have a copy of Algorithms + Data Structures = Programs on my shelf. Funny thing, most of the code in the book still compiles with a few minor changes. Sure the code is not OOP, but the concepts are still ones I use today. You might be amazed at how much a binary tree hasn't changed, and how the best way to optimize it is still the same. How sometimes using a simple old-master new-master routine is faster than loading the data into a SQL Table and then performing an update.

Its not ALWAYS about the cool factor... sometimes its about what works.

I have a problem for a university lab;

Write a short program that outputs all possible strings formed by using characters ‘c’, ‘a’, ‘r’, ‘b’, ‘o’, and ‘n’ exactly once.

It seems to be a common interview question and well documented.

So I've coded it with Java using a recursive method which wasn't too hard, when or why would you choose not to use recursion and what would be the easiest way of doing it?

I started to code a counter that would count down on base 6, the output would then reference char's and print the string.

Thanks,

Algorithms and Data Structures by Niklaus Wirth have a section "When not to use recursion", but recursion is useful programmer`s tool. I think that understanding recursion is "must" for a programmer.

You have a clever approach to permutation problem. It can be solved recursively (pseudocode):

private void PermutationsRecursive(string prefix, string s)
{
    int N = s.Length;

    if (N > 0)
        for (int i = 0; i < N; i++)
            PermutationsRecursive(prefix + s[i], s.Substring(0, i) + s.Substring(i + 1, N - i - 1));
    else
        WriteLine(prefix);
}

PermutationsRecursive("", "carbon");

I made a custom LinkedList in java, which is somewhat of a cross between a map & a list. I only did this exercise to learn, I know that HashMap is a better & faster implementation. I implemented delete method for LinkedList but am a little confused regarding which is the most optimal way to write method: deleteAll which basically deletes all occurences of a particular element.

Code:

public class LinkedListMain
{
    public static void main(String[] args)
    {
        LinkedList linkedList = new LinkedList();

        System.out.println("isEmpty: " + linkedList.isEmpty());

        linkedList.insert("abc", 34);
        linkedList.insert("pqr", 44);
        linkedList.insert("xyz", 54);
        linkedList.insert("asd", 64);
        linkedList.insert("abc", 74);

        linkedList.print();

/*      System.out.println("delete: " + linkedList.delete("abc"));
        System.out.println("delete: " + linkedList.delete("pqr"));
        System.out.println("delete: " + linkedList.delete("xyz"));
        System.out.println("delete: " + linkedList.delete("asd"));
*/
        System.out.println("deleteAll: " + linkedList.deleteAll("abc"));
        System.out.println("isEmpty: " + linkedList.isEmpty());
    }
}

class LinkedList
{
    private ListNode first;
    private ListNode last;

    public LinkedList()
    {
        first = null;
        last = first;
    }

    public void insert(String d1, int d2)
    {
        ListNode node = new ListNode(d1, d2);

        if(first == null)
        {
            node.next = null;
            first = node;
            last = node;
        }

        else
        {
            last.next = node;
            node.next = null;
            last = node;
        }
    }

    public String deleteAll(String str)
    {
        return "To Be Implemented";
    }

    public String delete(String str)
    {
        ListNode slow = first;
        ListNode fast = first;

        int count = 0;

        while(fast != null)
        {
            if(count > 1)
            {
                slow = slow.next;
            }

            if(count <= 1)
            {
                count++;
            }

            if(fast.getVal()==str)
            {
                if(fast == first)
                {
                    first = first.next;
                }

                else
                {
                    if(fast.next != null)
                    {
                        slow.next = fast.next;
                    }

                    else
                    {
                        slow.next = null;
                    }
                }

                fast = null;
                return str;     // fast.getVal()
            }

            fast = fast.next;
        }

        return "not found";
    }

    public void print()
    {
        ListNode currentNode = first;

        while(currentNode != null)
        {
            currentNode.print();
            currentNode = currentNode.next;
        }
    }

    public boolean isEmpty()
    {
//      return ( ((first==null) ? (true) : (false)) && ((last==null) ? (true) : (false)));
        return (first==null) ? (true) : (false);
    }
}

class ListNode
{
    private String data1;
    private int data2;

    public ListNode next;

    public ListNode(String d1, int d2)
    {
        data1 = d1;
        data2 = d2;
    }

    public String getVal()
    {
        return data1;
    }

//  public void printMe(ListNode node)
    public void print()
    {
        System.out.println("data1: [" + data1 + "], data2: [" + data2 + "]");
    }
}

I have 3 questions associated with this example:

  1. Should the ideal deleteAll function repeatedly use my delete function? Should I make changes to my delete function to accommodate this?
  2. Should the isEmpty function ideally compare first & last both to null? If last should be compared to null, then how should I change my delete & deleteAll functions for this to be implemented. I tried doing this with the current delete function but ran into some problems.
  3. In general, can this code be optimized significantly? Not in the sense that "if you need the perfect linked list, use the Collections", simply asking how to exactly optimize this singly linked list more if possible?

deleteAll() would probably be most comfortable to pass the previous node to a node deletion method. If delete() is expected to do anything but the obvious pointer manipulation, this action can be factored out.

/** 
 Deletes all nodes that contain target_string. 
 Returns the number of nodes deleted.
*/
public int deleteAll(String target_string) {
  int deleted_nodes_cnt = 0;
  ...
  if (prev_node.next.getVal().equals(target_string)) { // not ==
    deleteNextNode(prev_node); 
    prev_node = prev_node.next; 
    deleted_nodes_cnt += 1;
  }
  ...
  return deleted_nodes_cnt;
}

/** Delete the node after prev_node; prev_node.next != null */
private void deleteNextNode(Node prev_node) {
  Node dead_node = prev_node.next;
  prev_node.next = prev_node.next.next;
  dead_node.afterDelete(); // any custom cleanup, if required
}

public boolean delete(String target_string) {
   ... 

  if (prev_node.next.getVal().equals(target_string)) { // looks familiar?
    deleteNextNode(prev_node);
    return true; 
  }
  ...
  return false;
}

You may notice that delete() and deleteAll() use the same list iteration logic that can be nicely factored out, too.

/**
 Scan nodes, starting from this.
 Returns node X such that X.next.value equals target_string, or null. 
*/
private Node findNodeBeforeMatching(String target_string) {
  Node prev_node = this;
  while (prev_node.next != null) {
    if (prev_node.next.getVal().equals(target_string)) return prev_node;
    else prev_node = prev_node.next;
  }
  return null;
}

To use this method effectively, you will need to make LinkedList (essentially 'list head keeper') a subclass of Node, or make them both subclasses of a common class. Even better, make them implement the same interface which allows to getNext() and deleteNext(). Alternatively, you could return a Node from each operation, but this is not compatible with Collection interface, of course.

Old, incorrect text: deleteAll() should not call individual delete() methids, unless these methods may do something special in descendant classes . The reason is that such deleteAll() is O(1), runs in constant time, while traversing the list to delete() each node is O(n). AFAICT last instance variable serves only to speed up appending elements to the end of the list (though insert() does strange things to it). So isEmpty() should only really check first. If first == null, the method may assert last == null. I assume that having first == null but last != null means our bookkeeping was wrong, our data have been corrupted, and and we can't continue. WRT optimizing, I don't see how your byzantine delete() should work. I don't think having two pointers speed things up any in your case. Running two pointers with different 'speeds' is a known way to detect cycles, but I don't see how it's applicable here.

If you want a speed-up on sorted data, read about the skip list, or use a tree. On unsorted data, a plain sequential scan is your best bet.

To me, an entire LinkedList class should be half as long, since it's so logically simple.

Your ListNode and LinkedList are unnecessarily tightly coupled in insert(), which should IMHO accept an instance, not construct it. Better yet, ListNode and LinkedList should be the same thing, as in classical implementations.

Get the Wirth's book on data structures and algorithms, it's very approachable. (When you have understood it fully, try to continue with Knuth's books, and if you're really brave, with Okasaki's book.)

I think parser generators are a pretty nice tool to have in your programming toolkit so after playing around with some I wrote my own just to understand things better and it turned out to be better than I expected so I've stuck with it.

One thing that has been bugging me lately though is error reporting and recovery. I don't do a very good job of it. I know one method is token synchronization but the trail seems to stop there. Other than rolling your own recursive descent parser and including all sorts of heuristics what are some general purpose ways of handling error reporting and error recovery in parser generators?

With PEG, which is top-down, you can implement the "cut" feature, either automatically, or for manual inclusion, so you can report errors as close to their source as possible. See Grako and the referenced article by Kota Mizushima. A "cut" invalidates alternatives after certain tokens are seen on the input, so the parser can know how to fail early.

In general, I don't like error recovery as the errors reported after the first tend to be nuisance, as Turbo Pascal once proved.

The general strategy for recovery is to perform rewrites, inserts or deletes, on the input sequence so the parser can continue. For a simple recovery strategy based solely on deletes (skipping input until an expected token), see section 5.9 of Wirth's A+D=P.

I was able to get the binary code of double numbers:

Double d = 1.5E12;
long l = Double.doubleToLongBits(d);
String bin = Long.toBinaryString(l);
System.out.println(bin);
System.out.println(bin.length());

However, the resulting code doesn't behave like a double bitcode, it doesn't make any sense:

1.5E12   --> 100001001110101110100111110111101111001100000000000000000000000 (length: 63)
-1.5E12  --> 1100001001110101110100111110111101111001100000000000000000000000 (length: 64)
1.5E-12  --> 11110101111010011000110110011001000001110001001101111100011010 (length: 62)
-1.5E-12 --> 1011110101111010011000110110011001000001110001001101111100011010 (length: 64)

I tried to understand those number by grouping them:

1.5E12   -->     10000100111  0101110100111110111101111001100000000000000000000000
-1.5E12  --> 1   10000100111  0101110100111110111101111001100000000000000000000000
1.5E-12  -->   (0)1111010111  1010011000110110011001000001110001001101111100011010
-1.5E-12 --> 1   01111010111  1010011000110110011001000001110001001101111100011010

First of all it adds/removes a bit for the minus signs (that's a bad thing...speaking of parsers). But even more the exponent shows very dubious numbers: E.g. 10000100111 is supposed 1035 (= 1023 + 12) instead of 1063 (1063 - 1023 = 40!), and (0)1111010111 is supposed to be 1011 (= 1023 - 12) instead of 983 (983 - 1023 = -40!).

Does anyone know how to read this double bit code? I.e. how to get the right exponent and the mantisse out of above bit codes?

(An example: How to get the value 1.5E12 back again out of the bit code? 100001001110101110100111110111101111001100000000000000000000000 --?--> 1.5E12)

UPDATE:

Using the mask from the Java API I ended up to get the values like this:

static final long SIGN = 0x8000000000000000L;
static final long EXPN = 0x7ff0000000000000L;
static final long SGNF = 0x000fffffffffffffL;

Double d = ...;
long lng = Double.doubleToLongBits(d);
String bin = Long.toBinaryString(lng);
long sign = (lng & SIGN) >>> (bin.length()-1);
long expn = (lng & EXPN) >>> 52;
long sgnf = lng & SGNF;

And I could easily print it:

System.out.println("sign-bin: "+Long.toBinaryString(sign));
System.out.println("expn-bin: "+Long.toBinaryString(expn));
System.out.println("sgnf-bin: "+Long.toBinaryString(sgnf));
System.out.println("sign-string: "+Long.toString(sign));
System.out.println("expn-string: "+Long.toString(expn));
System.out.println("sgnf-string: "+Long.toString(sgnf));

With the Double 1.5E12 I got this result:

sign-bin: 0
expn-bin: 10000100111
sgnf-bin: 101110100111110111101111001100000000000000000000000
sign-string: 0
expn-string: 1063
sgnf-string: 1640400372629504

Do you know how to get the "real" decimal values out of them? (E.g. 1640400372629504 --> 1.5 and 1063 --> E12)

As the Wikipedia page says,value of the number is

expression for IEEE 488 double precision number value .

So the binary exponent is 1063 - 1023 = 40. This makes sense because it multiplies the mantissa by 2^40, which is about 10^12 (because 2^10 is about 1000, 2^40 will be 1000^4). (This method of coding exponents is called zero-offset. It's used rather than 2s complement because it leads to simpler hardware.)

The mantissa raw bits are 0101110100111110111101111001100000000000000000000000, so prepending the 1 as the formula above says, we have:

1.0101110100111110111101111001100000000000000000000000

Now, shifting the binary point 40 places to the right to account for the exponent, we have:

10101110100111110111101111001100000000000.000000000000

Happily, in decimal this is the integer 1500000000000, exactly what we want.

From this you should be able to get the algorithm. The first step is to compute the integer part (just as I have done above) and use the standard algorithm for printing integers:

i = 0;
while (int_part > 0) {
   digit[i] = int_part % 10
   int_part = int_part / 10
   i = i + 1
}
if (i == 0) return '0'
else reverse digits[0..i-1] and return

This leaves the fraction bits. If there are N bits, then treat them as an N-bit integer and divide by 2^N. For example, a 1-bit fraction of 1 is 1/2. You can get decimal digits by implementing long division. For example binary .11 is 3/4. Implementing long division, we first get 3*10/4 = 7R2. Then 2*10/4 = 7R0. So the decimal digits are .75. In pseudocode:

num = frac_part
den = 2 ^ (number of bits in frac_part)
i = 0
do {
  num = num * 10
  digit[i] = num / den
  num = num % den
  i = i + 1
} while (num != 0)

Note these algorithms are conceptual, not practical. For one thing, they assume arbitrary precision integers. In practice you don't want this overhead, so calculations are actually done in floating point. The good old book Algorithms + Data Structures = Programs by Wirth goes into some detail about how this works.

If you are implementing printf or similar library function, the details of base conversion are extremely hairy to get right with good speed. By "right" I mean being able to print a number in base-10 and read that representation back in with perfect confidence that you'll get exactly the same number. Don't underestimate this problem.

Also, as others have said your Java print routines are just eliding all leading zeros because they're intended for people, and people usually don't care about leading zeros.

I am trying to figure out how to remove a node from a binary search tree, I understand that it is different for each node whether it is a leaf, has one child or two children. So as of now my function is nothing more than:


bool BinSTree::remove_root(treeNode*& node) {
   if(node -> left == NULL && node -> right == NULL) {


   }
   elseif(node -> left != NULL && node -> right != NULL) {


   }  
   else {

   }                           

}

I am having a very hard time trying to comprehend the logic, for instance I know for all of them I need to be able to find the parent node to the node being deleted, which is where I am left clueless and any help would be much appreciated!

In addition to the other answers (and assuming this is homework, the point being to learn), you may find Niklaus Wirth's "Algorithms + Data Structures = Programs" very enlightening, both in general and for your particular problem.

It's a classic little book, a gem.

Hopefully available at your nearest (university?) library?

Cheers & hth.,