Algorithms in C++, Parts 1-4

Robert Sedgewick

Mentioned 6

Robert Sedgewick has thoroughly rewritten and substantially expanded and updated his popular work to provide current and comprehensive coverage of important algorithms and data structures. Christopher Van Wyk and Sedgewick have developed new C++ implementations that both express the methods in a concise and direct manner, and also provide programmers with the practical means to test them on real applications. Many new algorithms are presented, and the explanations of each algorithm are much more detailed than in previous editions. A new text design and detailed, innovative figures, with accompanying commentary, greatly enhance the presentation. The third edition retains the successful blend of theory and practice that has made Sedgewick's work an invaluable resource for more than 250,000 programmers! This particular book, Parts 1n4, represents the essential first half of Sedgewick's complete work. It provides extensive coverage of fundamental data structures and algorithms for sorting, searching, and related applications. Although the substance of the book applies to programming in any language, the implementations by Van Wyk and Sedgewick also exploit the natural match between C++ classes and ADT implementations. Highlights Expanded coverage of arrays, linked lists, strings, trees, and other basic data structures Greater emphasis on abstract data types (ADTs), modular programming, object-oriented programming, and C++ classes than in previous editions Over 100 algorithms for sorting, selection, priority queue ADT implementations, and symbol table ADT (searching) implementations New implementations of binomial queues, multiway radix sorting, randomized BSTs, splay trees, skip lists, multiway tries, B trees, extendible hashing, and much more Increased quantitative information about the algorithms, giving you a basis for comparing them Over 1000 new exercises to help you learn the properties of algorithms Whether you are learning the algorithms for the first time or wish to have up-to-date reference material that incorporates new programming styles with classic and new algorithms, you will find a wealth of useful information in this book.

More on Amazon.com

Mentioned in questions and answers.

How do you merge 2 Binary Search Trees in such a way that the resultant tree contains all the elements of both the trees and also maintains the BST property.

I saw the solution provided in how to merge two BST's efficiently?

However that solution involves converting into a Double Linked List. I was wondering if there is a more elegant way of doing this which could be done in place without the conversion. I came up with the following pseudocode. Does it work for all cases? Also I am having trouble with the 3rd case.

node* merge(node* head1, node* head2) {
        if (!head1)
            return head2;
        if (!head2)
            return head1;

        // Case 1.
        if (head1->info > head2->info) {
            node* temp = head2->right;
            head2->right = NULL;
            head1->left = merge(head1->left, head2);
            head1 = merge(head1, temp);
            return head1;
        } else if (head1->info < head2->info)  { // Case 2
            // Similar to case 1.
        } else { // Case 3
            // ...
        }
}

The following algorithm is from Algorithms in C++.

The idea is almost the same as in the algorithm posted by PengOne. This algorithm does in place merging, time complexity is O(n+m).

link join(link a, link b) {
    if (b == 0) return a;
    if (a == 0) return b;
    insert(b, a->item);
    b->left = join(a->left, b->left);
    b->right = join(a->right, b->right);
    delete a;
    return b;
}

insert just inserts an item in the right place in the tree.

void insert(link &h, Item x) {
    if (h == 0) {
        h = new node(x);
        return;
    }
    if (x.key() < h->item.key()) {
        insert(h->left, x);
        rotateRight(h);
    }
    else {
        insert(h->right, x);
        rotateLeft(h);
    }
}

rotateRight and rotateLeft keep tree in the right order.

void rotateRight(link &h) {
    link x = h->left;
    h->left = x->right;
    x->right = h;
    h = x;
}

void rotateLeft(link &h) {
    link x = h->right;
    h->right = x->left;
    x->left = h;
    h = x;
}

Here link is node *.

What is the best place or a link to learn algorithms in C? How do you know when and where to use the implementation of algorithms by just looking into the problems?

Algorithms in C by Sedgewick is a great place to start the investigation. Once you are familiar with what algorithms are available and what the performance characteristics of each are, you'll be able to see where to use each of them.

Algorithms aren't necessarily tied to a specific language, just to clarify, so any algorithms book will work great as long as you can understand the concept being the data structure/algorithm.

That said, this seems like a good choice: Algorithms in C. I have the C++ equivalent on my shelf.

There is also a book that seems language agnostic (correct me if I'm wrong) called Data Structures & Algorithm's, though I hear it's a bit dated, so you'll miss out on more recent structures.

Don't forget the internet has a plethora of information available to you. However, books are usually better for these sorts of things. This is because internet resources tend to focus on one thing at a time. For example, you need to understand what Big-O notation is before you can understand what it means when we say a List has O(1) [constant time] removal.

A book will cover these things in the correct order, but an internet resource will focus on either Big-O notation or data structures, but often won't easily connect the two.


When it comes to using it, you'll mostly make the connection when it comes to what you'll be doing with the data.

For example, you might want a vector (array) if you just need ordered elements, but if you need ordered elements and removal from any place (but can sacrifice random access), then a list would be more appropriate, due to it's constant-time removal.

I read Pointers on C by Kenneth Reek recently. I thought I was pretty well versed in C, but this book gave me a few epiphanies, despite being aimed at beginners. The code examples are things of beauty (but not the fastest code on a x86-like CPU). It provide good implementations of many of the most common algorithms and data-structures that are in use, with excellent explanations about why they are implemented as they are (and sometimes code or suggestions for alternative implementations).

On the same page as your question: patterns for creating reusable code in C (that is what we all want, isn't it?), C Interfaces and Implementations: Techniques for Creating Reusable Software, by David R. Hanson. It has been a few years since I read it, and I don't have a copy to verify what I recall is correct, but if I remember correctly it deals with how to create good C API:s to data structures and algorithms, as well as giving example implementations of some of the most common algorithms.

Of topic: As I have mostly written throw-away programs in C for private use, this one helped me get rid of some bad coding habits as well as being an excellent C reference: C: A reference Manual. Reminds me that I ought to buy that one.

Basically my program reads a text file with the following format:

3
chairs
tables
refrigerators

The number on the first line indicates the number of items in the file to read.

Here's my hash function:

int hash(string& item, int n) {
    int hashVal = 0;
    int len = item.length();

    for(int i = 0; i < len; i++)
      hashVal = hashVal*37 + item[i];

    hashVal %= n;   

    if(hashVal < 0) hashVal += n;

    return hashVal;
}

when my program read the text file above, it was successful. But when I tried another one:

5
sabel
ziyarah
moustache
math
pedobear

The program would freeze. Not a segmentation fault or anything but it would just stop.

Any ideas?

Edit:

int n, tableSize;
myFile >> n;

tableSize = generateTableSize(n); 

string item, hashTable[tableSize];

for(int i = 0; i < tableSize; i++)
    hashTable[i] = "--";

while(myFile >> item && n!=0) {
    int index = hash(item,tableSize);

    if(hashTable[index] == "--")
        hashTable[index] = item;

    else {
        int newIndex = rehash(item,tableSize);
        while(hashTable[newIndex] != "--") {
            newIndex = rehash(item,tableSize);
        }
        hashTable[newIndex] = item;
    }
    n--;
}

int rehash(string item, int n)  {
    return hash(item,n+1);
}

The code freezes because it ends in an endless loop:

int index = hash(item,tableSize);

if(hashTable[index] == "--")
    hashTable[index] = item;
else {
    int newIndex = rehash(item,tableSize);
    while(hashTable[newIndex] != "--") {
        newIndex = rehash(item,tableSize);
    }
    hashTable[newIndex] = item;
}

You continuously recalculate the index, but do not change the input parameters, so the output stays the same, and therefore it is being recalculated again.

In the code above newIndex is calculated, based on the same inputs as index was calculated from using a different calculaton function though, so most likely it will have a different value than the first time, however the new index is also occupied. So we recalculate the newIndex again this time using the same function as before, with the exact same input, which gives the exact same output again. You look up the same index in the hash table, which is still the same value as the last time you did so, so you recalculate again, once again with the same input parameters, giving the same output, which you look up in the hashtable once again, etc.

The reason why you didn't see this with the first 3 lines, is that you did not have a collision (or at least only a single collisison, meaning the newIndex calculated from the rehash function was useful the first time).

The solution is not to increment the table size (since incrementing the table size, will at best lower the chance of collision which in it self can be good, but won't solve your problem entirely), but to either alter the inputs to your functions, so you get a different output, or change the hashtable structure.

I always found Sedgewick's book on algorithms in C++ useful, there is a chapter on hashing it.

Sadly I don't have my copy of Algorithms in C++ at hand, so I cannot tell you how Sedgewick solved it, but I would suggest for the simple educational purpose of solving your problem, starting by simply incrementing the index by 1 until you find a free slot in the hash table.

Having been a hobbyist programmer for 3 years (mainly Python and C) and never having written an application longer than 500 lines of code, I find myself faced with two choices :

(1) Learn the essentials of data structures and algorithm design so I can become a l33t computer scientist.

(2) Learn Qt, which would help me build projects I have been itching to build for a long time.

For learning (1), everyone seems to recommend reading CLRS. Unfortunately, reading CLRS would take me at least an year of study (or more, I'm not Peter Krumins). I also understand that to accomplish any moderately complex task using (2), I will need to understand at least the fundamentals of (1), which brings me to my question : assuming I use C++ as the programming language of choice, which parts of CLRS would give me sufficient knowledge of algorithms and data structures to work on large projects using (2)?

In other words, I need a list of theoretical CompSci topics absolutely essential for everyday application programming tasks. Also, I want to use CLRS as a handy reference, so I don't want to skip any material critical to understanding the later sections of the book.

Don't get me wrong here. Discrete math and the theoretical underpinnings of CompSci have been on my "TODO: URGENT" list for about 6 months now, but I just don't have enough time owing to college work. After a long time, I have 15 days off to do whatever the hell I like, and I want to spend these 15 days building applications I really want to build rather than sitting at my desk, pen and paper in hand, trying to write down the solution to a textbook problem.

(BTW, a less-math-more-code resource on algorithms will be highly appreciated. I'm just out of high school and my math is not at the level it should be.)

Thanks :)

I would say the practical aspects of coding are more important. In particular, source control is vital if you don't use that already. I like bzr as an easy to set up and use system, though GUI support isn't as mature as it could be.

I'd then move on to one or both of the classics about the craft of coding, namely

You could also check out the list of recommended books on Stack Overflow.

For a less-math, more code resource on algorithms than CLRS, check out Algorithms in a Nutshell. If you're going to be writing desktop applications, I don't consider CLRS to be required reading. If you're using C++ I think Sedgewick is a more appropriate choice.

What are Splay tree, Red-black tree, AVL tree, B-tree and T-tree?

I'm looking for good implementations.

Besides the online resources I would also recommend you to get a real book about algorithms. I would strongly recommend Sedgewick:

These are great books that will teach various algorithms (trees, search, graphs, etc.).

So I wonder what is the most efficient implementation of a merge sort in Java (In case its efficiency in terms of time can change depending on the language). This question may be trivial but my ultimate goal is to learn from more experienced programmers. Here are 2 examples I made:

//version I made.
public static double[] mergeSort(double[] arreglo) {
    if (arreglo.length > 1) {
        int d = (arreglo.length / 2);
        double[] arreglo1 = Arrays.copyOfRange(arreglo, 0, d),
                arreglo2 = Arrays.copyOfRange(arreglo, d, arreglo.length);
        arreglo1 = mergeSort(arreglo1);
        arreglo2 = mergeSort(arreglo2);
        return merge(arreglo1, arreglo2);
    } else {
        return arreglo;
    }
}

public static double[] merge(double[] arreglo1, double[] arreglo2) {
    double[] convi = new double[arreglo1.length + arreglo2.length];
    for (int i = 0, m1 = 0, m2 = 0; i < convi.length; i++) {
        if (arreglo1.length > m1 && arreglo2.length > m2) {
            if (arreglo1[m1] <= arreglo2[m2])
                convi[i] = arreglo1[m1++];
            else {
                convi[i] = arreglo2[m2++];
            }
        } else {
            convi[i] = (arreglo1.length == m1) ? arreglo2[m2++] : arreglo1[m1++];
        }
    }
    return convi;
}

//Taken out of Cormens book.
public static void mergeSort(int[] arreglo, int i, int f) {
    if (f > i) {
        int d = ((i + f) / 2);
        mergeSort(arreglo, i, d);
        mergeSort(arreglo, d + 1, f);
        merge(arreglo, i, d, f);
    }
}


public static void merge(int[] arreglo, int i, int m, int f) {
    int n1 = (m - i) + 1;
    int n2 = (f - m);
    int[] mitad1 = new int[n1 + 1];
    int[] mitad2 = new int[n2 + 1];
    for (int v = 0; v < n1; v++) {
        mitad1[v] = arreglo[i + v];
    }
    for (int p = 0; p < n2; p++) {
        mitad2[p] = arreglo[p + m + 1];
    }
    mitad1[n1] = Integer.MAX_VALUE;
    mitad2[n2] = Integer.MAX_VALUE;
    for (int r = i, m1 = 0, m2 = 0; r <= f; r++) {
        if (mitad1[m1] <= mitad2[m2]) {
            arreglo[r] = mitad1[m1];
            m1++;
        } else {
            arreglo[r] = mitad2[m2];
            m2++;
        }
    }
}

The following program is translated from C++ example given in Robert Sedgewick's Algorithms in C++, Parts 1-4

It introduces one type of improvement. It does a single copy of the whole sorting array into a an auxiliary array that is further dealt with. Next, the recursion splitting is done on the auxiliary array by alternating between the auxiliary array and original array so that the extra copying operation of the merged arrays doesn’t happen. Basically, the algorithm switches the role of the input and auxiliary array in each recursive call. For example, conceptually:

Regular Mergesort:

--merge

(((8) (5))((2) (3)))(((1) (7))((4) (6)))

(( 5 8 )( 2 3 ))(( 1 7 )( 4 6 ))

-- copy back and ignore previous (UNNECESSARY)

(( 5 8 )( 2 3 ))(( 1 7 )( 4 6 ))

– – – – – – – –

This program:

--merge

(((8) (5))((2) (3)))(((1) (7))((4) (6)))

(( 5 8 )( 2 3 ))(( 1 7 )( 4 6 ))

--merge backwards

( 2 3 5 8 )( 1 4 6 7 )

(( 5 8 )( 2 3 ))(( 1 7 )( 4 6 ))    

Also, after splitting the array into halves gives small enough arrays, the algorithm uses insertion sort since it performs better on small data sets than merge sort. The threshold for when exactly to use insertion sort can be determined with trial-and-error.

The code:

static int M = 10;

//insertion sort to be used once the mergesort partitions become small enough
static void insertionsort(int[] a, int l, int r) {
    int i, j, temp;
    for (i = 1; i < r + 1; i++) {
        temp = a[i];
        j = i;
        while ((j > 0) && a[j - 1] > temp) 
        {
            a[j] = a[j - 1];
            j = j - 1;
        }
        a[j] = temp;
    }
}

//standard merging two sorted half arrays into single sorted array
static void merge(int[] merged_a, int start_a, int[] half_a1, int start_a1, int size_a1, 
                         int[] half_a2, int start_a2, int size_a2) {
    int i, j, k;
    int total_s = size_a1 + size_a2;
    for (i = start_a1, j = start_a2, k = start_a; k < (total_s); k++) {
        // if reached end of first half array, run through the loop 
        // filling in only from the second half array
        if (i == size_a1) {
            merged_a[k] = half_a2[j++];
            continue;
        }
        // if reached end of second half array, run through the loop 
        // filling in only from the first half array
        if (j == size_a2) {
            merged_a[k] = half_a1[i++];
            continue;
        }
        // merged array is filled with the smaller element of the two 
        // arrays, in order
        merged_a[k] = half_a1[i] < half_a2[j] ?
                      half_a1[i++] : half_a2[j++];
    }
}

//merge sort data during merging without the additional copying back to array
//all data movement is done during the course of the merges
static void mergesortNoCopy(int[] a, int[] b, int l, int r) {
    if (r - l <= M) {
        insertionsort(a + l, l, r - l);
        return;
    }
    int m = (l + r) / 2;
    //switch arrays to msort b thus recursively writing results to b
    mergesortNoCopy(b, a, l, m); //merge sort left
    mergesortNoCopy(b, a, m + 1, r); //merge sort right
    //merge partitions of b into a
    merge(a, l, b, l, m - l + 1, b, m + 1, r - m); //merge
}

static void mergesort(int[] a) {
    int[] aux = Arrays.copyOf(a, a.length);
    mergesortNoCopy(a, aux, 0, a.length - 1);
}

Some other possible improvements:

Stop if already sorted.

Check if the largest item in first half ≤ smallest item in second half. Helps for partially-ordered arrays.

// after split, before merge
if (a[mid] <= a[mid + 1]) return;

EDIT: here is a good document I found on different versions of Mergesort and improvements thereof.