The art of computer programming

Donald Ervin Knuth

Mentioned 17

More on

Mentioned in questions and answers.

I am working to become a scientific programmer. I have enough background in Math and Stat but rather lacking on programming background. I found it very hard to learn how to use a language for scientific programming because most of the reference for SP are close to trivial.

My work involves statistical/financial modelling and none with physics model. Currently, I use Python extensively with numpy and scipy. Done R/Mathematica. I know enough C/C++ to read code. No experience in Fortran.

I dont know if this is a good list of language for a scientific programmer. If this is, what is a good reading list for learning the syntax and design pattern of these languages in scientific settings.

this might be useful: the nature of mathematical modeling

Writing Scientific Software: A Guide to Good Style is a good book with overall advice for modern scientific programming.

I'm a scientific programmer who just entered the field in the past 2 years. I'm into more biology and physics modeling, but I bet what you're looking for is pretty similar. While I was applying to jobs and internships there were two things that I didn't think would be that important to know, but caused me to end up missing out on opportunities. One was MATLAB, which has already been mentioned. The other was database design -- no matter what area of SP you're in, there's probably going to be a lot of data that has to be managed somehow.

The book Database Design for Mere Mortals by Michael Hernandez was recommended to me as being a good start and helped me out a lot in my preparation. I would also make sure you at least understand some basic SQL if you don't already.

One issue scientific programmers face is maintaining a repository of code (and data) that others can use to reproduce your experiments. In my experience this is a skill not required in commercial development.

Here are some readings on this:

These are in the context of computational biology but I assume it applies to most scientific programming.

Also, look at Python Scripting for Computational Science.

Recently I needed to do weighted random selection of elements from a list, both with and without replacement. While there are well known and good algorithms for unweighted selection, and some for weighted selection without replacement (such as modifications of the resevoir algorithm), I couldn't find any good algorithms for weighted selection with replacement. I also wanted to avoid the resevoir method, as I was selecting a significant fraction of the list, which is small enough to hold in memory.

Does anyone have any suggestions on the best approach in this situation? I have my own solutions, but I'm hoping to find something more efficient, simpler, or both.

I'd recommend you start by looking at section 3.4.2 of Donald Knuth's Seminumerical Algorithms.

If your arrays are large, there are more efficient algorithms in chapter 3 of Principles of Random Variate Generation by John Dagpunar. If your arrays are not terribly large or you're not concerned with squeezing out as much efficiency as possible, the simpler algorithms in Knuth are probably fine.

i want to generate a sequence of unique random numbers in the range of 00000001 to 99999999.

So the first one might be 00001010, the second 40002928 etc.

The easy way is to generate a random number and store it in the database, and every next time do it again and check in the database if the number already exists and if so, generate a new one, check it again, etc. But that doesn't look right, i could be regenerating a number maybe 100 times if the number of generated items gets large.

Is there a smarter way?

EDIT as allways i forgot to say WHY i wanted this, and it will probably make things clearer and maybe get an alternative, and it is: we want to generate an ordernumber for a booking, so we could just use 000001, 000002 etc. But we don't want to give the competitors a clue of how much orders are created (because it's not a high volume market, and we don't want them to know if we are on order 30 after 2 months or at order 100. So we want to have an order number which is random (yet unique)

In case you happen to have access to a library and you want to dig into and understand the issue well, take a look at

The Art of Computer Programming, Volume 2: Seminumerical Algorithms

by Donald E. Knuth. Chapter 3 is all about random numbers.

I have to write a program that can calculate the powers of 2 power 2010 and to find the sum of the digits. eg:

if `2 power 12 => gives 4096 . So 4+0+9+6 = 19 . 

Now i need to find the same for 2 power 2010.

Please help me to understand.

GMP is perhaps the best, fastest free multi-architecture library for this. It provides a solid foundation for such calculations, including not only addition, but parsing from strings, multiplication, division, scientific operations, etc.

For literature on the algorithms themselves, I highly recommend The Art of Computer Programming, Volume 2: Seminumerical Algorithms by Donald Knuth. This book is considered by many to be the best single reference for the topic. This book explains from the ground up how such arithmetic can take place on a machine that can only do 32-bit arithmetic.

If you want to implement this calculation from scratch without using any tools, the following code block requires requires only the following additional methods to be supplied:

unsigned int divModByTen(unsigned int *num, unsigned int length);
bool isZero(unsigned int *num, unsigned int length);

divModByTen should divide replace num in memory with the value of num / 10, and return the remainder. The implementation will take some effort, unless a library is used. isZero just checks if the number is all zero's in memory. Once we have these, we can use the following code sample:

unsigned int div10;
int decimalDigitSum;

unsigned int hugeNumber[64];
memset(twoPow2010, 0, sizeof(twoPow2010));
twoPow2010[63] = 0x4000000;
// at this point, twoPow2010 is 2^2010 encoded in binary stored in memory

decimalDigitSum = 0;
while (!izZero(hugeNumber, 64)) {
    mod10 = divModByTen(&hugeNumber[0], 64);
    decimalDigitSum += mod10;

printf("Digit Sum:%d", decimalDigitSum);

Given a series of randomly generated data how can I figure out how random it actually is? Is R-lang a good tool for this matlab? What other questions can can these tools answer about randomly generated data? Is there another tool better for this?

First you need to decide what kind of randomness you're testing for. Do you have in mind a uniform distribution inside some range? That's usually what people have in mind, though you may have some other flavor of randomness such as a normal distribution.

Once you have a candidate distribution, you can test the goodness of fit to that distribution. The Kolmogorov-Smirnov test is a good general-purpose test. I believe it's called ks.test in R. But I also believe it assumes distinct values, so that could be a problem if you're sampling from such a small range of values that the same value appears more than once.

S. Lott mentioned Knuth's Seminumerical Algorithms in the comments. That book has a good introduction to the chi-squared test and the Kolmogorov-Smirnov tests for goodness of fit.

If you do suspect you have uniform random values, the DIEHARD test that Dirk Eddelbuettel mentioned is a standard test.

I have recently read an article about fast sqrt calculation. Therefore, I have decided to ask SO community and its experts to help me find out, which STL algorithms or mathematical calculations can be implemented faster with programming hacks?

It would be great if you can give examples or links.

Thanks in advance.

This is where you really need to listen to project managers and MBAs. What you're suggesting is re-implementing parts of the STL and or standard C library. There is an associated cost in terms of time to implement and maintenance burden of doing so, so you shouldn't do it unless you really, genuinely need to, as John points out. The rule is simple: is this calculation you're doing slowing you down (a.k.a. you are bound by the CPU)? If not, don't create your own implementation just for the sake of it.

Now, if you're really interested in fast maths, there are a few places you can start. The gnu multi-precision library implements many algorithms from modern computer arithmetic and semi numerical algorithms that are all about doing maths on arbitrary precision integers and floats insanely fast. The guys who write it optimise in assembly per build platform - it is about as fast as you can get in single core mode. This is the most general case I can think of for optimised maths i.e. that isn't specific to a certain domain.

Bringing my first paragraph and second in with what thkala has said, consider that GMP/MPIR have optimised assembly versions per cpu architecture and OS they support. Really. It's a big job, but it is what makes those libraries so fast on a specific small subset of problems that are programming.

Sometimes domain specific enhancements can be made. This is about understanding the problem in question. For example, when doing finite field arithmetic under rijndael's finite field you can, based on the knowledge that the characteristic polynomial is 2 with 8 terms, assume that your integers are of size uint8_t and that addition/subtraction are equivalent to xor operations. How does this work? Well basically if you add or subtract two elements of the polynomial, they contain either zero or one. If they're both zero or both one, the result is always zero. If they are different, the result is one. Term by term, that is equivalent to xor across a 8-bit binary string, where each bit represents a term in the polynomial. Multiplication is also relatively efficient. You can bet that rijndael was designed to take advantage of this kind of result.

That's a very specific result. It depends entirely on what you're doing to make things efficient. I can't imagine many STL functions are purely optimised for cpu speed, because amongst other things STL provides: collections via templates, which are about memory, file access which is about storage, exception handling etc. In short, being really fast is a narrow subset of what STL does and what it aims to achieve. Also, you should note that optimisation has different views. For example, if your app is heavy on IO, you are IO bound. Having a massively efficient square root calculation isn't really helpful since "slowness" really means waiting on the disk/OS/your file parsing routine.

In short, you as a developer of an STL library are trying to build an "all round" library for many different use cases.

But, since these things are always interesting, you might well be interested in bit twiddling hacks. I can't remember where I saw that, but I've definitely stolen that link from somebody else on here.

I am thinking recently on how floating point math works on computers and is hard for me understand all the tecnicals details behind the formulas. I would need to understand the basics of addition, subtraction, multiplication, division and remainder. With these I will be able to make trig functions and formulas.

I can guess something about it, but its a bit unclear. I know that a fixed point can be made by separating a 4 byte integer by a signal flag, a radix and a mantissa. With this we have a 1 bit flag, a 5 bits radix and a 10 bit mantissa. A word of 32 bits is perfect for a floating point value :)

To make an addition between two floats, I can simply try to add the two mantissas and add the carry to the 5 bits radix? This is a way to do floating point math (or fixed point math, to be true) or I am completely wrong?

All the explanations I saw use formulas, multiplications, etc. and they look so complex for a thing I guess, would be a bit more simple. I would need an explanation more directed to beginning programmers and less to mathematicians.

Run, don't walk, to get Knuth's Seminumerical Algorithms which contains wonderful intuition and algorithms behind doing multiprecision and floating point arithmetic.

I have a double which is:

double mydouble = 10;

and I want 10^12, so 10 * 10 * 10 * 10 * 10 * 10 * 10 * 10 * 10 * 10 * 10 * 10. I tried

double newDouble = pow(10, 12);

and it returns me in NSLog: pow=-1.991886

makes not much sense... I think pow isn't my friend right?

Here's how to compute x^12 with the fewest number of multiplications.

y = x*x*x; y *= y; y *= y;

The method comes from Knuth's Seminumerical Algorithms, section 4.6.3.

I'm writing a performance-critical, number-crunching C++ project where 70% of the time is used by the 200 line core module.

I'd like to optimize the core using inline assembly, but I'm completely new to this. I do, however, know some x86 assembly languages including the one used by GCC and NASM.

All I know:

I have to put the assembler instructions in _asm{} where I want them to be.


  • I have no clue where to start. What is in which register at the moment my inline assembly comes into play?

I really like assembly, so I'm not going to be a nay-sayer here. It appears that you've profiled your code and found the 'hotspot', which is the correct way to start. I also assume that the 200 lines in question don't use a lot of high-level constructs like vector.

I do have to give one bit of warning: if the number-crunching involves floating-point math, you are in for a world of pain, specifically a whole set of specialized instructions, and a college term's worth of algorithmic study.

All that said: if I were you, I'd step through the code in question in the VS debugger, using the Disassembly view. If you feel comfortable reading the code as you go along, that's a good sign. After that, do a Release compile (Debug turns off optimization) and generate an ASM listing for that module. Then if you think you see room for have a place to start. Other people's answers have linked to the MSDN documentation, which is really pretty skimpy but still a reasonable start.

I want to calculate the slope of a line.

public sealed class Point
    public System.Numerics.BigInteger x = 0;
    public System.Numerics.BigInteger y = 0;

    public double CalculateSlope (Point point)
        return ((point.Y - this.Y) / (point.X - this.X));

I know that BigInteger has a DivRem function that returns the division result plus the remainder but am not sure how to apply it to get a double. The numbers I'm dealing with are far far beyond the range of Int64.MaxValue so the remainder itself could be out of range to calculate by conventional division.

EDIT: Not sure if it helps but I'm dealing with only positive integers (>=1).

IMPORTANT: I only need a few decimal points of precision (5 should be good enough for my purpose).

Get BigRational from Codeplex. Its part of Microsoft's Base Class Library, so it's a work-in-progress for .Net. Once you have that, then do something like:

System.Numerics.BigInteger x = GetDividend() ;
System.Numerics.BigInteger y = GetDivisor() ;

BigRational r     = new BigRational( x , y ) ;
double      value = (double) r ;

Dealing with the inevitable overflow/underflow/loss of precision is, of course, another problem.

Since you can't drop the BigRational library into your code, evidently, the other approach would be to get out the right algorithms book and roll your own...

The easy way, of course, of "rolling one's own" here, since a rational number is represented as the ratio (division) of two integers, is to grab the explicit conversion to double operator from the BigRational class and tweak it to suit. It took me about 15 minutes.

About the only significant modification I made is in how the sign of the result is set when the result is positive or negative zero/infinity. While I was at it, I converted it to a BigInteger extension method for you:

public static class BigIntExtensions

  public static double DivideAndReturnDouble( this BigInteger x , BigInteger y )
    // The Double value type represents a double-precision 64-bit number with
    // values ranging from -1.79769313486232e308 to +1.79769313486232e308
    // values that do not fit into this range are returned as +/-Infinity
    if (SafeCastToDouble(x) && SafeCastToDouble(y))
      return (Double) x / (Double)  y;

    // kick it old-school and figure out the sign of the result
    bool isNegativeResult = ( ( x.Sign < 0 && y.Sign > 0 ) || ( x.Sign > 0 && y.Sign < 0 ) ) ;

    // scale the numerator to preseve the fraction part through the integer division
    BigInteger denormalized = (x * s_bnDoublePrecision) / y ;
    if ( denormalized.IsZero )
      return isNegativeResult ? BitConverter.Int64BitsToDouble(unchecked((long)0x8000000000000000)) : 0d; // underflow to -+0

    Double result   = 0              ;
    bool   isDouble = false          ;
    int    scale    = DoubleMaxScale ;

    while ( scale > 0 )
      if (!isDouble)
        if ( SafeCastToDouble(denormalized) )
          result = (Double) denormalized;
          isDouble = true;
          denormalized = denormalized / 10 ;
      result = result / 10 ;
      scale-- ;

    if (!isDouble)
      return isNegativeResult ? Double.NegativeInfinity : Double.PositiveInfinity;
      return result;


  private const           int        DoubleMaxScale      = 308 ;
  private static readonly BigInteger s_bnDoublePrecision = BigInteger.Pow( 10 , DoubleMaxScale ) ;
  private static readonly BigInteger s_bnDoubleMaxValue  = (BigInteger) Double.MaxValue;
  private static readonly BigInteger s_bnDoubleMinValue  = (BigInteger) Double.MinValue;

  private static bool SafeCastToDouble(BigInteger value)
    return s_bnDoubleMinValue <= value && value <= s_bnDoubleMaxValue;


I have a data file with a large number of values (53,000,000+) and I would like to pull out a random subset of n of these values (say, 2,000,000). I implemented a Perl script that pulls the list into memory, uses the Fisher-Yates method to shuffle the array, and then prints out the first n values in the shuffled list. However, this shuffling process is taking a lot of time, even on much smaller test sets (50,000 values).

I'm looking for a more efficient, scalable way to identify a random subset of a huge set of values and print it out. Any suggestions?

Update: Based on the answers and some more searching, it looks like the correct terminology is "random sampling".

Don't shuffle, it's unnecessarily expensive.

There's a simple linear algorithm discussed in Jon Bentley's "Programming Pearls" (which Bentley says he learnt from Knuth's "Seminumerical Algorithms"). Use this method instead.

There are some Perl implementations about:

These two snippets implement Algortihm S(3.4.2) and Algortihm R(3.4.2) from Knuth's Art Of programming. The first randomly selects N items from an array of elements, and returns a reference to an array containing the elements. Note that it will not necessarily consider all of the elements in the list.

The second randomly selects N items from a file of indeterminate size and returns an array containing the selected elements. Records in the file are assumed to be per line, and the lines are chomped while reading. This requires only 1 pass through the list. A slight modification can be made to use the snippet in situations where N records would exceed memory limitations, however this requires slightly more than 1 pass (/msg if you need this explained)

I'm implementing my own BigNumber class in C# for educational purposes. For a start, I intend it to code the basic arithmetic, relational operators and certain math methods. The values will be stored in a byte array.

Could you guys give me some tips on how I would design such a class or rather the proper way of designing such a class ?


I'm not asking for help on how to implement the specific operators and methods. I'd like to know how the class should be structured internally.

I did that once in C++. I recommend that you read The Art of Computer Programming. Volume 2 has all the details of the algorithms for implementing big numbers. It's a great resource (for this and many other problems.)

The book should be available from most public libraries around you (or any university library).

BTW. No need to read the whole book, if you just want you can just use it as a reference for the algorithms that you need.

UPDATE: As for the API you should try to mimic the existing APIs for number in .NET. Something like Int32.

As for the internal class design, it should be pretty straightforward because there should be very few units interacting. You could abstract the "storage" (byte array) part away and iterate over the "digits" using standard iterators over some generic storage provider. This would allow you to change to use int arrays for example. If you do this then you can automatically change the base of your numbers and enable your implementation to store "more" per digit. This implies that the base of the operations won't be static but would be determined by the "digit" size.

I had fun implementing mine, it's a simple but nice project. In my case I didn't go fancy with the internal design. Good luck!

I'm trying to build a memory game and I want to ask how I can generate a randomic number with just one repetition. Like 1-1, 2-2, 3-3. I will paste here my function that I created and tell me if I have to create another function just to create a condition to create just a pair from numbers.

// function to fulfill the table
void preencher_mesa(int matriz[4][4], int dificuldade)
    int i, j;
    int lim_col, lim_linha; // limits of the matriz

    for(i=0; i<4; i++)
        for(j=0; j<4; j++)
            matriz[i][j] = 0;

    if(dificuldade == 1)
        lim_col = 3;
        lim_linha = 2;
    else if(dificuldade == 2)
        lim_col = 4;
        lim_linha = 2;
    else if(dificuldade == 3)
        lim_col = 4;
        lim_linha = 4;

        for(j=0; j<lim_col;j++)
            if(dificuldade == 1) // difficulty == 1
                matriz[i][j] = (rand()%3)+1;
            else if(dificuldade == 2) // difficulty == 2
                matriz[i][j] = (rand()%6)+1;
            else if (dificuldade == 3) // difficulty == 3
                matriz[i][j] = (rand()%8)+1;

    mostrar_mesa(matriz); //showtable

If you have a 3x2 matrix that should be filled with the digits/numbers 1, 1, 2, 2, 3, 3 in some random permutation, then you could do something like:

  1. Allocate an array (vector) of the right size — 6 for the current example.
  2. Populate the array with the correct values — 1, 1, 2, 2, 3, 3 for the current example.
  3. Use an appropriate technique to shuffle the array, and then copy the shuffled data into the target 2D array.
  4. Or select a digit at random from the initial 6 options, then (if necessary) move the last digit into the hole and select the next digit from the remaining 5 options, etc.

You can use the Fisher-Yates shuffling algorithm. You might check in your copy of Knuth The Art of Computer Programming, Volume 2: Seminumerical Algorithms. Or you can look for expositions on Stack Overflow (such as Algorithm to select a single random combination of values, chosen because one of my Google searches also came across it).

Judging from the comments, you want duplicates from your rand() surrogate, so this should work:

int duprand(void)
    static int mode = 0;
    static int value = 0;
    if (mode == 0)
        mode = 1;
        value = rand();
        mode = 0;
    return value;

Or, more succinctly:

int duprand(void)
    static int mode = 0;
    static int value = 0;
    if (mode == 0)
        value = rand();
    mode = !mode;
    return value;

Simply call duprand() each time you want a random number. You will get the same value twice in a row. This code doesn't provide a resynchronization method; if you want one, you can write one easily enough:

void sync_duprand(void)
    int i = duprand();
    int j = duprand();
    if (i != j)
       i = duprand();

What I really wanted was ...

#include <stdio.h>
#include <stdlib.h>

extern void shuffle(int *array, int n);
** rand_int() and shuffle() copied verbatim (but reformatted) from
** - an answer by Roland Illig
** (

static int rand_int(int n)
    int limit = RAND_MAX - RAND_MAX % n;
    int rnd;

        rnd = rand();
    } while (rnd >= limit);
    return rnd % n;

void shuffle(int *array, int n)
    int i, j, tmp;

    for (i = n - 1; i > 0; i--)
        j = rand_int(i + 1);
        tmp = array[j];
        array[j] = array[i];
        array[i] = tmp;

/* New code - but hardly novel code */
static void dump_matriz(int matriz[4][4])
    for (int i = 0; i < 4; i++)
        for (int j = 0; j < 4; j++)
            printf("  %d", matriz[i][j]);

int main(void)
    int matriz[4][4];

    int *base = &matriz[0][0];
    for (int i = 0; i < 8; i++)
        *base++ = i + 1;
        *base++ = i + 1;


    shuffle(&matriz[0][0], 16);


    return 0;

Sample output:

  1  1  2  2
  3  3  4  4
  5  5  6  6
  7  7  8  8
  1  7  8  6
  6  2  5  8
  2  4  7  3
  3  5  1  4

Note that because there's no call to srand(), the permutation is fixed. (You might get a different result from what I show, but running this test multiple times will produce the same result each time on your machine.) Add a call to srand() with an appropriate initialization, and you get different sequences. Hack and chop to suit your requirements for smaller matrices.

Can anyone help me point out known shuffle techniques that are considered secure?

Any paper/technique name reference would help ( I tried to search it up with not decisive results showing)

Appreciate any kind of help

In theory, a perefectly-random implementation of something like the Fisher-Yates algorithm would yield a completely random shuffle. In practice, howerver, Fisher-Yates is susceptible to things like modulo bias. See some of the pitfalls in relevant section in the Wikipedia entry and How Not To Shuffle The Knuth-Fisher-Yates Algorithm.

Knuth's classic The Art Of Computer Programming (Volume 2) - discusses a possibly suitable algorithm by MacLaren and Marsaglia.

Finally, see also Cryptographic Shuffling of Random and Pseudorandom Sequences.

Could someone tell me if c++ and matlab use the same floating point computation implementations? Will I get the same values in C++ as I would in Matlab?

Currently I have these discrepancies from translating my Matlab code into C++:

Matlab: R = 1.0000000001623, I = -3.07178893432791e-010, C = -3.79693498864242e-011

C++:    R = 1.00000000340128 I = -3.96890964537988e-009  Z = 2.66864907949582e-009

If not what is the difference and where can I find more about floating point computation implementations?


Although it's not clear what your numbers actually are, the relative difference of the first (and largest) numbers is about 1e-8, which is the relative tolerance of many double precision algorithms.

Floating point numbers are only an approximation of the real number system, and their finite size (64 bits for double precision) limits their precision. Because of this finite precision, operations that involve floating point numbers can incur round-off error, and are thus not strictly associative. What this means is that A+(B+C) != (A+B)+C. The difference between the two is usually small, depending on their relative sizes, but it's not always zero.

What this means is that you should expect small differences in the relative and absolute values when you compare an algorithm coded in Matlab to one in C++. The difference may be in the libraries (i.e., there's no guarantee that Matlab uses the system math library for routines like sqrt), or it may just be that your C++ and Matlab implementations order their operations differently.

The section on floating point comparison tests in Boost::Test discusses this a bit, and has some good references. In particular, you should probably read What Every Computer Scientist Should Know About Floating-Point Arithmetic and consider picking up a copy of Knuth's TAOCP Vol. II.

I am trying to implement floating point operations in a microcontroller and so far I have had ample success.

The problem lies in the way I do multiplication in my computer and it works fine:

unsigned long long gig,mm1,mm2;
unsigned long m,m1,m2;
mm1 = f1.float_parts.mantissa;
mm2 = f2.float_parts.mantissa;

m1 = f1.float_parts.mantissa;
m2 = f2.float_parts.mantissa;

gig = mm1*mm2; //this works fine I get all the bits I need since they are all long long, but won't work in the mcu

gig = m1*m2//this does not work, to be precise it gives only the 32 least significant bits , but works on the mcu

So you can see that my problem is that the microcontroller will throw an undefined refence to __muldi3 if I try the gig = mm1*mm2 there.

And If I try with the smaller data types, it only keeps the least significant bits, which I don't want it to. I need the 23 msb bits of the product.

Does anyone have any ideas as to how I can do this?

The comments in the FreeBSD implementation of __muldi3() have a good explanation of the required procedure, see muldi3.c. If you want to go straight to the source (always a good idea!), according to the comments this code was based on an algorithm described in Knuth's The Art of Computer Programming vol. 2 (2nd ed), section 4.3.3, p. 278. (N.B. the link is for the 3rd edition.)