Computer Architecture

John L. Hennessy, David A. Patterson

Mentioned 8

This best-selling title, considered for over a decade to be essential reading for every serious student and practitioner of computer design, has been updated throughout to address the most important trends facing computer designers today. In this edition, the authors bring their trademark method of quantitative analysis not only to high performance desktop machine design, but also to the design of embedded and server systems. They have illustrated their principles with designs from all three of these domains, including examples from consumer electronics, multimedia and web technologies, and high performance computing. The book retains its highly rated features: Fallacies and Pitfalls, which share the hard-won lessons of real designers; Historical Perspectives, which provide a deeper look at computer design history; Putting it all Together, which present a design example that illustrates the principles of the chapter; Worked Examples, which challenge the reader to apply the concepts, theories and methods in smaller scale problems; and Cross-Cutting Issues, which show how the ideas covered in one chapter interact with those presented in others. In addition, a new feature, Another View, presents brief design examples in one of the three domains other than the one chosen for Putting It All Together. The authors present a new organization of the material as well, reducing the overlap with their other text, Computer Organization and Design: A Hardware/Software Approach 2/e, and offering more in-depth treatment of advanced topics in multithreading, instruction level parallelism, VLIW architectures, memory hierarchies, storage devices and network technologies. Also new to this edition, is the adoption of the MIPS 64 as the instruction set architecture. In addition to several online appendixes, two new appendixes will be printed in the book: one contains a complete review of the basic concepts of pipelining, the other provides solutions a selection of the exercises. Both will be invaluable to the student or professional learning on her own or in the classroom. Hennessy and Patterson continue to focus on fundamental techniques for designing real machines and for maximizing their cost/performance. * Presents state-of-the-art design examples including: * IA-64 architecture and its first implementation, the Itanium * Pipeline designs for Pentium III and Pentium IV * The cluster that runs the Google search engine * EMC storage systems and their performance * Sony Playstation 2 * Infiniband, a new storage area and system area network * SunFire 6800 multiprocessor server and its processor the UltraSPARC III * Trimedia TM32 media processor and the Transmeta Crusoe processor * Examines quantitative performance analysis in the commercial server market and the embedded market, as well as the traditional desktop market. Updates all the examples and figures with the most recent benchmarks, such as SPEC 2000. * Expands coverage of instruction sets to include descriptions of digital signal processors, media processors, and multimedia extensions to desktop processors. * Analyzes capacity, cost, and performance of disks over two decades. Surveys the role of clusters in scientific computing and commercial computing. * Presents a survey, taxonomy, and the benchmarks of errors and failures in computer systems. * Presents detailed descriptions of the design of storage systems and of clusters. * Surveys memory hierarchies in modern microprocessors and the key parameters of modern disks. * Presents a glossary of networking terms.

More on

Mentioned in questions and answers.

In "C# 4 in a Nutshell", the author shows that this class can write 0 sometimes without MemoryBarrier, though I can't reproduce in my Core2Duo:

public class Foo
    int _answer;
    bool _complete;
    public void A()
        _answer = 123;
        //Thread.MemoryBarrier();    // Barrier 1
        _complete = true;
        //Thread.MemoryBarrier();    // Barrier 2
    public void B()
        //Thread.MemoryBarrier();    // Barrier 3
        if (_complete)
            //Thread.MemoryBarrier();       // Barrier 4

private static void ThreadInverteOrdemComandos()
    Foo obj = new Foo();



This need seems crazy to me. How can I recognize all possible cases that this can occur? I think that if processor changes order of operations, it needs to guarantee that the behavior doesn't change.

Do you bother to use Barriers?

If you are ever touching data from two different threads, this can occur. This is one of the tricks that processors use to increase speed - you could build processors that didn't do this, but they would be much slower, so no one does that anymore. You should probably read something like Hennessey and Patterson to recognize all of the various types of race conditions.

I always use some sort of higher level tool like a monitor or a lock, but internally they are doing something similar or are implemented with barriers.

I've been working in C and CPython for the past 3 - 5 years. Consider that my base of knowledge here.

If I were to use an assembly instruction such as MOV AL, 61h to a processor that supported it, what exactly is inside the processor that interprets this code and dispatches it as voltage signals? How would such a simple instruction likely be carried out?

Assembly even feels like a high level language when I try to think of the multitude of steps contained in MOV AL, 61h or even XOR EAX, EBX.

EDIT: I read a few comments asking why I put this as embedded when the x86-family is not common in embedded systems. Welcome to my own ignorance. Now I figure that if I'm ignorant about this, there are likely others ignorant of it as well.

It was difficult for me to pick a favorite answer considering the effort you all put into your answers, but I felt compelled to make a decision. No hurt feelings, fellas.

I often find that the more I learn about computers the less I realize I actually know. Thank you for opening my mind to microcode and transistor logic!

EDIT #2: Thanks to this thread, I have just comprehended why XOR EAX, EAX is faster than MOV EAX, 0h. :)

This is a big question, and at most universities there's an entire semester-long class to answer it. So, rather than give you some terribly butchered summary in this little box, instead I'll direct you to the textbook that has the whole truth: Computer Organization and Design: The Hardware/Software Interface by Patterson and Hennessey.

This is a question that requires more than an answer on StackOverflow to explain.

To learn about this all the way from the most basic electronic components up to basic machine code, read The Art of Electronics, by Horowitz and Hill. To learn more about computer architecture, read Computer Organization and Design by Patterson and Hennessey. If you want to get into more advanced topics, read Computer Architecture: A Quantitative Approach, by Hennessey and Patterson.

By the way, The Art of Electronics also has a companion lab manual. If you have the time and resources available, I would highly recommend doing the labs; I actually took the classes taught by Tom Hayes, in which we built a variety of analog and digital circuits, culminating in building a computer from a 68k chip, some RAM, some PLDs, and some discrete components. You would enter machine code directly into RAM using a hexadecimal keypad; it was a blast, and a great way to get hands on experience at the very lowest levels of a computer.

I am preparing for a microprocessor exam. If the use of a program counter is to hold the address of the next instruction, what is use of stack pointer?

Should you ever crave deeper understanding, I heartily recommend Patterson and Hennessy as an intro and Hennessy and Patterson as an intermediate to advanced text. They're pricey, but truly non-pareil; I just wish either or both were available when I got my Masters' degree and entered the workforce designing chips, systems, and parts of system software for them (but, alas!, that was WAY too long ago;-). Stack pointers are so crucial (and the distinction between a microprocessor and any other kind of CPU so utterly meaningful in this context... or, for that matter, in ANY other context, in the last few decades...!-) that I doubt anything but a couple of thorough from-the-ground-up refreshers can help!-)

Could anyone give me some pointers as to the best way in which to learn how to do very low latency programming? I have many programming books but I've never seen one which focused (or helped) on writing extremely fast code. Or are books not the best way forward?

Some advice from an expert would be really appreciated!

EDIT: I think I'm referring more to CPU/Memory bound.

[C++ programmer]:

Ultra-low-latency programming is hard. Much harder than people suspect when they first start down the path. There are some techniques and "tricks" you can employ. Like IO Completion ports, multi core utilization, highly optimized synchronization techniques, shared memory. The list goes on forever. (edit) It's not as simple as "code-profile-refactor-repeat" because you can write excellent code that is robust and fast, but will never be truly ultra-low latency code.

Unfortunately there is no one single resource I know of that will show you how it's done. Programmers specializing in (and good at) ultra low-latency code are among the best in the business and the most experienced. And with good reason. Because if there is a silver bullet solution to becoming a good low-latency programmer, it is simply this: you have to know a lot about everything. And that knowledge is not easy to come by. It takes years (decades?) of experience and constant study.

As far as the study itself is concerned, here's a few books I found useful or especially insightful for one reason or another:

I'm looking for a laymen's introduction to computer hardware and organization. Here are some of the topics I would like to cover.

  1. Brief intro to electronics.

  2. Gates and state machines, intro to register transfer and timing.

  3. Basic CPU design. Control.

  4. Microprogrammed CPU design.

  5. Cache systems.

  6. Memory hierarchy:registers, cache, RAM

  7. Virtual memory organization.

  8. Disk storage systems.

  9. Internal busses-front side, memory, PCI

  10. Internal busses for storage-IDE, SATA, SCSI

  11. External busses-USB and firewire

  12. Display systems and GPUs

I would prefer free resources online, but if nothing is available a book is fine as well. I have no background with hardware so an introductory text would be wonderful. Also I'm sorry if this isn't directly programming but I don't know where else to ask.

The Art of Electronics by Horowitz and Hill is a great one for hobbyist on electronics.

For computer architecture Computer Organization and Design: The Hardware/Software Interface

For RTL design VHDL for Programmable Logic

I would recommend the book "Code" by Charles Petzold. It covers a lot of how the low level of a computer works from a layman's perspective. Not everything on your list is included, but it will give you a good start.

Tanenbaum's Structured Computer Organization was my intro into the 'levels' of computers. It's quite logical, approaching each level built on the previous.

I've often thought of doing a similar one, stretching from quantum physics through classical physics, electronics, integrated circuits, microcode, machine code, compilers, interpreters, VMs and so on, but I fear that would be about as possible as Knuth's 12-volume series. I hope he has a child to carry on the work :-).

As mentioned already Code: The Hidden Language of Computer Hardware and Software is a great book that covers the fundamentals.

Here are a couple of other books:

Computer Architecture: A Quantitative Approach

The Essentials of Computer Organization and Architecture

Upgrading and Repairing PCs

Here's a good site:

PC Architecture

Its easy to find and understand the function definition for Amadahl's Law, but all of the working examples I was able to find were either too vague or too academic/cerebreal for my tiny pea brain to understand.

Amadahl's Law takes to parameters: F, the % of a task that cannot be improved via multi-threading, and N, the number of threads to use.

How does one calculate F with any degree of accuracy?

How do you look at a piece of code and determine whether that will be improved by multi-threading?

It's relatively easy to say which parts of your code certainly won't benefit from multi-threading: sequential parts. If you have to carry out a series of small steps in order, muli-threading won't help because you always need to wait for one step to be done before starting the next. Many common tasks aren't (necessarily) sequential in this sense: for example, searching a list for a number of items. If you want to extract every red item from a list, you can share parts of the list among several threads and collect all the red items from each part into a final result list. The difficulty in concurrent programming lies in finding efficient ways of doing this for real problems.

At a lower level you can talk about data dependency: a particular instruction or block depends on a previous block if it uses the results of that block's calculations in its own. So (pseudocode):

Block one:
load r1 into r2
add r1 to r3 into r4

Block two:
load r4 into r1
add 3 to r4 into r4

block two depends on block one: they must be executed in order. Here:

Block one:
load r1 into r2
add r1 to r3 into r4

Block two:
load r1 into r3
add 3 to r1 into r1

that isn't the case. This isn't directly useful for concurrency, but hopefully it illustrates the point more concretely. It also illustrates another problem in handling concurrency: as abstract blocks functionality these two can be run in parallel, but in the concrete example given here they're reading/writing some of the same registers, so a compiler/pipeliner/whatever would have to do more work to make them run together. This is all very complex, but is covered beautifully in

Which other parts don't benefit from multi-threading depends on your programming environment and machine architecture.

As for how to get a percentage, there's probably some hand-waving involved in a practical case - I doubt you'll ever get a precise number. If you divide your code up into functional units and profile the execution time in each, that would give you a roughly appropriate weighting. Then if one part that takes up 90% of the execution time can be improved with multi-threading, you say that 90% of your 'task' can be so improved.

I need to learn the basic knowledge of OS, kernel and CPU architectures since some jobs do require those background.

Is there a good book or online resource that I can refer to.

I don't know if you had a specific OS in mind, but one of the best books on how the Windows operating system works "under the hood" is called Windows Internals. It describes in detail how everything from the kernel, to device drivers, and the file system all work.

If your looking for a good book on how CPUs and processors work, in general, I recommend Computer Architecture: A Quantitative approach. Very good info there!

Also, some good resources on how CPUs work, with perspective to programmers, can be found from the Intel technical library. Everything is free to download there and it makes for some good reading!

I want to go backwards and learn more about how compilers, processors and memory operate on my programs. I am also interested in the physics on which all of this depends. Any good references or books would be appreciated...

Pick up a book on "Computer Organization" or "Computer Architecture" on Amazon. This is what we used when I was in college. It's not too thick, and will give you the basics, from the gate level all of the way up to how memory is organized and programs are written. If, after this, you want to look deeper into the physics, then you'll want to pick up a book on semiconductor physics. (But if I were you I'd just start by looking up "logic gate", "diode", and "transistor" on wikipedia!)

Feynman has a nice bit on the Physics of Computation:

which addresses the second part of your question.

My first suggestion was going to be Code which has been suggested already. A better, but harder, book on the subject of processors is Computer Organization & Design by Hennessey & Patterson. You might look for an older edition on Amazon or They'll be a lot cheaper and have basically the same information.

These will both teach you the basics of how a processor works, assembly language, etc. This will help you understand how your program will be interpreted and thus, what sort of performance bottlenecks might exist based on your design.