Computer Organization and Design

John L. Hennessy, David A. Patterson

Mentioned 7

This book trains the student with the concepts needed to lay a solid foundation for joining this exciting field. More importantly, this book provides a framework for thinking about computer organization and design that will enable the reader to continue the lifetime of learning necessary for staying at the forefront of this competitive discipline. --John Crawford Intel Fellow Director of Microprocessor Architecture, Intel The performance of software systems is dramatically affected by how well software designers understand the basic hardware technologies at work in a system. Similarly, hardware designers must understand the far reaching effects their design decisions have on software applications. For readers in either category, this classic introduction to the field provides a deep look into the computer. It demonstrates the relationship between the software and hardware and focuses on the foundational concepts that are the basis for current computer design. Using a distinctive learning by evolution approach the authors present each idea from its first principles, guiding readers through a series of worked examples that incrementally add more complex instructions until they ha

More on

Mentioned in questions and answers.

I've been working in C and CPython for the past 3 - 5 years. Consider that my base of knowledge here.

If I were to use an assembly instruction such as MOV AL, 61h to a processor that supported it, what exactly is inside the processor that interprets this code and dispatches it as voltage signals? How would such a simple instruction likely be carried out?

Assembly even feels like a high level language when I try to think of the multitude of steps contained in MOV AL, 61h or even XOR EAX, EBX.

EDIT: I read a few comments asking why I put this as embedded when the x86-family is not common in embedded systems. Welcome to my own ignorance. Now I figure that if I'm ignorant about this, there are likely others ignorant of it as well.

It was difficult for me to pick a favorite answer considering the effort you all put into your answers, but I felt compelled to make a decision. No hurt feelings, fellas.

I often find that the more I learn about computers the less I realize I actually know. Thank you for opening my mind to microcode and transistor logic!

EDIT #2: Thanks to this thread, I have just comprehended why XOR EAX, EAX is faster than MOV EAX, 0h. :)

This is a big question, and at most universities there's an entire semester-long class to answer it. So, rather than give you some terribly butchered summary in this little box, instead I'll direct you to the textbook that has the whole truth: Computer Organization and Design: The Hardware/Software Interface by Patterson and Hennessey.

This is a question that requires more than an answer on StackOverflow to explain.

To learn about this all the way from the most basic electronic components up to basic machine code, read The Art of Electronics, by Horowitz and Hill. To learn more about computer architecture, read Computer Organization and Design by Patterson and Hennessey. If you want to get into more advanced topics, read Computer Architecture: A Quantitative Approach, by Hennessey and Patterson.

By the way, The Art of Electronics also has a companion lab manual. If you have the time and resources available, I would highly recommend doing the labs; I actually took the classes taught by Tom Hayes, in which we built a variety of analog and digital circuits, culminating in building a computer from a 68k chip, some RAM, some PLDs, and some discrete components. You would enter machine code directly into RAM using a hexadecimal keypad; it was a blast, and a great way to get hands on experience at the very lowest levels of a computer.

Why is the size of L1 cache smaller than that of the L2 cache in most of the processors ?

For those interested in this type of questions, my university recommends Computer Architecture: A Quantitative Approach and Computer Organization and Design: The Hardware/Software Interface. Of course, if you don't have time for this, a quick overview is available on Wikipedia.

I understand how code is compiled to assembly, and that assembly is a 1:1 replacement with binary codes. Can somebody help me understand how binary is connected to the hardware? How is the binary physically read and run? How does an if statement work in the hardware?

From google searches I'm thinking that maybe my question title should be "how is binary data put on a line of a bus" but I wasn't sure.


(Vastly simplified)

The binary (say a string of binary from a line of machine code/asm) is loaded into memory from say disk. Then an instruction is sent by the processor logic to memory controller to load the contents of the memory into a processor local resister. It then gets interpreted as an instruction to do by the processor.

I learned this level of stuff by doing microcoding at college.

In reality there are many more steps that could occur, depending on the processor complexity and power. The processor is made up of various parts (ALU, registers etc) and they cooperate in getting instructions, data and processing. If you are interested in this level of understand and I commend you for asking the question, Id say get a book on computer architecture. I used Structure Computer Organisation by Tanenbaum at college.

This is a huge, very complicated topic. The best textbook I've seen on the subject is Patterson/Hennesy's "Computer Organization and Design", which has many editions.

Other than suggesting you read it, I wouldn't dare try to cram a semester-long class into a 500-character answer box.

I am reading the book 'Computer Organization and Design' by Patterson and Hennessy and got interested in MIPS.

I have doubts in finding the range of a jump/branch instruction. Also in determining the number of branch/jump instructions required to get to a specific address.

Can someone provide an explanation of how this has to be calculated i.e. Considering PC at a specific address and finding the number of branch/jump instructions needed to go to a different address? For example, what if PC is at 0x10001010, what is the range of addresses of branch and jump instructions?

Or can you direct me to some online resource or book which would help me in getting a better understanding of these?

The following is all for MIPS-32.

Branch B, BEQ, BNE, etc. instructions have a 16 bit signed word offset field, allowing a branch to an address +/- 128kBytes from the current location. A jump J instruction specifies an address within the current 256MByte region specified by PC's most significant 4 bits : 26<<2 bits(this is not a relative address). To branch to an arbitrary address anywhere in the 4GB address space, use JR (jump register) which jumps to an address contained in a general purpose register.

It takes either a single branch or jump instruction, or a register load followed by a JR to jump to an arbitrary address, depending how far away the address is.

The best book for MIPS programming is still See MIPS Run. You can also find MIPS architecture reference manuals at (registration required). The most relevant document is MIPS32® Architecture for Programmers Volume II: The MIPS32® Instruction Set.

I'm looking for a laymen's introduction to computer hardware and organization. Here are some of the topics I would like to cover.

  1. Brief intro to electronics.

  2. Gates and state machines, intro to register transfer and timing.

  3. Basic CPU design. Control.

  4. Microprogrammed CPU design.

  5. Cache systems.

  6. Memory hierarchy:registers, cache, RAM

  7. Virtual memory organization.

  8. Disk storage systems.

  9. Internal busses-front side, memory, PCI

  10. Internal busses for storage-IDE, SATA, SCSI

  11. External busses-USB and firewire

  12. Display systems and GPUs

I would prefer free resources online, but if nothing is available a book is fine as well. I have no background with hardware so an introductory text would be wonderful. Also I'm sorry if this isn't directly programming but I don't know where else to ask.

The Art of Electronics by Horowitz and Hill is a great one for hobbyist on electronics.

For computer architecture Computer Organization and Design: The Hardware/Software Interface

For RTL design VHDL for Programmable Logic

I would recommend the book "Code" by Charles Petzold. It covers a lot of how the low level of a computer works from a layman's perspective. Not everything on your list is included, but it will give you a good start.

Tanenbaum's Structured Computer Organization was my intro into the 'levels' of computers. It's quite logical, approaching each level built on the previous.

I've often thought of doing a similar one, stretching from quantum physics through classical physics, electronics, integrated circuits, microcode, machine code, compilers, interpreters, VMs and so on, but I fear that would be about as possible as Knuth's 12-volume series. I hope he has a child to carry on the work :-).

As mentioned already Code: The Hidden Language of Computer Hardware and Software is a great book that covers the fundamentals.

Here are a couple of other books:

Computer Architecture: A Quantitative Approach

The Essentials of Computer Organization and Architecture

Upgrading and Repairing PCs

Here's a good site:

PC Architecture

I have written a project, which uses some basic functions in openssl such as RAND_bytes and des_ecb_encrypt.

My computer has i7-2600(4 cores and 8 logic CPU). When I run my project with 4 threads, it will costs 10 seconds. When I run it with 8 threads, it also costs 10 seconds.

What I mean is that hyper-threading doesn't give me any performance improvement. In Linux, the experiment result is same.

I found here tells me that hyper-threading doesn't give me some improvement in some situations. Also, I found here give me some intuitive results.

However, I have tried to write some simple tests and found some simple examples which will show hyper-threading won't give me apparent improvement. Sadly, I don't find it.

So, my questions is that whether there are some simple tests shows the hyper-threading won't give me any performance improvement.

I have written a project, which use some basic functions in openssl such as RAND_bytes and des_ecb_encrypt... My computer has i7-2600(4 cores and 8 logic CPU). When I run my project with 4 threads, it will costs 10 seconds. When I run it with 8 threads, it also costs 10 seconds.

When using RDRAND (which RAND_bytes will do in this case), the bus us the limiting factor. You should peak at around 800MB/sec. It does not matter how many threads you have - the bus cannot transfer data fast enough. See Intel rdrand instruction revisited.

If you used AES, then you might see a better speedup over the DES/3DES observations. Your Ivy Bridge has AES-NI and it can achieve almost 1.3 cycle/byte, and that should be about double or triple AES is software. To ensure you are using the AES-NI instructions, you have to use the EVP_* interfaces.

I found here tells me that hyper-threading doesn't give me some improvement in some situations. Also, I found here give me some intuitive results.

I think @selalerer and @Mats Petersson answered your question. The problem does not scale linearly and there's a maximum speedup you will encounter. Intel states its about 30%.

Intel's newest architecture favors of Out-Of-Order execution over Hyper-threading execution because its supposed to be more efficient. Read about the Silvermont processor cores.

But if you want a formal deep dive, then see a book on computer engineering. Here's the book we used when I studied it in college: Computer Organization and Design (its probably a bit dated now).

However, I have tried to write some simple tests and found some simple examples which will show hyper-threading won't give me apparent improvement.

OpenSSL also has a benchmarking app. See the source code in <openssl source>/apps/speed.c.

Also, benchmarking apps have their own personalities. An encryption stress test may not reveal the differences as predominantly as you hope to see them. See, for example, Benchmarking Tools.

I am trying to convert a IEEE single precision binary format number to an integer. I am using the following loop:


I need to do this 7 more times in my program. My question is if there is a library to do this, that takes a 32 bit input and gives an integer output? If yes how do I include that function in my program? Thank you.

update: will the snippet above work correctly?

For behavioral code use either $rtoi() or $realtobits()

real in1;
integer     a1;
wire [63:0] b1;
a1 = $roti(in1); //Truncates fractional part
b1 = $realtobits(in1); //Probably not what you want

You can use $bitstoreal() if you need to cast a bit vector to a real type.

EDIT: So if I follow your comments correctly, you're building a model of a floating-point ALU that works on 32-bit data values. In this case you could use real data types since Verilog can handle this format natively. Of course, you won't be able to detect certain situations

task [31:0] realAdd(input [31:0] in1, input [31:0] in2, output [31:0] out);

real rIn1,rIn2,rOut;
rIn1 = $bitstoreal(in1);
rIn2 = $bitstoreal(in2);
rOut = rIn1 + rIn2;

out = $realtobits(rOut);


These functions all use double precision so you'll need to do some trivial bit extensions to handle single precision inputs, and some non-trivial bounds checking/truncation on the output. You can avoid this by using SystemVerilog, which has the $bitstoshortreal()/$shortrealtobits() functions that work on single precision values.

If you want hardware for this, Computer Organization & Design has a description of a multi cycle implementation. As Andy posted, there may be better resources out there for your case. These are not simple to design.