The Design of the UNIX Operating System

Maurice J. Bach

Mentioned 16

This is the first, and still, the most comprehensive book to describe the sophisticated workings of the UNIX System V kernel--the internal algorithms, the structures that form the basis of the UNIX operating system, and their relationship to the programming interface. System programmers will gain a better understanding of how the kernel works and will be able to compare algorithms used in the UNIX system to algorithms used in other operating systems. Programmers on UNIX systems will gain a deeper understanding of how their programs interact with the system and can thereby code more efficient programs.

More on Amazon.com

Mentioned in questions and answers.

Programming isn't my main job, though I enjoy it and sometimes get paid for it. For many years now I've been hearing about Linux and my friends have shown to me many *nixes (or *nici?), though I stick with Mac OS.

Do you think there are any parts of the Linux kernel that I could enjoy looking at, that would help me understand what's the whole stuff about? For example, how Linux is different from Darwin?

I've grown up with assembler and DOS, so things like interrupts or low-level C shouldn't be barriers to understanding. But in the end I'm more interested in high-level concepts, like threading or networking stack - I know different operating systems do them differently. And I'm looking for something fun, easy and enjoyable, like late-night reading.

(Note: made a CW, just in case)

Update: I looked for some docs and started reading:

You might want to read or skim a book that describes the Linux Kernel before looking deep into the Linux kernel.

The books that come to mind are:

am very much interested in unix. Want to learn in and out. Can you guys help me by listing some books which can make me a wizard? Ultimately I want to become a unix programmer.

I am not a novice user in Unix.

You want system administration knowledge, or programming knowledge?

For programming:

For system administration:

As other responders have noted, Advanced Programming in the Unix Environment (APUE) is indispensable.

Other books that you might want to consider (these have more of a Linux focus, but are a good way to become familiar with Unix internals):

I have been studying signals in Linux. And I've done a test program to capture SIGINT.

#include <unistd.h>
#include <signal.h>
#include <iostream>
void signal_handler(int signal_no);
int main() {
  signal(SIGINT, signal_handler);
  for (int i = 0; i < 10; ++i) {
  std::cout << "I'm sleeping..." << std::endl;
  unsigned int one_ms = 1000;
  usleep(200* one_ms);
  }
  return 0;
}
void signal_handler(int signal_no) {
  if (signal_no == SIGINT)
    std::cout << "Oops, you pressed Ctrl+C!\n";
  return;
}

While the output looks like this:

I'm sleeping...
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
I'm sleeping...
I'm sleeping...

I understand that when pressing Ctrl+C, processes in foreground process group all receives a SIGINT(if no process chooses to ignore it).

So is it that the shell(bash) AND the the instance of the above program both received the signal? Where does the "^C" before each "Oops" come from?

The OS is CentOS, and the shell is bash.

The shell echoes everything you type, so when you type ^C, that too gets echoed (and in your case intercepted by your signal handler). The command stty -echo may or may not be useful to you depending on your needs/constraints, see the man page for stty for more information.

Of course much more goes on at a lower level, anytime you communicate with a system via peripherals device drivers (such as the keyboard driver that you use to generate the ^C signal, and the terminal driver that displays everything) are involved. You can dig even deeper at the level of assembly/machine language, registers, lookup tables etc. If you want a more detailed, in-depth level of understanding the books below are a good place to start:

The Design of the Unix OS is a good reference for these sort of things. Two more classic references: Unix Programming Environment and Advanced Programming in the UNIX Environment

Nice summary here in this SO question How does Ctrl-C terminate a child process?

"when youre run a program, for example find, the shell:

  • the shell fork itself
  • and for the child set the default signal handling
  • replace the child with the given command (e.g. with find)
  • when you press CTRL-C, parent shell handle this signal but the child will receive it - with the default action - terminate. (the child can implement signal handling too)"

I've been interested in programming an operating system for some time. Delving through a few different sites, I've come across an interesting concept (to paraphrase): if you start writing your bootloader with #include, you've already made a fatal mistake.

I've gone through K&R, and the entire book includes it throughout each lesson. Having used it throughout learning C, I don't know what I've learned that uses stdio and what doesn't. What can you do in C without stdio?

stdio stands for Standard Input/Output. As the name said, it contains things related to standard IO. The Wikipedia article of stdio.h lists the contents of stdio.h. You will need stdio.h if you use them. And you can check the man page of stdio.h too for further details.

And for the OS part, writing an OS is much more than simple programming, even if it is an academic one. Before jumping to that you should learn data structures, algorithms, OS theories and much more. The Design of the UNIX Operating System is a very good book to learn OS. Nachos is an academic OS simulation program. You can check that too. And if you are fanatic about OS, then you should read the autobiography of Linus Torvalds Just for Fun. Well, it's not a technical book, but you will get a feeling what it means writing an OS.

As we know, STDOUT is buffered in Linux. My question is: 1) is it a global buffer shared by all processes? or one buffer for each process? 2) where the buffer is? on stack, or heap or static area? 3) who creates it?

Best resource to learn about the C standard library in excruciating detail is P. J. Plauger's The Standard C Library. He describes all the issues that arose while he implemented the library by himself (in MSWord! on a win3.1 laptop! on vacation!).

He also gives detailed information on how to use (and test) every function.

For the Unix (Linux) side, you should start reading about 'inode's, which is the classic data structure for storing in-memory cached files. The classic book for this is The Design of the UNIX Operating System by Maurice J. Bach.


Alright, now that you're suitably scolded for not having read all the old books and every related wiki page.

Here's a relevant quote from The Standard C Library, p. 256.

You can, in principle, exercise a certain amount of control over how the I/O functions buffer data for a stream. You must realize, however, that buffering is an optimization based on various conjectures about patterns of I/O. These conjectures are usually correct, and many implementations follow your advice. But they don't have to. An implementation is free to ignore most of your buffering requests.

Nevertheless, if you think a bigger buffer will improve performance or a smaller buffer will save space, you can supply your own candidate buffer. Call the function setvbuf after you open the file, and before you perform any other operations on the stream. (Avoid the older function setbuf, which is less flexible.) You can specify whether I/O should be fully buffered, buffered by text lines, or unbuffered. It just might make a difference in how well your program performs.

...

setbuf -- Use setvbuf instead of this function to get more control.

setvbuf -- As a rule, it is best to let the Standard C library decide how to buffer input/output for you. If you are certain that you want no buffering or line-at-a-time buffering, the use this function to initialize the stream properly. Call setvbuf immediately after you open the stream. Almost any operation on the stream will preempt your right to choose a buffering strategy. Should you specify your own buffer with this call, don't assume that the stream will actually use it. And never alter the contents of the buffer while the stream is open. The mode (third) argument must have one of the values _IOFBF, _IOLBF, or _IONBF, described above. Also see the macro BUFSIZ, described [elsewhere].

...

/* setbuf function */
#include "xstdio.h"

int (setbuf)(FILE *str, char *buf)
    {        /* set up buffer for a stream */
    setvbuf(str, buf, buf ? _IOFBF : _IONBF, BUFSIZ);
    }

/* setvbuf function */
#include <limits.h>
#include <stdlib.h>
#include "xstdio.h"

int (setvbuf)(FILE *str, char *abuf, int smode, size_t size)
    {         /* set up buffer for a stream */
    int mode;
    unsigned char *buf = (unsigned char *)abuf;

    if (str->_Mode & (_MREAD|_MWRITE))
        return (-1);
    mode = smode == _IOFBF ? 0
        : smode == _IOLBF ? _MLBF
        : smode == _IONBF ? _MNBF : -1;
    if (mode == -1)
        return (-1);
    if (size == 0)
        buf = &str->_Cbuf, size = 1;
    else if (INT_MAX < size)
        size = INT_MAX;
    if (buf)
        ;
    else if ((buf = malloc(size)) == NULL)
        return (-1);
    else
        mode |= _MALBUF;
    if (str->_Mode & _MALBUF)
        free(str->_Buf), str->_Mode &= ~_MALBUF;
    str->_Mode |= mode;
    str->_Buf = buf;
    str->_Bend = buf + size;
    str->_Next = buf;
    str->_Rend = buf;
    str->_Wend = buf;
    return (0);
    }

So, at least in this implementation, the default buffer probably lives in the FILE structure and is allocated on the heap. We can see here its brother, a character buffer (str->_CBuf), is used for "unbuffered".

I have a confusing notion about the process of segmentation & paging in x86 linux machines. Will be glad if some clarify all the steps involved from the start to the end.

x86 uses paged segmentation memory technique for memory management.

Can any one please explain what happens from the moment an executable .elf format file is loaded from hard disk in to main memory to the time it dies. when compiled the executable has different sections in it (text, data, stack, heap, bss). how will this be loaded ? how will they be set up under paged segmentation memory technique.

Wanted to know how the page tables get set up for the loaded program ? Wanted to know how GDT table gets set up. how the registers are loaded ? and why it is said that logical addresses (the ones that are processed by segmentation unit of MMU are 48 bits (16 bits of segment selector + 32 bit offset) when it is a bit 32 bit machine. how will other 16 bits be stored ? any thing accessed from ram must be 32 bits or 4 bytes how does the rest of 16 bits be accessed (to be loaded into segment registers) ?

Thanks in advance. the question can have a lot of things. but wanted to get clarification about the entire life cycle of an executable. Will be glad if some answers and pulls up a discussion on this.

Perhaps read the Assembly HOWTO. When a Linux process starts to execute an ELF executable using the execve system call, it is essentially (sort of) mmap-ing some segments (and initializing registers, and a tiny part of the stack). Read also the SVR4 x86 ABI supplement and its x86-64 variant. Don't forget that a Linux process only see memory mapping for its address space and only cares about virtual memory

There are many good books on Operating Systems (=O.S.) kernels, notably by A.Tanenbaum & by M.Bach, and some on the linux kernel

NB: segment registers are nearly (almost) unused on Linux.

I'm writing an emulator of an old computer in Java/Swing, and I think I've identified the design problem that I'm having. As idiosyncratic as this application is, I suspect someone will find a "pattern" to this problem.

I should add that I'm still a beginner at OOP, GUIs, and Design Patterns.

The machine has a GUI thread (Console) - with pushbuttons and switches, and a Model thread (CPU) with which the Console communicate to cause Console events to change the CPU's state. The Console, of course, is event-driven from the AWT event queue. The Console communicates with the CPU by queueing messages on a Priority Blocking Queue, which the CPU receives. As such, the CPU is structured as an event-loop, too. So far, so good.

The problem is that, when you push START on the Console, you want the CPU to start executing whatever program it has in its memory. It still needs to respond to switch throws and button-pushes (such as STOP) from the Console, but it mainly needs to sit there and spin through its instruction fetch-decode-execute loop.

And even that wasn't problematic for a while: I had a method called Cycle() that would execute one specific "cycle" of the current instruction and then return, to be immediately redispatched to execute the next cycle. I placed the call to Cycle() inside the CPU's run loop, and polled the message queue after every cycle. If the CPU was stopped, the run loop would simply wait on the message queue.

Now: I'm implementing the I/O instructions, e.g. Read Card. It's necessary for one of the cycles to send a request for data to the peripheral device in question (itself implemented as a GUI/Model running on separate threads) and then wait for the data to arrive. This completely breaks the whole notion of the CPU being a simple event-loop that receives messages and acts upon them without incurring blocking in the process. This new cycle will block. And if the operator hasn't loaded a card deck into the reader, it can block a long time.

I thought about factoring the instruction fetch-decode-execute loop out into a separate "worker" thread, but I don't think it's a good fit, since worker threads (as I understand them) are intended to run asynchronously to completion, and do not continue interact with their parent thread while running. (In fact, I can't think of why the "worker thread" should ever terminate.) Also, there is currently no synchronization necessary when a cycle needs to access data that can concurrently be modified as a result of console key presses.

So how do I manage to meld "event-driven" processing with a traditional batch-process which needs to explicitly wait for messages before proceeding?

In a typical real system, each separate device would in fact be running in parallel (i.e., on its own thread), but it certainly makes sense to model it in your simulation as a separate thread. You'll want to make sure that you implement some sort of analog to the interrupt system in real CPUs to handle the synchronization when the worker has finished its job (e.g., a new priority queue to catch "I/O interrupts" and the like). You may find it helpful to pick up a copy of The Design of the UNIX Operating System by Maurice Bach. It goes into great detail on how UNIX interacts with the underlying hardware, and may be a good resource for your project.

As the title suggests, is there any source with a systematic knowledge required to understand Linux kernel?

If there's no such thing, what are the hardware topics should I covered (I think it's just around the computer)? To what depth (I want to leanr deep enough to understand, not to the circuit design level)?

I suggest you buy an embedded board such as the Beagleboard. They will give you the necessary documentation of the chipset used ( its TI ARM Cortex A8 in this case ). Start reading this chipset manual. Also, there are many Android, Ubuntu projects already implemented on beagleboard. Take them as reference and see how they've written the drivers to your specific board etc. This will give you an idea of how the kernel/drivers interact with the H/W.

Coming to the S/W part of the kernel, I suggest you read a good book on Operating systems in general and Linux/Unix OS in particular. This will give you fair ideas of what is a kernel and how it manages things. Then, you can play around with your desktop linux by writing small kernel modules, inserting and removing it, debugging it etc.

Also, keep a handy reference of the kernel source with you and make the internet your best friend.

For example, a process waiting for disk I/O to complete will sleep on the address of the buffer header corresponding to the data being transferred. When the interrupt routine for the disk driver notes that the transfer is complete, it calls wakeup on the buffer header. The interrupt uses the kernel stack for whatever process happened to be running at the time, and the wakeup is done from that system process.

Can you please explain the last line in the paragraph which I have emphasised. It is about waking up the process which has been waiting for some event to occur and thus has slept. This para is from Galvin. By the way can you suggest some good book or link for studying unix operating systems?

Thanks.

There is some process running at the time the interrupt is received. The kernel doesn't change over to some other process context to handle it -- that would take time -- it just does what's necessary in the current context, and lets the scheduler know that the next time it schedules, the waiting process is ready to proceed.

There are a number of good internals books around. I'm fond of the various McKusick et al books, like The Design and Implementation of the FreeBSD Operating System.

Maurice Bach's Design of the Unix Operating System is the most well-known and comprehensive book on the subject.

Possible Duplicate:
context switch during doubly linked list creation

I was reading The Design of the UNIX Operating System (Maurice Bach).

He gives the following example of a doubly linked-list.

struct queue{
    //pointers (back and forward)
    //other data
}*bp, *bp1

bp1->forp = bp->forp; //1

bp1->backp = bp; //makes sense
bp->forp = bp1; //makes sense

bp1->forp->backp = bp1; //2

I can't understand the purpose of the statements marked 1 and 2. 1 seems wrong, 2 looks to be redundant.

Is this a valid way to create a doubly linked-list?

The code is correct.

bp is a doubly linked list. You want to insert bp1 into bp as the second item in the list (that is what the code does).

To do that you need to set 4 pointers:

bp1->forp should point to the second item in the list, bp->forp (//1 above)

bp1->backp should point to the first item in the list, bp

bp->forp should point to the inserted item, bp1

The back pointer of the second item, bp1->forp->backp should point to the inserted item, bp1. (//2 above)

Edit:

Let's call the structs A, B, C, D... The list (pointed to by bp consists of A, C, D... before the insert. We want to insert B (pointed to by bp1. <-> denotes the forward and backwards pointers.

Before:

bp --> A <-> C <-> D <-> E <-> ...
bp1--> B

After:

bp--> A <-> B <-> C <-> D <-> E <-> ...

I and my friends are trying to find out a way to implement new (actually old :) ) kernel scheduling algorithm for SCHED_NORMAL and SCHED_BATCH classes. In other words, we are trying to implement Round Robin or FIFO behaviors instead of Fair Scheduling algorithm. We have been reading kernel code, we had some progress but it is not enough. What is the known way to start this process? How can we start (Start implementing these algorithms)?

Thank you

We have been reading kernel code, we had some progress but it is not enough. What is the known way to start this process? How can we start (Start implementing these algorithms)?

Well, these concepts are bit complex and greatly evolved so you should begin to understand the algorithm and concept about these from the below reference.Once you understand reading kernel code should be started. This should be the approach.

you may want to refer the great classic books which describes these concepts and the implementation in detailed way.

  1. "The Design Of UNIX Operating System" By Maurice J Bach (Chapter08)

  2. Linux Kernel Development By Robert Love (Chapter04)

I need to design some data structures for something that's roughly a cross between a database and a file system. In fact there's a large overlap in the design of file systems and databases. I've done some work on both, not very in-depth, but still, I have some idea about the topics. However I need more information and I am wondering if there's a good overview of design concepts out there. Things like overviews of algorithms, data structures and the like. The advantages of different types of trees or hash tables for key lookup. Perhaps some information about data ordering an alignment. In short, the experiences of other people in implementing filesystems and databases that would help the next person avoid the same mistakes.

Gray wrote a book titled "Transaction Processing: Concepts and Techniques" - http://www.amazon.com/Transaction-Processing-Concepts-Techniques-Management/dp/1558601902 - which covers a great deal of what you would need to build your own database.

One starting point for file systems would be http://www.amazon.com/The-Design-UNIX-Operating-System/dp/0132017997 - but I suspect you would have to chase down individual papers or Linux source code to keep up with the variety of different file systems created since then.

When we open a file using fopen() in C(Ubuntu platform and gcc compiler) and write to it,does the content is directly written to the hard disk address where the file resides or is it first brought into primary memory? What is the actual process with which a file could be written or read from its location in hard disk through a C program in Linux.

The C library does not make the actual write to disk. It is the job of operating system. C library will make a system call to kernel to write it to the disk. It may even implement a buffer to minimize the number of system calls. And kernel also implement buffer to optimize real writing to disk. In general when you are working with C you don't think this much low level. However, you need to ensure that you have closed the file correctly. The actual disk management is the job of OS.

The Design of the UNIX Operating System by Maurice J. Bach contains nice explanation of Unix kernel. You may have a look as a beginning.

Can anybody tell me what a 'call out table' is in Unix? Maurice J. Bach gives an explanation in his book Design of the UNIX Operating System, but I'm having difficulty in understanding the examples, especially the one explaining the reason of negative time-out fields. Why are software interrupts used there?

Thanks!

Interrupts stop the current code and start the execution of a high-priority handler; while this handler runs, nothing else can get the CPU. So if you need to do something complex, your interrupt handler would hang the whole system.

The solution: Fill a data structure with all the necessary data and then store this data structure with a pointer to the handler in the call out table. Some service (usually the clock handler) will eventually visit the table and execute the entries one by one in a standard context (i.e. one which doesn't block the process switching).

I can't seem to find this specific implementation detail, or even a pointer to where in an OS book to find this.

Basically, main thread calls an async task (to be run later) on itself. So... when does it run?

Does it wait for the run loop to finish? Or does it just randomly interrupt the run-loop in the middle of any function?

I understand the registers will be the same (unless separate thread), but not really the instruction pointer and what happens to the stack, if anything does happen.

Thank you

The threads of a process all share the same address space, not the same CPU registers. The thread scheduling is done depends on the programming language and the O/S. Usually there are explicit scheduling points, such as returning from a system call, blocking awaiting I/O completion, or between p-code instructions for interpreted languages. Some O/S implemtations reschedule depending on how long a thread has run for time-based scheduling. Often languages include a function that explicitly offers the CPU to any other thread or process by transferring control to the process or thread scheduler component of the O/S.

The act of switching from one thread or process to another is known as a context switch and is carefully tuned code because this is often done thousands of times per second. This can make the code difficult to follow.

The best explanation of this I've ever seen is http://www.amazon.com/The-Design-UNIX-Operating-System/dp/0132017997 classic.

When a process is selected by a Long Term Scheduler, the process enters into Ready queue (Ready State from New State) and all the processes in ready queue are present in main memory. But when a process is in a new state, where does it reside? In Primary Memory or Secondary Memory?

But, As Ready Queue processes are stored in Primary Memory, then, New Processes might reside in secondary memory! But couldn't get a proper reason!

When a process is in the "new" state right after its creation through fork system call or something similar, it initially resides on the main memory. Depending on the amount of available memory, the OS may decide to swap it out to secondary storage or keep it in main memory. Refer to the process state transition diagram (taken from The Design of Unix Operating System) below -

enter image description here

To quote it directly from The Design of Unix Operating System -

The process enters the state model in the "created" state when the parent process executes the fork system call and eventually moves into a state where it is ready to run (3 or 5). The process scheduler will eventually pick the process to execute, and the process enters the state "kernel running" where it completes its part of the fork system call.