Modern Operating Systems

Andrew S. Tanenbaum

Mentioned 19

The widely anticipated revision of this worldwide best-seller incorporates the latest developments in operating systems technologies. The Third Edition includes up-to-date materials on relevant operating systems such as Linux, Windows, and embedded real-time and multimedia systems. Includes new and updated coverage of multimedia operating systems, multiprocessors, virtual machines, and antivirus software. Covers internal workings of Windows Vista (Ch. 11); unique even for current publications. Provides information on current research based Tanenbaum's experiences as an operating systems researcher. A useful reference for programmers.

More on

Mentioned in questions and answers.

What is the technical difference between a process and a thread? I get the feeling a word like 'process' is over used and there is also hardware and software threads. How about light-weight processes in languages like Erlang? Is there a definitive reason to use one term over the other?

First, let's look at the theoretical aspect. You need to understand what a process is conceptually to understand the difference between a process and a thread and what's shared between them.

We have the following from section 2.2.2 The Classical Thread Model in Modern Operating Systems 3e by Tanenbaum:

The process model is based on two independent concepts: resource grouping and execution. Sometimes it is use­ful to separate them; this is where threads come in....

He continues:

One way of looking at a process is that it is a way to group related resources together. A process has an address space containing program text and data, as well as other resources. These resource may include open files, child processes, pending alarms, signal handlers, accounting information, and more. By putting them together in the form of a process, they can be managed more easily. The other concept a process has is a thread of execution, usually shortened to just thread. The thread has a program counter that keeps track of which instruc­tion to execute next. It has registers, which hold its current working variables. It has a stack, which contains the execution history, with one frame for each proce­dure called but not yet returned from. Although a thread must execute in some process, the thread and its process are different concepts and can be treated sepa­rately. Processes are used to group resources together; threads are the entities scheduled for execution on the CPU.

Further down he provides the following table:

Per process items             | Per thread items
Address space                 | Program counter
Global variables              | Registers
Open files                    | Stack
Child processes               | State
Pending alarms                |
Signals and signal handlers   |
Accounting information        |

Let's deal with the hardware multithreading issue. Classically, a CPU would support a single thread of execution, maintaining the thread's state via a single program counter, and set of registers. But what happens if there's a cache miss? It takes a long time to fetch data from main memory, and while that's happening the CPU is just sitting there idle. So someone had the idea to basically have two sets of thread state ( PC + registers ) so that another thread ( maybe in the same process, maybe in a different process ) can get work done while the other thread is waiting on main memory. There are multiple names and implementations of this concept, such as HyperThreading and Simultaneous Multithreading ( SMT for short ).

Now let's look at the software side. There are basically three ways that threads can be implemented on the software side.

  1. Userspace Threads
  2. Kernel Threads
  3. A combination of the two

All you need to implement threads is the ability to save the CPU state and maintain multiple stacks, which can in many cases be done in user space. The advantage of user space threads is super fast thread switching since you don't have to trap into the kernel and the ability to schedule your threads the way you like. The biggest drawback is the inability to do blocking I/O ( which would block the entire process and all it's user threads ), which is one of the big reasons we use threads in the first place. Blocking I/O using threads greatly simplifies program design in many cases.

Kernel threads have the advantage of being able to use blocking I/O, in addition to leaving all the scheduling issues to the OS. But each thread switch requires trapping into the kernel which is potentially relatively slow. However, if you're switching threads because of blocked I/O this isn't really an issue since the I/O operation probably trapped you into the kernel already anyway.

Another approach is to combine the two, with multiple kernel threads each having multiple user threads.

So getting back to your question of terminology, you can see that a process and a thread of execution are two different concepts and your choice of which term to use depends on what you're talking about. Regarding the term "light weight process", I don't personally see the point in it since it doesn't really convey what's going on as well as the term "thread of execution".

Recently, I have been asked a question in an interview what's the difference between a process and a thread. Really, I did not know the answer. I thought for a minute and gave a very weird answer.

Threads share the same memory.. processes do not. After answering this, the interviewer gave me an evil smile and fired the following questions at me:

Q. Do you know the segments in which a program gets divided?

My answer: yep (thought it was an easy one) Stack, Data, Code, Heap

Q. So, tell me which segments share threads?

I could not answer this and ended up in saying all of them.

Please, can anybody present the correct and impressive answers for the difference between a process and a thread?

Something that really needs to be pointed out is that there are really two aspects to this question - the theoretical aspect and the implementations aspect.

First, let's look at the theoretical aspect. You need to understand what a process is conceptually to understand the difference between a process and a thread and what's shared between them.

We have the following from section 2.2.2 The Classical Thread Model in Modern Operating Systems 3e by Tanenbaum:

The process model is based on two independent concepts: resource grouping and execution. Sometimes it is use­ful to separate them; this is where threads come in....

He continues:

One way of looking at a process is that it is a way to group related resources together. A process has an address space containing program text and data, as well as other resources. These resource may include open files, child processes, pending alarms, signal handlers, accounting information, and more. By putting them together in the form of a process, they can be managed more easily. The other concept a process has is a thread of execution, usually shortened to just thread. The thread has a program counter that keeps track of which instruc­tion to execute next. It has registers, which hold its current working variables. It has a stack, which contains the execution history, with one frame for each proce­dure called but not yet returned from. Although a thread must execute in some process, the thread and its process are different concepts and can be treated sepa­rately. Processes are used to group resources together; threads are the entities scheduled for execution on the CPU.

Further down he provides the following table:

Per process items             | Per thread items
Address space                 | Program counter
Global variables              | Registers
Open files                    | Stack
Child processes               | State
Pending alarms                |
Signals and signal handlers   |
Accounting information        |

The above is what you need for threads to work. As others have pointed out, things like segments are OS dependant implementation details.

This might be in vain, as I know writing an operating system is unbearably complicated (especially by oneself).

  • I don't expect to build the next linux, or windows.

  • I know it will be horrible, and buggy, and won't work, but that's fine.

I want to write everything myself, in Assembly, C, and (some) C++.

This is a future project, as I'm busy with some other things at the moment and don't have the time immediately, but I figured I would ask it now, so maybe I could get lots of answers to this, and it could build up and be a useful resource for this kind of approach (everything else I have seen involved building off of minix, using an existing bootloader, building it in a virtual booting program thing, etc).

I want to set up one of my older desktops with a monitor, keyboard and mouse, and start working on a blank hard drive.

I want to learn how to write my own bootloader (I've found lots of resources about this, but for completeness, please still add some good ones), my own USB driver (if that's necessary), a CD driver (if that's necessary), etc. Everything, from the ground up.

  • How do I put the code onto the computer? Is it best to do it with a floppy disk? Can most computers do it from a USB stick?

  • What drivers do I need, and can you suggest any references to building those?

  • After the booting sequence--then what? How do I get into protected mode etc.

  • How do I manage memory without the help of an operating system? Do I just use whatever addresses I want? No initialization necessary?

  • What will I undoubtedly run into that will confuse me?

  • How can I make it either a command line O/S, and a graphical one?

  • What is a graphical O/S built on? Like, how would I do something like, a command line, with a font, and a picture at the top?

  • Where can I read about setting up a multitasking environment? (ie., having two graphical-like command lines running side-by-side).

  • How would I set up a sort of windowing system? How do I display graphics on the screen once simple multitasking is set up?

Believe me, I understand that this is a very complicated project, and I probably will never get around to completing it or writing anything on it of any use.

There are lots of other pieces to this I haven't mentioned, if you think of any, feel free to add those too.

Please put one "topic" per answer--for example, USB drivers, and then maybe a list of resources, things to look out for, etc.

Also, please don't suggest building off of another O/S or pre-existing code. I know I will read a lot of pre-existing code (such as the linux kernel, or example resources, existing drivers, etc) but ultimately I want to do all the writing myself. I know I should build off of something else, and there are lots of other questions on SO about that that I can read if I change my mind and go that route. But this one is all about doing the whole thing from scratch.


I've got lots of great answers to this, mostly about the booting process, file systems and various existing projects to read for reference.

Any suggestions on how to get it graphical? Different video modes and how to work with them, etc?

Take a look at Minix. Study the source code along with "Operating Systems Design and Implementation". Consider making contributions to the project. I think Minix is a really good and promising OS in the making. It is also well funded project. That means, you might even get paid for your contributions!

First things first. Read, read, read, read, read. You need to have a firm understanding of how the OS works before you can hope to implement your own.

Grab one of Andrew Tanenbaum's books on operating systems. This is the one we used in my OS class in college:

Modern Operating Systems

Modern Operating Systems on Amazon

Despite the ridiculous cover, it's a fantastic read, especially for a textbook. Tanenbaum is really an expert in this area and his explanations of how the OS works underneath the hood are clear and easy to understand. This book is mostly theory, but I believe he also has a book that discusses more of the implementation. I've never read it, though, so I can't comment on it.

That should help you bone up on process management, memory management, filesystems, and everything else your OS kernel needs to do to get it up to a bootable state. From that point on it's basically a matter of writing device drivers for the hardware you need to support, and offering implementations of the C library functions to make kernel calls for things like opening files and devices, reading and writing, passing messages between processes, etc.

Read up on x86 assembly (assuming you are designing this for an x86 machine). That should answer a lot of your questions with regards to moving between processor operating modes.

If you've got any electronics knowledge, it may be easier to start with writing an operating system for an embedded device that has ample documentation, because it will generally be simpler than an x86 PC. I've always wanted to write my own OS as well, and I'm starting with writing a microkernel embedded OS for this development board from Digilent. It can run the soft-core MicroBlaze processor from Xilinx, which has very thorough documentation. It's also got some RAM, flash data storage, LEDs, switches, buttons, VGA output, etc. Plenty of stuff to play around with writing simple drivers for.

One of the benefits of an embedded device is also that you may be able to avoid writing a VGA driver for a long time. In my case, the Digilent development board has an onboard UART, so I can effectively use the serial output as my console to get the whole thing up and booting to a command line with minimal fuss.

Just make sure that whatever you choose to target has a readily available and well-tested compiler for it. You do not want to be writing an OS and a compiler at the same time.

What concepts in Computer Science do you think have made you a better programmer?

My degree was in Mechanical Engineering so having ended up as a programmer, I'm a bit lacking in the basics. There are a few standard CS concepts which I've learnt recently that have given me a much deeper understanding of what I'm doing, specifically:

Language Features

  • Pointers & Recursion (Thanks Joel!)

Data Structures

  • Linked Lists
  • Hashtables


  • Bubble Sorts

Obviously, the list is a little short at the moment so I was hoping for suggestions as to:

  1. What concepts I should understand,
  2. Any good resources for properly understanding them (as Wikipedia can be a bit dense and academic sometimes).

I find it a little funny that you're looking for computer science subjects, but find wikipedia too academic :D

Anyway, here goes, in no particular order:

As a recent graduate from a computer science degree I'd recommend the following:

Some of the OS concepts

 ( memory, IO, Scheduling, process\Threads, multithreading )

[a good book "Modern Operating Systems, 2nd Edition, Andrew S. Tanenbaum"]

Basic knowledge of Computer networks

[a good book by Tanenbaum

OOPS concepts

Finite autometa

A programming language ( I learnt C first then C++)

Algorithms ( Time\space complexity, sort, search, trees, linked list, stack, queue )

[a good book Introduction to Algorithms]

I'm planning to write an operating system and I don't know very much about operating systems. Are there any good resources or books to read in order for me to learn? What are your recommendations?

Operating System Concepts is the book we used at University. It's quite ugly BUT the information inside are well explain (from basic memory management, to how to OS decide what to execute or how to avoid deadlock). Pretty wide.

alt text

While old, these books are very good:

Operating System Design with Xinu

alt text

Operating System Design-Internetworking With XINU, Vol. II

alt text

3: http://Operating System Design-Internetworking With XINU, Vol. II

Operating Systems Implementation Prentice Software

alt text

This book is written by Tanenbaum, the main guy behind Minix, which is what Linux was based on. It provides good overviews for basic OS concepts like memory management, file systems, processes, etc. The concepts in this book book are intimately tied to examples of the Minix OS, which is a good thing.

I think you should start by something like that.

We used Andrew Tannenbaum's Modern Operating Systems at the university I attended. I highly recommend it for it's clear explanations of the tradeoffs inherent in many of the design decisions that you'll run up against. This book is a little bit more "fair and balanced" than the Minix book.

alt text

I also recommend this book because, despite his net-famous flame war with Linus Torvalds, few of his biases come through in the book. Also, he's a pretty decent writer, and the book is actually entertaining.

I've read various bits and pieces on concurrency, but was hoping to find a single resource that details and compares the various approaches. Ideally taking in threads, co-routines, message passing, actors, futures... whatever else that might be new that I don't know about yet!

Would prefer a lay-coders guide than something overtly theoretical / mathematical.

Thank you.

I recommend An Introduction to Parallel Programming by Pacheco. It's clearly written, and a good intro to parallel programming.

If you don't care about something being tied to a language, then Java Concurrency in Practice is a great resource.

Oracle's online tutorial is free, but probably a bit more succinct than what you're looking for.

That being said, the best teacher for concurrency is probably experience. I'd try to get some practice, myself. Start out by making a simulation of the Dining Philosophers problem. It's a classic.

At first, let's see if you're interested in the topic or not. To grasp a big picture about concurrency, best practice is to take a look at operating systems books, like Operating systems internal by Stalings or Modern operating systems by Tanenbaum. They can give you an intuition about what this is all about.
There's also an old book, named Concurent programming by Ben-Ari. If you found it, it can be helpful.

Beside reading text books it's good make your hand dirty by writing some concurrent programs. Python is a very good choice if you want to start using threads. Every Python book has a part dedicated to this topic. Also a with a simple search on the web you can find a lot of resources about it, but I give these two higher preference:
Multithreaded Programming (POSIX pthreads Tutorial), A very comprehensive introduction to concurrency and multi-threading. It's mainly about C multi-threading.
The other one is Thread Synchronization Mechanisms in Python.

Now if you still find your self interested about concurrent programming, it's time to go deeper. You almost have the basic knowledge of concurrency, now the best practice in this level is to start solving problem and become familiar with patterns. To achieve this goal, you can use The Little Book of Semaphores. It's one of best books in the field and it's also free. This is a book that can head you toward becoming a proficient.

These should be enough if you want to approach concurrent programming, but if you have enough time, and you're eager, it's good to take a look at some other paradigms of concurrent programming, like actors which used in Erlang. I say it worth to read some chapters of the book Seven Languages in Seven Weeks. especially chapter about Erlang and IO. At first glance, it might be hard and strange, but it's good to become familiar with other solutions to concurrency.

I have a rookie question about multi-threading. I understand that, when a single user is accessing an application, multiple threads can be used, and they can run parallel if multiple cores are present. If only one processor exists, then threads will run one after another.

But my question is, when multiple users are accessing an application, how are the threads handled?

You need understand about thread scheduler. In fact, in a single core, CPU divides its time between multiple threads (the process is not exactly sequential). In a multiple core, two (or more) threads can run simultaneously. Read thread article in wikipedia. I recommend Tanenbaum's OS book.

Ok, so I am reading about synchronization, and I read through various algorithms such as spinlocks, semaphores, and mutex to avoid race condition.

However, these algorithms can't prevent race condition in SMP when multiple proceses access the data exactly at the same time.

For example, suppose thread 1 in processor A runs lock(mutex1); withdraw(1000); unlock(mutex1);

and thread 2 in processor B runs lock(mutex1); deposit(1000); deposit(1000); unlock(mutex1);

When both threads run EXACTLY AT THE SAME TIME, both threads will be in critical section simultaneously.

The only solution (should be in hardware level) would be making each processors run slightly off to each other, but it defeats the purpose of parallelism.

Is there any hardware level support to avoid these situation where multiple processors try to acquire the lock at the exactly same time?

(this is not a problem of atomicity, but rather problem of exact parallelism, and I wonder how SMP deals with it).

You can prevent this using atomic instructions like TLS and XCHG.

How do you ensure atomicity for an instruction?

You can disable all interruptions before executing the instruction, then enable them all after the instruction is done. That doesn't help on multicore systems, because disabling the interruption in processor 1 doesn't have any effect on processor 2. On multicore systems the atomicity of an instruction is ensured by preventing other CPU's from access to the memory bus (memory barrier).

So, if you implement semaphores using these instructions you'll have no problems on SMP.

Implementation of mutex_lock and mutex_unlock using TSL:

    TSL REGISTER, MUTEX ; copy mutex to register and sets mutex
    CMP REGISTER, #0    ; compare mutex to zero
    JZE ok              ; if zero return
    CALL thread_yield   ; else: mutex is busy, schedule another thread
    JMP mutex_lock      ; try again later
 ok: RET

    MOVE MUTEX,#0       ; free mutex

You can find some information about TSL here:

A good book that can help you with:

I strongly recommend Curt Schimmel's UNIX® Systems for Modern Architectures: Symmetric Multiprocessing and Caching for Kernel Programmers. Different hardware architectures provide different low-level tools for synchronizing access to data, including some architectures with very nearly no help. Schimmel's book provides algorithms that can work even on those architectures.

I wish I could easily locate my copy to summarize the contents.

I am developing a J2ME application that has a large amount of data to store on the device (in the region of 1MB but variable). I can't rely on the file system so I'm stuck the Record Management System (RMS), which allows multiple record stores but each have a limited size. My initial target platform, Blackberry, limits each to 64KB.

I'm wondering if anyone else has had to tackle the problem of storing a large amount of data in the RMS and how they managed it? I'm thinking of having to calculate record sizes and split one data set accross multiple stores if its too large, but that adds a lot of complexity to keep it intact.

There is lots of different types of data being stored but only one set in particular will exceed the 64KB limit.

I think the most flexible approach would be to implement your own file system on top of the RMS. You can handle the RMS records in a similar way as blocks on a hard drive and use a inode structure or similar to spread logical files over multiple blocks. I would recommend implementing a byte or stream-oriented interface on top of the blocks, and then possibly making another API layer on top of that for writing special data structures (or simply make your objects serializable to the data stream).

Tanenbaum's classical book on operating systems covers how to implement a simple file system, but I am sure you can find other resources online if you don't like paper.

What is the common theory behind thread communication? I have some primitive idea about how it should work but something doesn't settle well with me. Is there a way of doing it with interrupts?

Really, it's just the same as any concurrency problem: you've got multiple threads of control, and it's indeterminate which statements on which threads get executed when. That means there are a large number of POTENTIAL execution paths through the program, and your program must be correct under all of them.

In general the place where trouble can occur is when state is shared among the threads (aka "lightweight processes" in the old days.) That happens when there are shared memory areas,

To ensure correctness, what you need to do is ensure that these data areas get updated in a way that can't cause errors. To do this, you need to identify "critical sections" of the program, where sequential operation must be guaranteed. Those can be as little as a single instruction or line of code; if the language and architecture ensure that these are atomic, that is, can't be interrupted, then you're golden.

Otherwise, you idnetify that section, and put some kind of guards onto it. The classic way is to use a semaphore, which is an atomic statement that only allows one thread of control past at a time. These were invented by Edsgar Dijkstra, and so have names that come from the Dutch, P and V. When you come to a P, only one thread can proceed; all other threads are queued and waiting until the executing thread comes to the associated V operation.

Because these primitives are a little primitive, and because the Dutch names aren't very intuitive, there have been some ther larger-scale approaches developed.

Per Brinch-Hansen invented the monitor, which is basically just a data structure that has operations which are guaranteed atomic; they can be implemented with semaphores. Monitors are pretty much what Java synchronized statements are based on; they make an object or code block have that particular behavir -- that is, only one thread can be "in" them at a time -- with simpler syntax.

There are other modeals possible. Haskell and Erlang solve the problem by being functional languages that never allow a variable to be modified once it's created; this means they naturally don't need to wory about synchronization. Some new languages, like Clojure, instead have a structure called "transactional memory", which basically means that when there is an assignment, you're guaranteed the assignment is atomic and reversible.

So that's it in a nutshell. To really learn about it, the best places to look at Operating Systems texts, like, eg, Andy Tannenbaum's text.

I am currently in an university operating system class and we are working on the windows kernel, more precisely WRK, the windows research kernel, for our projects. WRK is based off of win2k3 server.

I am however having a real hard time dredging up resources to help learn the basics of OS development, Windows kernel development and just generally getting around the Windows API.

We are using the book Microsoft Internals by Russinovich but I was wondering if any of you had some great resources to recommend to me, whether book, online guides or some old class notes. Thanks!

The third edition of Tanenbaum's Modern Operating Systems has a chapter devoted to the Vista kernel. I haven't looked into that chapter (I only read the Linux one), but as far as big-picture stuff, it's fantastic. I'm not sure what level of detail you're looking for, but that might be a good resource to check out.

What specifically are you looking for? Online resources? For that, OSROnline is one of the better websites. Alot of kernel development knowledge is found in the MS and the OSR Mailing lists, that's another place to check that might be better than Stack overflow.

Specifically books, there is the Programming WDM,Developing drivers with KMDF and Advance Windows Debugging. The last specifically will not teach you so much about the kernel and more how to navigate inside it, something you will do quite often if you are writing drivers or researching parts of it.

In order to write drivers, the easiest way is probably to take Windows Driver samples and hack at them, stare the results with windbg and learn more.

I'm actually quite surprised that this isn't a quickly googlable question. I was doing some research on file systems and their implementations in this book by tanenbaum. It lists a number of ways that Unix based systems implement their file/resource security mechanisms.

I have a basic understanding of domains, groups, user ids and protected resources as well as Access Control Lists. I also understand that there are various ways of managing these ACLs such as protection Matrixes (one axis is domains, one is files), file based permissions (where each file has a list of owners/groups/domains that may read/write/execute a file) and its reciprical owner ACLs (where each owner has a list of Files and RWX permissions).

My main question is what approach doe Mac OS X take in managing these file permissions or more specifically, Their ACL's? what happens when I change these file permissions with chmod for a file? A Technical answer would be much appreciated and I won't accept an answer straight away to allow for someone to give a detailed answer.

UPDATE: I Have come across this post as a guide to the meaning of the octet bits in a file permision mod (using chmod) but it does not allude to the underlying storage of this ACL.

I have been given the task to fix an Embedded Operating System which is written in C/C++. The Current thread scheduler being used is very similar to Round Robin Scheduling, except it lacks one very important feature, the ability to interrupt threads and then return executing thus creating a dependable "slice" of execution time.

My Question is, how does one go about interrupting running code, execute another task and then return execution gracefully? I believe this behavior requires assembler specific to the architecture. This is the chip that the OS will be running under:

On a side note, this is avionics software so it must be "Deterministic". Apart of this is that there is no heap usage, all memory must be bounded.

The current system is a "periodic process" in which the next task must wait for the first to complete. This is simple horrific, if one part of the operating system crashes, lets say the ATN stack, then the entire operating system will be brought to its knees. (Insert crashed airplane here... although this is class B software, which means the airplane will not crash if the system does.)

disclaimer: Don't use my advice. Find a specialist, if people's well-being depends on a system then don't leave it to chance/hacks/SO advice!

Plane oops

You should be able to write a new procedure which is entered via an interrupt at a known interval, save thread-state using existing scheduling functions and change thread context. Also,ensure your locking primitives work with the new scheduling and that you don't balls up non-atomic/non-instruction based T&S locking or anything.

This website gives good information about thread switching, state saving and so on. Ultimately interrupts are specific to your CPU/Hardware. The way you save your thread state will also be dependent on the constraints of the system and the thread structure you are using.

Modern Operating Systems 3rd Edition contains some good chunks on the theory, but the implementing depends on existing code and best practice for the hardware you are on, as well as other code in the kernel that handles interrupts, signals and so on.

Also "Real-time systems design and analysis By Phillip A. Laplante" might be a good resource for adapting your existing scheduler to the new requirements. Another interesting bit of text

i have shared memory, x writers, y readers, one parent process. Writers have exclusive access, so one writer can write and other readers and writers must wait. Multiple readers can read parallel. Priority is on writers, so for example if 3 readers are reading and one writer want write to that shared memory, then when those 3 readers finish their job, no more readers can read and writer can write. I dont know how to implement it through semaphores, because readers can read parallel, so next code will not work, because then all readers will be waiting in that semaphore.



I think something like this is not good, because it is not blocking.

//readers doJob
    return E_WRITER_ACTIVE;


Your problem is a variation of the classic Producer/Consumer problem (with no given read/write operations synchronization constraint). The following pseudo-code allows must solve your problem:

// globals
semaphore writers = 1; // "binary semaphore"
semaphore readers = 1; // "counting semaphore"

void writer() {
    sem_wait(writers); // wait until there's no writers
    sem_wait(readers); // wait until there's no readers

    // safe write context: no-one else can read nor write

    sem_post(readers); // signal other readers can run
    sem_post(writers); // signal other writers can run

void reader() {
    sem_wait(writers); // wait until there's no writers
    sem_post(readers); // there's one more reader active

    // safe read context: others can read, but no-one else can write

    sem_wait(readers); // this reader is completed
    sem_post(writers); // signal other writers can run

For more information on Synchronization see this lecture, but I'd recommend reading the more about Dijkstra Semaphores online or using a good book like Tanenbaum's Modern Operating Systems.

I have to download 1000s of images using api in cron job. Code is working fine but I want to optimize it. I want like,if I have to download 1000 images. Master file will divide job into 10 parts (100 images download each) and allocate job to 10 files(may be 10 instance of same file) 10*100 = 1000 images (like multi-threading). I can do with curl But it's on same server, so is there should be faster way to run files?I want to make like multi-threading. Master file will be called by cron job, rest work should be like multi-threading. Is $pid = pcntl_fork(); is good and safe way?

Writing a high performance HTTP crawler / downloader is no easy task. I'll describe, how I'd do it. Keep in mind, there are a lot of solutions, so if you want to dive deeper into this topic, you may want to read Modern Operating Systems by Andrew S. Tanenbaum.

  1. Put all URL's that need to be downloaded into a database.
  2. The master process connects the database and determines how many downloads there are and uses this parameter to set the download amount for the workers. It then closes the database connection.
  3. The master process uses pcntl_fork() to start the amount of workers you wish to run. (Yes, it's safe, but it's not easy to use.)
  4. Each worker should connect to the database separately and establish a read-write lock on the URL's it wishes to process. In case of MySQL, you'd use SELECT... FOR UPDATE.
  5. The worker should mark the records it processes with it's PID as in progress within the database, then release the database lock.
  6. The worker processes the download, then updates the database that it's finished.
  7. Once all workers have finished with their share and exited (see pcntl_wait()), the master process opens a database connection and clears all in-progress flag that any crashed workers may have left. It then closes the database connection.
  8. The process is repeated until all downloads are complete.

This is a relatively safe path to travel. However, keep in mind that you are playing with something way beyond the knowledge level of an average (or even experienced) PHP coder. You must read up on how the Linux (or Windows) process model works, otherwise you will have an incredibly broken application at your hands.

I'm studying operating systems with this book and following some exercise guides from the university: scheduling, synchronization, memory, file system, I/O...

It's an interesting topic, but I've to take the exam in a month so I've a short amount of time to deeply learn it. I tried reading a linux scheduler for example, but I couldn't understand it very well because of my limited C knowledge.

I'm looking for comprehensive material(interactive at best), I've found this about semaphores(synchronization) that seems really nice and I'm about to start looking at it.

My suggestion is, in addition to search for some interactive learning solutions, to read some chapters of Modern Operating Systems book by A. S. Tanenbaum

I studied it for Operating Systems and Architecture course (@ university - Computer Science, 2nd year) and it's very good as the author makes concepts simple to understand!

I'd like to know where the source code for the commands are in the linux tree.

Some commands:

  • mkdir
  • ls
  • dir
  • chmod
  • etc

Right now I'd just like to read the code and understand it, I know there are many more steps in understanding how linux works, but right now I really want to understand how the OS receives the command and how it calls the correct command

Download the source of coreutils from and get started. Get the source of bash (or other shell) and you can read the bash source.

You may want to read any Linux system programming book to read about the system calls API and know how to use them. Here is a link to unix.stackexchange: What is the best book to learn Linux system programming?

The working of the Linux OS and working of the commands are different things. If you already know OS basics then you can try reading Understanding The Linux Kernels by Daniel Bovet. Else I think you might want to first read a standard OS book by Galvin, Tanenbaum or Deitel or any other book.

Without going into details, how is a Monitor different from an OS?

I read that first there was Serial Processing in the earlier days, and then Monitors and now OS.

Monitor in this context means Batch Monitor.

In the 1950s - mid 60s, before we had true operating systems, we had Batch Monitors. You would "program" the job onto punch cards and put them on an input queue that the machine would process one by one.

The programmer would sit in front of a monitor, which would display memory dumps, debugging information, etc - it was an incredibly tedious process.

Of course the major drawback of a Batch Monitor is that the CPU was often idle. Because CPU speeds are so much faster than I/O speed, the machine would spend the majority of the time reading in the cards (I/O) while the CPU waited.

Nowadays, modern operating systems can run several processes at once and optimize CPU utilization. When a process on the run queue needs to do I/O, the OS puts it on another queue, and the CPU starts processing the next job. When the I/O is done, that process is moved back to the run queue. This way, the CPU is always doing something.

After looking up "batch monitor" and not finding many references to it, it seems that it is more commonly referred to as a "batch system" - here's a book for reference; should be able to find a pdf version online: Modern Operating Systems.

I love programming in C. And I want to create an O.S using C. BUt I've no idea from where to begin? Some suggestions are appreciated!

Try reading Modern Operating Systems for an overview of what you're in for.

enter image description here