Advanced Programming in the UNIX Environment

W. Richard Stevens, Stephen A. Rago

Mentioned 18

The revision of the definitive guide to Unix system programming is now available in a more portable format.

More on Amazon.com

Mentioned in questions and answers.

I need to run a php script as daemon process (wait for instructions and do stuff). cron job will not do it for me because actions need to be taken as soon as instruction arrives. I know PHP is not really the best option for daemon processes due to memory management issues, but due to various reasons I have to use PHP in this case. I came across a tool by libslack called Daemon (http://libslack.org/daemon) it seems to help me manage daemon processes, but there hasn't been any updates in the last 5 years, so I wonder if you know some other alternatives suitable for my case. Any information will be really appreciated.

If you can - grab a copy of Advanced Programming in the UNIX Environment. The entire chapter 13 is devoted to daemon programming. Examples are in C, but all the function you need have wrappers in PHP (basically the pcntl and posix extensions).

In a few words - writing a daemon (this is posible only on *nix based OS-es - Windows uses services) is like this:

  1. Call umask(0) to prevent permission issues.
  2. fork() and have the parent exit.
  3. Call setsid().
  4. Setup signal processing of SIGHUP (usually this is ignored or used to signal the daemon to reload its configuration) and SIGTERM (to tell the process to exit gracefully).
  5. fork() again and have the parent exit.
  6. Change the current working dir with chdir().
  7. fclose() stdin, stdout and stderr and don't write to them. The corrrect way is to redirect those to either /dev/null or a file, but I couldn't find a way to do it in PHP. It is possible when you launch the daemon to redirect them using the shell (you'll have to find out yourself how to do that, I don't know :).
  8. Do your work!

Also, since you are using PHP, be careful for cyclic references, since the PHP garbage collector, prior to PHP 5.3, has no way of collecting those references and the process will memory leak, until it eventually crashes.

I've always been a largely independent learner gleaning what I can from Wikipedia and various books. However, I fear that I may have biased my self-education by inadvertent omission of topics and concepts. My goal is to teach myself the equivalent of an undergraduate degree in Computer Science from a top university (doesn't matter which one).

To that end, I've purchased and started reading a few academic textbooks:

As well as a few textbooks I have left over from classes I've taken at a mediocre-at-best state university:

My questions are:

  • What topics aren't covered by this collection?
  • Are there any books that are more rigorous or thorough (or even easier to read) than a book listed here?
  • Are there any books that are a waste of my time?
  • In what order should I read the books?
  • What does an MIT or Stanford (or UCB or CMU ...) undergrad learn that I might miss?

Software engineering books are welcome, but in the context of academic study only please. I'm aware of Code Complete and the Pragmatic Programmer, but I'm looking for a more theoretical approach. Thanks!

I think you can use most of the other books for reference and just absorb Programming Pearls in its entirety. Doing so would make you better than 90% of the programmers I've ever met.

The "Gang of Four" Design Patterns book. The Design Patterns course I took in college was probably the most beneficial class I've ever taken.

First, I wouldn't worry about it. But if you'd like a book to learn some of the abstract CS ideas, I'd recommend The Turing Omnibus or Theoretical Introduction to Programming.

If I were deciding between hiring two programmers and neither had much experience, but one had a CS degree and the other didn't, I'd hire the one with the CS degree. But when you get to comparing two programmers with a dozen years of experience, the degree hardly matters.

Even i'm in the same plane: studying computer science in my free time after work; These are some of the books i have in my shelf right now

  1. Applying UML and patterns - Larman
  2. Introduction to algorithms - Cormen
  3. Discrete mathematics and its applications - Rosen
  4. Software Engineering
  5. Advanced Programming in the UNIX Environment

Will udpate this list further as soon as i finish them... :-)

File Structures: An object oriented approach with C++

A lot of good info about block devices and file structuring which you won't find in any of the books you listed. It got a few critical reviews on Amazon because people didn't like his code examples, but the point of the book is to teach the concepts, not give cut and paste code examples.

Also make sure to get a book on compilers

Biggest two omissions I see:

For operating systems I prefer the Tanenbaum instead of the Silberschatz but both are good:

And about the order, that would depend on your interests. There aren't many prerequisites, automata for compilers is the most obvious one. First read the automata book and then the dragon one.

I don't know all the books you have, but the ones I know are good enough so that may mean the others are decent as well.

You are missing some logic and discrete math books as well.

And let's not forget some database theory books!

Think MUDs/MUCKs but maybe with avatars or locale illustrations. My language of choice is ruby.

I need to handle multiple persistent connections with data being asynchronously transferred between the server and its various clients. A single database must be kept up-to-date based on activity occurring in the client sessions. Activity in each client session may require multiple other clients to be immediately updated (a user enters a room; a user sends another user a private message).

This is a goal project and a learning project, so my intention is to re-invent a wheel or two to learn more about concurrent network programming. However, I am new to both concurrent and network programming; previously I have worked almost exclusively in the world of non-persistent, synchronous HTTP requests in web apps. So, I want to make sure that I'm reinventing the right wheels.

Per emboss's excellent answer, I have been starting to look at the internals of certain HTTP servers, since web apps can usually avoid threading concerns due to how thoroughly the issue is abstracted away by the servers themselves.

I do not want to use EventMachine or GServer because I don't yet understand what they do. Once I have a general sense of how they work, what problems they solve and why they're useful, I'll feel comfortable with it. My goal here is not "write a game", but "write a game and learn how some of the lower-level stuff works". I'm also unclear on the boundaries of certain terms; for example, is "I/O-unbound apps" a superset of "event-driven apps"? Vice-versa?

I am of course interested in the One Right Way to achieve my goal, if it exists, but overall I want to understand why it's the right way and why other ways are less preferable.

Any books, ebooks, online resources, sample projects or other tidbits you can suggest are what I'm really after.

The way I am doing things right now is by using IO#select to block on the list of connected sockets, with a timeout of 0.1 seconds. It pushes any information read into a thread-safe read queue, and then whenever it hits the timeout, it pulls data from a thread-safe write queue. I'm not sure if the timeout should be shorter. There is a second thread which polls the socket-handling thread's read queue and processes the "requests". This is better than how I had it working initially, but still might not be ideal.

I posted this question on Hacker News and got linked to a few resources that I'm working through; anything similar would be great:

Although you probably don't like to hear it I would still recommend to start investigating HTTP servers first. Although programming for them seemed boring, synchronous, and non-persistent to you, that's only because the creators of the servers did their job to hide the gory details from you so tremendously well - if you think about it, a web server is so not synchronous (it's not that millions of people have to wait for reading this post until you are done... concurrency :) ... and because these beasts do their job so well (yeah, I know we yell at them a lot, but at the end of the day most HTTP servers are outstanding pieces of software) this is the definite starting point to look into if you want to learn about efficient multi-threading. Operating systems and implementations of programming languages or games are another good source, but maybe a bit further away from what you intend to achieve.

If you really intend to get your fingers dirty I would suggest to orient yourself at something like WEBrick first - it ships with Ruby and is entirely implemented in Ruby, so you will learn all about Ruby threading concepts there. But be warned, you'll never get close to the performance of a Rack solution that sits on top of a web server that's implemented in C such as thin.

So if you really want to be serious, you would have to roll your own server implementation in C(++) and probably make it support Rack, if you intend to support HTTP. Quite a task I would say, especially if you want your end result to be competitive. C code can be blazingly fast, but it's all to easy to be blazingly slow as well, it lies in the nature of low-level stuff. And we haven't discussed memory management and security yet. But if it's really your desire, go for it, but I would first dig into well-known server implementations to get inspiration. See how they work with threads (pooling) and how they implement 'sessions' (you wanted persistence). All the things you desire can be done with HTTP, even better when used with a clever REST interface, existing applications that support all the features you mentioned are living proof for that. So going in that direction would be not entirely wrong.

If you still want to invent your own proprietary protocol, base it on TCP/IP as the lowest acceptable common denominator. Going beyond that would end up in a project that your grand-children would probably still be coding on. That's really as low as I would dare to go when it comes to network programming.

Whether you are using it as a library or not, look into EventMachine and its conceptual model. Overlooking event-driven ('non-blocking') IO in your journey would be negligent in the context of learning about/reinventing the right wheels. An appetizer for event-driven programming explaining the benefits of node.js as a web server.

Based on your requirements: asynchronous communication, multiple "subscribers" reacting to "events" that are centrally published; well that really sounds like a good candidate for an event-driven/message-based architecture.


Some books that may be helpful on your journey (Linux/C only, but the concepts are universal):

(Those were the classics)

  • The Linux programming interface - if you just intend to buy one book, let it be this one, I'm not entirely through yet but it is truly amazing and covers all the topics you need to know about for your adventure

Projects you may want to check out:

I have to read large files in C using read function. I was just wondering if it makes any difference what buffer size we keep in terms of performance. The file sizes may reach till tens of GB.

Short version.
It depends. On x86 buffer size of 4096 bytes is a good start (one page size and also Advanced Format block size).

Longer version.
In UNIX it depends on kernel, libc, filesystem, hardware, etc. Not only on versions and compilation options but also on run-time tunables(e.g read ahead setup).

DIY.
Test it! See Advanced Programing in UNIX Environment Chapter 3.9 "I/O Efficiency" for straightforward way of determining the best read-write buffer size for one particular system.

I am wondering if you can : lock only a line or a single character in a file in linux and the rest of the file should remain accessible for other processes? I received a task regarding simulating transaction on a file with c/c++ under linux . Please give me an answer and if this answer is yes ,give me some links from where i could take a peek to make this task.

Thanks, Madicemickael

Yes, this is possible.

The Unix way to do this is via fcntl or lockf. Whatever you choose, make sure to use only it and not mix the two. Have a look at this question (with answer) about it: fcntl, lockf, which is better to use for file locking?.

If you can, have a look at section 14.3 in Advanced Programming in the UNIX Environment.

Where are some lists of system calls on UNIX?

This wasn't my original question, but thanks anyway :)

What you should really do is pick up a copy of "Advanced Programming in the Unix Environment" by W. Richard Stevens. This is the classic book on how to program Unix-like OS's. The book is old and MacOs/iPhoneOs are a different most traditional flavors of Unix, but the book is a great way to learn the basics and get a feel for how the API's are supposed to be used. Check it out at Amazon

am very much interested in unix. Want to learn in and out. Can you guys help me by listing some books which can make me a wizard? Ultimately I want to become a unix programmer.

I am not a novice user in Unix.

You want system administration knowledge, or programming knowledge?

For programming:

For system administration:

As other responders have noted, Advanced Programming in the Unix Environment (APUE) is indispensable.

Other books that you might want to consider (these have more of a Linux focus, but are a good way to become familiar with Unix internals):

How to migrate to *nix platform after spending more than 10 years on windows? Which flavor will be easy to handle to make me more comfortable and then maybe I can switch over to more stadard *nix flavors? I have been postponing for a while now. Help me with the extra push.

Linux is the most accessible and has the most mature desktop functionality. BSD (in its various flavours) has less userspace baggage and would be easier to understand at a fundamental level. In this regard it is more like a traditional Unix than a modern Linux distribution. Some might view this as a good thing (and from certain perspectives it is) but will be more alien to someone familiar with Windows.

The main desktop distributions are Ubuntu and Fedora. These are both capable systems but differ somewhat in their userspace architecture The tooling for the desktop environment and default configuration for system security works a bit differently on Ubuntu than it does on most other Linux or Unix flavours but this is of little relevance to development. From a user perspective either of these would be a good start.

From a the perspective of a developer, all modern flavours of Unix and Linux are very similar and share essentially the same developer tool chain. If you want to learn about the system from a programmer's perspective there is relatively little to choose.

Most unix programming can be accomplished quite effectively with a programmer's editor such as vim or emacs, both of which come in text mode and windowing flavours. These editors are very powerful and have rather quirky user interfaces - the user interfaces are ususual but contribute significantly to the power of the tools. If you are not comfortable with these tools, this posting discusses several other editors that offer a user experience closer to common Windows tooling.

There are several IDEs such as Eclipse that might be of more interest to someone coming off Windows/Visual Studio.

Some postings on Stackoverflow that discuss linux/unix resources are:

If you have the time and want to do a real tour of the nuts and bolts Linux From Scratch is a tutorial that goes through building a linux installation by hand. This is quite a good way to learn in depth.

For programming, get a feel for C/unix from K&R and some of the resources mentioned in the questions linked above. The equivalent of Petzold, Prosise and Richter in the Unix world are W Richard Stevens' Advanced Programming in the Unix Environment and Unix Network Programming vol. 1 and 2.

Learning one of the dynamic languages such as Perl or Python if you are not already familiar with these is also a useful thing to do. As a bonus you can get good Windows ports of both the above from Activestate which means that these skills are useful on both platforms.

If you're into C++ take a look at QT. This is arguably the best cross-platform GUI toolkit on the market and (again) has the benefit of a skill set and tool chain that is transferrable back into Windows. There are also several good books on the subject and (as a bonus) it also works well with Python.

Finally, Cygwin is a unix emulation layer that runs on Windows and gives substantially unix-like environment. Architecturally, Cygwin is a port of glibc and the crt (the GNU tool chain's base libraries) as an adaptor on top of Win32. This emulation layer makes it easy to port unix/linux apps onto Cygwin. The platform comes with a pretty complete set of software - essentially a full linux distribution hosted on a Windows kernel. It allows you to work in a unix-like way on Windows without having to maintain a separate operating system installations. If you don't want to run VMs, multiple boots or multiple PCs it may be a way of easing into unix.

I have been studying signals in Linux. And I've done a test program to capture SIGINT.

#include <unistd.h>
#include <signal.h>
#include <iostream>
void signal_handler(int signal_no);
int main() {
  signal(SIGINT, signal_handler);
  for (int i = 0; i < 10; ++i) {
  std::cout << "I'm sleeping..." << std::endl;
  unsigned int one_ms = 1000;
  usleep(200* one_ms);
  }
  return 0;
}
void signal_handler(int signal_no) {
  if (signal_no == SIGINT)
    std::cout << "Oops, you pressed Ctrl+C!\n";
  return;
}

While the output looks like this:

I'm sleeping...
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
^COops, you pressed Ctrl+C!
I'm sleeping...
I'm sleeping...
I'm sleeping...

I understand that when pressing Ctrl+C, processes in foreground process group all receives a SIGINT(if no process chooses to ignore it).

So is it that the shell(bash) AND the the instance of the above program both received the signal? Where does the "^C" before each "Oops" come from?

The OS is CentOS, and the shell is bash.

The shell echoes everything you type, so when you type ^C, that too gets echoed (and in your case intercepted by your signal handler). The command stty -echo may or may not be useful to you depending on your needs/constraints, see the man page for stty for more information.

Of course much more goes on at a lower level, anytime you communicate with a system via peripherals device drivers (such as the keyboard driver that you use to generate the ^C signal, and the terminal driver that displays everything) are involved. You can dig even deeper at the level of assembly/machine language, registers, lookup tables etc. If you want a more detailed, in-depth level of understanding the books below are a good place to start:

The Design of the Unix OS is a good reference for these sort of things. Two more classic references: Unix Programming Environment and Advanced Programming in the UNIX Environment

Nice summary here in this SO question How does Ctrl-C terminate a child process?

"when youre run a program, for example find, the shell:

  • the shell fork itself
  • and for the child set the default signal handling
  • replace the child with the given command (e.g. with find)
  • when you press CTRL-C, parent shell handle this signal but the child will receive it - with the default action - terminate. (the child can implement signal handling too)"

I was just thrust into Linux programming (Red Hat) after several years of C++ on Win32. So I am not looking for the basics of programming. Rather I am looking to get up to speed with things unique to the Linux programming world, such as packages, etc. In other words, I need to know everything in https://www.redhat.com/courses/rhd251_red_hat_linux_programming/details/ without spending 3K. Any ideas of how I can acquire that knowledge quickly (and relatively cheaply)?

Update: The things that I am used to doing on Windows like building .exe and dlls using VC++, creating install scripts etc are just done differently on Linux. They use things like yum, make and make install, etc. Things like dependency walker that I take for granted in the windows world constantly send me to google while doing linux. Is there a 'set' of new skills somewhere that I can browse or is this more of a learn as you go?

The primary problem is this: As a very experienced programmer in Windows,I am having to ask simple questions like what's the difference between usr\bin and usr\local\bin and I would like to be prepared.

For POSIX and such I can recommend Advanced Programming in the UNIX Environment and having a bookmark to The single UNIX Specification.

For GCC/GDB and those tools I'm afraid I can't give you any good recommendation.

Hope that helps anyway.

Edit: Duck was slightly faster.

Edited because I had to leave a meeting when I originally submitted this, but wanted to complete the information

Half of that material is learning about development in a Unix-like environment, and for that, I'd recommend a book since it's tougher to filter out useful information from the start.

I'd urge you to go to a bookstore and browse through these books:

  • Advanced Programming in the Unix Environment by Stevens and Rago - this book covers threads, networking, IPC, signals, files, process management
  • Unix Network Programming, Volume 1 by Stevens - This book is focused on network programming techniques, design - you might not need this until much later
  • Unix/Linux System Administration - This book covers the more system administrator side of stuff, like directory structure of most Unix and Linux file systems (Linux distributions are more diverse than their Unix-named counterparts in how they might structure their file system)

    Other information accessible online:

  • GCC Online Manual - the comprehensive GNU GCC documentation

  • Beej's network programming guide - A really well written tutorial to network programming with the use of the BSD API. If you have done work with winsock, this should be mostly familiar to you.
  • Red Hat Enterprise Linux 5's Deployment Guide - talks specifically about Red Hat EL 5's basic administrative/deployment, like installing with package manager, a Red Hat system's directory structure...
  • make - Wikipedia article that will have links to the various make documentation out there
  • binutils - These are the Linux tools used for manipulating object/binaries.
  • GNU Build System - Wikipedia article about the traditional build system of GNU software, using autoconf/automake/autogen

Additionally, you will want to learn about ldd, which is like dependency walker in Windows. It lists a target binary's dependencies, if it has any.

And for Debugging, check out this StackOverflow thread which talks about a well written GDB tutorial and also links to an IBM guide.

Happy reading.

I know there are a number of questions about senior project ideas but I am specifically looking for a project that involves Unix system programming in C or (preferably) C++. I have the book which I used for one quarter but haven't had a chance to use since. I want to find a project that will give me as much experience with Unix system calls as possible.

My ideas so far:

  • Packet analyzer
  • Web server

Also, I would like to create a GUI for the application. Since it will be written in C or C++, I am leaning towards Qt4 since I would like to be able to run it on Mac OS X. I would appreciate recommendations in this area as well.

EDIT: As suggested by some answers, it does not have to have a GUI. That was just an idea. Although I can't think of many project ideas that don't involve one.

I would lean away from a gui.

It's a lot of work to learn gui programming and the framework.
It takes a lot longer to get the gui looking good than it does to do the algorithms.
It's hard to test.
It doesn't really 'count' as CS so your prof is not going to be impressed by pretty colors.

For a project you need something with a good theory behind it - some analyzable algorithm - just writing some large software that does something useful isn't really the point.

But back to your original question:
Packet analyzer - yes but so what? You print some packets out like ethereal. What are you going to do that's more interesting than writing a network version of "ls"

Web server - do you have any ideas for something that existing web servers don't do? Is there some interesting corner of the http protocol that Apache/IIS doesn't make use of?

I've searched for a while for a good book which covers server designed patterns. I'm looking for something along the lines of Gang of Four.

Concepts include:

-- Threaded vs Process vs combo based solutions
-- How to triage requests properly. i.e. I expect only limited requests from any domain, so I may only allocate a certain number of workers per domain.
-- Worker timeouts
-- poll/select/epoll use cases
-- And those things I don't know!

Any suggestions please!

Thanks!

Advanced Programming in the Unix Environment, 2nd Edition is a fantastic resource for learning the details of Unix systems programming. It's extremely well-written (one of my favorite books in the English Language), the depth is excellent, and the focus on four common environments (at the time of publication) help ensure that it is well-rounded. It's not too badly out of date -- new features in newer operating systems may be fantastic for specific problems, but this book really covers the basics very well.

The downside, of course, is that APUE2nd misses out on some fantastic third-party tools such as libevent, which can make programming sockets-based servers significantly easier. (And automatically picks the 'best' of select(2), poll(2), epoll(4), kpoll, and Windows event handling, for the platform.)

As for choosing between threads and processes, it comes down to: how much memory sharing do you want / need between tasks? If each process can run relatively isolated, processes provide better memory protection and no speed penalty. If processes need to interact with each other's objects, or objects 'owned' by a single thread, then threads provide better primitives for sharing data. (But many would argue that the shared memory of threads is an invitation to fun and exciting bugs. It Depends.)

Two very useful books:

The book Enterprise Integration Patterns provides a consistent vocabulary and visual notation to describe large-scale integration solutions across many implementation technologies. It also explores in detail the advantages and limitations of asynchronous messaging architectures. You will learn how to design code that connects an application to a messaging system, how to route messages to the proper destination and how to monitor the health of a messaging system. The patterns in the book are technology-agnostic and come to life with examples implemented in different messaging technologies, such as SOAP, JMS, MSMQ, .NET, TIBCO and other EAI Tools.

For my school project I am implementing a shell and I need help with job control. If we type a command, say cat &, then because of the & it should run in background, but it's not working. I have this code:

{
  int pid;  
  int status;  
  pid = fork();  
  if (pid == 0) {  
    fprintf(stderr, "Child Job pid = %d\n", getpid());  
    execvp(arg1, arg2);  
  } 
  pid=getpid();  
  fprintf(stderr, "Child Job pid is = %d\n", getpid());      
  waitpid(pid, &status, 0);  
}

Rather than just going straight to waiting, you should set up a signal handler for the SIGCHLD signal. SIGCHLD is sent whenever a child process stops or is terminated. Check out the GNU description of process completion.

The end of this article has a sample handler (which I've more or less copied and pasted below). Try modeling your code off of it.

 void sigchld_handler (int signum) {
     int pid, status, serrno;
     serrno = errno;
     while (1) {
         pid = waitpid(WAIT_ANY, &status, WNOHANG);
         if (pid < 0) {
             perror("waitpid");
             break;
         }
         if (pid == 0)
           break;
         /* customize here.
            notice_termination is in this case some function you would provide
            that would report back to your shell.
         */             
         notice_termination (pid, status);
     }
     errno = serrno;
 }

Another good source of information on this subject is Advanced Programming in the UNIX Environment, chapters 8 and 10.

Where is a good place to start if one is interested in Unix systems programming?

Any recommended reading, tutorials etc that are aimed at the beginner?

What knowledge is needed to start with systems programming?

Start with Mark Rochkind's "Advanced Unix Programming" if you can find it. Then graduate to Stevens "Advanced Programming in the Unix Environment".

I discovered this too for anyone interested. Apparently it is the "New standard" for linux programming. alt text

The Linux Programming Interface: A Linux and UNIX System Programming Handbook

Stevens is the bible. Read and understand this and his other books and you have most of what you need.

I'd like to do some hobby development of command line applications for UNIX in C. To narrow that down, I'd like to focus on the BSD family, specifically FreeBSD as my development machine is a Mac OS X 10.7 Lion box.

Searches for UNIX development have returned some from Addison Wesley, but I cannot find adequate documentation for FreeBSD. If there is a good general book on developing for either BSD or AT&T UNIX, I would be interested in that. please note I prefer books as I learn best that way.

Thanks,

Scott

Stevens "Advanced Programming in the Unix Environment". It covers FreeBSD but it's not FreeBSD specific. It is Unix specific, and covers all the bases you require.

I guess you could take a look at these:

Programming with POSIX Threads

The sockets Networking API

Interprocess Communications

Advanced Programming in the UNIX environment

The first three are very specific and would serve only if you need to focus on that particular subject. The last link is a highly rated book on Amazon that you may be interested in.

All in all, if you already have a grip of threads, IPC, networking, filesystem, all you need is the internet because there is widely available documentation about the POSIX API.

I am working on a debian system and have to communicate some processes so i am looking for some advise or documentation ...

As an imposed rule, i cannot use any library such as boost, so i am trying to choose between systemV IPC and POSIX ipc facilities , but i have not found any good document about the later. ┬┐Could you please help me?

Also i have been looking for a ipc best practices manual or something like that... Do you know anyone?

Thanks in advance (and forgive me for my English )

The following are great books describing all you are asking about:

Unix Systems Programming, Robbins and Robbins.

Advanced Programming in the UNIX Environment, Stevens.

They both do a great job talking about SYSV and POSIX IPC approaches and are the staple in college curriculum for CS.

Is a shell a normal CLI application, or is it different from an application that accepts input from standard input, and outputs the result on standard output?

A shell reads standard input, writes to standard output/error as appropriate, and executes other programs. If you are interested in what it takes to write one, I would recommend reading "UNIX Systems Programming" by Kay Robbins and Steve Robbins. I haven't read this version, the original copy was named "Practical UNIX Programming". It does contain sections devoted to process management that include writing a very basic shell. If you haven't read "Advanced Programming in the UNIX Environment" by Stevens, then I would suggest reading it as well.