TCP/IP Illustrated: The protocols

W. Richard Stevens, Gary R. Wright

Mentioned 27

This book's innovative approach helps readers at all levels to truly understand how TCP/IP really works. Rather than just describing what the RFCs say the protocol suite should do, TCP/IP Illustrated uses a popular diagnostic tool so you can actually watch the protocols in action. By forcing certain conditions to occur (connection establishment, timeout and retransmission, fragmentation, etc.) and watching the results, Rich Stevens provides insight into how the protocols work, and why certain design decisions were made. Written in his well-known style with lots of examples, Stevens shows how current, popular TCP/IP implementations operate (SunOS 4.1.3, Solaris 2.2, System V Release 4, BSD/386, AIX 3.2.2, and 4.4 BSD), and relates these real-world implementations to the RFC standards.

More on Amazon.com

Mentioned in questions and answers.

Are there any good books for a relatively new but not totally new *nix user to get a bit more in depth knowledge (so no "Linux for dummies")? For the most part, I'm not looking for something to read through from start to finish. Rather, I'd rather have something that I can pick up and read in chunks when I need to know how to do something or whenever I have one of those "how do I do that again?" moments. Some areas that I'd like to see are:

  • command line administration
  • bash scripting
  • programming (although I'd like something that isn't just relevant for C programmers)

I'd like this to be as platform-independent as possible (meaning it has info that's relevant for any linux distro as well as BSD, Solaris, OS X, etc), but the unix systems that I use the most are OS X and Debian/Ubuntu. So if I would benefit the most from having a more platform-dependent book, those are the platforms to target.

If I can get all this in one book, great, but I'd rather have a bit more in-depth material than coverage of everything. So if there are any books that cover just one of these areas, post it. Hell, post it even if it's not relevant to any of those areas and you think it's something that a person in my position should know about.

I've wiki'd this post - could those with sufficient rep add in items to it.

System administration, general usage books

Programming:

Specific tools (e.g. Sendmail)

Various of the books from O'Reilly and other publishers cover specific topics. Some of the key ones are:

Some of these books have been in print for quite a while and are still relevant. Consequently they are also often available secondhand at much less than list price. Amazon marketplace is a good place to look for such items. It's quite a good way to do a shotgun approach to topics like this for not much money.

As an example, in New Zealand technical books are usurously expensive due to a weak kiwi peso (as the $NZ is affectionately known in expat circles) and a tortuously long supply chain. You could spend 20% of a week's after-tax pay for a starting graduate on a single book. When I was living there just out of university I used this type of market a lot, often buying books for 1/4 of their list price - including the cost of shipping to New Zealand. If you're not living in a location with tier-1 incomes I recommend this.

E-Books and on-line resources (thanks to israkir for reminding me):

  • The Linux Documentation project (www.tldp.org), has many specific topic guides known as HowTos that also often concern third party OSS tools and will be relevant to other Unix variants. It also has a series of FAQ's and guides.

  • Unix Guru's Universe is a collection of unix resources with a somewhat more old-school flavour.

  • Google. There are many, many unix and linux resources on the web. Search strings like unix commands or learn unix will turn up any amount of online resources.

  • Safari. This is a subscription service, but you can search the texts of quite a large number of books. I can recommend this as I've used it. They also do site licences for corporate customers.

Some of the philosophy of Unix:

I recommend the Armadillo book from O'Reilly for command line administration and shell scripting.

alt text

Jason,

Unix Programming Environment by Kernighan and Pike will give you solid foundations on all things Unix and should cover most of your questions regarding shell command line scripting etc.

The Armadillo book by O'Reilly will add the administration angle. It has served me well!

Good luck!

The aforementioned Unix Power Tools is a must. Other classics are sed&awk and Mastering Regular Expressions. I also like some books from the O'Reilly "Cookbook" series:

Think MUDs/MUCKs but maybe with avatars or locale illustrations. My language of choice is ruby.

I need to handle multiple persistent connections with data being asynchronously transferred between the server and its various clients. A single database must be kept up-to-date based on activity occurring in the client sessions. Activity in each client session may require multiple other clients to be immediately updated (a user enters a room; a user sends another user a private message).

This is a goal project and a learning project, so my intention is to re-invent a wheel or two to learn more about concurrent network programming. However, I am new to both concurrent and network programming; previously I have worked almost exclusively in the world of non-persistent, synchronous HTTP requests in web apps. So, I want to make sure that I'm reinventing the right wheels.

Per emboss's excellent answer, I have been starting to look at the internals of certain HTTP servers, since web apps can usually avoid threading concerns due to how thoroughly the issue is abstracted away by the servers themselves.

I do not want to use EventMachine or GServer because I don't yet understand what they do. Once I have a general sense of how they work, what problems they solve and why they're useful, I'll feel comfortable with it. My goal here is not "write a game", but "write a game and learn how some of the lower-level stuff works". I'm also unclear on the boundaries of certain terms; for example, is "I/O-unbound apps" a superset of "event-driven apps"? Vice-versa?

I am of course interested in the One Right Way to achieve my goal, if it exists, but overall I want to understand why it's the right way and why other ways are less preferable.

Any books, ebooks, online resources, sample projects or other tidbits you can suggest are what I'm really after.

The way I am doing things right now is by using IO#select to block on the list of connected sockets, with a timeout of 0.1 seconds. It pushes any information read into a thread-safe read queue, and then whenever it hits the timeout, it pulls data from a thread-safe write queue. I'm not sure if the timeout should be shorter. There is a second thread which polls the socket-handling thread's read queue and processes the "requests". This is better than how I had it working initially, but still might not be ideal.

I posted this question on Hacker News and got linked to a few resources that I'm working through; anything similar would be great:

Although you probably don't like to hear it I would still recommend to start investigating HTTP servers first. Although programming for them seemed boring, synchronous, and non-persistent to you, that's only because the creators of the servers did their job to hide the gory details from you so tremendously well - if you think about it, a web server is so not synchronous (it's not that millions of people have to wait for reading this post until you are done... concurrency :) ... and because these beasts do their job so well (yeah, I know we yell at them a lot, but at the end of the day most HTTP servers are outstanding pieces of software) this is the definite starting point to look into if you want to learn about efficient multi-threading. Operating systems and implementations of programming languages or games are another good source, but maybe a bit further away from what you intend to achieve.

If you really intend to get your fingers dirty I would suggest to orient yourself at something like WEBrick first - it ships with Ruby and is entirely implemented in Ruby, so you will learn all about Ruby threading concepts there. But be warned, you'll never get close to the performance of a Rack solution that sits on top of a web server that's implemented in C such as thin.

So if you really want to be serious, you would have to roll your own server implementation in C(++) and probably make it support Rack, if you intend to support HTTP. Quite a task I would say, especially if you want your end result to be competitive. C code can be blazingly fast, but it's all to easy to be blazingly slow as well, it lies in the nature of low-level stuff. And we haven't discussed memory management and security yet. But if it's really your desire, go for it, but I would first dig into well-known server implementations to get inspiration. See how they work with threads (pooling) and how they implement 'sessions' (you wanted persistence). All the things you desire can be done with HTTP, even better when used with a clever REST interface, existing applications that support all the features you mentioned are living proof for that. So going in that direction would be not entirely wrong.

If you still want to invent your own proprietary protocol, base it on TCP/IP as the lowest acceptable common denominator. Going beyond that would end up in a project that your grand-children would probably still be coding on. That's really as low as I would dare to go when it comes to network programming.

Whether you are using it as a library or not, look into EventMachine and its conceptual model. Overlooking event-driven ('non-blocking') IO in your journey would be negligent in the context of learning about/reinventing the right wheels. An appetizer for event-driven programming explaining the benefits of node.js as a web server.

Based on your requirements: asynchronous communication, multiple "subscribers" reacting to "events" that are centrally published; well that really sounds like a good candidate for an event-driven/message-based architecture.


Some books that may be helpful on your journey (Linux/C only, but the concepts are universal):

(Those were the classics)

  • The Linux programming interface - if you just intend to buy one book, let it be this one, I'm not entirely through yet but it is truly amazing and covers all the topics you need to know about for your adventure

Projects you may want to check out:

So I want to learn all about networks. Well below the socket, down to raw sockets and stuff. And I want to understand hubs, routers, access points, etc. For example, I'd like to be able to write my own software to do this kind of stuff.* Is there a great source for this kind of information?

I know that I'm asking a LOT here, and that to fully explain it all requires from high level down to low level. I guess I'm looking for a source similar in scope and depth to Applied Cryptography, but about networks.

Thanks to anyone who can help to point me (and others like me?) in the right direction.

* Yes, I realize using any of my hand-crafted network stack code would be a huge security issue, and am only looking to do it to learn :)

Similar Question: here. However I'm looking for more than just 'what's below TCP/UDP sockets?'.

Edited for Clarification: The depth I'm talking about is above the driver level. So assuming that the bits can make it to and from the other end of the wire, what next?

I learned IP networking from TCP/IP Illustrated. Highly recommended.

In order to cover for my (glaring) lack of knowledge in the basics of networking, I'm looking for a book which would ideally cover:

-> 1 or 2 chapters on the transport layer: tcp, udp...

-> 1 or 2 chapters on the application layer: http, dns...

-> rest of the book would be devoted to pratical way of sending data across the wire using Java-related technologies. This would involve discussions about existing products (eg. hessian, protobuf, thrift, tibco...) , performances comparisons, case studies...etc..

Does such a book exist ?

Edit: Thanks for all the answers so far... however most of the books listed focus heavily on the lower levels of the networking stack (ie. tcp/ip, network administration...). This is one-half of the answer only. I'm still eager to hear suggestions about the other half: discussions around the "state of the art" options available to the Java developer to ferry data around, what products/frameworks are available and how do they compare.

For a TCP/IP text (Not Java centric)

For a Java Networking book I would go with this. Most books are very dated and do not cover the newer stuff, this one covers NIO as well as uses generics in the examples.

If you are looking for improving upon basics on networking it would be better if you look at books which cover basics of networking. Once you are comfortable with the basics of networking you can start with the networking section in Java tutorial and explore the appropriate Java libraries. Networking is an area of its own whose understanding is independent of any programming language.

That said, some of the networking books which I have found helpful are :

Internetworking with TCP/IP, Vol 1 by Douglas Comer

TCP/IP Illustrated Vol.1 by W.Richard Stevens

Computer Networks by Andrew.S.Tanenbaum

As a primer on networking in general, I'd recommend TCP/IP Network Administration, Third Edition, by Craig Hunt. This book provides a chapter on the TCP/IP stack, another on Addressing and routing and the remainder of the book covers in reasonable depth most common network services and diagnostic tools.

For a heavyweight reference, get TCP/IP Illustrated, Vol 1: The Protocols, by Richard Stephens, if you become obsessed with networks buy or borrow volumes 2 and 3.

As far as Java specific networking introduction, I'd suggest Java Network Programming, Third Edition, by Elliotte Rusty Harold, this book does take some critiscim but I still believe it's a good introduction and is an approachable read.

It's a general book for Java beginners but the part about networking is very, VERY clear and easy to grasp.

Head First Java, 2nd Edition

Every day I write Web applications, and I have a good understanding of HTTP. However I want to close the gaps in my knowlege of network architecture. I'm not a sysadmin, so a hard-core sysadmin reference book would probably be a bit much for me, but I'm also not looking for a book on how to write code in any way -- I'm interested in the mechanisms underneath all that fun Web code I write.

Any recommendations?

I have some question regarding how Internet checksum is calculated. I couldn't find any good explaination from the book, so I ask it here. I'm not sure if this is the correct place to ask, so I'm sorry if I asked it in wrong place.

If you look at the following example. The following two message is sent 10101001, and 00111001. The checksum is calculated with 1's complement. So far I understood. But how is the sum calculated? At first I thought it maybe is XOR, but it seems not to be the case.

              10101001
              00111001
              --------
   Sum        11100010
   Checksum:  00011101

And then when they calculate if the msg arrived OK. And once again how is the sum calculated?

               10101001
               00111001
               00011101
               --------
   Sum         11111111
   Complement  00000000  means that the pattern is O.K.

If by internet checksum you mean TCP Checksum there's a good explination here and even some code.

When you're calculating the checksum remember that it's not just a function of the data but also of the "pseudo header" which puts the source IP, dest IP, protocol, and length of the TCP packet into the data to be checksummed. This ties the tcp meta-data to some data in the IP header.

TCP/IP Illustrated Vol 1 is a good reference for this and explains it all in detail.

I recently started attending two classes in school that focus on networking, one regarding distributed systems and another regarding computer networks in general. After completing the first labs for the two classes, I now have a pretty good understand of network protocol and socket concepts with both C and Java.

Now I'm trying to move beyond the basic concepts and become better at communication class and object design, network design patterns, intermediate socket/stream management conventions, important libraries, and general *nix network programming intermediate techniques in either C or OO languages.

Can you suggest any resources that you've had success with?

Unix network programming by Richard Stevens is a must-have book which discusses many advanced network programming techniques. I've been doing network programming for years, and even now hardly a day goes by without me looking up something in this great reference.

Steven's Know it all book about Network Programming is too detailed to start with and contains library to encapsulate the real socket programming.

I would recommend to start with the Beej's Guide to Network Programing and then move to the TCP/IP Sockets in C. They give a a lot good basics about the network programming and provide a platform to finally go through the Stevens book.

Stevens others books like TCP/IP illustrated series covers all the conceptual part of networks.

I learned network programming from the Linux Socket Programming By Example book by Warren W. Gray.

Ok first of all I like to mention what im doing is completely ethical and yes I am port scanning.

The program runs fine when the port is open but when I get to a closed socket the program halts for a very long time because there is no time-out clause. Below is the following code

int main(){

    int err, net;
    struct hostent *host;
    struct sockaddr_in sa;

    sa.sin_family = AF_INET;

    sa.sin_port = htons(xxxx);
    sa.sin_addr.s_addr = inet_addr("xxx.xxx.xxx.xxx");

    net = socket(AF_INET, SOCK_STREAM, IPPROTO_TCP);
    err = connect(net, (struct sockaddr *)&sa, sizeof(sa));

    if(err >= 0){ cout << "Port is Open"; }
    else { cout << "Port is Closed"; }

}

I found this on stack overflow but it just doesn't make sense to me using a select() command.

Question is can we make the connect() function timeout so we dont wait a year for it to come back with an error?

If you're dead-set on using blocking IO to get this done, you should investigate the setsockopt() call, specifically the SO_SNDTIMEO flag (or other flags, depending on your OS).

Be forewarned these flags are not reliable/portable and may be implemented differently on different platforms or different versions of a given platform.

The traditional/best way to do this is via the nonblocking approach which uses select(). In the event you're new to sockets, one of the very best books is TCP/IP Illustrated, Volume 1: The Protocols. It's at Amazon at: http://www.amazon.com/TCP-Illustrated-Protocols-Addison-Wesley-Professional/dp/0201633469

Recently I started taking this guide to get myself started on downloading files from the internet. I read it and came up with the following code to download the HTTP body of a website. The only problem is, it's not working. The code stops when calling the recv() call. It does not crash, it just keeps on running. Is this my fault? Am I using the wrong approch? I intent to use the code to not just download the contents of .html-files, but also to download other files (zip, png, jpg, dmg ...). I hope there's somebody that can help me. This is my code:

#include <stdio.h>
#include <sys/socket.h> /* SOCKET */
#include <netdb.h> /* struct addrinfo */
#include <stdlib.h> /* exit() */
#include <string.h> /* memset() */
#include <errno.h> /* errno */
#include <unistd.h> /* close() */
#include <arpa/inet.h> /* IP Conversion */

#include <stdarg.h> /* va_list */

#define SERVERNAME "developerief2.site11.com"
#define PROTOCOL "80"
#define MAXDATASIZE 1024*1024

void errorOut(int status, const char *format, ...);
void *get_in_addr(struct sockaddr *sa);

int main (int argc, const char * argv[]) {
    int status;

    // GET ADDRESS INFO
    struct addrinfo *infos; 
    struct addrinfo hints;

    // fill hints
    memset(&hints, 0, sizeof(hints));
    hints.ai_socktype = SOCK_STREAM;
    hints.ai_flags = AI_PASSIVE;
    hints.ai_family = AF_UNSPEC;

    // get address info
    status = getaddrinfo(SERVERNAME, 
                         PROTOCOL, 
                         &hints, 
                         &infos);
    if(status != 0)
        errorOut(-1, "Couldn't get addres information: %s\n", gai_strerror(status));

    // MAKE SOCKET
    int sockfd;

    // loop, use first valid
    struct addrinfo *p;
    for(p = infos; p != NULL; p = p->ai_next) {
        // CREATE SOCKET
        sockfd = socket(p->ai_family, 
                        p->ai_socktype, 
                        p->ai_protocol);
        if(sockfd == -1)
            continue;

        // TRY TO CONNECT
        status = connect(sockfd, 
                         p->ai_addr, 
                         p->ai_addrlen);
        if(status == -1) {
            close(sockfd);
            continue;
        }

        break;
    }

    if(p == NULL) {
        fprintf(stderr, "Failed to connect\n");
        return 1;
    }

    // LET USER KNOW
    char printableIP[INET6_ADDRSTRLEN];
    inet_ntop(p->ai_family,
              get_in_addr((struct sockaddr *)p->ai_addr),
              printableIP,
              sizeof(printableIP));
    printf("Connection to %s\n", printableIP);

    // GET RID OF INFOS
    freeaddrinfo(infos);

    // RECEIVE DATA
    ssize_t receivedBytes;
    char buf[MAXDATASIZE];
    printf("Start receiving\n");
    receivedBytes = recv(sockfd, 
                         buf, 
                         MAXDATASIZE-1, 
                         0);
    printf("Received %d bytes\n", (int)receivedBytes);
    if(receivedBytes == -1)
        errorOut(1, "Error while receiving\n");

    // null terminate
    buf[receivedBytes] = '\0';

    // PRINT
    printf("Received Data:\n\n%s\n", buf);

    // CLOSE
    close(sockfd);

    return 0;
}

void *get_in_addr(struct sockaddr *sa) {
    // IP4
    if(sa->sa_family == AF_INET)
        return &(((struct sockaddr_in *) sa)->sin_addr);

    return &(((struct sockaddr_in6 *) sa)->sin6_addr);
}

void errorOut(int status, const char *format, ...) {
    va_list args;
    va_start(args, format);
    vfprintf(stderr, format, args);
    va_end(args);
    exit(status);
}

If you want to grab files using HTTP, then libcURL is probably your best bet in C. However, if you are using this as a way to learn network programming, then you are going to have to learn a bit more about HTTP before you can retrieve a file.

What you are seeing in your current program is that you need to send an explicit request for the file before you can retrieve it. I would start by reading through RFC2616. Don't try to understand it all - it is a lot to read for this example. Read the first section to get an understanding of how HTTP works, then read sections 4, 5, and 6 to understand the basic message format.

Here is an example of what an HTTP request for the stackoverflow Questions page looks like:

GET http://stackoverflow.com/questions HTTP/1.1\r\n
Host: stackoverflow.com:80\r\n
Connection: close\r\n
Accept-Encoding: identity, *;q=0\r\n
\r\n

I believe that is a minimal request. I added the CRLFs explicitly to show that a blank line is used to terminate the request header block as described in RFC2616. If you leave out the Accept-Encoding header, then the result document will probably be transfered as a gzip-compressed stream since HTTP allows for this explicitly unless you tell the server that you do not want it.

The server response also contains HTTP headers for the meta-data describing the response. Here is an example of a response from the previous request:

HTTP/1.1 200 OK\r\n
Server: nginx\r\n
Date: Sun, 01 Aug 2010 13:54:56 GMT\r\n
Content-Type: text/html; charset=utf-8\r\n
Connection: close\r\n
Cache-Control: private\r\n
Content-Length: 49731\r\n
\r\n
\r\n
\r\n
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" ... 49,667 bytes follow

This simple example should give you an idea what you are getting into implementing if you want to grab files using HTTP. This is the best case, most simple example. This isn't something that I would undertake lightly, but it is probably the best way to learn and appreciate HTTP.

If you are looking for a simple way to learn network programming, this is a decent way to start. I would recommend picking up a copy of TCP/IP Illustrated, Volume 1 and UNIX Network Programming, Volume 1. These are probably the best way to really learn how to write network-based applications. I would probably start by writing an FTP client since FTP is a much simpler protocol to start with.

If you are trying to learn the details associated with HTTP, then:

  1. Buy HTTP: the Definitive Guide and read it
  2. Read RFC2616 until you understand it
    • Try examples using telnet server 80 and typing in requests by hand
    • Download the cURL client and use the --verbose and --include command line options so that you can see what is happening
  3. Read Fielding's dissertation until HTTP really makes sense.

Just don't plan on writing your own HTTP client for enterprise use. You do not want to do that, trust me as one who has been maintaining such a mistake for a little while now...

I have noticed that viewing images or websites that are hosted on US servers (Im in europe) is considerably slower. The main reason would be the latency because of the distance.

But if 1 packet takes n milliseconds to be received, can't this be alleviated by sending more packets simultaneously?

Does this actually happen or are the packets sent one by one? And if yes what determines how many packets can be send simultaneously (something to do with the cable i guess)?

TCP uses what's called a sliding window. Basically the amount of buffer space, X, the receiver has to re-assemble out of order packets. The sender can send X bytes past the last acknowledged byte, sequence number N, say. This way you can fill the pipe between sender and receiver with X unacknowledged bytes under the assumption that the packets will likely get there and if not the receiver will let you know by not acknowledging the missing packets. On each response packet the receiver sends a cumulative acknowledgment, saying "I've got all the bytes up to byte X." This lets it ack multiple packets at once.

Imagine a client sending 3 packets, X, Y, and Z, starting at sequence number N. Due to routing Y arrives first, then Z, and then X. Y and Z will be buffered at the destination's stack and when X arrives the receiver will ack N+ (the cumulative lengths of X,Y, and Z). This will bump the start of the sliding window allowing the client to send additional packets.

It's possible with selective acknowledgement to ack portions of the sliding window and ask the sender to retransmit just the lost portions. In the classic scheme is Y was lost the sender would have to resend Y and Z. Selective acknowledgement means the sender can just resend Y. Take a look at the wikipedia page.

Regarding speed, one thing that may slow you down is DNS. That adds an additional round-trip, if the IP isn't cached, before you can even request the image in question. If it's not a common site this may be the case.

TCP Illustrated volume 1, by Richard Stevens is tremendous if you want to learn more. The title sounds funny but the packets diagrams and annotated arrows from one host to the other really make this stuff easier to understand. It's one of those books that you can learn from and then end up keeping as a reference. It's one of my 3 go-to books on networking projects. alt text

I am looking for a primer to learn TCP/IP basic knowledge.

Can someone give me some suggestions on books or online resources?

thank you

// Update the title based on comments

Stevens' TCP/IP Illustrated is still a very good resource to learn the basics of protocols.

HI,

Let us say that there are 10 packets 1-10 and the 6th Packet gets dropped because of a network fault. Does TCP resend's all packets from 6-10 or just 6th ?

It will resend all packets 6 through 10. In fact, since the receiver only tells the sender what sequence number was the last good one, the sender may choose to split up the packets differently (ie. by consolidating packets 6 through 10 into one bigger packet) when resending.

However, I should note that in all my years of socket programming, I've never actually needed to know that detail. I've never written an actual TCP driver, which is the only place you'd need to know that information.

The TCP/IP Illustrated series of books is an excellent resource for this.

I'm having a problem doing network programming. I'm using the TCP protocol to communicate between server and client. My code is working, but I as yet can't detect that the data is successfully sent or if it failed. I have the following questions:

  1. How does one check that the bytes have been sent with socket TCP successfully or not?
  2. How acknowledgment (ACK) is work in TCP protocoll ?
  3. How does one do secure communication using socket programing?

You can explain in C#, Java or PHP.

An explanation of how TCP/IP works is best left as a research exercise, rather than an ad hoc SO question. While the usual first stop might be Wikipedia, that probably doesn't give you a good introduction from a conceptual level. In the comments there is also a link to a TCP tutorial.

However in essence, TCP is, as the name says, a transmission control protocol that provides a mechanism for the reliable delivery of a byte stream. Given a stream of bytes, TCP basically ensures that the receiver receives the bytes in the correct order. If a byte in the middle of a stream got "lost", then TCP can detect this and will arrange its re-transmission. All of this is transparent to the receiving party. The magic of TCP is that the receiving party just reads from the socket, and the data appears intact and in the correct order.

When you utilise a TCP socket, you don't deal with ACKs and NACKs and retransmissions. All of that is transparent to your application.

Now, you can detect that the other end has gone away, but you can't know that the other end definitely did or did not receive your message. This is the Two General's Problem.

But really, if you're interested in how TCP works read W Richard Stevens' TCP/IP Illustrated. Not trying to palm you off, but to understand it, you really do need to go away and read about it (either online or on dead trees).

I'm starting a new job soon with a manufacturer and supplier of fibre-optic multiplexers. I'm not expected to be a techie, but can anyone recommend some books on networking (not necessarily just optical) that would give me a good foundation. My current networking knowledge is minimal.

For a basic introduction to the internet and the full-stack of technologies... Take a look at Stevens' TCP/IP Illustrated, Vol I or Doug Comer's Internetworking with TCP/IP, Vol I. You should be able to find either one in your public library...

For cabling and network technician's books:

I've heard about these TCP/IP books which apparently seem to focus on TCP/IP in UNIX

TCP-Illustrated-Vol.1
TCP-Illustrated-Vol.2

Apart from the code introduced in these books, Are there any differences between TCP/IP implementation from windows?

If yes, Can you suggest some other TCP/IP books for windows platform?

Network Programming for Microsoft Windows is THE book. It's quite old, but that doesn't matter. It's not a simple subject, the book can be a bit dense and it's not a hand holding 'for dummies' book, but I've yet to find a better book on the subject.

Of course, you also probably want to have the two Stevens' books that you link to to hand as well as they're great for the platform independent stuff.

As for differences, well, Windows more or less implements the BSD socket API but also provides alternative APIs which are often more appropriate for the Windows platform. Things like overlapped I/O and IO Completion Ports are a much more sensible route to take on Windows if you are looking for scalability and server side coding. The BSD API is probably fine for simple servers and single threaded clients. If you need some example code for IO Completion Port based designs then I have some here.

Has anyone got the experience to implement a web server? I got the following questions:

Q1 - What major problems could be involved during the design and implementing a web server?

Q2 - What major technologies could be used to solve the problems in Q1?

Q3 - Are there any books related to this area? I know Apache is open source, is there any book addrssing it?

This could be a big problem. Any comments will be deeply appreciated, be it general or detailed.

Many thanks.

I worked on a simple one written in C for an university course. Our version implemented HTTP version 0.9, much simpler than 1.0 or 1.1.

We started reading the specs (here you find rfc for HTTP/1.1). We had this book as reference for the course. It's a very good read! There you can find in detail how tcp and ip works. It builds the basis for programming network stuff. Another good reference book is "Unix network programming" (same author) or if you already have some background you might take a look at Beej's Guide to Network Programming .

The experience for me was very enlightening on how a server works, how to read specs and in general on unix programming. My suggestions: if you want to give a try at implementing one start with a small subset of the specs and use a high level programming language.

As others said there's probably no need for yet another webserver, but it's a good learning excercise.

If I open a browser and send a request to http://255.255.255.255, is it possible have a web server, on the same subnet, listening and respond?

HTTP servers use the TCP protocol, and broadcast packets can only be sent through the stateless UDP protocol.

Quoting W. Richard Stevens from his classic book TCP/IP Illustrated (Chapter 12):

Broadcasting and multicasting only apply to UDP, where it makes sense for an application to send a single message to multiple recipients. TCP is a connection-oriented protocol that implies a connection between two hosts (specified by IP addresses) and one process on each host (specified by port numbers).

So, I guess the answer is no.

I have several programs listening to the same multicast stream, I'm wondering will this doubling the traffic compared with only one program listening or the traffic/bandwidth usage are the same? thanks!

The short answer is no, the amount of traffic is the same. I'll caveat that with "in most cases". Multicast packets are written to the wire using a MAC address constructed from the multicast group address. Joining a multicast group is essentially telling the NIC to listen to the appropriate MAC address. This makes each listener receive the same ethernet frame. The caveat has to do with how multicast routing may or may not work. If you have a multicast aware router then multicast traffic may traverse the router onto other networks if someone has joined the group on another subnet.

I recommend reading "TCP/IP Illustrated, Volume 1" if you plan on doing a lot of network programming. This is the best way to really understand how all of the protocols fit together.

I'm using Boost asio to send a TCP message. I set the NO_DELAY option because this is a 'real time' control system. I see the PSH flag set in the message using Wireshark. I am happy with the performance and it is working as expected.

For interest, I decided to turn the NO_DELAY off and measure the performance difference.

I swapped my existing code:

m_tcpSocket.open(boost::asio::ip::tcp::v4());
boost::asio::ip::tcp::no_delay noDelayOption(true);
m_tcpSocket.set_option(noDelayOption);

// snip create endpoint
m_tcpSocket.connect(m_tcpServerEndpoint);

// snip build message
m_tcpSocket.send(boost::asio::buffer(pDataBuffer, size));

for

boost::asio::ip::tcp::no_delay noDelayOption(false);
m_tcpSocket.set_option(noDelayOption);

and I still see the PSH flag set.

I also tried removing the set_option code and still see it set.

In Wireshark I see:

104 - 105  SYN
105 - 104  SYN, ACK
104 - 105  ACK
104 - 105  PSH, ACK + my message
105 - 104  ACK

where 104 and 105 are IP addresses of my 2 PCs. I am also surprised that the message with my data has an ACK.

How do I turn NO_DELAY off?

Your code looks as though it is properly setting TCP_NODELAY on or off. To set TCP_NODELAY off, use:

socket.set_option(boost::asio::ip::tcp::no_delay(false));

The TCP RFC defines PSH as the push function. In short, it is a flag that informs the receiver that all data has been sent, so forward data up the protocol stack. Boost.Asio maps its API to BSD sockets, and BSD sockets do not provide a way to control the PSH flag. This is often handled by the kernel within the protocol stack, when it clears its buffer.

From TCP/IP Illustrated:

This flag is conventionally used to indicate that the buffer at the side sending the packet has been emptied in conjunction with sending the packet. In other words, when the packet with the PSH bit field set left the sender, the sender had no more data to send.

[...]

Push (the receiver should pass this data to the application as soon as possible—not reliably implemented or used).

I understand the basics of networking such as Lan and stuff. I know what many of the protocols are and how to build a client/server socket program in C. But what I really want is a very good understanding of how networks actually work. Not only from a programming aspect but also from a application aspect. I am looking for some material(preferably a book) which will give me a very good foundation to go off of. I am in the middle of wanting to be a programmer or a UNIX admin, so I really should learn and know how to apply networking fundamentals.

Does any such a concise resource exist? Would it be better going the more academic route by buying a networking book(such as those from Tanenbaum or Kurose), or is it better to go the It route possibly looking into network admin text or certification books.

Thank you all so much.

For fundamentals, you may want to get the W. Richard Stevens Classic, TCP/IP Illustrated, and possibly his other books as well. There will not be any more of them, either.

I'm having a mental block for the words describing data flow in a communications protocol + google isn't helping, due to information glut.

In the following scenarios A and B are communicating to each other.

  • command or request: a packet of data going from A to B indicating that B should take some kind of action
  • response: a packet of data going from B to A in response to a particular packet that A has previously sent to B.
  • acknowledge or ACK: a specific kind of response that just indicates Yes I got that packet of data. (negative acknowledge or NAK indicates No there was some problem receiving data)
  • {X}: unsolicited information either from A to B, or B to A, which is neither a response, nor a request for the recipient to take action. Examples: datalogging packets, notification packets, etc.

I can't think of what to call {X}, I'm having a brain cramp.

Also are there other common words in this category? Where would you look them up?

I would following the terminology in Steven's TCP/IP Illusrated

So you have a request, response, acknowledement, push and poll are used if I remember correctly

is there any quick guide to understand basic concept of computer networking like layers of networking tcp/ip and how to use it in programming language like c ? i am not talking about books but some tutorials available on net.

If you really want a good insight into TCP/IP, then unfortunately I need to point you at this book:

"TCP/IP Illustrated, Vol. 1: The Protocols" by W. Richard Stevens

So i started to learn tcp/ip protocol stack. But in all sources tcp protocol is described too blurry. The main think i want to know how actually tcp protocol programly implemented. how applyed protocols communicate with tcp protocol. what is interface of this communication. For now i think that applied protocols are implemented directly in applied program: lets say browser implement HTTP protocol and this protocol communicate with centralized implementation of tcp protocol which is implemented in OS. IS it correct i have lack of sources from which i can learn this.please recomend me something to read.

Note: while your question is leaning towards being broad, I am answering it since I think that it is a good introductory question.

TCP is a layer-4 (or transports layer) protocol. Network applications sit on top of it (and other layer-4 protocols like UDP). Applications can interface with Layer-4 protocols via a socket interface (http://linux.die.net/man/7/socket). HTTP is also an application that runs on top of TCP and would be using the socket interface. Besides HTTP, there many other famous applications that run on top of TCP, like Telnet, BGP, etc.

One of the best book to understand basics of TCP and options would be "TCP/IP Illustrated, Vol. 1: The Protocols" by Richard Stevens. It talks about how TCP works and various options. Here is a link: http://www.amazon.com/TCP-Illustrated-Vol-Addison-Wesley-Professional/dp/0201633469

Once you have read that, you probably should read the RFC itself: http://www.ietf.org/rfc/rfc793.txt

For details of implementation, you can read its second volume: "TCP/IP Illustrated: The Implementation, Vol. 2". Here is a link: http://www.amazon.com/TCP-IP-Illustrated-Implementation-Vol/dp/020163354X . While this books talks about BSD implementation -- it should still help you understand the basic mechanics of how TCP implementation works.

I am learning Computer Networking this semester, on which I find it quite interesting in learning why Internet is designed like today. And I also enjoy reading paper referred to in the teaching slides, like End-to-End Argument in System Design and The Design Philosophy of the DARPA Internet Protocols.

Could you recommend some other interesting paper/articles to me, especially those related to higher layers like TCP/IP protocols?

Many thanks for your reply.

If you want an introduction to what (not why) the TCP/IP protocols are, a classic book is TCP/IP Illustrated, Volume 1: The Protocols.

The tcpdump log put below is copied from an test I was running recently. At the beginning everything went very smoothly. Then the client side finally overwhelmed a router, then a lot of packets [# - 6176] get dropped(never see ACK for them). Then at 6177 a re-transmission is triggered due to rto timer timed out.

So here are the questions:

  1. When there is re-transmission, what will happen to sender side congestion window (snd_cwnd)? The os is linux kernel 3.4.42. As is said the snd_cwnd will be reduced to 1 when there is re-transmission. If this is the case, why packet 6179, 6180 can still be sent?
  2. why 6179, 6180 did not get ACKed? Instead 6178 can get ACKed, means packets can get through.
6174    2.881075    10.203.85.190   207.198.102.53  TCP 1426    58206 > 80 [ACK] Seq=6379071 Ack=1 Win=13824 Len=1358 TSval=4294945643 TSecr=2532115493
6175    2.881094    10.203.85.190   207.198.102.53  TCP 1426    58206 > 80 [ACK] Seq=6380429 Ack=1 Win=13824 Len=1358 TSval=4294945643 TSecr=2532115493
6176    2.881114    10.203.85.190   207.198.102.53  TCP 1426    58206 > 80 [ACK] Seq=6381787 Ack=1 Win=13824 Len=1358 TSval=4294945643 TSecr=2532115493
6177    3.227347    10.203.85.190   207.198.102.53  TCP 1426    [TCP Retransmission] 58206 > 80 [ACK] Seq=5887475 Ack=1 Win=13824 Len=1358 TSval=4294945685 TSecr=2532115493
6178    3.323055    207.198.102.53  10.203.85.190   TCP 68  http > 58206 [ACK] Seq=1 Ack=5888833 Win=980480 Len=0 TSval=2532115623 TSecr=4294945685
6179    3.326368    10.203.85.190   207.198.102.53  TCP 1426    58206 > 80 [ACK] Seq=6383145 Ack=1 Win=13824 Len=1358 TSval=4294945694 TSecr=2532115623
6180    3.326454    10.203.85.190   207.198.102.53  TCP 1426    58206 > 80 [ACK] Seq=6384503 Ack=1 Win=13824 Len=1358 TSval=4294945694 TSecr=2532115623
6181    3.727429    10.203.85.190   207.198.102.53  TCP 1426    [TCP Retransmission] 58206 > 80 [ACK] Seq=5888833 Ack=1 Win=13824 Len=1358 TSval=4294945735 TSecr=2532115623
6182    3.813101    207.198.102.53  10.203.85.190   TCP 68  80 > 58206 [ACK] Seq=1 Ack=5890191 Win=980480 Len=0 TSval=2532115746 TSecr=4294945735
6183    3.813606    10.203.85.190   207.198.102.53  TCP 1426    58206 > 80 [ACK] Seq=6385861 Ack=1 Win=13824 Len=1358 TSval=4294945743 TSecr=2532115746
6184    3.813822    10.203.85.190   207.198.102.53  TCP 1426    58206 > 80 [ACK] Seq=6387219 Ack=1 Win=13824 Len=1358 TSval=4294945743 TSecr=2532115746
6185    4.197341    10.203.85.190   207.198.102.53  TCP 1426    [TCP Retransmission] 58206 > 80 [ACK] Seq=5890191 Ack=1 Win=13824 Len=1358 TSval=4294945782 TSecr=2532115746
6186    4.294162    207.198.102.53  10.203.85.190   TCP 68  80 > 58206 [ACK] Seq=1 Ack=5891549 Win=980480 Len=0 TSval=2532115866 TSecr=4294945782
6187    4.297450    10.203.85.190   207.198.102.53  TCP 1426    58206 > 80 [ACK] Seq=6388577 Ack=1 Win=13824 Len=1358 TSval=4294945792 TSecr=2532115866
6188    4.297675    10.203.85.190   207.198.102.53  TCP 1426    58206 > 80 [ACK] Seq=6389935 Ack=1 Win=13824 Len=1358 TSval=4294945792 TSecr=2532115866

When you send a TCP packet a re-transmission timer will be created (for each packet); if the ACK does not show up when timer expires the packet will be retransmitted. This procedure will happen multiple times (OS specific and configurable) and if all of the tries are unsuccessful the connection will fail. For more information about TCP/IP implementation in the Linux I highly recommend you to refer to:

Understanding Linux Network Internals

For more information about TCP refer to:

TCP/IP Illustrated

I'm stuck with some tasks related to TCP sockets under Windows OS so I need to know the mechanism of how Windows handles TCP packets in & out. Please correct me if my understanding below is wrong:

Using WinSock when a TCP client wants to establish TCP connection to certain IP:port, winsock.connect(dest_IP,dest_Port) is called. Then

1) The WinSock library on the client will create and send a TCP SYN packet to the destination address.

2) When the client receives other peer's reply with SYN-ACK packet, the WinSock object fires an event called "on_connect" for the client application to handle from thereon.

3) The last ACK packet of the protocol is somehow sent to finish the 3-way-handshake (by the WinSock library or by the OS itself - I don't know). Q1: Who sends it?

I wonder what happens under the hood when I craft a raw SYN-packet using winPCap and send it to the peer. If the dest-IP replies with a SYN-ACK packet then:

Q2: How does the OS (windows) handle that SYN-ACK packet without a relevant winsock object bound to it? Will it automatically follow the 3way-handshake to form a TCP connection or simply drop the packet?

Q3: Can I somehow use winPCap (under admin privilege) to prevent the certain packet from being sent by Windows?

TCP is a connection-oriented protocol i.e., (a) there is a communication session established between the peers; and (b) the data is sent as a stream which means each packet is received in the same order as it was sent. Because of the connection-oriented property, TCP is a stateful protocol. Each peer maintains a state of the TCP connection based on messages sent and received. For easier understanding, let us the call the peer which listens as the server and the peer that connects as the client.

On the implementation level, the data structure used to maintain the state is Transmission Control Block (TCB). As soon as a TCP socket is opened, a TCB is created to hold its state. Further, accept creates a new TCB for the actual data socket on the server. Below is a state diagram of TCP which describes how states are changed in the TCB. With this one, it becomes clear what happens when unsolicited control packets are sent.

TCP State diagram

Figure: Tcp state diagram. Licensed under CC BY-SA 3.0 via Commons - Src: wiki

First, your understanding of the 3-way handshake is right. This is easily verifiable in the state diagram that both TCBs end up on the ESTABLISHED state. Coming to Q1, since a solicited connection was created, the client TCB would be waiting for a SYN+ACK. Once received, the network stack (either on Linux or Windows) automatically responds with ACK. For practical purposes here, you may consider the network stack is part of the OS. Also note that there is hardly any use to give control for the user application to respond for SYN+ACK. Therefore, it is handled by the stack.

To answer Q2, let us again trace the state diagram. Although you have asked it specific to WinSock, the same operations happen on any implementation of TCP for unsolicited SYN. On the server TCB, it assumes the SYN was legitimate and therefore responds with SYN+ACK. However, there are two possible cases on the client here: (a) there is no TCB for that IP:port on the client, and (b) the connection is being teared down.

In case (a), notice that the client can create a TCB but will be in state CLOSED. Receiving a SYN+ACK in this state is an unsusal event for which a response is to send a RST (reset) message is sent to actively break the connection. In some implementations, no message is sent leading to time out. The SYN+ACK just times out on the server TCB and state returns to LISTEN. In fact, time out happens when a client is in ESTABLISHED state but suddenly receives an unsolicited SYN+ACK! More such cases are dealt in this article; I have pasted the relevant one below.

Unsolicited ACK

Case (b) can generalized to say that the client TCB is in one of the CLOSE states. If SYN+ACK is received when the client is in FIN_WAIT 1, FIN_WAIT 2 or CLOSE_WAIT, the client responds with a FIN (for FIN_WAIT 1, FIN_WAIT 2) and FIN/ACK (for CLOSE_WAIT). If it is in any other state, the client does not respond and times out.

Answering Q3: it is possible to filter out packets of connections. Typically this is done with a firewall or program like ipfilter in Linux where a certain rule can be described to allow/disallow packets. In Windows 7, you can either set filter lists or also develop an application using the Windows Filtering Platform APIS. Your question was that if you can remove only a certain packet, I must say I have not tried to do that, however from MSDN it looks that you can do that.

Other useful references:

http://tangentsoft.net/wskfaq/articles/debugging-tcp.html

TCP/IP Illustrated by Stevens, http://tangentsoft.net/wskfaq/reviews/tcpillus.html